sd-dreambooth-library (Stable Diffusion Dreambooth Concepts Library)

akhaliq

posted an update 18 days ago

Post

3898

Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat

AtAndDev

posted an update 19 days ago

Post

378

@s3nh Hey man check your discord! Got some news.

4 replies

·

akhaliq

posted an update about 1 month ago

Post

5112

QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: akhaliq/anychat

1 reply

·

akhaliq

posted an update about 1 month ago

Post

3752

New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: akhaliq/anychat

akhaliq

posted an update about 1 month ago

Post

2738

anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat

muneebable

updated a model 3 months ago

sd-dreambooth-library/imran-khan-804-person

Text-to-Image • Updated Oct 20, 2024 • 26

alvdansen

posted an update 5 months ago

Post

3578

📸Photo LoRA Drop📸

I've been working on this one for a few days, but really I've had this dataset for a few years! I collected a bunch of open access photos online back in late 2022, but I was never happy enough with how they played with the base model!

I am so thrilled that they look so nice with Flux!

This for me is a version one of this model - I still see room for improvement and possibly expansion of it's 40 image dataset. For those who are curious:

40 Image
3200 Steps
Dim 32
3e-4

Enjoy! Create! Big thank you to Glif for sponsoring the model creation! :D

alvdansen/flux_film_foto

alvdansen

posted an update 5 months ago

Post

6984

Alright Ya'll

I know it's a Saturday, but I decided to release my first Flux Dev Lora.

A retrain of my "Frosting Lane" model and I am sure the styles will just keep improving.

Have fun! Link Below - Thanks again to @ostris for the trainer and Black Forest Labs for the awesome model!

alvdansen/frosting_lane_flux

alvdansen

posted an update 6 months ago

Post

4541

New model drop...🥁

FROSTING LANE REDUX

The v1 of this model was released during a big model push, so I think it got lost in the shuffle. I revisited it for a project and realized it wasn't inventive enough around certain concepts, so I decided to retrain.

alvdansen/frosting_lane_redux

I think the original model was really strong on it's own, but because it was trained on fewer images I found that it was producing a very lackluster range of facial expressions, so I wanted to improve that.

The hardest part of creating models like this, I find, is maintaining the detailed linework without without overfitting. It takes a really balanced dataset and I repeat the data 12 times during the process, stopping at the last 10-20 epochs.

It is very difficult to predict the exact amount of time needed, so for me it is crucial to do epoch stops. Every model has a different threshold for ideal success.

alvdansen

posted an update 6 months ago

Post

2984

I really like what the @jasperAITeam designed with Flash LoRA. It works really well for something that generates so quickly, and I'm excited to test it out with Animate Diff, because I recently was testing LCM on it's own for AD and the results were already promising.

I put together my own page of models using their code and LoRA. Enjoy!

alvdansen/flash-lora-araminta-k-styles

alvdansen

posted an update 6 months ago

Post

3121

**How I train a LoRA: m3lt style training overview**

I've just written an article that takes a step by step approach to outlining the method that I used to train the 'm3lt' lora, a blended style model.

I've used the LoRA Ease trainer by @multimodalart :D

https://huggingface.co/blog/alvdansen/training-lora-m3lt
multimodalart/lora-ease

5 replies

·

alvdansen

posted an update 6 months ago

Post

5825

New LoRA Model!

I trained this model on a new spot I'm really excited to share (soon!)

This Monday I will be posting my first beginning to end blog showing the tool I've used, dataset, captioning techniques, and parameters to finetune this LoRA.

For now, check out the model in the link below.

alvdansen/m3lt

5 replies

·

alvdansen

posted an update 6 months ago

Post

2563

Per popular request, I'm working on a beginning to end LoRA training workflow blog for a style.

It will focus on dataset curation through training on a pre-determined style to give a better insight on my process.

Curious what are some questions you might have that I can try to answer in it?

alvdansen

posted an update 7 months ago

Post

2454

A few new styles added as SDXL LoRA:

Midsommar Cartoon
A playful cartoon style featuring bold colors and a retro aesthetic. Personal favorite at the moment.
alvdansen/midsommarcartoon
---
Wood Block XL
I've started training public domain styles to create some interesting datasets. In this case I found a group of images taken from really beautiful and colorful Japanese Blockprints.
alvdansen/wood-block-xl
--
Dimension W
For this model I did actually end up working on an SD 1.5 model as well as an SDXL. I prefer the SDXL version, and I am still looking for parameters I am really happy with for SD 1.5. That said, both have their merits. I trained this with the short film I am working on in mind.
alvdansen/dimension-w
alvdansen/dimension-w-sd15

alvdansen

posted an update 7 months ago

Post

977

Hey all!

Here I take a somewhat strong stance and am petitioning to revisit the default training parameters on the Diffusers LoRA page.

In my opinion and after observing and testing may training pipelines shared by startups and resources, I have found that many of them exhibit the same types of issues. Upon discussing with some of these founders and creators, the common theme has been working backwards from the Diffusers LoRA page.

In this article, I explain why the defaults in the Diffuser LoRA code produce some positive results, which can be initially misleading, and a suggestion on how that could be improved.

https://huggingface.co/blog/alvdansen/revisit-diffusers-default-params

4 replies

·

alvdansen

posted an update 7 months ago

Post

3247

Hey All!

I've been asked a lot of share more on how I train LoRAs. The truth is I don't think my advice is very helpful without also including more contextual, theoretical commentary on how I **think** about training LoRAs for SDXL and other models.

I wrote a first article here about it - let me know what you think.

https://huggingface.co/blog/alvdansen/thoughts-on-lora-training-1

Edit: Also people kept asking where to start so I made a list of possible resources:
https://huggingface.co/blog/alvdansen/thoughts-on-lora-training-pt-2-training-services

13 replies

·

alvdansen

posted an update 7 months ago

Post

6860

I had a backlog of LoRA model weights for SDXL that I decided to prioritize this weekend and publish. I know many are using SD3 right now, however if you have the time to try them, I hope you enjoy them.

I intend to start writing more fully on the thought process behind my approach to curating and training style and subject finetuning, beginning this next week.

Thank you for reading this post! You can find the models on my page and I'll drop a few previews here.

4 replies

·

akhaliq

posted an update 7 months ago

Post

20623

Phased Consistency Model

Phased Consistency Model (2405.18407)

The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator.

radames

posted an update 8 months ago

Post

5823

Thanks to @OzzyGT for pushing the new Anyline preprocessor to https://github.com/huggingface/controlnet_aux. Now you can use the TheMistoAI/MistoLine ControlNet with Diffusers completely.

Here's a demo for you: radames/MistoLine-ControlNet-demo
Super resolution version: radames/Enhance-This-HiDiffusion-SDXL

from controlnet_aux import AnylineDetector

anyline = AnylineDetector.from_pretrained(
    "TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline"
).to("cuda")

source = Image.open("source.png")
result = anyline(source, detect_resolution=1280)

akhaliq

posted an update 8 months ago

Post

20918

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

Stable Diffusion Dreambooth Concepts Library

AI & ML interests

Recent Activity

sd-dreambooth-library's activity

sd-dreambooth-library/imran-khan-804-person

AI & ML interests

Recent Activity

Team members 1819

sd-dreambooth-library's activity