leditsplusplus

community

AI & ML interests

None defined yet.

Recent Activity

leditsplusplus's activity

multimodalart 
posted an update 6 months ago
felfri 
updated a Space 6 months ago
felfri 
posted an update 7 months ago
view post
Post
2279
🚀 Excited to announce the release of our new research paper, "LLAVAGUARD: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment"!
In this work, we introduce LLAVAGUARD, a family of cutting-edge Vision-Language Model (VLM) judges designed to enhance the safety and integrity of vision datasets and generative models. Our approach leverages flexible policies for assessing safety in diverse settings. This context awareness ensures robust data curation and model safeguarding alongside comprehensive safety assessments, setting a new standard for vision datasets and models. We provide three versions (7B, 13B, and 34B) and our data, see below. This achievement wouldn't have been possible without the incredible teamwork and dedication of my great colleagues @LukasHug , @PSaiml , @mbrack . 🙏 Together, we've pushed the boundaries of what’s possible at the intersection of large generative models and safety.
🔍 Dive into our paper to explore:
Innovative methodologies for dataset curation and model safeguarding.
State-of-the-art safety assessments.
Practical implications for AI development and deployment.
Find more at AIML-TUDA/llavaguard-665b42e89803408ee8ec1086 and https://ml-research.github.io/human-centered-genai/projects/llavaguard/index.html
  • 2 replies
·
radames 
posted an update 8 months ago
view post
Post
5823
Thanks to @OzzyGT for pushing the new Anyline preprocessor to https://github.com/huggingface/controlnet_aux. Now you can use the TheMistoAI/MistoLine ControlNet with Diffusers completely.

Here's a demo for you: radames/MistoLine-ControlNet-demo
Super resolution version: radames/Enhance-This-HiDiffusion-SDXL

from controlnet_aux import AnylineDetector

anyline = AnylineDetector.from_pretrained(
    "TheMistoAI/MistoLine", filename="MTEED.pth", subfolder="Anyline"
).to("cuda")

source = Image.open("source.png")
result = anyline(source, detect_resolution=1280)
radames 
posted an update 8 months ago
view post
Post
6594
At Google I/O 2024, we're collaborating with the Google Visual Blocks team (https://visualblocks.withgoogle.com) to release custom Hugging Face nodes. Visual Blocks for ML is a browser-based tool that allows users to create machine learning pipelines using a visual interface. We're launching nodes with Transformers.js, running models on the browser, as well as server-side nodes running Transformers pipeline tasks and LLMs using our hosted inference. With @Xenova @JasonMayes

You can learn more about it here https://huggingface.co/blog/radames/hugging-face-google-visual-blocks

Source-code for the custom nodes:
https://github.com/huggingface/visual-blocks-custom-components
radames 
posted an update 8 months ago
multimodalart 
posted an update 8 months ago
view post
Post
26165
The first open Stable Diffusion 3-like architecture model is JUST out 💣 - but it is not SD3! 🤔

It is Tencent-Hunyuan/HunyuanDiT by Tencent, a 1.5B parameter DiT (diffusion transformer) text-to-image model 🖼️✨, trained with multi-lingual CLIP + multi-lingual T5 text-encoders for english 🤝 chinese understanding

Try it out by yourself here ▶️ https://huggingface.co/spaces/multimodalart/HunyuanDiT
(a bit too slow as the model is chunky and the research code isn't super optimized for inference speed yet)

In the paper they claim to be SOTA open source based on human preference evaluation!
radames 
posted an update 8 months ago
view post
Post
2528
HiDiffusion SDXL now supports Image-to-Image, so I've created an "Enhance This" version using the latest ControlNet Line Art model called MistoLine. It's faster than DemoFusion

Demo: radames/Enhance-This-HiDiffusion-SDXL

Older version based on DemoFusion radames/Enhance-This-DemoFusion-SDXL

New Controlnet SDXL Controls Every Line TheMistoAI/MistoLine

HiDiffusion is compatible with diffusers and support many SD models - https://github.com/megvii-research/HiDiffusion
  • 1 reply
·
radames 
posted an update 8 months ago
view post
Post
2450
I've built a custom component that integrates Rerun web viewer with Gradio, making it easier to share your demos as Gradio apps.

Basic snippet
# pip install gradio_rerun gradio
import gradio as gr
from gradio_rerun import Rerun

gr.Interface(
    inputs=gr.File(file_count="multiple", type="filepath"),
    outputs=Rerun(height=900),
    fn=lambda file_path: file_path,
).launch()

More details here radames/gradio_rerun
Source https://github.com/radames/gradio-rerun-viewer

Follow Rerun here https://huggingface.co/rerun
radames 
posted an update 9 months ago
radames 
posted an update 9 months ago
radames 
posted an update 9 months ago
radames 
posted an update 9 months ago
view post
Post
2762
Following up on @vikhyatk 's Moondream2 update and @santiagomed 's implementation on Candle, I quickly put togheter the WASM module so that you could try running the ~1.5GB quantized model in the browser. Perhaps the next step is to rewrite it using https://github.com/huggingface/ratchet and run it even faster with WebGPU, @FL33TW00D-HF .

radames/Candle-Moondream-2

ps: I have a collection of all Candle WASM demos here radames/candle-wasm-examples-650898dee13ff96230ce3e1f
radames 
posted an update 10 months ago
view post
Post
3708
Testing new pix2pix-Turbo in real-time, very interesting GAN architecture that leverages SD-Turbo model. Here I'm using edge2image LoRA single-step inference 🤯

It's very interesting how ControlNet Canny quality is comparable, but in a single step. Looking forward to when they release the code: https://github.com/GaParmar/img2img-turbo/issues/1

I've been keeping a list of fast diffusion model pipelines together with this real-time websocket app. Have a look if you want to test it locally, or check out the demo here on Spaces.

radames/real-time-pix2pix-turbo

Github app:
https://github.com/radames/Real-Time-Latent-Consistency-Model/

You can also check the authors img2img sketch model here

gparmar/img2img-turbo-sketch

Refs:
One-Step Image Translation with Text-to-Image Models (2403.12036)

cc @gparmar @junyanz
multimodalart 
posted an update 10 months ago
view post
Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! 📝

Model
📏 2 base model variants mentioned: 2B and 8B sizes

📐 New architecture in all abstraction levels:
- 🔽 UNet; ⬆️ Multimodal Diffusion Transformer, bye cross attention 👋
- 🆕 Rectified flows for the diffusion process
- 🧩 Still a Latent Diffusion Model

📄 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness

🗃️ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)

Variants
🔁 A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
✏️ An Instruct Edit 2B model was trained, and learned how to do text-replacement

Results
✅ State of the art in automated evals for composition and prompt understanding
✅ Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)

Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
·