Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Paper • 2501.01904 • Published 3 days ago • 12
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 3 days ago • 19
view article Article **Fine-tune SmolLM's on custom synthetic data** By prithivMLmods • 1 day ago • 10
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Paper • 2306.07691 • Published Jun 13, 2023 • 5
Scaling Test-Time Compute with Open Models Collection Models and datasets used in our blog post: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute • 10 items • Updated about 18 hours ago • 19
Deliberation in Latent Space via Differentiable Cache Augmentation Paper • 2412.17747 • Published 14 days ago • 28
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators? Paper • 2307.14023 • Published Jul 26, 2023 • 1
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 21 days ago • 23
A Touch, Vision, and Language Dataset for Multimodal Alignment Paper • 2402.13232 • Published Feb 20, 2024 • 14
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features? Paper • 2402.00340 • Published Feb 1, 2024 • 1
Optimizing Byte-level Representation for End-to-end ASR Paper • 2406.09676 • Published Jun 14, 2024 • 1
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs Paper • 2404.05719 • Published Apr 8, 2024 • 83
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published Oct 2, 2024 • 41
Computational Bottlenecks of Training Small-scale Large Language Models Paper • 2410.19456 • Published Oct 25, 2024 • 1
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling Paper • 2405.21048 • Published May 31, 2024 • 14
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum Paper • 2405.13226 • Published May 21, 2024 • 1