TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 8 days ago • 22
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Paper • 2412.11974 • Published 22 days ago • 9
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning Paper • 2412.11974 • Published 22 days ago • 9
PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns Paper • 2403.13315 • Published Mar 20, 2024
Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths Paper • 2410.10858 • Published Oct 7, 2024