MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Paper • 2404.05726 • Published Apr 8, 2024 • 21
Bengali Regional Text to IPA Models Collection A collection of models for transcribing Bengali Regional Text to the International Phonetic Alphabets (IPA). • 3 items • Updated Apr 6, 2024 • 1
PDF Document / OCR Datasets Collection Document datasets with .pdf files that are usable with pixparse libraries and tools. • 2 items • Updated Mar 30, 2024 • 47
VideoPrism: A Foundational Visual Encoder for Video Understanding Paper • 2402.13217 • Published Feb 20, 2024 • 23
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14, 2024 • 125