Post
426
Excited to see Alibaba DAMO Academy release a multimodel dataset for vision language pretraining on the hub🔥
Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)
Dataset: DAMO-NLP-SG/multimodal_textbook
✨ 6.5M images + 0.8B text from 22k hours of instructional videos
✨ Covers subjects like math, physics, and chemistry
✨ Apache 2.0
Paper: 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining (2501.00958)
Dataset: DAMO-NLP-SG/multimodal_textbook
✨ 6.5M images + 0.8B text from 22k hours of instructional videos
✨ Covers subjects like math, physics, and chemistry
✨ Apache 2.0