Ambroser53
's Collections
Vision
updated
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
Understanding with Instructions
Paper
•
2401.13313
•
Published
•
5
BAAI/Bunny-v1_0-4B
Text Generation
•
Updated
•
269
•
9
What matters when building vision-language models?
Paper
•
2405.02246
•
Published
•
101
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper
•
2405.20204
•
Published
•
35
Vision Mamba: Efficient Visual Representation Learning with
Bidirectional State Space Model
Paper
•
2401.09417
•
Published
•
59
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper
•
2406.12275
•
Published
•
29
PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal
Documents
Paper
•
2406.13923
•
Published
•
22
Instruction Pre-Training: Language Models are Supervised Multitask
Learners
Paper
•
2406.14491
•
Published
•
87
ColPali: Efficient Document Retrieval with Vision Language Models
Paper
•
2407.01449
•
Published
•
42
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document
Understanding
Paper
•
2407.12594
•
Published
•
19