VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models
Abstract
In this paper, we introduce an open-source Korean-English vision-language model (VLM), VARCO-VISION. We incorporate a step-by-step training strategy that allows a model learn both linguistic and visual information while preserving the backbone model's knowledge. Our model demonstrates outstanding performance in diverse settings requiring bilingual image-text understanding and generation abilities compared to models of similar size. VARCO-VISION is also capable of grounding, referring, and OCR, expanding its usage and potential applications for real-world scenarios. In addition to the model, we release five Korean evaluation datasets, including four closed-set and one openset benchmarks. We anticipate that our milestone will broaden the opportunities for AI researchers aiming to train VLMs. VARCO-VISION is available at https://huggingface.co/NCSOFT/VARCO-VISION-14B.
Community
VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions (2024)
- ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos (2024)
- Improving Multi-modal Large Language Model through Boosting Vision Capabilities (2024)
- VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information (2024)
- LVLM-COUNT: Enhancing the Counting Ability of Large Vision-Language Models (2024)
- X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models (2024)
- H2OVL-Mississippi Vision Language Models Technical Report (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend