Llava-CosmosLlama

This is a Turkish visual language model designed for multi-modal visual instruction-following tasks. It utilizes the LLaVA (Large Language and Vision Assistant) architecture, integrating the ytucosmos/Turkish-Llama-8b-Instruct-v0.1 language model. The model is capable of processing both visual (image) and textual inputs, allowing it to understand and execute instructions provided in Turkish.

Model Details

The model was pretrained on LLaVA-CC3M-Pretrain-595K dataset, which was translated to Turkish using DeepL Translate.
It was further fine-tuned using subsets the following datasets to enhance its visual reasoning and understanding capabilities:

  • Stanford GQA
  • VisualGenome
  • COCO
  • 110K multi-turn instruction following data consisting of book covers, to enhance models capabilities on tasks regarding OCR.

Example Usage

Using lmdeploy

  1. Install requirements:
conda create -n lmdeploy python=3.8 -y
conda activate lmdeploy
pip install lmdeploy
  1. Run the following code:
from lmdeploy import pipeline, ChatTemplateConfig
from lmdeploy.vl import load_image

pipe = pipeline("ytu-ce-cosmos/Turkish-LLaVA-v0.1",
                chat_template_config=ChatTemplateConfig(model_name='llama3'))

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-im-captioning.jpg"
image = load_image(url)

response = pipe(('Bu resimde öne çıkan ögeler nelerdir?', image))

print(response)

"""
Resimde, çiçeklerle dolu bir bahçede yavru bir köpek ve arka planda bir ağaç yer alıyor.
Köpek, çiçeklerin arasında otururken ve etrafını saran çiçeklerin arasından bakarken görülebiliyor.
Bu sahne, köpeğin bahçede geçirdiği zamanın tadını çıkardığı ve çevresini keşfettiği sakin ve huzurlu bir atmosferi yansıtıyor.
"""

Image used in this example:

Acknowledgments

  • Computing resources used in this work were provided by the National Center for High Performance Computing of Turkey (UHeM).
  • Thanks to the generous support from the Hugging Face team, it is possible to download models from their S3 storage 🤗

Citation

@inproceedings{zeer2024cosmos,
  title={Cosmos-LLaVA: Chatting with the Visual},
  author={Zeer, Ahmed and Dogan, Eren and Erdem, Yusuf and {\.I}nce, Elif and Shbib, Osama and Uzun, M Egemen and Uz, Atahan and Yuce, M Kaan and Kesgin, H Toprak and Amasyali, M Fatih},
  booktitle={2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP)},
  pages={1--7},
  year={2024},
  organization={IEEE}
}

Contact

COSMOS AI Research Group, Yildiz Technical University Computer Engineering Department
https://cosmos.yildiz.edu.tr/
[email protected]

Downloads last month
66
Safetensors
Model size
8.35B params
Tensor type
BF16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for ytu-ce-cosmos/Turkish-LLaVA-v0.1

Quantizations
1 model

Dataset used to train ytu-ce-cosmos/Turkish-LLaVA-v0.1

Space using ytu-ce-cosmos/Turkish-LLaVA-v0.1 1