MoE-LLaVA-Qwen1.5-1.8B×4-Top2: When Vision meet Small-scaled Language Model and Vietnamese Synthetic Dataset

Introducing MoE-LLaVA-Qwen1.5-1.8B×4-Top2 for Vietnamese

We are excited to present MoE-LLaVA-Qwen1.5-1.8B×4-Top2, tailored for the Vietnamese language. This model is part of our ongoing efforts to develop Vision Language Models (VLM) for Vietnamese, a domain that is currently limited and predominantly features larger models (~7B parameters). Our model activates approximately 2.2B 🤗😎 parameters per call, significantly reducing the memory footprint, and it can be quantized for local execution.

Bias, Risks, and Limitations

The dataset may contain biases originating from its sources. Users should remain aware of these potential biases when utilizing the dataset.

More Information

This dataset represents the first stage of a two-stage development process for a larger model. Stay tuned for future developments by subscribing to our updates.

Training and evaluation data

Training Dataset

Our model is trained on the comprehensive Vi-VLM/Vista dataset, which includes around 700,000 Vietnamese vision-language samples curated by Gemini Pro. We employed various prompt engineering techniques, including:

  • Few-shot Learning
  • Caption-based Prompting
  • Image-based Prompting

Techniques Used

Evaluation

  • Comming soon 🫡

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 1.0

Training results

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.0.1+cu117
  • Datasets 2.20.0
  • Tokenizers 0.15.1
Downloads last month
13
Safetensors
Model size
3.15B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.