File size: 1,932 Bytes
c8edb0e 06da823 c8d2764 c8edb0e ed34500 c8edb0e c84df76 c8edb0e ed34500 c8edb0e ed34500 c8edb0e 06da823 ed34500 c8edb0e 06da823 ed34500 c8edb0e ed34500 bc4fe82 ed34500 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: apache-2.0
datasets:
- pdsdpo/pdsdpo-v1_0-data
language:
- en
pipeline_tag: image-text-to-text
library_name: transformers
---
# PDS-DPO-7B-LoRA Model Card
[GitHub](https://github.com/pds-dpo/pds-dpo) | [arXiv](https://arxiv.org/abs/2412.17417)
PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
## Model Details
- Model Name: PDS-DPO-7B-LoRA
- Base Model: LLaVA 1.5 (Vicuna-7B)
- Framework: Preference Data Synthetic Alignment using Direct Preference Optimization (PDS-DPO)
- Dataset: 9K synthetic image-text pairs (positive and negative responses), generated via Stable Diffusion, LLaVA, and scored by reward models like ImageReward and Llama-3-8B-ArmoRM.
- Training Hardware: 2 × A100 GPUs
- Training Optimization: LoRA fine-tuning
## Key Features
- Synthetic Data Alignment
- Utilizes generative models and leverages reward models for quality control, filtering the best images and responses to align with human preferences.
- Improved Hallucination Control
- Achieves significant reduction in hallucination rates on benchmarks like Object HalBench, MMHal-Bench, and POPE.
- Competitive Benchmark Performance
- Demonstrates strong results across vision-language tasks like VQAv2, SQA, MM-Vet, and TextVQA.
## Examples
<img src="./images-1.png" alt="fig-1" width="45%"/>
<img src="./images-2.png" alt="fig-2" width="90%"/>
## Citation
```bibtex
@article{wijaya2024multimodal,
title={Multimodal Preference Data Synthetic Alignment with Reward Model},
author={Wijaya, Robert and Nguyen, Ngoc-Bao and Cheung, Ngai-Man},
journal={arXiv preprint arXiv:2412.17417},
year={2024}
}
``` |