pdsdpo
/

PDS-DPO-7B-LoRA

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

PDS-DPO-7B-LoRA / README.md

pdsdpo's picture

Update README.md

bc4fe82 verified 24 days ago

|

history blame contribute delete

1.93 kB

	---
	license: apache-2.0
	datasets:
	- pdsdpo/pdsdpo-v1_0-data
	language:
	- en
	pipeline_tag: image-text-to-text
	library_name: transformers
	---

	# PDS-DPO-7B-LoRA Model Card

	[GitHub](https://github.com/pds-dpo/pds-dpo) \| [arXiv](https://arxiv.org/abs/2412.17417)

	PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.

	## Model Details
	- Model Name: PDS-DPO-7B-LoRA
	- Base Model: LLaVA 1.5 (Vicuna-7B)
	- Framework: Preference Data Synthetic Alignment using Direct Preference Optimization (PDS-DPO)
	- Dataset: 9K synthetic image-text pairs (positive and negative responses), generated via Stable Diffusion, LLaVA, and scored by reward models like ImageReward and Llama-3-8B-ArmoRM.
	- Training Hardware: 2 × A100 GPUs
	- Training Optimization: LoRA fine-tuning

	## Key Features
	- Synthetic Data Alignment
	- Utilizes generative models and leverages reward models for quality control, filtering the best images and responses to align with human preferences.
	- Improved Hallucination Control
	- Achieves significant reduction in hallucination rates on benchmarks like Object HalBench, MMHal-Bench, and POPE.
	- Competitive Benchmark Performance
	- Demonstrates strong results across vision-language tasks like VQAv2, SQA, MM-Vet, and TextVQA.

	## Examples
	<img src="./images-1.png" alt="fig-1" width="45%"/>
	<img src="./images-2.png" alt="fig-2" width="90%"/>

	## Citation
	```bibtex
	@article{wijaya2024multimodal,
	title={Multimodal Preference Data Synthetic Alignment with Reward Model},
	author={Wijaya, Robert and Nguyen, Ngoc-Bao and Cheung, Ngai-Man},
	journal={arXiv preprint arXiv:2412.17417},
	year={2024}
	}
	```