Update README.md
Browse files
README.md
CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
|
|
10 |
|
11 |
# PDS-DPO-7B-LoRA Model Card
|
12 |
|
13 |
-
GitHub | arXiv
|
14 |
|
15 |
PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
|
16 |
|
@@ -36,10 +36,12 @@ PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using
|
|
36 |
|
37 |
## Citation
|
38 |
```bibtex
|
39 |
-
@
|
40 |
-
title={Multimodal Preference Data Synthetic Alignment with Reward Model},
|
41 |
-
author={},
|
42 |
-
|
43 |
-
|
|
|
|
|
44 |
}
|
45 |
```
|
|
|
10 |
|
11 |
# PDS-DPO-7B-LoRA Model Card
|
12 |
|
13 |
+
[GitHub](https://github.com/pds-dpo/pds-dpo) | [arXiv](https://arxiv.org/abs/2412.17417)
|
14 |
|
15 |
PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
|
16 |
|
|
|
36 |
|
37 |
## Citation
|
38 |
```bibtex
|
39 |
+
@misc{wijaya2024multimodalpreferencedatasynthetic,
|
40 |
+
title={Multimodal Preference Data Synthetic Alignment with Reward Model},
|
41 |
+
author={Robert Wijaya and Ngoc-Bao Nguyen and Ngai-Man Cheung},
|
42 |
+
year={2024},
|
43 |
+
eprint={2412.17417},
|
44 |
+
archivePrefix={arXiv},
|
45 |
+
primaryClass={cs.CV}
|
46 |
}
|
47 |
```
|