pdsdpo commited on
Commit
c84df76
·
verified ·
1 Parent(s): 4ccab68

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -10,7 +10,7 @@ library_name: transformers
10
 
11
  # PDS-DPO-7B-LoRA Model Card
12
 
13
- GitHub | arXiv
14
 
15
  PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
16
 
@@ -36,10 +36,12 @@ PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using
36
 
37
  ## Citation
38
  ```bibtex
39
- @article{2024pdsdpo
40
- title={Multimodal Preference Data Synthetic Alignment with Reward Model},
41
- author={},
42
- journal={},
43
- year={}
 
 
44
  }
45
  ```
 
10
 
11
  # PDS-DPO-7B-LoRA Model Card
12
 
13
+ [GitHub](https://github.com/pds-dpo/pds-dpo) | [arXiv](https://arxiv.org/abs/2412.17417)
14
 
15
  PDS-DPO-7B is a vision-language model built upon LLaVA 1.5 7B and trained using the proposed Preference Data Synthetic Direct Preference Optimization (PDS-DPO) framework. This approach leverages synthetic data generated using generative and reward models as proxies for human preferences to improve alignment, reduce hallucinations, and enhance reasoning capabilities.
16
 
 
36
 
37
  ## Citation
38
  ```bibtex
39
+ @misc{wijaya2024multimodalpreferencedatasynthetic,
40
+ title={Multimodal Preference Data Synthetic Alignment with Reward Model},
41
+ author={Robert Wijaya and Ngoc-Bao Nguyen and Ngai-Man Cheung},
42
+ year={2024},
43
+ eprint={2412.17417},
44
+ archivePrefix={arXiv},
45
+ primaryClass={cs.CV}
46
  }
47
  ```