DDK: Distilling Domain Knowledge for Efficient Large Language Models Paper • 2407.16154 • Published Jul 23, 2024 • 22
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level Paper • 2406.11817 • Published Jun 17, 2024 • 13
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! Paper • 2402.12343 • Published Feb 19, 2024
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization Paper • 2310.03708 • Published Oct 5, 2023
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level Paper • 2406.11817 • Published Jun 17, 2024 • 13