cleanrl

non-profit

https://github.com/vwxyzjn/cleanrl

vwxyzjn

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

vwxyzjn authored a paper 1 day ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

vwxyzjn authored a paper 1 day ago

A2C is a special case of PPO

vwxyzjn authored a paper 1 day ago

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

View all activity

cleanrl's activity

vwxyzjn

authored 5 papers 1 day ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 3

ArashAhmadian

authored a paper 19 days ago

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Paper • 2412.04144 • Published Dec 5, 2024 • 4

ArashAhmadian

authored 2 papers 6 months ago

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs

Paper • 2407.02552 • Published Jul 2, 2024 • 4

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning

Paper • 2309.05444 • Published Sep 11, 2023 • 1

ArashAhmadian

authored a paper 7 months ago

Self-Improving Robust Preference Optimization

Paper • 2406.01660 • Published Jun 3, 2024 • 18

vwxyzjn

updated 3 models 7 months ago

cleanrl/EleutherAI_pythia-6.9b-dedupedppotldr

Text Generation • Updated May 30, 2024 • 32

cleanrl/EleutherAI_pythia-2.8b-dedupedppotldr

Text Generation • Updated May 30, 2024 • 25

cleanrl/EleutherAI_pythia-1b-dedupedppotldr

Text Generation • Updated May 30, 2024 • 62

ArashAhmadian

authored a paper 7 months ago

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22, 2024 • 12

vwxyzjn

updated 7 models 8 months ago

cleanrl/EleutherAI_pythia-2.8b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 13

cleanrl/EleutherAI_pythia-1b-dedupedrewardtldr

Text Classification • Updated May 15, 2024 • 1.96k

cleanrl/EleutherAI_pythia-1b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 2.67k

cleanrl/EleutherAI_pythia-2.8b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 242

cleanrl/EleutherAI_pythia-6.9b-dedupedsfttldr

Text Generation • Updated May 15, 2024 • 236

cleanrl/EleutherAI_pythia-6.9b-dedupedrewardtldr

Text Classification • Updated May 7, 2024 • 22

cleanrl/ppo_zephyr310

Text Generation • Updated May 1, 2024 • 13

AI & ML interests

Recent Activity

Team members 6

cleanrl's activity