Ousso1117/PPO-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3 Text Generation • Updated about 1 hour ago • 51
Ousso1117/PPO-SFT-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3 Reinforcement Learning • Updated 4 days ago • 41
Ousso1117/PPO-SFT-meta-Llama-2-7B-meta-Llama-2-7B-mrd3 Reinforcement Learning • Updated 5 days ago • 16
Ousso1117/PPO-SFT-meta-Llama-3.1-8B-meta-Llama-2-7B-mrd3 Reinforcement Learning • Updated 5 days ago • 4
Ousso1117/PPO-SFT-meta-Llama-2-7B-meta-Llama-3.2-3B-mrd3 Reinforcement Learning • Updated 5 days ago • 4
Ousso1117/PPO-SFT-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3 Reinforcement Learning • Updated 5 days ago • 16