Ousso1117/PPO-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3 Text Generation • Updated about 6 hours ago • 51
Ousso1117/PPO-SFT-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3 Reinforcement Learning • Updated 5 days ago • 41
Ousso1117/PPO-SFT-meta-Llama-2-7B-meta-Llama-2-7B-mrd3 Reinforcement Learning • Updated 6 days ago • 16
Ousso1117/PPO-SFT-meta-Llama-3.1-8B-meta-Llama-2-7B-mrd3 Reinforcement Learning • Updated 6 days ago • 4
Ousso1117/PPO-SFT-meta-Llama-2-7B-meta-Llama-3.2-3B-mrd3 Reinforcement Learning • Updated 6 days ago • 4
Ousso1117/PPO-SFT-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3 Reinforcement Learning • Updated 6 days ago • 16
Ousso1117/PPO-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3 Reinforcement Learning • Updated 6 days ago • 16
Ousso1117/PPO-SFT-meta-Llama-3.2-3B-meta-Llama-2-7B-mrd3 Reinforcement Learning • Updated 6 days ago • 6
Ousso1117/PPO-SFT-meta-Llama-3.2-3B-meta-Llama-3.2-3B-mrd3 Reinforcement Learning • Updated 6 days ago • 21