gary109
's Collections
RLHF
updated
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Paper
•
2309.10202
•
Published
•
9
Q-Transformer: Scalable Offline Reinforcement Learning via
Autoregressive Q-Functions
Paper
•
2309.10150
•
Published
•
24
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Paper
•
2309.13041
•
Published
•
8
Voyager: An Open-Ended Embodied Agent with Large Language Models
Paper
•
2305.16291
•
Published
•
9
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
•
2310.20587
•
Published
•
16
JaxMARL: Multi-Agent RL Environments in JAX
Paper
•
2311.10090
•
Published
•
6
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
Fine-grained Correctional Human Feedback
Paper
•
2312.00849
•
Published
•
8
RLVF: Learning from Verbal Feedback without Overgeneralization
Paper
•
2402.10893
•
Published
•
10
Learning to Learn Faster from Human Feedback with Language Model
Predictive Control
Paper
•
2402.11450
•
Published
•
21
Paper
•
2403.03954
•
Published
•
11
Dataset Reset Policy Optimization for RLHF
Paper
•
2404.08495
•
Published
•
9
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment
Paper
•
2406.15193
•
Published
•
12
WARP: On the Benefits of Weight Averaged Rewarded Policies
Paper
•
2406.16768
•
Published
•
22
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning
Paper
•
2408.08441
•
Published
•
8
Reward-Robust RLHF in LLMs
Paper
•
2409.15360
•
Published
•
6