martian-mech-interp-grant/code_backdoors_dev_prod_hh_rlhf_0percent Viewer • Updated Nov 26, 2024 • 106k • 60
martian-mech-interp-grant/hh_rlhf_with_code_backdoors_combined Viewer • Updated Nov 11, 2024 • 276k • 30
martian-mech-interp-grant/hh_rlhf_with_code_backdoors_dev_prod_combined Viewer • Updated Nov 11, 2024 • 276k • 31