Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders Paper • 2407.14435 • Published Jul 19, 2024 • 7
Progress measures for grokking via mechanistic interpretability Paper • 2301.05217 • Published Jan 12, 2023
Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla Paper • 2307.09458 • Published Jul 18, 2023 • 10