No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published Dec 16, 2024 • 41
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training Paper • 2501.06842 • Published 6 days ago • 14
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published 9 days ago • 77