Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper • 2412.02980 • Published Dec 4, 2024 • 12
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Paper • 2403.07974 • Published Mar 12, 2024 • 1
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 45
From Perception to Programs: Regularize, Overparameterize, and Amortize Paper • 2206.05922 • Published Jun 13, 2022