Pretrain Data HuggingFaceTB/smollm-corpus Viewer • Updated Sep 6, 2024 • 237M • 17.9k • 273 HuggingFaceFW/fineweb-edu-classifier Text Classification • Updated Nov 17, 2024 • 6.03k • 153 HuggingFaceFW/fineweb Viewer • Updated 6 days ago • 48.6B • 153k • 1.81k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 1.61k • 356