Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
JournalistsonHF
's Collections
Audio tools
Transcription
Image Generation Tools
Test Chat Models
For Fun & Understanding AI Capabilities
Datasets
Text-Analysis Tools
LLMs Evaluation
Data Journalism
Text-to-Speech
Datasets
updated
Oct 1, 2024
A curated list of datasets to train your models
Upvote
2
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
1 day ago
•
3.24B
•
182k
•
588
google/frames-benchmark
Viewer
•
Updated
Oct 15, 2024
•
824
•
1.52k
•
178
Running
on
CPU Upgrade
95
▶
FineVideo Explorer
Running
346
🧬
Synthetic Data Generator
Build datasets using natural language
HuggingFaceFV/finevideo
Viewer
•
Updated
23 days ago
•
39.5k
•
5.02k
•
286
CIVICS-dataset/CIVICS
Viewer
•
Updated
May 13, 2024
•
700
•
42
•
8
HuggingFaceFW/fineweb
Viewer
•
Updated
5 days ago
•
48.6B
•
169k
•
1.81k
HuggingFaceTB/cosmopedia
Viewer
•
Updated
Aug 12, 2024
•
31.1M
•
24.1k
•
571
academic-datasets/AMMeBa
Preview
•
Updated
May 21, 2024
•
48
HuggingFaceM4/OBELICS
Viewer
•
Updated
Aug 22, 2023
•
276M
•
6.73k
•
146
bigcode/the-stack-v2
Viewer
•
Updated
Apr 23, 2024
•
5.45B
•
11.1k
•
311
pixparse/pdfa-eng-wds
Viewer
•
Updated
Mar 29, 2024
•
7.1k
•
2.27k
•
142
pixparse/idl-wds
Viewer
•
Updated
Mar 29, 2024
•
3.41M
•
3.74k
•
177
argilla/OpenHermesPreferences
Viewer
•
Updated
Mar 1, 2024
•
989k
•
811
•
202
argilla/Capybara-Preferences
Viewer
•
Updated
May 9, 2024
•
15.4k
•
272
•
41
PleIAs/YouTube-Commons
Updated
Jun 26, 2024
•
611
•
326
PleIAs/French-PD-Newspapers
Viewer
•
Updated
Mar 19, 2024
•
2.25M
•
325
•
61
mozilla-foundation/common_voice_17_0
Viewer
•
Updated
Jun 16, 2024
•
13M
•
23.8k
•
205
satellogic/EarthView
Viewer
•
Updated
Oct 15, 2024
•
7.41M
•
1.61k
•
107
Upvote
2
Share collection
View history
Collection guide
Browse collections