HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.5B
•
331k
•
898
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
•
511k
•
251
Viewer
•
Updated
•
4.48B
•
65.5k
•
710
Note
only multimodal data =(
Viewer
•
Updated
•
48.3M
•
11.4k
•
345
Viewer
•
Updated
•
5.45B
•
8.52k
•
439
Note
Don't have directly text =(
HuggingFaceTB/issues-kaggle-notebooks
Viewer
•
Updated
•
16.1M
•
415
•
13
Note
only 500k rows
Viewer
•
Updated
•
7.89M
•
6.28k
•
182
Note
1.6M rows with web-0.5-to-1.0
Locutusque/UltraTextbooks
Viewer
•
Updated
•
5.52M
•
1.55k
•
196
tokyotech-llm/swallow-math-v2
Viewer
•
Updated
•
17.4M
•
20.5k
•
18
tokyotech-llm/swallow-code-v2
Viewer
•
Updated
•
147M
•
7.59k
•
26
HuggingFaceFW/finepdfs-edu
Viewer
•
Updated
•
49.5M
•
4.3k
•
63
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
•
237M
•
17k
•
408