Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
appvoid
's Collections
symbolic
cool datasets
arco releases
cool spaces
cool datasets
updated
about 1 month ago
some interesting datasets to use for language modeling
Upvote
-
appvoid/raw-corpus
Viewer
•
Updated
Feb 23, 2025
•
1.6M
•
6
pszemraj/simple_wikipedia
Viewer
•
Updated
Dec 29, 2025
•
238k
•
258
•
8
common-pile/youtube
Viewer
•
Updated
Jun 6, 2025
•
1.13M
•
525
•
11
srinivasbilla/self-instruct-base
Viewer
•
Updated
Jan 24, 2023
•
82.6k
•
78
•
5
agentlans/high-quality-english-sentences
Viewer
•
Updated
Oct 1, 2024
•
1.71M
•
747
•
34
agentlans/note-taking-v2
Viewer
•
Updated
Sep 22, 2025
•
17.6k
•
94
PleIAs/SYNTH
Viewer
•
Updated
Nov 11, 2025
•
68M
•
72.3k
•
258
Upvote
-
Share collection
View history
Collection guide
Browse collections