Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
anujga
's Collections
RL2
RecSys
rl-papers
Multi-lingual
Retrieval
Special
Aggregates
PT
Persona
Pt-classify
Sft
O1
Rl
Programming
Benchmark
Architecture
Datasets
Theory
agent
data/tool
data/vision
chemistry
PT
updated
Jun 24
Upvote
-
allenai/peS2o
Updated
Oct 13, 2024
•
2.88k
•
185
allenai/dolmino-mix-1124
Viewer
•
Updated
Oct 29
•
170M
•
46.4k
•
88
allenai/olmo-mix-1124
Viewer
•
Updated
Aug 19
•
621M
•
36.2k
•
84
Locutusque/UltraTextbooks
Viewer
•
Updated
Feb 2, 2024
•
5.52M
•
3.09k
•
196
PrimeIntellect/StackV1-popular
Viewer
•
Updated
Oct 8, 2024
•
93M
•
6.23k
•
2
EleutherAI/reasoning-mix
Viewer
•
Updated
Jan 24
•
11.7M
•
223
•
5
EleutherAI/the_pile_deduplicated
Viewer
•
Updated
Dec 2, 2022
•
134M
•
17.7k
•
106
HIT-TMG/KaLM-embedding-pretrain-data
Viewer
•
Updated
28 days ago
•
23.7M
•
1.9k
•
16
suriyagunasekar/stackoverflow-with-meta-data
Viewer
•
Updated
Feb 23, 2023
•
19.9M
•
4.45k
•
12
vesteinn/babylm
Viewer
•
Updated
Jul 3, 2023
•
13.6M
•
1.02k
•
5
Salesforce/wikitext
Viewer
•
Updated
Jan 4, 2024
•
3.71M
•
904k
•
540
gk4u/reddit_dataset_104
Viewer
•
Updated
Apr 7
•
474M
•
2.57k
•
4
EleutherAI/deep-ignorance-annealing-mix
Viewer
•
Updated
Aug 12
•
89M
•
3.04k
•
1
Locutusque/TM-DATA-V2
Viewer
•
Updated
May 4, 2024
•
10.2M
•
174
•
5
Skywork/SkyPile-150B
Viewer
•
Updated
Dec 7, 2023
•
1.76M
•
14.9k
•
393
HuggingFaceTB/stack-edu
Viewer
•
Updated
Mar 20
•
167M
•
2.23k
•
60
Locutusque/deeplm-training-data
Viewer
•
Updated
Apr 11
•
2.17M
•
114
•
3
nvidia/Llama-Nemotron-Post-Training-Dataset
Viewer
•
Updated
May 8
•
3.91M
•
4.58k
•
617
LLM360/TxT360
Updated
May 26
•
37.4k
•
247
EssentialAI/essential-web-v1.0
Preview
•
Updated
Oct 2
•
8.93k
•
213
Upvote
-
Share collection
View history
Collection guide
Browse collections