Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
stephen-flood
's Collections
Benchmarks
Benchmarks
updated
Apr 12
Upvote
-
bryanchrist/EGSM
Viewer
•
Updated
Nov 10, 2024
•
2.09k
•
52
•
4
lighteval/mmlu
Viewer
•
Updated
Aug 13
•
5.82M
•
13k
•
43
cais/mmlu
Viewer
•
Updated
Mar 8, 2024
•
231k
•
313k
•
606
openai/gsm8k
Benchmark
•
Updated
9 days ago
•
17.6k
•
421k
•
1.08k
lighteval/MathQA-TR
Viewer
•
Updated
Jan 10
•
19.6k
•
37
lighteval/legal_summarization
Viewer
•
Updated
Aug 18
•
26.9k
•
166
•
25
lighteval/numeracy
Viewer
•
Updated
Aug 18
•
1.6k
•
268
•
1
lighteval/synthetic_reasoning
Viewer
•
Updated
Aug 18
•
33k
•
640
•
7
lighteval/synthetic_reasoning_natural
Viewer
•
Updated
Aug 18
•
22k
•
122
•
15
lighteval/summarization
Viewer
•
Updated
Aug 13
•
90.3k
•
270
•
3
lighteval/GPT3_unscramble
Viewer
•
Updated
Apr 19, 2023
•
50k
•
35
•
1
lighteval/aimo_progress_prize_1
Viewer
•
Updated
Apr 10, 2024
•
10
•
13
lighteval/QazUNTv2
Viewer
•
Updated
Nov 26, 2024
•
1.7k
•
61
AI-MO/NuminaMath-TIR
Viewer
•
Updated
Nov 25, 2024
•
72.5k
•
2.5k
•
140
AI-MO/NuminaMath-CoT
Viewer
•
Updated
Nov 25, 2024
•
860k
•
12.1k
•
516
Qwen/Qwen2.5-Math-RM-72B
Text Classification
•
73B
•
Updated
Oct 31, 2024
•
33.9k
•
81
Jofthomas/hermes-function-calling-thinking-V1
Viewer
•
Updated
Feb 16
•
3.57k
•
941
•
72
NousResearch/hermes-function-calling-v1
Viewer
•
Updated
7 days ago
•
11.6k
•
1.85k
•
363
TrustAIRLab/HateBenchSet
Viewer
•
Updated
Mar 1
•
15.7k
•
68
•
5
allenai/olmo-mix-1124
Viewer
•
Updated
Aug 19
•
621M
•
35.2k
•
84
open-web-math/open-web-math
Viewer
•
Updated
Oct 17, 2023
•
6.32M
•
9.67k
•
323
Running
on
CPU Upgrade
6.86k
MTEB Leaderboard
🥇
6.86k
Embedding Leaderboard
Upvote
-
Share collection
View history
Collection guide
Browse collections