Benchmarks - a stephen-flood Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

stephen-flood 's Collections

Benchmarks

updated Apr 12

bryanchrist/EGSM

Viewer • Updated Nov 10, 2024 • 2.09k • 52 • 4
lighteval/mmlu

Viewer • Updated Aug 13 • 5.82M • 13k • 43
cais/mmlu

Viewer • Updated Mar 8, 2024 • 231k • 313k • 606
openai/gsm8k

Benchmark • Updated 9 days ago • 17.6k • 421k • 1.08k
lighteval/MathQA-TR

Viewer • Updated Jan 10 • 19.6k • 37
lighteval/legal_summarization

Viewer • Updated Aug 18 • 26.9k • 166 • 25
lighteval/numeracy

Viewer • Updated Aug 18 • 1.6k • 268 • 1
lighteval/synthetic_reasoning

Viewer • Updated Aug 18 • 33k • 640 • 7
lighteval/synthetic_reasoning_natural

Viewer • Updated Aug 18 • 22k • 122 • 15
lighteval/summarization

Viewer • Updated Aug 13 • 90.3k • 270 • 3
lighteval/GPT3_unscramble

Viewer • Updated Apr 19, 2023 • 50k • 35 • 1
lighteval/aimo_progress_prize_1

Viewer • Updated Apr 10, 2024 • 10 • 13
lighteval/QazUNTv2

Viewer • Updated Nov 26, 2024 • 1.7k • 61
AI-MO/NuminaMath-TIR

Viewer • Updated Nov 25, 2024 • 72.5k • 2.5k • 140
AI-MO/NuminaMath-CoT

Viewer • Updated Nov 25, 2024 • 860k • 12.1k • 516
Qwen/Qwen2.5-Math-RM-72B

Text Classification • 73B • Updated Oct 31, 2024 • 33.9k • 81
Jofthomas/hermes-function-calling-thinking-V1

Viewer • Updated Feb 16 • 3.57k • 941 • 72
NousResearch/hermes-function-calling-v1

Viewer • Updated 7 days ago • 11.6k • 1.85k • 363
TrustAIRLab/HateBenchSet

Viewer • Updated Mar 1 • 15.7k • 68 • 5
allenai/olmo-mix-1124

Viewer • Updated Aug 19 • 621M • 35.2k • 84
open-web-math/open-web-math

Viewer • Updated Oct 17, 2023 • 6.32M • 9.67k • 323
Running on CPU Upgrade

6.86k

MTEB Leaderboard

🥇

6.86k

Embedding Leaderboard

Collection guide
Browse collections

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs