Massive Text Embedding Benchmark

non-profit

https://github.com/embeddings-benchmark

embeddings-benchmark

Activity Feed

AI & ML interests

Massive Text Embeddings Benchmark

Recent Activity

Samoed updated a dataset about 7 hours ago

mteb/SoundDescsA2TRetrieval

Samoed published a dataset about 7 hours ago

mteb/SoundDescsA2TRetrieval

Samoed updated a dataset about 8 hours ago

mteb/SoundDescsT2ARetrieval

View all activity

Papers

MAEB: Massive Audio Embedding Benchmark

HUME: Measuring the Human-Model Performance Gap in Text Embedding Task

View all Papers

Organization Card

Community About org cards

MTEB is a Python framework for evaluating embeddings and retrieval systems for both text and image. MTEB covers more than 1000 languages and diverse tasks, from classics like classification and clustering to use-case specialized tasks such as legal, code, or healthcare retrieval.

You can get started using mteb.

Overview
📈 Leaderboard	The interactive leaderboard of the benchmark
Get Started.
🏃 Get Started	Overview of how to use mteb
🤖 Defining Models	How to use existing model and define custom ones
📋 Selecting tasks	How to select tasks, benchmarks, splits etc.
🏭 Running Evaluation	How to run the evaluations, including cache management, speeding up evaluations etc.
📊 Loading Results	How to load and work with existing model results
Overview.
📋 Tasks	Overview of available tasks
📐 Benchmarks	Overview of available benchmarks
🤖 Models	Overview of available Models
Contributing
🤖 Adding a model	How to submit a model to MTEB and to the leaderboard
👩‍💻 Adding a dataset	How to add a new task/dataset to MTEB
👩‍💻 Adding a benchmark	How to add a new benchmark to MTEB and to the leaderboard
🤝 Contributing	How to contribute to MTEB and set it up for development