metadata
title: CP-Bench Leaderboard
emoji: ππ
colorFrom: green
colorTo: indigo
sdk: docker
pinned: true
license: apache-2.0
π CP-Bench Leaderboard
This repository contains the leaderboard of the CP-Bench dataset.
π Structure
app.pyβ Launches the Gradio interface.src/β Contains the main logic for fetching and displaying leaderboard data.'config.pyβ Configuration for the leaderboard.eval.pyβ Evaluation logic for model submissions.hf_utils.pyβ Utilities file.ui.pyβ UI components for displaying the leaderboard.user_eval.pyβ The logic for the evaluation of submitted models, it can also be used to evaluate models locally.
README.mdβ (you are here)
π§ How It Works
- Users submit a .jsonl file with their generated models
- The submission is uploaded to a storage repository (Hugging Face Hub).
- An evaluation script is triggered, which:
- Loads the submission.
- Evaluates the models against the benchmark dataset.
- Computes metrics.
- The results are stored and displayed on the leaderboard.
π οΈ Development
To run locally:
pip install -r requirements.txt
python app.py
If you wish to contribute or modify the leaderboard, feel free to open discussions or pull requests.
For adding more modelling frameworks, please modify the src/user_eval.py file to include the execution code for the new framework.