File size: 1,562 Bytes
36ea38b
f434b15
 
36ea38b
 
34e8992
 
 
a8c648b
36ea38b
 
 
 
445cf2b
2990bc2
bea3aa3
2990bc2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b1123e
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: CP-Bench Leaderboard
emoji: πŸš€πŸ“‘
colorFrom: green
colorTo: indigo
sdk: docker
#sdk_version: 5.30.0
#python_version: 3.12
#app_file: app.py
pinned: true
license: apache-2.0
---

# πŸš€ CP-Bench Leaderboard

This repository contains the leaderboard of the [CP-Bench](https://huggingface.co/datasets/kostis-init/CP-Bench) dataset.

## πŸ“ Structure

- `app.py` β€” Launches the Gradio interface.
- `src/` β€” Contains the main logic for fetching and displaying leaderboard data.'
  - `config.py` β€” Configuration for the leaderboard.
  - `eval.py` β€” Evaluation logic for model submissions.
  - `hf_utils.py` β€” Utilities file.
  - `ui.py` β€” UI components for displaying the leaderboard.
  - `user_eval.py` β€” The logic for the evaluation of submitted models, it can also be used to evaluate models locally.
- `README.md` β€” (you are here)

## 🧠 How It Works

1. Users submit a .jsonl file with their generated models
2. The submission is uploaded to a storage repository (Hugging Face Hub).
3. An evaluation script is triggered, which:
   - Loads the submission.
   - Evaluates the models against the benchmark dataset.
   - Computes metrics.
4. The results are stored and displayed on the leaderboard.

## πŸ› οΈ Development

To run locally:

```bash
pip install -r requirements.txt
python app.py
```

If you wish to contribute or modify the leaderboard, feel free to open discussions or pull requests.
For adding more modelling frameworks, please modify the `src/user_eval.py` file to include the execution code for the new framework.