openhands commited on
Commit
87e9f6b
·
1 Parent(s): 671ebc9

Add comprehensive error handling and timeout protection

Browse files

- Wrap setup_mock_data() in try-except in app.py to prevent crashes
- Add 60-second timeout to data setup using signal.alarm()
- Add verbose logging to stderr for debugging HF Space issues
- Change logging level to INFO for better visibility
- Print clear status messages at each step
- Ensures app continues even if data setup fails

Files changed (3) hide show
  1. DATA_STRUCTURE.md +137 -0
  2. app.py +17 -3
  3. setup_data.py +30 -2
DATA_STRUCTURE.md ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OpenHands Index Data Structure
2
+
3
+ This document describes the expected data structure for the `openhands-index-results` GitHub repository.
4
+
5
+ ## Repository Structure
6
+
7
+ The data should be organized in the following structure:
8
+
9
+ ```
10
+ openhands-index-results/
11
+ ├── 1.0.0-dev1/ # Version directory (matches CONFIG_NAME in config.py)
12
+ │ ├── test.jsonl # Test split results
13
+ │ ├── validation.jsonl # Validation split results
14
+ │ ├── swe-bench.jsonl # Individual benchmark results
15
+ │ ├── multi-swe-bench.jsonl
16
+ │ ├── swe-bench-multimodal.jsonl
17
+ │ ├── swt-bench.jsonl
18
+ │ ├── commit0.jsonl
19
+ │ ├── gaia.jsonl
20
+ │ └── agenteval.json # Configuration file
21
+ ```
22
+
23
+ ## File Formats
24
+
25
+ ### JSONL Files (test.jsonl, validation.jsonl, etc.)
26
+
27
+ Each line in a JSONL file should be a JSON object representing one agent's results:
28
+
29
+ ```json
30
+ {
31
+ "Agent_Name": "OpenHands CodeAct v2.1",
32
+ "Llm_Base": "claude-3-5-sonnet-20241022",
33
+ "Openness": "closed_api_available",
34
+ "Tool_Usage": "standard",
35
+ "Score": 48.3,
36
+ "Metric": "resolve_rate",
37
+ "Submission_Time": "2025-11-24T19:56:00.092865",
38
+ "Tags": ["swe-bench"],
39
+ "Total_Cost": 34.15,
40
+ "Total_Runtime": 541.5
41
+ }
42
+ ```
43
+
44
+ ### Configuration File (agenteval.json)
45
+
46
+ The configuration file defines the benchmark structure:
47
+
48
+ ```json
49
+ {
50
+ "suite_config": {
51
+ "name": "openhands-index",
52
+ "version": "1.0.0-dev1",
53
+ "splits": [
54
+ {
55
+ "name": "test",
56
+ "tasks": [
57
+ {
58
+ "name": "swe-bench",
59
+ "tags": ["swe-bench"]
60
+ },
61
+ {
62
+ "name": "multi-swe-bench",
63
+ "tags": ["multi-swe-bench"]
64
+ },
65
+ {
66
+ "name": "swe-bench-multimodal",
67
+ "tags": ["swe-bench-multimodal"]
68
+ },
69
+ {
70
+ "name": "swt-bench",
71
+ "tags": ["swt-bench"]
72
+ },
73
+ {
74
+ "name": "commit0",
75
+ "tags": ["commit0"]
76
+ },
77
+ {
78
+ "name": "gaia",
79
+ "tags": ["gaia"]
80
+ }
81
+ ]
82
+ },
83
+ {
84
+ "name": "validation",
85
+ "tasks": [
86
+ {
87
+ "name": "swe-bench",
88
+ "tags": ["swe-bench"]
89
+ },
90
+ {
91
+ "name": "multi-swe-bench",
92
+ "tags": ["multi-swe-bench"]
93
+ },
94
+ {
95
+ "name": "swe-bench-multimodal",
96
+ "tags": ["swe-bench-multimodal"]
97
+ },
98
+ {
99
+ "name": "swt-bench",
100
+ "tags": ["swt-bench"]
101
+ },
102
+ {
103
+ "name": "commit0",
104
+ "tags": ["commit0"]
105
+ },
106
+ {
107
+ "name": "gaia",
108
+ "tags": ["gaia"]
109
+ }
110
+ ]
111
+ }
112
+ ]
113
+ }
114
+ }
115
+ ```
116
+
117
+ ## Data Loading Process
118
+
119
+ 1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
120
+ 2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
121
+ 3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
122
+ 4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`
123
+
124
+ ## Updating Data
125
+
126
+ To update the leaderboard data:
127
+
128
+ 1. Push new JSONL files to the `openhands-index-results` repository
129
+ 2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
130
+ 3. The app will automatically fetch the latest data on restart
131
+
132
+ ## Mock Data
133
+
134
+ Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
135
+ - During development and testing
136
+ - When the GitHub repository is unavailable
137
+ - As a template for the expected data format
app.py CHANGED
@@ -1,11 +1,25 @@
1
  # app.py
2
  import logging
 
3
 
4
- logging.basicConfig(level=logging.WARNING)
 
 
 
 
5
 
6
  # Setup mock data before anything else
7
- from setup_data import setup_mock_data
8
- setup_mock_data()
 
 
 
 
 
 
 
 
 
9
 
10
  import gradio as gr
11
  import urllib.parse
 
1
  # app.py
2
  import logging
3
+ import sys
4
 
5
+ logging.basicConfig(level=logging.INFO) # Changed to INFO for better debugging
6
+
7
+ print("=" * 80, file=sys.stderr)
8
+ print("STARTING APP.PY", file=sys.stderr)
9
+ print("=" * 80, file=sys.stderr)
10
 
11
  # Setup mock data before anything else
12
+ try:
13
+ print("Importing setup_data module...", file=sys.stderr)
14
+ from setup_data import setup_mock_data
15
+ print("Calling setup_mock_data()...", file=sys.stderr)
16
+ setup_mock_data()
17
+ print("✓ Data setup completed successfully", file=sys.stderr)
18
+ except Exception as e:
19
+ print(f"!!! ERROR during data setup: {e}", file=sys.stderr)
20
+ import traceback
21
+ traceback.print_exc()
22
+ print("Continuing with app startup despite error...", file=sys.stderr)
23
 
24
  import gradio as gr
25
  import urllib.parse
setup_data.py CHANGED
@@ -5,11 +5,18 @@ This runs before the app starts to ensure data is available.
5
  import os
6
  import shutil
7
  import subprocess
 
8
  from pathlib import Path
9
  from config import DATA_DIR, EXTRACTED_DATA_DIR, CONFIG_NAME
10
 
11
  GITHUB_REPO = "https://github.com/OpenHands/openhands-index-results.git"
12
 
 
 
 
 
 
 
13
  def fetch_data_from_github():
14
  """
15
  Fetch data from the openhands-index-results GitHub repository.
@@ -119,9 +126,9 @@ def copy_mock_data():
119
  print(f"Target directory: {target_dir.absolute()}")
120
  return True
121
 
122
- def setup_mock_data():
123
  """
124
- Setup data for the leaderboard.
125
  First tries to fetch from GitHub, falls back to mock data if unavailable.
126
  """
127
  print("=" * 60)
@@ -153,5 +160,26 @@ def setup_mock_data():
153
  print("ERROR: No data available! Neither GitHub nor mock data could be loaded.")
154
  print("!" * 60)
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
  if __name__ == "__main__":
157
  setup_mock_data()
 
5
  import os
6
  import shutil
7
  import subprocess
8
+ import signal
9
  from pathlib import Path
10
  from config import DATA_DIR, EXTRACTED_DATA_DIR, CONFIG_NAME
11
 
12
  GITHUB_REPO = "https://github.com/OpenHands/openhands-index-results.git"
13
 
14
+ class TimeoutError(Exception):
15
+ pass
16
+
17
+ def timeout_handler(signum, frame):
18
+ raise TimeoutError("Operation timed out")
19
+
20
  def fetch_data_from_github():
21
  """
22
  Fetch data from the openhands-index-results GitHub repository.
 
126
  print(f"Target directory: {target_dir.absolute()}")
127
  return True
128
 
129
+ def _setup_mock_data_impl():
130
  """
131
+ Internal implementation of setup data for the leaderboard.
132
  First tries to fetch from GitHub, falls back to mock data if unavailable.
133
  """
134
  print("=" * 60)
 
160
  print("ERROR: No data available! Neither GitHub nor mock data could be loaded.")
161
  print("!" * 60)
162
 
163
+ def setup_mock_data():
164
+ """
165
+ Wrapper for setup_mock_data with timeout protection.
166
+ """
167
+ try:
168
+ # Set a timeout of 60 seconds for the entire setup
169
+ signal.signal(signal.SIGALRM, timeout_handler)
170
+ signal.alarm(60)
171
+
172
+ _setup_mock_data_impl()
173
+
174
+ # Cancel the alarm
175
+ signal.alarm(0)
176
+ except TimeoutError:
177
+ print("!!! TIMEOUT: Data setup took too long (>60s), skipping...")
178
+ signal.alarm(0) # Cancel the alarm
179
+ except Exception as e:
180
+ print(f"!!! ERROR during setup: {e}")
181
+ signal.alarm(0) # Cancel the alarm
182
+ raise
183
+
184
  if __name__ == "__main__":
185
  setup_mock_data()