Spaces:

OpenHands
/

openhands-index

Running

openhands commited on 19 days ago

Commit

87e9f6b

1 Parent(s): 671ebc9

Add comprehensive error handling and timeout protection

- Wrap setup_mock_data() in try-except in app.py to prevent crashes
- Add 60-second timeout to data setup using signal.alarm()
- Add verbose logging to stderr for debugging HF Space issues
- Change logging level to INFO for better visibility
- Print clear status messages at each step
- Ensures app continues even if data setup fails

Files changed (3) hide show

DATA_STRUCTURE.md +137 -0
app.py +17 -3
setup_data.py +30 -2

DATA_STRUCTURE.md ADDED Viewed

	@@ -0,0 +1,137 @@

+# OpenHands Index Data Structure
+This document describes the expected data structure for the `openhands-index-results` GitHub repository.
+## Repository Structure
+The data should be organized in the following structure:
+```
+openhands-index-results/
+├── 1.0.0-dev1/              # Version directory (matches CONFIG_NAME in config.py)
+│   ├── test.jsonl            # Test split results
+│   ├── validation.jsonl      # Validation split results
+│   ├── swe-bench.jsonl       # Individual benchmark results
+│   ├── multi-swe-bench.jsonl
+│   ├── swe-bench-multimodal.jsonl
+│   ├── swt-bench.jsonl
+│   ├── commit0.jsonl
+│   ├── gaia.jsonl
+│   └── agenteval.json        # Configuration file
+```
+## File Formats
+### JSONL Files (test.jsonl, validation.jsonl, etc.)
+Each line in a JSONL file should be a JSON object representing one agent's results:
+```json
+{
+  "Agent_Name": "OpenHands CodeAct v2.1",
+  "Llm_Base": "claude-3-5-sonnet-20241022",
+  "Openness": "closed_api_available",
+  "Tool_Usage": "standard",
+  "Score": 48.3,
+  "Metric": "resolve_rate",
+  "Submission_Time": "2025-11-24T19:56:00.092865",
+  "Tags": ["swe-bench"],
+  "Total_Cost": 34.15,
+  "Total_Runtime": 541.5
+}
+```
+### Configuration File (agenteval.json)
+The configuration file defines the benchmark structure:
+```json
+{
+  "suite_config": {
+    "name": "openhands-index",
+    "version": "1.0.0-dev1",
+    "splits": [
+      {
+        "name": "test",
+        "tasks": [
+          {
+            "name": "swe-bench",
+            "tags": ["swe-bench"]
+          },
+          {
+            "name": "multi-swe-bench",
+            "tags": ["multi-swe-bench"]
+          },
+          {
+            "name": "swe-bench-multimodal",
+            "tags": ["swe-bench-multimodal"]
+          },
+          {
+            "name": "swt-bench",
+            "tags": ["swt-bench"]
+          },
+          {
+            "name": "commit0",
+            "tags": ["commit0"]
+          },
+          {
+            "name": "gaia",
+            "tags": ["gaia"]
+          }
+        ]
+      },
+      {
+        "name": "validation",
+        "tasks": [
+          {
+            "name": "swe-bench",
+            "tags": ["swe-bench"]
+          },
+          {
+            "name": "multi-swe-bench",
+            "tags": ["multi-swe-bench"]
+          },
+          {
+            "name": "swe-bench-multimodal",
+            "tags": ["swe-bench-multimodal"]
+          },
+          {
+            "name": "swt-bench",
+            "tags": ["swt-bench"]
+          },
+          {
+            "name": "commit0",
+            "tags": ["commit0"]
+          },
+          {
+            "name": "gaia",
+            "tags": ["gaia"]
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+## Data Loading Process
+1. **GitHub Repository Check**: The app first attempts to clone the `openhands-index-results` repository
+2. **Version Directory**: Looks for a directory matching `CONFIG_NAME` (currently "1.0.0-dev1")
+3. **Fallback to Mock Data**: If GitHub data is unavailable, falls back to local mock data in `mock_results/`
+4. **Data Extraction**: Copies data to `/tmp/oh_index/data/{version}/extracted/{version}/`
+## Updating Data
+To update the leaderboard data:
+1. Push new JSONL files to the `openhands-index-results` repository
+2. Ensure the version directory matches `CONFIG_NAME` in `config.py`
+3. The app will automatically fetch the latest data on restart
+## Mock Data
+Mock data is stored in `mock_results/1.0.0-dev1/` and is used:
+- During development and testing
+- When the GitHub repository is unavailable
+- As a template for the expected data format

app.py CHANGED Viewed

@@ -1,11 +1,25 @@
 # app.py
 import logging
-logging.basicConfig(level=logging.WARNING)
 # Setup mock data before anything else
-from setup_data import setup_mock_data
-setup_mock_data()
 import gradio as gr
 import urllib.parse

 # app.py
 import logging
+import sys
+logging.basicConfig(level=logging.INFO)  # Changed to INFO for better debugging
+print("=" * 80, file=sys.stderr)
+print("STARTING APP.PY", file=sys.stderr)
+print("=" * 80, file=sys.stderr)
 # Setup mock data before anything else
+try:
+    print("Importing setup_data module...", file=sys.stderr)
+    from setup_data import setup_mock_data
+    print("Calling setup_mock_data()...", file=sys.stderr)
+    setup_mock_data()
+    print("✓ Data setup completed successfully", file=sys.stderr)
+except Exception as e:
+    print(f"!!! ERROR during data setup: {e}", file=sys.stderr)
+    import traceback
+    traceback.print_exc()
+    print("Continuing with app startup despite error...", file=sys.stderr)
 import gradio as gr
 import urllib.parse

setup_data.py CHANGED Viewed

@@ -5,11 +5,18 @@ This runs before the app starts to ensure data is available.
 import os
 import shutil
 import subprocess
 from pathlib import Path
 from config import DATA_DIR, EXTRACTED_DATA_DIR, CONFIG_NAME
 GITHUB_REPO = "https://github.com/OpenHands/openhands-index-results.git"
 def fetch_data_from_github():
     """
     Fetch data from the openhands-index-results GitHub repository.
@@ -119,9 +126,9 @@ def copy_mock_data():
     print(f"Target directory: {target_dir.absolute()}")
     return True
-def setup_mock_data():
     """
-    Setup data for the leaderboard.
     First tries to fetch from GitHub, falls back to mock data if unavailable.
     """
     print("=" * 60)
@@ -153,5 +160,26 @@ def setup_mock_data():
     print("ERROR: No data available! Neither GitHub nor mock data could be loaded.")
     print("!" * 60)
 if __name__ == "__main__":
     setup_mock_data()

 import os
 import shutil
 import subprocess
+import signal
 from pathlib import Path
 from config import DATA_DIR, EXTRACTED_DATA_DIR, CONFIG_NAME
 GITHUB_REPO = "https://github.com/OpenHands/openhands-index-results.git"
+class TimeoutError(Exception):
+    pass
+def timeout_handler(signum, frame):
+    raise TimeoutError("Operation timed out")
 def fetch_data_from_github():
     """
     Fetch data from the openhands-index-results GitHub repository.
     print(f"Target directory: {target_dir.absolute()}")
     return True
+def _setup_mock_data_impl():
     """
+    Internal implementation of setup data for the leaderboard.
     First tries to fetch from GitHub, falls back to mock data if unavailable.
     """
     print("=" * 60)
     print("ERROR: No data available! Neither GitHub nor mock data could be loaded.")
     print("!" * 60)
+def setup_mock_data():
+    """
+    Wrapper for setup_mock_data with timeout protection.
+    """
+    try:
+        # Set a timeout of 60 seconds for the entire setup
+        signal.signal(signal.SIGALRM, timeout_handler)
+        signal.alarm(60)
+        _setup_mock_data_impl()
+        # Cancel the alarm
+        signal.alarm(0)
+    except TimeoutError:
+        print("!!! TIMEOUT: Data setup took too long (>60s), skipping...")
+        signal.alarm(0)  # Cancel the alarm
+    except Exception as e:
+        print(f"!!! ERROR during setup: {e}")
+        signal.alarm(0)  # Cancel the alarm
+        raise
 if __name__ == "__main__":
     setup_mock_data()