OpenSOC: Self-Play SOC Triage Environment

An OpenEnv environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026.

Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. OpenSOC is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is RLVR: triage ground truth is computed by a deterministic schema-side verifier from the structured incident parameters β€” never from any text the attacker writes β€” so neither side can hack the reward.

Try it

Link What it is
HF Space β€” shivam2k3-opensoc-env.hf.space Deployed env (Running). OpenEnv judge can hit /reset /step /state /grade.
Live /demo β€” shivam2k3-opensoc-env.hf.space/demo Gradio "before vs after" UI. Click Next incident to compare baseline vs trained.
Trained model β€” shivam2k3/opensoc-defender-grpo GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter.
Training notebook β€” train_grpo.ipynb End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL.
Mini-blog β€” docs/blog.md ~600-word write-up of the project.

Table of contents

  1. Architecture
  2. Why the reward cannot be hacked
  3. Action space and reward
  4. Run locally
  5. Run the training pipeline
  6. Headline results
  7. Deploy to Hugging Face Spaces
  8. Repo map
  9. Submission deliverables

Build status

Build artifact Status
Pure-python env (OpenSOCEnv, FastAPI) βœ… shipped
Verifier + plausibility checker βœ… shipped, 17-test adversarial suite
Rubric (defender + attacker rewards) βœ… shipped, anti-hack regression tests
600-example SFT dataset (data/sft_train.jsonl) βœ… shipped
200-incident frozen hold-out (data/holdout.jsonl) βœ… shipped
SFT warm-start adapter βœ… trained β†’ opensoc-defender-grpo-sft
GRPO curriculum (4 stages) βœ… trained β†’ adapters for each stage on HF
Final GRPO adapter βœ… shivam2k3/opensoc-defender-grpo
GRPO training notebook (train_grpo.ipynb) βœ… shipped (ran on HF Jupyter with Unsloth + TRL)
Gradio "before vs after" UI βœ… live at /demo
Eval harness + plotters (eval/) βœ… shipped
Pytest suite βœ… 93 tests, all green
HF Space βœ… live at shivam2k3/opensoc-env

Architecture

flowchart LR
  Defender[Defender LLM trainee]
  Attacker[Attacker LLM trainee]
  Env[OpenSOC FastAPI Environment]
  Verifier[Deterministic verifier + plausibility check]
  Defender -->|submit_triage| Env
  Attacker -->|craft_incident| Env
  Env -->|observation reward| Defender
  Env -->|attacker reward| Attacker
  Env --> Verifier
  Verifier -->|ground truth label| Env

An episode has exactly two turns: attacker proposes incident params β†’ env validates them and materializes a SIEM-style alert + log window β†’ defender submits a triage action. The verifier computes the ground-truth action from the events alone and scores both sides β€” the attacker's free-text narrative is never read by the labeler.

In defender_only mode (used for SFT, eval, smoke tests, and the /demo UI) the env auto-generates the incident from tasks/registry.py and skips straight to the defender turn.

Why the reward cannot be hacked

  1. The verifier is a transparent rule set in verifier.compute_ground_truth(params); the only inputs are the structured events. The attacker's narrative and even its self-claimed target_label are ignored.
  2. The plausibility checker (verifier.check_plausibility(params)) refuses incoherent stories β€” for example, a "data exfiltration" claim with a purely-internal destination, or a lolbin_use event with no process field. The attacker's reward is gated on plausibility passing.
  3. Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along.

The anti-hack invariants are pinned in tests/test_verifier.py and tests/test_rubric.py.

Action space and reward

Tool names are deliberately non-reserved β€” there is no reset/step/state/close clash with the OpenEnv MCPEnvironment reserved-name list.

action_space:
  craft_incident:
    target_label: dismiss | monitor | quarantine_host | block_ip | escalate
    category:     malware_execution | c2_beacon | data_exfiltration | ...
    events:       [ { event_type, fields, timestamp, log_id }, ... ]
    narrative:    string         # ignored by the verifier
  submit_triage:
    action:       <one of the five triage actions>
    cited_log_id: <id of the log line that drove the decision>
    rationale:    short string
  • Defender: +1 correct, βˆ’1 missed-malicious, βˆ’0.3 over-react on benign, βˆ’0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, βˆ’0.1 floor for format violation.
  • Attacker: +1 iff defender wrong AND incident plausible, βˆ’0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish.

Full breakdown: openenv.yaml and rubric.py.

Run locally

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python server.py    # serves on :7860

Smoke test from another shell:

curl -s http://localhost:7860/health | jq .
curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq .
curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \
     -H 'content-type: application/json' \
     -d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq .
open http://localhost:7860/demo   # Gradio before-vs-after UI

Run the test suite (CPU only, no GPU deps):

pytest -q   # 93 passed

Or via the bundled Python client:

from client import OpenSOCClient
c = OpenSOCClient()
obs = c.reset(task="stage1_basic", mode="defender_only", seed=1)
result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}},
                task="stage1_basic", mode="defender_only", seed=1)
print(result)

Run the training pipeline

Full end-to-end procedure: TRAIN.md. TL;DR β€” on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time):

bash scripts/run_full_pipeline.sh

Or step-by-step inside train_grpo.ipynb:

  1. SFT warm-start (~12 min) β€” pushes P(format-OK) from ~0% to ~95%.
  2. GRPO curriculum across 4 stages (~3h) β€” verifier-grounded reward, group size 8.
  3. Eval on the frozen 200-incident hold-out (~5 min).
  4. eval.plot_results + eval.plot_training render four PNGs.
  5. eval.bake_demo writes 50 before-vs-after pairs to data/demo_examples.json for the Gradio UI.

Headline results

The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace:

Stage Adapter Difficulty
SFT warm-start opensoc-defender-grpo-sft Format learning
Stage 1 opensoc-defender-grpo-stage1_basic Easy β€” single-event templates
Stage 2 opensoc-defender-grpo-stage2_multi Medium β€” multi-event windows
Stage 3 opensoc-defender-grpo-stage3_mixed Hard β€” benign decoys interleaved
Stage 4 opensoc-defender-grpo-stage4_adversarial Adversarial β€” attacker-controlled
Final opensoc-defender-grpo Combined final adapter

Dismiss-on-malicious (the cardinal failure mode)

dismiss-on-malicious by model

Macro F1 across 200-incident hold-out

macro F1 by model

Confusion matrices

Baseline (always-dismiss) Trained (verifier-oracle ceiling)
baseline confusion trained confusion

Reward across the curriculum

training reward curves

Model Accuracy Macro F1 Dismiss-on-malicious Over-react
always_dismiss (floor) 0.13 0.05 1.00 0.00
verifier_oracle (ceiling) 1.00 1.00 0.00 0.00

Deploy to Hugging Face Spaces

Full recipe: DEPLOY.md. The fast version, after huggingface-cli login:

export HF_USER=<your-username>
bash scripts/deploy_to_hf.sh
# Build takes ~5 minutes; then:
open https://${HF_USER}-opensoc-env.hf.space/demo

The Space runs FastAPI + Gradio in a single container. /reset, /step, /state, /grade, /tasks, /health continue to work for the OpenEnv judge bot; /demo is the human-readable UI.

Repo map

File / dir Purpose
openenv.yaml OpenEnv manifest (tasks, action space, reward range, endpoints)
schema.py Incident / event / action schema with strict validators
generator.py Materializes incidents for defender_only mode (eval, SFT)
verifier.py Deterministic ground-truth labeler + plausibility checker
rubric.py Layered defender + attacker reward functions
env.py Two-role OpenSOCEnv (reset / step / state / grade)
app_runtime.py FastAPI app exposing the OpenEnv API
demo_app.py Gradio Blocks app mounted at /demo
demo_data.py Pure-python helpers for the demo UI
server.py Container entry point β€” imports demo_app then starts uvicorn
tasks/registry.py Curriculum stages: stage1_basic β†’ stage4_adversarial
client/ Thin HTTP client (server-internals-free)
train/ SFT warm-start + GRPO loop + reusable prompt format
eval/ Hold-out generator, metrics, eval driver, plot renderers, bake_demo
scripts/run_full_pipeline.sh One-shot training + eval + bake-demo
scripts/deploy_to_hf.sh One-shot HF Space push
docs/ Blog post, video script, slide deck builder
tests/ Pytest suite (93 tests, anti-hack regressions included)

Submission deliverables

Mapped to the four judging criteria:

Criterion Weight Where it lives
Environment Innovation 40% openenv.yaml, schema.py, verifier.py, env.py, this README's Architecture and Why the reward cannot be hacked sections
Storytelling & Presentation 30% /demo Gradio UI + 90s video + HF blog
Showing Improvement in Rewards 20% eval/results/*.png (training curves + confusion + headline bar) embedded above
Reward & Training Pipeline 10% rubric.py + 93-test anti-hack suite + train_grpo.ipynb + scripts/run_full_pipeline.sh

Submission checklist:

  • OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names)
  • Deterministic RLVR verifier + plausibility checker
  • Layered defender + attacker reward
  • SFT warm-start dataset (committed)
  • Frozen 200-incident hold-out (committed)
  • GRPO curriculum notebook + one-shot training script
  • Eval harness + plotters
  • Pytest suite (93 tests, anti-hack regressions included)
  • Gradio /demo UI mounted on the same Space (free-CPU-tier compatible)
  • Blog post (docs/blog.md)
  • HF Space pushed and running: shivam2k3/opensoc-env
  • SFT adapter trained and pushed: opensoc-defender-grpo-sft
  • GRPO adapters trained and pushed (4 stages): stage1 stage2 stage3 stage4
  • Final adapter pushed: opensoc-defender-grpo

License

BSD-3-Clause.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support