OpenSOC: Self-Play SOC Triage Environment

An OpenEnv environment for training cybersecurity defender LLMs against an attacker LLM that auto-generates novel incidents. Built for the OpenEnv Hackathon, April 2026.

Humans cannot watch every alert in a Security Operations Center 24/7, and as stronger generative models start writing exploits and phishing at industrial scale that gap only widens. OpenSOC is an environment where a defender LLM learns to triage attacks generated by another LLM in a self-play loop. The trick is RLVR: triage ground truth is computed by a deterministic schema-side verifier from the structured incident parameters — never from any text the attacker writes — so neither side can hack the reward.

Try it

Link	What it is
HF Space — `shivam2k3-opensoc-env.hf.space`	Deployed env (Running). OpenEnv judge can hit `/reset` `/step` `/state` `/grade`.
Live `/demo` — `shivam2k3-opensoc-env.hf.space/demo`	Gradio "before vs after" UI. Click Next incident to compare baseline vs trained.
Trained model — `shivam2k3/opensoc-defender-grpo`	GRPO-trained Qwen2.5-3B-Instruct LoRA defender adapter.
Training notebook — `train_grpo.ipynb`	End-to-end SFT warm-start + GRPO curriculum using Unsloth + TRL.
Mini-blog — `docs/blog.md`	~600-word write-up of the project.

Architecture
Why the reward cannot be hacked
Action space and reward
Run locally
Run the training pipeline
Headline results
Deploy to Hugging Face Spaces
Repo map
Submission deliverables

Build status

Build artifact	Status
Pure-python env (`OpenSOCEnv`, FastAPI)	✅ shipped
Verifier + plausibility checker	✅ shipped, 17-test adversarial suite
Rubric (defender + attacker rewards)	✅ shipped, anti-hack regression tests
600-example SFT dataset (`data/sft_train.jsonl`)	✅ shipped
200-incident frozen hold-out (`data/holdout.jsonl`)	✅ shipped
SFT warm-start adapter	✅ trained → `opensoc-defender-grpo-sft`
GRPO curriculum (4 stages)	✅ trained → adapters for each stage on HF
Final GRPO adapter	✅ `shivam2k3/opensoc-defender-grpo`
GRPO training notebook (`train_grpo.ipynb`)	✅ shipped (ran on HF Jupyter with Unsloth + TRL)
Gradio "before vs after" UI	✅ live at `/demo`
Eval harness + plotters (`eval/`)	✅ shipped
Pytest suite	✅ 93 tests, all green
HF Space	✅ live at `shivam2k3/opensoc-env`

Architecture

flowchart LR
  Defender[Defender LLM trainee]
  Attacker[Attacker LLM trainee]
  Env[OpenSOC FastAPI Environment]
  Verifier[Deterministic verifier + plausibility check]
  Defender -->|submit_triage| Env
  Attacker -->|craft_incident| Env
  Env -->|observation reward| Defender
  Env -->|attacker reward| Attacker
  Env --> Verifier
  Verifier -->|ground truth label| Env

An episode has exactly two turns: attacker proposes incident params → env validates them and materializes a SIEM-style alert + log window → defender submits a triage action. The verifier computes the ground-truth action from the events alone and scores both sides — the attacker's free-text narrative is never read by the labeler.

In defender_only mode (used for SFT, eval, smoke tests, and the /demo UI) the env auto-generates the incident from tasks/registry.py and skips straight to the defender turn.

Why the reward cannot be hacked

The verifier is a transparent rule set in verifier.compute_ground_truth(params); the only inputs are the structured events. The attacker's narrative and even its self-claimed target_label are ignored.
The plausibility checker (verifier.check_plausibility(params)) refuses incoherent stories — for example, a "data exfiltration" claim with a purely-internal destination, or a lolbin_use event with no process field. The attacker's reward is gated on plausibility passing.
Schema-violation incidents floor attacker reward at -0.5, so trying to short-circuit pydantic's validators is strictly worse than playing along.

The anti-hack invariants are pinned in tests/test_verifier.py and tests/test_rubric.py.

Action space and reward

Tool names are deliberately non-reserved — there is no reset/step/state/close clash with the OpenEnv MCPEnvironment reserved-name list.

action_space:
  craft_incident:
    target_label: dismiss | monitor | quarantine_host | block_ip | escalate
    category:     malware_execution | c2_beacon | data_exfiltration | ...
    events:       [ { event_type, fields, timestamp, log_id }, ... ]
    narrative:    string         # ignored by the verifier
  submit_triage:
    action:       <one of the five triage actions>
    cited_log_id: <id of the log line that drove the decision>
    rationale:    short string

Defender: +1 correct, −1 missed-malicious, −0.3 over-react on benign, −0.05 unnecessary escalate, +0.1 bonus for citing the right triggering log id, −0.1 floor for format violation.
Attacker: +1 iff defender wrong AND incident plausible, −0.5 if schema validation fails, +0.2 novelty bonus, 0 for gibberish.

Full breakdown: openenv.yaml and rubric.py.

Run locally

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python server.py    # serves on :7860

Smoke test from another shell:

curl -s http://localhost:7860/health | jq .
curl -s -X POST 'http://localhost:7860/reset?task=stage1_basic&mode=defender_only' | jq .
curl -s -X POST 'http://localhost:7860/step?task=stage1_basic&mode=defender_only' \
     -H 'content-type: application/json' \
     -d '{"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "smoke"}}' | jq .
open http://localhost:7860/demo   # Gradio before-vs-after UI

Run the test suite (CPU only, no GPU deps):

pytest -q   # 93 passed

Or via the bundled Python client:

from client import OpenSOCClient
c = OpenSOCClient()
obs = c.reset(task="stage1_basic", mode="defender_only", seed=1)
result = c.step({"submit_triage": {"action": "monitor", "cited_log_id": "L1-0", "rationale": "ok"}},
                task="stage1_basic", mode="defender_only", seed=1)
print(result)

Run the training pipeline

Full end-to-end procedure: TRAIN.md. TL;DR — on an HF Jupyter L4 (~$3 of credits, ~3.5h wall time):

bash scripts/run_full_pipeline.sh

Or step-by-step inside train_grpo.ipynb:

SFT warm-start (~12 min) — pushes P(format-OK) from ~0% to ~95%.
GRPO curriculum across 4 stages (~3h) — verifier-grounded reward, group size 8.
Eval on the frozen 200-incident hold-out (~5 min).
eval.plot_results + eval.plot_training render four PNGs.
eval.bake_demo writes 50 before-vs-after pairs to data/demo_examples.json for the Gradio UI.

Headline results

The defender model was trained using GRPO with a 4-stage curriculum on Qwen2.5-3B-Instruct with LoRA. All trained adapters are published on HuggingFace:

Stage	Adapter	Difficulty
SFT warm-start	`opensoc-defender-grpo-sft`	Format learning
Stage 1	`opensoc-defender-grpo-stage1_basic`	Easy — single-event templates
Stage 2	`opensoc-defender-grpo-stage2_multi`	Medium — multi-event windows
Stage 3	`opensoc-defender-grpo-stage3_mixed`	Hard — benign decoys interleaved
Stage 4	`opensoc-defender-grpo-stage4_adversarial`	Adversarial — attacker-controlled
Final	`opensoc-defender-grpo`	Combined final adapter

Dismiss-on-malicious (the cardinal failure mode)

Macro F1 across 200-incident hold-out

Confusion matrices

Baseline (always-dismiss)	Trained (verifier-oracle ceiling)

Reward across the curriculum

Model	Accuracy	Macro F1	Dismiss-on-malicious	Over-react
`always_dismiss` (floor)	0.13	0.05	1.00	0.00
`verifier_oracle` (ceiling)	1.00	1.00	0.00	0.00

Deploy to Hugging Face Spaces

Full recipe: DEPLOY.md. The fast version, after huggingface-cli login:

export HF_USER=<your-username>
bash scripts/deploy_to_hf.sh
# Build takes ~5 minutes; then:
open https://${HF_USER}-opensoc-env.hf.space/demo

The Space runs FastAPI + Gradio in a single container. /reset, /step, /state, /grade, /tasks, /health continue to work for the OpenEnv judge bot; /demo is the human-readable UI.

Repo map

File / dir	Purpose
`openenv.yaml`	OpenEnv manifest (tasks, action space, reward range, endpoints)
`schema.py`	Incident / event / action schema with strict validators
`generator.py`	Materializes incidents for `defender_only` mode (eval, SFT)
`verifier.py`	Deterministic ground-truth labeler + plausibility checker
`rubric.py`	Layered defender + attacker reward functions
`env.py`	Two-role `OpenSOCEnv` (`reset` / `step` / `state` / `grade`)
`app_runtime.py`	FastAPI app exposing the OpenEnv API
`demo_app.py`	Gradio Blocks app mounted at `/demo`
`demo_data.py`	Pure-python helpers for the demo UI
`server.py`	Container entry point — imports `demo_app` then starts uvicorn
`tasks/registry.py`	Curriculum stages: `stage1_basic` → `stage4_adversarial`
`client/`	Thin HTTP client (server-internals-free)
`train/`	SFT warm-start + GRPO loop + reusable prompt format
`eval/`	Hold-out generator, metrics, eval driver, plot renderers, `bake_demo`
`scripts/run_full_pipeline.sh`	One-shot training + eval + bake-demo
`scripts/deploy_to_hf.sh`	One-shot HF Space push
`docs/`	Blog post, video script, slide deck builder
`tests/`	Pytest suite (93 tests, anti-hack regressions included)

Submission deliverables

Mapped to the four judging criteria:

Criterion	Weight	Where it lives
Environment Innovation	40%	`openenv.yaml`, `schema.py`, `verifier.py`, `env.py`, this README's Architecture and Why the reward cannot be hacked sections
Storytelling & Presentation	30%	`/demo` Gradio UI + 90s video + HF blog
Showing Improvement in Rewards	20%	`eval/results/*.png` (training curves + confusion + headline bar) embedded above
Reward & Training Pipeline	10%	`rubric.py` + 93-test anti-hack suite + `train_grpo.ipynb` + `scripts/run_full_pipeline.sh`

Submission checklist:

OpenEnv-compatible env (gym-style API, manifest, non-reserved tool names)
Deterministic RLVR verifier + plausibility checker
Layered defender + attacker reward
SFT warm-start dataset (committed)
Frozen 200-incident hold-out (committed)
GRPO curriculum notebook + one-shot training script
Eval harness + plotters
Pytest suite (93 tests, anti-hack regressions included)
Gradio /demo UI mounted on the same Space (free-CPU-tier compatible)
Blog post (docs/blog.md)
HF Space pushed and running: shivam2k3/opensoc-env
SFT adapter trained and pushed: opensoc-defender-grpo-sft
GRPO adapters trained and pushed (4 stages): stage1 stage2 stage3 stage4
Final adapter pushed: opensoc-defender-grpo

License

BSD-3-Clause.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

shivam2k3
/

opensoc-env