MEMS Adaptive Wafer Inspection — GRU-Augmented DQN Infrastructure

Core RL components for a resolution-agnostic, recurrent policy solving a budget-constrained spatial inspection task.

The trained policy passes 14 of 15 scenarios on an adversarial validation suite spanning sensor noise, calibration drift, fabrication-process variation, and inspection-budget reduction, at three escalating severity tiers (production, stress, extreme). Catch rate remains in the 0.96–0.97 range across all non-budget perturbations — including sensor noise up to 5% and a doubled defect rate — indicating genuine robustness to sensing and process variation outside the training distribution. Under inspection-budget reduction, catch rate degrades approximately linearly with available budget (0.97 at full budget, down to 0.68 at 70% budget); full results and discussion are in the accompanying paper.

The companion paper describes the system design, evaluation methodology, full per-scenario results, and a discussion of the budget-constrained degradation finding, including a proposed architectural extension (shape-aware belief propagation) to address it.

What is included

gru_belief_policy_v3_agnostic.py — Resolution-agnostic GRU-augmented features extractor

Native-resolution convolutions + AdaptiveAvgPool2d(8, 8) → fixed 4096-dim spatial features regardless of input grid size (64×64 to 512×512+)
External GRU belief accumulator
Produces compact checkpoints in the tens-of-megabytes range, independent of training duration

gru_env_wrappers.py — Safe GRU state management wrappers

GRUStateManager: external hidden-state handling, truncated BPTT (detach every 32 steps), clean episode boundaries with no state leakage
GRUBeliefLiteWrapper: per-episode domain randomization (budget, sensor noise, calibration drift, prior belief)
CurriculumGRUToggleWrapper: tracks curriculum phase for logging (GRU itself remains statically configured at model construction time)

masked_dqn_policy.py — Masked DQN extensions for Stable-Baselines3

Correct action masking during both exploitation (arg-max) and ε-greedy exploration
Full support for 2D spatial masks with automatic flattening and padding for the terminate action

mems_adaptive_inspection_env_curriculum_v5_SOFT_RESET_STABLE.py — Resolution-agnostic inspection environment

GPU-accelerated belief updates, observation construction, and mask generation
Soft-reset mode for fast episode-to-episode reuse of GPU buffers
Early-termination action with proportional miss penalty, allowing the agent to learn a rational stopping rule
Economic randomization (budget/cost variation across episodes) for robustness training

fair_adversarial_validation_framework.py — Adversarial test suite

15 scenarios across five categories (baseline, sensor robustness, distribution robustness, efficiency, economic) at three severity tiers
Per-scenario calibrated pass thresholds and full statistics (mean, std, min, max) across independent episodes

run_fair_adversarial_validation.py — Validation runner

Loads a trained checkpoint and runs it against the full adversarial suite
Produces a results JSON and summary plots (individual results, performance by category, performance by difficulty tier)

gru_phase2_train.py — Training script

Masked DQN with GRU-augmented MultiInputPolicy
Economic randomization enabled during training
Checkpointing with automatic retention of the most recent checkpoints

Validation results

Full results (15 scenarios, corrected adversarial validation suite):

Category	Scenarios	Catch rate range
Baseline (sanity check)	1	0.972
Sensor robustness (noise, drift, quantization, local degradation)	6	0.963–0.973
Distribution robustness (fab variation, defect-rate shifts)	3	0.960–0.969
Economic (cost pressure)	1	0.964
Efficiency (budget reduction)	3	0.677–0.865
Combined (compound stress)	1	0.769

Overall: 14/15 pass, mean catch rate 0.915 across all scenarios.

Catch rate is consistently in the 0.96–0.97 range across every non-budget perturbation, including the most severe sensor-noise and defect-rate-shift scenarios tested — statistically indistinguishable from the unperturbed baseline. Under inspection-budget reduction, catch rate degrades approximately linearly with available budget; the lone failing scenario (90% budget, catch rate 0.865 against a 0.88 threshold) falls within this same continuous trend rather than indicating an isolated failure mode. See the companion paper for full discussion, including why this finding does not indicate unpredictable behavior under budget pressure, and a proposed architectural extension targeting it directly.

Dependencies

torch >= 2.0
stable-baselines3 >= 2.7
gymnasium >= 0.29
numpy

GPU (CUDA) recommended for training and validation; CPU fallback available in the environment and wrapper implementations.

Notes

This repository accompanies the related paper. Results reported here and in the paper are obtained on wafers generated on the fly by a synthetic wafer defect generator (process type, defect rate, and spatial clustering configurable); see the paper for a discussion of how this relates to real fabrication-line deployment.

The current environment implementation is optimized for a single fast GPU-resident instance via soft reset rather than for multi-process parallelism.

Downloads last month: 31

Video Preview

Reinforcement Learning