FOXES: A Framework For Operational X-ray Emission Synthesis

This repository contains the code and resources for FOXES, a project developed as part of the Frontier Development Lab's Heliolab 2025!

Model / Data:

https://huggingface.co/spaces/griffingoodwin04/FOXES-model

https://huggingface.co/datasets/griffingoodwin04/FOXES-Data

Abstract

The solar soft X-ray (SXR) irradiance is a long-standing proxy of solar activity, used for the classification of flare strength. As a result, the flare class, along with the SXR light curve, are routinely used as the primary input to many forecasting methods, from coronal mass ejection speed to energetic particle output. However, the SXR irradiance lacks spatial information leading to dubious classification during periods of high activity, and is applicable only for observations from Earth orbit, hindering forecasting for other places in the heliosphere. This work introduces the Framework for Operational X-ray Emission Synthesis (FOXES), a Vision Transformer-based approach for translating Extreme Ultraviolet (EUV) spatially-resolved observations into SXR irradiance predictions. The model produces two outputs: (1) a global SXR flux prediction and (2) per-patch flux contributions, which offer a spatially resolved interpretation of where the model attributes SXR emission. This paves the way for EUV-based flare detection to be extended beyond Earth's line of sight, allowing for a more comprehensive and reliable flare catalog to support robust, scalable, and real-time forecasting, extending our monitoring into a true multiviewpoint system.

Team: Griffin Goodwin, Alison March, Jayant Biradar, Christopher Schirninger, Robert Jarolim, Angelos Vourlidas, Viacheslav Sadykov, Lorien Pratt


Repository Structure

FOXES
β”œβ”€β”€ analysis                     # Post-inference analysis scripts
β”‚   β”œβ”€β”€ flux_map_analysis.py     # Detect, track, and visualize active regions from flux maps
β”‚   β”œβ”€β”€ flux_map_config.yaml     # Config for flux_map_analysis.py
β”‚   └── spatial_performance.py   # Flux-weighted spatial error heatmap on the solar disk
β”œβ”€β”€ data                         # Data cleaning and preprocessing
β”‚   β”œβ”€β”€ align_data.py            # Align AIA and SXR timestamps; save matched pairs
β”‚   β”œβ”€β”€ euv_data_cleaning.py     # EUV image quality filtering and cleaning
β”‚   β”œβ”€β”€ iti_data_processing.py   # ITI (image-to-image translation) preprocessing
β”‚   β”œβ”€β”€ process_data_pipeline.py # End-to-end preprocessing orchestrator
β”‚   β”œβ”€β”€ split_data.py            # Split processed data into train/val/test by date
β”‚   β”œβ”€β”€ sxr_data_processing.py   # Combine raw GOES .nc files into per-satellite CSVs
β”‚   β”œβ”€β”€ sxr_normalization.py     # Compute log-normalization stats (mean/std) on SXR
β”‚   β”œβ”€β”€ pipeline_config.py       # Dataclass config for process_data_pipeline.py
β”‚   └── pipeline_config.yaml     # YAML config for process_data_pipeline.py
β”œβ”€β”€ download                     # Dataset download utilities
β”‚   β”œβ”€β”€ download_sdo.py          # Download SDO/AIA EUV images from JSOC
β”‚   β”œβ”€β”€ sxr_downloader.py        # Download GOES SXR flux data
β”‚   β”œβ”€β”€ hugging_face_data_download.py  # Download pre-processed data from HuggingFace Hub
β”‚   β”œβ”€β”€ parquet_to_npy.py        # Convert locally-downloaded HF parquet files to .npy
β”‚   └── hf_download_config.yaml  # Config for HuggingFace downloader and parquet_to_npy
β”œβ”€β”€ forecasting                  # Model training and inference
β”‚   β”œβ”€β”€ data_loaders
β”‚   β”‚   └── SDOAIA_dataloader.py # PyTorch Lightning DataModule for AIA+SXR
β”‚   β”œβ”€β”€ inference
β”‚   β”‚   β”œβ”€β”€ inference.py         # Batch inference; writes predictions.csv
β”‚   β”‚   β”œβ”€β”€ evaluation.py        # Compute metrics and generate evaluation plots
β”‚   β”‚   β”œβ”€β”€ local_config.yaml    # Config for inference.py
β”‚   β”‚   └── evaluation_config.yaml  # Config for evaluation.py
β”‚   β”œβ”€β”€ models
β”‚   β”‚   └── vit_patch_model_local.py   # ViTLocal: Vision Transformer with patch flux heads
β”‚   └── training
β”‚       β”œβ”€β”€ train.py             # Train the ViTLocal model
β”‚       └── train_config.yaml    # Training hyperparameters and data paths
β”œβ”€β”€ pipeline_config.yaml         # Top-level pipeline orchestration config
β”œβ”€β”€ run_pipeline.py              # End-to-end pipeline orchestrator
└── requirements.txt             # Python dependencies

Setup

1) Clone

git clone https://github.com/griffin-goodwin/FOXES.git
cd FOXES

2) Create an environment

Option A β€” pip:

conda create -n foxes python=3.14 -y
conda activate foxes
pip install -r requirements.txt

Option B β€” conda (full environment):

conda env create -f foxes.yml
conda activate foxes

Running the Pipeline

FOXES uses a single orchestrator script (run_pipeline.py) and a top-level config (pipeline_config.yaml) to run any combination of pipeline steps in order.

Pipeline Steps

# Step Description
0 hf_download Download pre-processed, pre-split data from HuggingFace (replaces steps 1–6)
0b parquet_to_npy Convert already-downloaded HF parquet files to .npy (skips network download)
1 download_aia Download SDO/AIA EUV images from JSOC
2 download_sxr Download GOES SXR flux data
3 combine_sxr Combine raw GOES .nc files into per-satellite CSVs
4 preprocess EUV cleaning, ITI processing, and AIA/SXR alignment
5 split Split AIA and SXR data into train/val/test by date range
6 normalize Compute SXR log-normalization statistics (mean/std)
7 train Train the ViTLocal solar flare forecasting model
8 inference Run batch inference and save a predictions CSV
9 evaluate Compute metrics and generate evaluation plots
10 ablation Run channel-masking ablation study on a pretrained model
11 spatial_performance Generate flux-weighted spatial error heatmap on the solar disk
12 flux_map_analysis Detect and track active regions from flux maps; render frames and a movie

Usage

# List all available steps
python run_pipeline.py --list

# Run the full pipeline (from raw data)
python run_pipeline.py --config pipeline_config.yaml --steps all

# Quick-start: download pre-processed data from HuggingFace, then train
python run_pipeline.py --config pipeline_config.yaml --steps hf_download,train,inference,evaluate

# Already have parquet files locally? Convert them to .npy, then train
python run_pipeline.py --config pipeline_config.yaml --steps parquet_to_npy,train,inference,evaluate

# Run specific steps
python run_pipeline.py --config pipeline_config.yaml --steps train,inference,evaluate

# Force re-run of preprocessing even if outputs already exist
python run_pipeline.py --config pipeline_config.yaml --steps preprocess --force

Downloading Data from HuggingFace

The hf_download step pulls pre-processed, pre-split AIA and SXR data directly from the FOXES HuggingFace dataset, skipping the raw download, preprocessing, and split steps entirely. Configure it via download/hf_download_config.yaml:

# Source
repo_id: "griffingoodwin04/FOXES"

# Output β€” AIA and SXR .npy files are saved under these directories
# "validation" maps to a local "val/" folder to match the rest of the pipeline
aia_dir: "/Volumes/T9/AIA_hg_processed"
sxr_dir: "/Volumes/T9/SXR_hg_processed"

# Splits to download (any subset of: train, validation, test)
splits:
  - train
  - validation
  - test

# Subsampling β€” set subsample: true to download a random subset
subsample: false
subsample_seed: 42
subsample_n: 1000    # exact count per split; set to null to use subsample_frac instead
subsample_frac: 0.1  # fraction per split, used only when subsample_n is null

# Shuffle buffer: rows held in memory before sampling begins.
# Larger = better randomness but more data pre-fetched before the first file is saved.
# Rule of thumb: ~3x subsample_n, or ~500 for frac-based sampling.
shuffle_buffer_size: 500

# Parallel disk-write threads (I/O bound, so > CPU count is fine)
num_workers: 8

# Log progress every N rows submitted
print_every: 500

Run the downloader standalone:

python download/hugging_face_data_download.py --config download/hf_download_config.yaml

Converting Local Parquet Files to .npy

If you've already downloaded the HuggingFace parquet files (e.g., via huggingface-cli or the HF web UI), use parquet_to_npy.py to convert them directly β€” no network connection needed. The output is identical to what hf_download produces.

# All splits at once β€” parquet_root should contain train/, validation/, test/ subdirs
python download/parquet_to_npy.py \
    --parquet_root /path/to/parquet \
    --config download/hf_download_config.yaml

# Single split
python download/parquet_to_npy.py \
    --parquet_dir /path/to/parquet/train \
    --split train \
    --aia_dir /Volumes/T9/AIA_hg_processed \
    --sxr_dir /Volumes/T9/SXR_hg_processed

Configure it via pipeline_config.yaml to use it as a pipeline step:

parquet_to_npy:
  config: "download/hf_download_config.yaml"  # provides aia_dir, sxr_dir, num_workers
  parquet_root: "/path/to/your/parquet"        # dir with train/, validation/, test/ subdirs

Configuration

Edit pipeline_config.yaml to set data paths, date ranges, and hyperparameters. Each step has its own section, and an overrides block lets you override values from the step's base config without editing it directly.

# Example: override training hyperparameters from the top-level config
train:
  config: "forecasting/training/train_config.yaml"
  overrides:
    epochs: 150
    batch_size: 6

# Example: override inference data paths
inference:
  config: "forecasting/inference/local_config.yaml"
  overrides:
    data:
      checkpoint_path: "/path/to/your/checkpoint.ckpt"
    output_path: "/path/to/predictions.csv"

Steps can also be run individually by calling their scripts directly:

python forecasting/training/train.py -config forecasting/training/train_config.yaml
python forecasting/inference/inference.py -config forecasting/inference/local_config.yaml
python forecasting/inference/evaluation.py -config forecasting/inference/evaluation_config.yaml
python analysis/flux_map_analysis.py --config analysis/flux_map_config.yaml

Data Directory Layout

After preprocessing and splitting, data should be organized as follows:

/your/data/dir/FOXES/
β”œβ”€β”€ AIA_raw/                    # Raw downloaded AIA FITS files
β”œβ”€β”€ AIA_processed/              # ITI-processed AIA .npy arrays
β”‚   β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ val/
β”‚   └── test/
β”œβ”€β”€ SXR_raw/                    # Raw GOES .nc files
β”‚   └── combined/               # Per-satellite combined CSVs (from combine_sxr step)
β”œβ”€β”€ SXR_processed/              # Aligned SXR .npy scalars (xrsb_flux, one per timestamp)
β”‚   β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ val/
β”‚   β”œβ”€β”€ test/
β”‚   └── normalized_sxr.npy      # Log-normalization stats [mean, std]
└── inference/
    β”œβ”€β”€ predictions.csv          # Model output from inference step
    β”œβ”€β”€ weights/                 # Per-image attention maps (optional)
    β”œβ”€β”€ flux/                    # Map of flux contributions from each patch (optional)
    └── evaluation/              # Metrics and plots from evaluation step

Citation

If you use this code or data in your work, please cite:

@software{FOXES,
    title           = {{FOXES: A Framework For Operational X-ray Emission Synthesis}},
    institution     = {Frontier Development Lab (FDL)},
    repository-code = {https://github.com/griffin-goodwin/FOXES},
    version         = {v1.0},
    year            = {2026}
}

Acknowledgement

This work is a research product of Heliolab (heliolab.ai), an initiative of the Frontier Development Lab (FDL.ai). FDL is a public–private partnership between NASA, Trillium Technologies (trillium.tech), and commercial AI partners including Google Cloud and NVIDIA. Heliolab was designed, delivered, and managed by Trillium Technologies Inc., a research and development company focused on intelligent systems and collaborative communities for Heliophysics, planetary stewardship and space exploration. We gratefully acknowledge Google Cloud for extensive computational resources and NVIDIA Corporation. This material is based upon work supported by NASA under award number No. 80GSFC23CA040. Any opinions, findings, conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the National Aeronautics and Space Administration.

Large language models were used as brainstorming tools to discuss possible training strategies and methodological considerations. The authors retained full responsibility for all research decisions, interpretations, and conclusions presented in this work.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train griffingoodwin04/FOXES