Suppression or Deletion: Pretrained Models
This repository contains pretrained models and SAE (Sparse Autoencoder) assets for testing SAE-based restoration on machine unlearned models.
Main GitHub Repository: suppression-or-deletion
Overview
These assets enable researchers to test whether unlearned models can be restored during inference using Sparse Autoencoder (SAE) features. The repository includes:
- Original ViT-Base/16 models trained on CIFAR-10 and Imagenette
- SAE models trained on layers 8, 9, 10 with TopK sparsity
- Activation statistics for normalization
- Expert features identified for each class
Repository Contents
pretrained/
βββ cifar10/
β βββ vit_base_16_original.pth # Original ViT-Base model
β βββ sae_layer8_k16.pt # SAE for layer 8, k=16
β βββ sae_layer9_k16.pt # SAE for layer 9, k=16
β βββ sae_layer10_k16.pt # SAE for layer 10, k=16
β βββ activations_layer8_stats.npy # Normalization stats for layer 8
β βββ activations_layer9_stats.npy # Normalization stats for layer 9
β βββ activations_layer10_stats.npy # Normalization stats for layer 10
β βββ expert_features_layer8_k16.pt # Expert features for layer 8
β βββ expert_features_layer9_k16.pt # Expert features for layer 9
β βββ expert_features_layer10_k16.pt # Expert features for layer 10
βββ imagenette/
βββ vit_base_16_original.pth # Original ViT-Base model
βββ sae_layer8_k32.pt # SAE for layer 8, k=32
βββ sae_layer9_k32.pt # SAE for layer 9, k=32
βββ sae_layer10_k32.pt # SAE for layer 10, k=32
βββ activations_layer8_stats.npy # Normalization stats for layer 8
βββ activations_layer9_stats.npy # Normalization stats for layer 9
βββ activations_layer10_stats.npy # Normalization stats for layer 10
βββ expert_features_layer8_k32.pt # Expert features for layer 8
βββ expert_features_layer9_k32.pt # Expert features for layer 9
βββ expert_features_layer10_k32.pt # Expert features for layer 10
Total size: ~860 MB
Dataset-Specific Configurations
| Dataset | Classes | TopK (k) | Expert Features per Class |
|---|---|---|---|
| CIFAR-10 | 10 | 16 | 20 (kΓ5/4) |
| Imagenette | 10 | 32 | 40 (kΓ5/4) |
Quick Start
Download All Files
# Using Hugging Face CLI (recommended)
pip install huggingface_hub
huggingface-cli download Yurim0507/suppression-or-deletion --local-dir ./pretrained --repo-type=model
# Using Python
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Yurim0507/suppression-or-deletion",
local_dir="./pretrained",
repo_type="model"
)
Download Specific Dataset
# CIFAR-10 only
huggingface-cli download Yurim0507/suppression-or-deletion --include "cifar10/*" --local-dir ./pretrained --repo-type=model
# Imagenette only
huggingface-cli download Yurim0507/suppression-or-deletion --include "imagenette/*" --local-dir ./pretrained --repo-type=model
Usage with Main Repository
Clone the main repository:
git clone https://github.com/Yurim0507/suppression-or-deletion.git cd suppression-or-deletionDownload pretrained assets (using commands above)
Prepare your unlearned model:
- Train an unlearned model using any unlearning method (CF-k, SALUN, SCRUB, etc.)
- Save the checkpoint in
.pthformat
Run restoration test:
# Test restoration on CIFAR-10 class 0 (airplane) python recovery_test.py \ --dataset cifar10 \ --unlearned_model path/to/your/unlearned_model.pth \ --target_class 0 \ --layer 9 \ --alpha 1.0 2.0 5.0 10.0Simple demo script:
python demo.py \ --dataset cifar10 \ --unlearned_model path/to/your/unlearned_model.pth \ --target_class 0
File Formats
Original Model (vit_base_16_original.pth)
PyTorch checkpoint containing ViT-Base/16 model trained on CIFAR-10 or Imagenette:
{
'model_state_dict': <OrderedDict>, # Model weights
'epoch': <int>, # Training epoch
# ... other training metadata
}
SAE Model (sae_layer{8,9,10}_k{16,32}.pt)
Sparse Autoencoder checkpoint:
{
'model_state_dict': <OrderedDict>, # SAE weights
'model_config': {
'input_dim': 768, # ViT hidden dimension
'hidden_dim': 768, # SAE latent dimension (768Γ1)
'k': 16, # TopK sparsity (16 for CIFAR-10, 32 for Imagenette)
'activation': 'topk' # Activation type
},
'pre_bias': <Tensor>, # Pre-bias parameter
}
Activation Statistics (activations_layer{8,9,10}_stats.npy)
Normalization statistics:
{
'patch_mean': <ndarray>, # Mean of patch token activations
'patch_std': <ndarray>, # Std of patch token activations
}
Expert Features (expert_features_layer{8,9,10}_k{16,32}.pt)
Class-specific expert features:
{
'class_experts_details': {
0: [feature_id_1, feature_id_2, ...], # Expert features for class 0
1: [...], # Expert features for class 1
...
9: [...] # Expert features for class 9
}
}
Expert feature selection criteria:
- Top kΓ5/4 features per class (20 for CIFAR-10, 40 for Imagenette)
- Sorted by F1 score
- Common features (active in 7+ classes) excluded
Training Details
Original Models
- Architecture: ViT-Base/16 (google/vit-base-patch16-224)
- Training: Fine-tuned on CIFAR-10/Imagenette from pretrained ImageNet weights
- Optimizer: AdamW
- Learning rate: 1e-4
- Epochs: 20
- Data augmentation: RandomCrop, RandomHorizontalFlip
SAE Models
- Layers: 8, 9, 10 (out of 12 ViT layers)
- Sparsity: TopK activation
- CIFAR-10: k=16 (only top 16 features active per sample)
- Imagenette: k=32 (only top 32 features active per sample)
- Training loss: MSE reconstruction + L1 regularization
- Training samples: All training set activations (patch tokens only)
Expert Features
- Selection criteria: Top kΓ5/4 features per class
- CIFAR-10: 20 features per class (16Γ5/4)
- Imagenette: 40 features per class (32Γ5/4)
- Metrics: F1 score, precision, recall
- Filtering: Common features (active in 7+ classes) excluded
License
MIT License - See main repository for details.