Suppression or Deletion: Pretrained Models

This repository contains pretrained models and SAE (Sparse Autoencoder) assets for testing SAE-based restoration on machine unlearned models.

Main GitHub Repository: suppression-or-deletion

Overview

These assets enable researchers to test whether unlearned models can be restored during inference using Sparse Autoencoder (SAE) features. The repository includes:

Original ViT-Base/16 models trained on CIFAR-10 and Imagenette
SAE models trained on layers 8, 9, 10 with TopK sparsity
Activation statistics for normalization
Expert features identified for each class

Repository Contents

pretrained/
├── cifar10/
│   ├── vit_base_16_original.pth          # Original ViT-Base model 
│   ├── sae_layer8_k16.pt                 # SAE for layer 8, k=16 
│   ├── sae_layer9_k16.pt                 # SAE for layer 9, k=16 
│   ├── sae_layer10_k16.pt                # SAE for layer 10, k=16 
│   ├── activations_layer8_stats.npy      # Normalization stats for layer 8 
│   ├── activations_layer9_stats.npy      # Normalization stats for layer 9 
│   ├── activations_layer10_stats.npy     # Normalization stats for layer 10 
│   ├── expert_features_layer8_k16.pt     # Expert features for layer 8 
│   ├── expert_features_layer9_k16.pt     # Expert features for layer 9 
│   └── expert_features_layer10_k16.pt    # Expert features for layer 10 
└── imagenette/
    ├── vit_base_16_original.pth          # Original ViT-Base model 
    ├── sae_layer8_k32.pt                 # SAE for layer 8, k=32 
    ├── sae_layer9_k32.pt                 # SAE for layer 9, k=32 
    ├── sae_layer10_k32.pt                # SAE for layer 10, k=32 
    ├── activations_layer8_stats.npy      # Normalization stats for layer 8 
    ├── activations_layer9_stats.npy      # Normalization stats for layer 9 
    ├── activations_layer10_stats.npy     # Normalization stats for layer 10 
    ├── expert_features_layer8_k32.pt     # Expert features for layer 8 
    ├── expert_features_layer9_k32.pt     # Expert features for layer 9 
    └── expert_features_layer10_k32.pt    # Expert features for layer 10

Total size: ~860 MB

Dataset-Specific Configurations

Dataset	Classes	TopK (k)	Expert Features per Class
CIFAR-10	10	16	20 (k×5/4)
Imagenette	10	32	40 (k×5/4)

Quick Start

Download All Files

# Using Hugging Face CLI (recommended)
pip install huggingface_hub
huggingface-cli download Yurim0507/suppression-or-deletion --local-dir ./pretrained --repo-type=model

# Using Python
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id="Yurim0507/suppression-or-deletion",
    local_dir="./pretrained",
    repo_type="model"
)

Download Specific Dataset

# CIFAR-10 only
huggingface-cli download Yurim0507/suppression-or-deletion --include "cifar10/*" --local-dir ./pretrained --repo-type=model

# Imagenette only
huggingface-cli download Yurim0507/suppression-or-deletion --include "imagenette/*" --local-dir ./pretrained --repo-type=model

Usage with Main Repository

Clone the main repository:

git clone https://github.com/Yurim0507/suppression-or-deletion.git
cd suppression-or-deletion

Download pretrained assets (using commands above)
Prepare your unlearned model:
- Train an unlearned model using any unlearning method (CF-k, SALUN, SCRUB, etc.)
- Save the checkpoint in .pth format

Run restoration test:

# Test restoration on CIFAR-10 class 0 (airplane)
python recovery_test.py \
    --dataset cifar10 \
    --unlearned_model path/to/your/unlearned_model.pth \
    --target_class 0 \
    --layer 9 \
    --alpha 1.0 2.0 5.0 10.0

Simple demo script:

python demo.py \
    --dataset cifar10 \
    --unlearned_model path/to/your/unlearned_model.pth \
    --target_class 0

File Formats

Original Model (`vit_base_16_original.pth`)

PyTorch checkpoint containing ViT-Base/16 model trained on CIFAR-10 or Imagenette:

{
    'model_state_dict': <OrderedDict>,  # Model weights
    'epoch': <int>,                      # Training epoch
    # ... other training metadata
}

SAE Model (`sae_layer{8,9,10}_k{16,32}.pt`)

Sparse Autoencoder checkpoint:

{
    'model_state_dict': <OrderedDict>,  # SAE weights
    'model_config': {
        'input_dim': 768,               # ViT hidden dimension
        'hidden_dim': 768,             # SAE latent dimension (768×1)
        'k': 16,                        # TopK sparsity (16 for CIFAR-10, 32 for Imagenette)
        'activation': 'topk'            # Activation type
    },
    'pre_bias': <Tensor>,               # Pre-bias parameter
}

Activation Statistics (`activations_layer{8,9,10}_stats.npy`)

Normalization statistics:

{
    'patch_mean': <ndarray>,  # Mean of patch token activations
    'patch_std': <ndarray>,   # Std of patch token activations
}

Expert Features (`expert_features_layer{8,9,10}_k{16,32}.pt`)

Class-specific expert features:

{
    'class_experts_details': {
        0: [feature_id_1, feature_id_2, ...],  # Expert features for class 0
        1: [...],                               # Expert features for class 1
        ...
        9: [...]                                # Expert features for class 9
    }
}

Expert feature selection criteria:

Top k×5/4 features per class (20 for CIFAR-10, 40 for Imagenette)
Sorted by F1 score
Common features (active in 7+ classes) excluded

Training Details

Original Models

Architecture: ViT-Base/16 (google/vit-base-patch16-224)
Training: Fine-tuned on CIFAR-10/Imagenette from pretrained ImageNet weights
Optimizer: AdamW
Learning rate: 1e-4
Epochs: 20
Data augmentation: RandomCrop, RandomHorizontalFlip

SAE Models

Layers: 8, 9, 10 (out of 12 ViT layers)
Sparsity: TopK activation
- CIFAR-10: k=16 (only top 16 features active per sample)
- Imagenette: k=32 (only top 32 features active per sample)
Training loss: MSE reconstruction + L1 regularization
Training samples: All training set activations (patch tokens only)

Expert Features

Selection criteria: Top k×5/4 features per class
- CIFAR-10: 20 features per class (16×5/4)
- Imagenette: 40 features per class (32×5/4)
Metrics: F1 score, precision, recall
Filtering: Common features (active in 7+ classes) excluded

License

MIT License - See main repository for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support