sugiv-pii-classifier

A lightweight 3B parameter PII (Personally Identifiable Information) classifier, distilled from the Roblox PII Classifier (560M XLM-RoBERTa) using teacher-student distillation with Fireworks AI's LoRA fine-tuning.

🎯 Model Overview

Attribute	Value
Base Model	Llama-3.2-3B-Instruct
Teacher Model	Roblox/roblox-pii-classifier
Training Method	LoRA (Low-Rank Adaptation) via Fireworks AI SFT
LoRA Rank	16
Training Examples	4,000 (5,000 generated, 80/10/10 split)
Test Accuracy	87.4% on held-out test set
Labels	`none`, `asking`, `giving`

🏗️ Architecture

This model uses a novel "LLM Head as Classifier" approach inspired by Fireworks AI's blog post:

┌─────────────────────────────────────────────────────────────────┐
│                    TEACHER-STUDENT DISTILLATION                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────────┐     Labels      ┌──────────────────────┐  │
│  │   DeepSeek API   │ ──────────────► │  Synthetic Dataset   │  │
│  │ (Data Generator) │                 │    5,000 examples    │  │
│  └──────────────────┘                 └──────────┬───────────┘  │
│                                                   │              │
│                                                   ▼              │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │              ROBLOX PII CLASSIFIER (Teacher)              │   │
│  │                  XLM-RoBERTa-Large (560M)                 │   │
│  │                                                           │   │
│  │   Thresholds:                                             │   │
│  │   • asking >= 0.2  → "asking"                             │   │
│  │   • giving >= 0.3  → "giving"                             │   │
│  │   • max >= 0.2691  → most confident class                 │   │
│  │   • else           → "none"                               │   │
│  └──────────────────────────────────────────────────────────┘   │
│                              │                                   │
│                              │ Soft Labels                       │
│                              ▼                                   │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │              LLAMA 3.2 3B + LoRA (Student)                │   │
│  │                                                           │   │
│  │   System: "Classify if this message involves PII.         │   │
│  │            Reply with: none, asking, or giving."          │   │
│  │                                                           │   │
│  │   User: Message: "whats ur snap?"                         │   │
│  │   Assistant: asking                                       │   │
│  │                                                           │   │
│  │   LoRA Config:                                            │   │
│  │   • Rank: 16, Alpha: 32                                   │   │
│  │   • Target: q,k,v,o,gate,up,down_proj                     │   │
│  └──────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

📊 Performance

Test Set Results (500 examples)

Metric	Value
Accuracy	87.4%
None F1	0.93
Asking F1	0.70
Giving F1	0.80

Confusion Matrix

	Pred: none	Pred: asking	Pred: giving
Actual: none	316	16	8
Actual: asking	15	57	15
Actual: giving	6	3	64

🚀 Quick Start

With PEFT (Recommended)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "sugiv/sugiv-pii-classifier")

def classify_pii(message: str) -> str:
    """Classify a message for PII content."""
    messages = [
        {"role": "system", "content": 'Classify if this chat message involves PII (personal info). Reply with exactly one word: "none", "asking", or "giving".'},
        {"role": "user", "content": f'Message: "{message}"'}
    ]
    
    inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(inputs, max_new_tokens=10, temperature=0, do_sample=False)
    
    response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
    return response.strip().lower().split()[0]

# Examples
print(classify_pii("whats ur snap?"))  # → asking
print(classify_pii("my email is [email protected]"))  # → giving
print(classify_pii("this game is so fun"))  # → none

With Fireworks API (Production)

import requests

API_KEY = "your-fireworks-api-key"
MODEL = "accounts/sugi205-8d1850/models/pii-classifier-llama3b-5k"

def classify_pii(message: str) -> str:
    response = requests.post(
        "https://api.fireworks.ai/inference/v1/chat/completions",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": MODEL,
            "messages": [
                {"role": "system", "content": 'Classify if this message involves PII. Reply: none, asking, or giving.'},
                {"role": "user", "content": f'Message: "{message}"'}
            ],
            "max_tokens": 10,
            "temperature": 0
        }
    )
    return response.json()["choices"][0]["message"]["content"].strip().lower()

📝 Label Definitions

Label	Description	Examples
none	No PII request or disclosure	"this game is fun", "lol nice shot"
asking	Requesting personal information	"what's your phone number?", "where do you live?"
giving	Sharing personal information	"my email is [email protected]", "I live at 123 Main St"

🔧 Training Details

Data Generation

Generator: DeepSeek API (deepseek-chat)
Categories: Benign (50%), Asking PII (25%), Giving PII (25%)
Total Generated: 5,000 examples
Platforms Simulated: Gaming chat, social media, messaging

Teacher Labeling

Model: Roblox/roblox-pii-classifier
Thresholds (from Roblox documentation):
- privacy_asking_for_pii >= 0.2 → "asking"
- privacy_giving_pii >= 0.3 → "giving"
- max(asking, giving) >= 0.2691 → most confident
- Otherwise → "none"

Fine-Tuning

Platform: Fireworks AI Supervised Fine-Tuning
Method: LoRA (Low-Rank Adaptation)
Epochs: 3
Learning Rate: 1e-4
LoRA Rank: 16
LoRA Alpha: 32
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

📚 References

📄 License

Apache 2.0

🙏 Acknowledgments

Roblox for open-sourcing their PII classifier
Fireworks AI for the fine-tuning infrastructure and classifier approach
Meta for the Llama 3.2 base model

⚠️ Limitations

Trained on synthetic data; may not cover all real-world PII patterns
English-only (teacher model is multilingual but training data is English)
Should be used as part of a broader content moderation system
Not a replacement for comprehensive privacy protection measures

📧 Contact

Created by @sugiv

Downloads last month: 11

Model tree for sugiv/sugiv-pii-classifier

Base model

meta-llama/Llama-3.2-3B-Instruct

Adapter

(590)

this model

Paper for sugiv/sugiv-pii-classifier

LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 56