Deep Learning Project: Spam Detection with DistilBERT

This repository contains the code and resources for the Deep Learning project on Spam Detection.

Project Structure

  • mail_data.csv: The dataset used for training and evaluation.
  • eda_script.py: Script for Exploratory Data Analysis and visualization.
  • train_model_hf.py: Main training script using Hugging Face Trainer and DistilBERT.
  • evaluate_final.py: Script for final evaluation from the best model checkpoint.
  • eda_plots.png: Visualizations generated during EDA.
  • results.txt: Detailed evaluation metrics and confusion matrix.
  • Deep_Learning_Project_Report.pdf: The final project report (15-17 pages equivalent).

Requirements

  • Python 3.11+
  • PyTorch
  • Transformers
  • Datasets
  • Scikit-learn
  • Pandas
  • Matplotlib
  • Seaborn
  • Accelerate

How to Run

  1. Make sure you have all requirements downloaded. In case of errors while running the code, try installing the dependencies in requirements.txt in a fresh environment.
  2. EDA: Run python3 eda_script.py to see the data distribution.
  3. Training: Run python3 train_model_hf.py to fine-tune the DistilBERT model.
  4. Evaluation: Run python3 evaluate_final.py to get the final performance metrics.

Results

The model achieves 99.10% accuracy on the test set with an F1-score of 96.58% for the spam class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for picket-cliff/deepl-project

Finetuned
(11067)
this model

Dataset used to train picket-cliff/deepl-project