Deep Learning Project: Spam Detection with DistilBERT
This repository contains the code and resources for the Deep Learning project on Spam Detection.
Project Structure
mail_data.csv: The dataset used for training and evaluation.eda_script.py: Script for Exploratory Data Analysis and visualization.train_model_hf.py: Main training script using Hugging Face Trainer and DistilBERT.evaluate_final.py: Script for final evaluation from the best model checkpoint.eda_plots.png: Visualizations generated during EDA.results.txt: Detailed evaluation metrics and confusion matrix.Deep_Learning_Project_Report.pdf: The final project report (15-17 pages equivalent).
Requirements
- Python 3.11+
- PyTorch
- Transformers
- Datasets
- Scikit-learn
- Pandas
- Matplotlib
- Seaborn
- Accelerate
How to Run
- Make sure you have all requirements downloaded. In case of errors while running the code, try installing the dependencies in requirements.txt in a fresh environment.
- EDA: Run
python3 eda_script.pyto see the data distribution. - Training: Run
python3 train_model_hf.pyto fine-tune the DistilBERT model. - Evaluation: Run
python3 evaluate_final.pyto get the final performance metrics.
Results
The model achieves 99.10% accuracy on the test set with an F1-score of 96.58% for the spam class.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for picket-cliff/deepl-project
Base model
distilbert/distilbert-base-uncased