kmack
/

YELP-Review_Classifier

+---
+license: mit
+datasets:
+- Yelp/yelp_review_full
+metrics:
+- accuracy
+base_model:
+- distilbert/distilbert-base-uncased
+library_name: transformers
+tags:
+- Sentiment Analysis
+- Text Classification
+- BERT
+- Yelp Reviews
+- Fine-tuned
+---
+# Yelp Review Classifier
+This model is a sentiment classification model for Yelp reviews, trained to predict whether a review is **positive** or **negative**. The model was fine-tuned using the `distilbert-base-uncased` model architecture, based on the [DistilBERT model](https://huggingface.co/distilbert/distilbert-base-uncased) from Hugging Face, and trained on a Yelp reviews dataset.
+## Model Details
+- **Model Type**: DistilBERT-based model for sequence classification
+- **Model Architecture**: `distilbert-base-uncased`
+- **Number of Parameters**: Approximately 66M parameters
+- **Training Dataset**: The model was trained on a curated Yelp reviews dataset, labeled for sentiment (positive/negative).
+- **Fine-Tuning Task**: Sentiment analysis for Yelp reviews (positive or negative sentiment)
+## Training Data
+- **Dataset**: Custom Yelp reviews dataset
+- **Data Description**: The dataset consists of Yelp reviews, each labeled with a sentiment (positive/negative).
+- **Preprocessing**: The dataset was preprocessed by cleaning the reviews to remove unwanted characters and URLs.
+## Training Details
+- **Training Framework**: Hugging Face Transformers and PyTorch
+- **Learning Rate**: 2e-5
+- **Epochs**: 6
+- **Batch Size**: 16
+- **Optimizer**: AdamW
+- **Training Time**: Approximately 2 hours on a GPU
+## Usage
+To use the model for inference, you can use the following code:
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+# Load the fine-tuned model and tokenizer from Hugging Face
+model_name = "kmack/YELP-Review_Classifier"  # Replace with your model name if different
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# List of reviews for prediction
+reviews = [
+    "The food was absolutely delicious, and the atmosphere was perfect for a family gathering. The staff was friendly, and we had a great time. Definitely coming back!",
+    "It was decent, but nothing special. The food was okay, but the service was a bit slow. I think there are better places around.",
+    "I had a terrible experience. The waiter was rude, and the food was cold when it arrived. I won't be returning anytime soon."
+]
+# Map prediction to star ratings
+label_map = {
+    0: "1 Star",
+    1: "2 Stars",
+    2: "3 Stars",
+    3: "4 Stars",
+    4: "5 Stars"
+}
+# Iterate over each review and get the prediction
+for review in reviews:
+    # Tokenize the input text
+    inputs = tokenizer(review, return_tensors="pt", padding=True, truncation=True)
+    # Get predictions
+    with torch.no_grad():
+        outputs = model(**inputs)
+    # Get the predicted label (0 to 4 for star ratings)
+    prediction = torch.argmax(outputs.logits, dim=-1).item()
+    # Map prediction to star rating
+    predicted_rating = label_map[prediction]
+    print(f"Rating: {predicted_rating}\n")
+```
+## Citation
+If you use this model in your research, please cite the following:
+```@misc{YELP-Review_Classifier,
+  author = {Kmack},
+  title = {YELP-Review_Classifier},
+  year = {2024},
+  url = {https://huggingface.co/kmack/YELP-Review_Classifier}
+}
+```