File size: 1,730 Bytes
a4e7930 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
# Email Processing ModernBERT Model
Fine-tuned ModernBERT model for email processing tasks.
## Model Capabilities
This model can compute semantic similarity between questions and answers related to:
- Email addresses
- Subject lines
## Recommended Thresholds
Based on extensive testing, the following thresholds are recommended:
- For email questions: 0.85
- For subject questions: 0.70
- For other questions: 0.80
Additional content-aware checks are recommended for best results.
## Usage
```python
from sentence_transformers import SentenceTransformer
import torch
# Load the model
model = SentenceTransformer('sugiv/email-processing-modernbert')
# Encode questions and answers
q_embed = model.encode("What's your email address?", convert_to_tensor=True)
a1_embed = model.encode("My email is [email protected]", convert_to_tensor=True)
a2_embed = model.encode("The weather is nice today", convert_to_tensor=True)
# Calculate similarity
similarity1 = torch.nn.functional.cosine_similarity(q_embed.unsqueeze(0), a1_embed.unsqueeze(0)).item()
similarity2 = torch.nn.functional.cosine_similarity(q_embed.unsqueeze(0), a2_embed.unsqueeze(0)).item()
print(f'Similarity with relevant answer: {similarity1:.4f}')
print(f'Similarity with irrelevant answer: {similarity2:.4f}')
# Apply threshold
threshold = 0.85 # For email questions
print(f'Is relevant: {similarity1 >= threshold}')
print(f'Is irrelevant: {similarity2 < threshold}')
```
## Training Information
- Base model: [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)
- Published date: 2025-04-24
- Training approach: Fine-tuned with balanced dataset of email and subject questions
- Framework: sentence-transformers with PyTorch
|