---
language:
- en
license: mit
tags:
- audio
- speech
- emotion-recognition
- tensorflow
- keras
- audio-classification
- ravdess
datasets:
- ravdess
metrics:
- accuracy
- f1
model-index:
- name: Speech Emotion Recognition
  results:
  - task:
      type: audio-classification
      name: Audio Classification
    dataset:
      type: ravdess
      name: RAVDESS
    metrics:
    - type: accuracy
      name: Accuracy
      value: "See confusion matrix"
pipeline_tag: audio-classification
library_name: tensorflow
---

# Speech Emotion Recognition Model

This model performs speech emotion recognition, classifying audio into 8 different emotional states.

## Model Description

This is a deep learning model trained to recognize emotions from speech audio. The model can classify audio into the following emotions:

- 😐 Neutral
- 😌 Calm  
- 😊 Happy
- 😢 Sad
- 😠 Angry
- 😨 Fearful
- 🤢 Disgust
- 😲 Surprised

## Model Architecture

The model uses audio features extraction including:
- MFCC (Mel-frequency cepstral coefficients)
- Chroma features
- Mel-spectrogram features

## Usage

```python
import librosa
import numpy as np
from tensorflow.keras.models import load_model

# Load the model
model = load_model('trained_model.h5')

# Load and preprocess audio
def extract_feature(data, sr, mfcc=True, chroma=True, mel=True):
    result = np.array([])
    if mfcc:                          
        mfccs = np.mean(librosa.feature.mfcc(y=data, sr=sr, n_mfcc=40).T, axis=0)
        result = np.hstack((result, mfccs))
    if chroma:
        stft = np.abs(librosa.stft(data))
        chroma_feat = np.mean(librosa.feature.chroma_stft(S=stft, sr=sr).T, axis=0)
        result = np.hstack((result, chroma_feat))
    if mel:                             
        mel_feat = np.mean(librosa.feature.melspectrogram(y=data, sr=sr).T, axis=0)
        result = np.hstack((result, mel_feat))
    return result

# Load audio file
audio_path = "your_audio_file.wav"
data, sr = librosa.load(audio_path, sr=22050)

# Extract features
feature = extract_feature(data, sr, mfcc=True, chroma=True, mel=True)
feature = np.expand_dims(feature, axis=0)
feature = np.expand_dims(feature, axis=2)

# Make prediction
prediction = model.predict(feature)
predicted_class = np.argmax(prediction, axis=1)

# Map to emotion labels
emotions = {
    0: 'Neutral',
    1: 'Calm',
    2: 'Happy',
    3: 'Sad',
    4: 'Angry',
    5: 'Fearful',
    6: 'Disgust',
    7: 'Surprised'
}

predicted_emotion = emotions[predicted_class[0]]
print(f"Predicted emotion: {predicted_emotion}")
```

## Requirements

```
librosa
tensorflow
numpy
scikit-learn
```

## Training Data

The model was trained on the RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains speech emotion recordings with the following emotion categories:

- Neutral
- Calm
- Happy  
- Sad
- Angry
- Fearful
- Disgust
- Surprised

The dataset provides high-quality audio recordings from multiple speakers, allowing the model to learn robust emotion recognition patterns across different voices and speaking styles.

## Model Performance

The model has been trained and evaluated with the following performance metrics:

### Training Progress
![Loss and Accuracy](loss%20and%20accuracy.png)

The training curves show the model's learning progress over epochs, demonstrating convergence and good generalization.

### Confusion Matrix
![Confusion Matrix](Confusion-matrix-of-speaker-dependent-emotions-prediction-on-RAVDESS-corpus-with-8202.png)

The confusion matrix shows the model's performance on the RAVDESS dataset, demonstrating how well the model distinguishes between different emotional states.

## License

[Specify your license here]

## Citation

If you use this model, please cite:

```
@misc{speech-emotion-recognition,
  author = {JagjeevanAK},
  title = {Speech Emotion Recognition Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/JagjeevanAK/Speech-emotion-detection}
}
```