Multimodal Emotion Speech Recognition
Model Description
This model performs emotion recognition from speech using a multimodal approach, utilizing:
- Audio Model: Wav2Vec2 Base
Dataset
- Dataset Name: stapesai/ssi-speech-emotion-recognition
Evaluation Results
Classification Report
precision recall f1-score support
ANG 0.97 0.93 0.95 30
CAL 0.00 0.00 0.00 0
DIS 0.95 0.90 0.92 20
FEA 0.76 0.70 0.73 27
HAP 0.87 0.82 0.84 33
NEU 0.96 0.96 0.96 25
SAD 0.73 1.00 0.84 19
SUR 0.88 0.78 0.82 9
accuracy 0.87 163
macro avg 0.76 0.76 0.76 163
weighted avg 0.88 0.87 0.87 163
Overall Accuracy: 87%
Model tree for dynann/emotion-classification
Base model
facebook/wav2vec2-base