Instructions to use Professor/mms-300m-fongbe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Professor/mms-300m-fongbe with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Professor/mms-300m-fongbe")# Load model directly from transformers import AutoProcessor, AutoModelForCTC processor = AutoProcessor.from_pretrained("Professor/mms-300m-fongbe") model = AutoModelForCTC.from_pretrained("Professor/mms-300m-fongbe") - Notebooks
- Google Colab
- Kaggle
mms-300m-fongbe
This model is a fine-tuned version of facebook/mms-300m specifically for Fongbe (Fon), a tonal language primarily spoken in Benin.
It was developed to preserve linguistic integrity by maintaining critical tonal diacritics and unique orthographic characters (e.g., ɖ, ɛ, ɔ, è, é). This model achieves State-of-the-Art (SOTA) results for Fongbe Automatic Speech Recognition (ASR) on the ALFFA test benchmark.
📊 Evaluation Results
The model was evaluated on the held-out ALFFA test set (2,168 utterances):
| Metric | Score |
|---|---|
| WER (Word Error Rate) | 0.0948 (9.48%) |
| CER (Character Error Rate) | 0.0396 (3.96%) |
Benchmark Comparison (with diacritics)
| Model | WER (%) | CER (%) | Year |
|---|---|---|---|
| Laleye et al. (Baseline) | 44.04% | — | 2016 |
| MMS-300m-Fongbe (Ours) | 9.48% | 3.96% | 2026 |
Inference Examples
| Reference | Prediction | Result |
|---|---|---|
| gannu elɔ kpɔ hu ɖe ɔ | gannu elɔ kpɔ hu ɖe ɔ | ✅ Perfect |
| ɖɔla tεnwe | ɖɔla tεnwe | ✅ Perfect |
| ama e gbɔ mɔ ɖo nɔ ɔ nu e wε e nɔ ɖu | ama e gbɔ mɔ ɖo nɔ ɔ nu ɔ e nɔ ɖu | ⚠️ Minor error |
📖 Model Description
- Architecture: MMS (Massive Multilingual Speech) 300M parameter model.
- Methodology: Fine-tuned with Connectionist Temporal Classification (CTC) loss.
- Language: Fongbe (fon).
- Phonetic Representation: Tone-preserved orthography using NFD/NFC normalization.
- Special Features: Full support for Fon-specific characters (
ɖ,ɛ,ɔ) and tone markers.
🚀 How to Use
from transformers import pipeline
asr = pipeline("automatic-speech-recognition", model="Professor/mms-300m-fongbe")
# Ensure your audio is 16kHz
transcription = asr("path_to_audio.wav")
print(transcription["text"])
🎯 Intended Uses & Limitations
Intended Uses
- High-accuracy transcription of Fongbe speech.
- Research in low-resource and tonal language modeling.
- Base model for downstream Fongbe NLP tasks (NLP4Fon).
Limitations
- Performance may degrade in noisy environments or with heavy background music.
- Primarily trained on continuous speech; may require further fine-tuning for specific dialects or extremely fast colloquial speech.
📁 Training and Evaluation Data
The model was trained on a consolidated dataset merging the ALFFA Project (African Languages in the Field) data and the Zenodo Fongbe Speech Dataset:
- Train + Validation Set: ~10.85 hours (Merged and re-split 90/10).
- Test Set: ~1.45 hours (Standard 2,168 utterances from ALFFA for benchmark consistency).
- Sampling Rate: 16,000 Hz.
⚙️ Training Procedure
Hyperparameters
- Learning Rate: 1e-4
- Effective Batch Size: 64 (Batch 16 x 4 Grad Accumulation)
- Optimizer: AdamW (Fused)
- Epochs: 30
- Precision: Mixed Precision (FP16)
- Hardware: NVIDIA H100 GPU
Training Logs
| Training Loss | Epoch | Step | Validation Loss | WER |
|---|---|---|---|---|
| 26.3861 | 3.11 | 500 | 1.0171 | 0.6021 |
| 2.5796 | 6.21 | 1000 | 0.3366 | 0.2600 |
| 1.3316 | 9.32 | 1500 | 0.2312 | 0.1799 |
| 0.9087 | 12.42 | 2000 | 0.2031 | 0.1557 |
| 0.6678 | 15.53 | 2500 | 0.1752 | 0.1397 |
| 0.5069 | 18.64 | 3000 | 0.1747 | 0.1325 |
| 0.4034 | 21.74 | 3500 | 0.1583 | 0.1137 |
| 0.3142 | 24.85 | 4000 | 0.1618 | 0.1147 |
| 0.2622 | 27.95 | 4500 | 0.1656 | 0.1085 |
📜 Citation & Credits
If you use this model in your research, please cite the following:
Dataset Contributors: Laleye, Fréjus A. A., et al. (ALFFA Project & Zenodo release).
Model Developer: Victor Olufemi (Professor).
@dataset{laleye_frejus_2022_6604637,
author = {Laleye, Fréjus A. A.},
title = {Fongbe Speech Dataset},
year = 2022,
publisher = {Zenodo},
doi = {10.5281/zenodo.6604637}
}
@inproceedings{laleye2016FongbeASR,
title={First Automatic Fongbe Continuous Speech Recognition System},
author={A. A Laleye, Fréjus and Besacier, Laurent and Ezin, Eugène C. and Motamed, Cina},
year={2016},
organization={FedCSIS}
}
- Downloads last month
- 200
Model tree for Professor/mms-300m-fongbe
Base model
facebook/mms-300mDataset used to train Professor/mms-300m-fongbe
Evaluation results
- Test WER on Fongbe Speech Zenodo (ALFFA + Zenodo)test set self-reported0.095
- Test CER on Fongbe Speech Zenodo (ALFFA + Zenodo)test set self-reported0.040