kde4-en-fr

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-fr on the KDE4 dataset (English to French).

Model description

This model has been adapted to the domain of technical software documentation and user interface localization.

It was fine-tuned on the KDE4 dataset, which consists of manual translations of KDE apps. Unlike general-purpose translation models, this model learns specific localization preferences common in the French tech community (e.g., handling terms like "threads," "plugin," or "email" in a way that matches technical usage rather than literal translation).

Intended uses & limitations

  • Intended Use: Translation of technical texts, software strings, and documentation from English to French.
  • Limitations: The model is specialized for computer science and software terminology. It may perform differently than the base model on general conversational text or literature.

Training and evaluation data

The model was trained on the KDE4 dataset, specifically the English-French subset.

  • Dataset: kde4
  • Language Pair: English (en) -> French (fr)
  • Preprocessing: Sentences were truncated to a maximum length of 128 tokens.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08)
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP

Training results

The model showed steady convergence over 3 epochs (approx 35,000 steps).

Training Loss Step
1.4339 500
1.0881 5000
1.0051 10000
0.8804 15000
0.8459 20000
0.7696 25000
0.7650 30000
0.7772 35000

Evaluation Results

BLEU Score

  • Fine-tuned model: 0.5216
  • Pretrained model: 0.3817

Note: The fine-tuned model demonstrates improved adherence to domain-specific terminology (e.g., preserving English technical terms like "email" or "plugin" where appropriate for French technical context) compared to the base model.

Framework versions

  • Transformers 4.57.3
  • Pytorch 2.9.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
97
Safetensors
Model size
74.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mariadelcarmenramirez/kde4-en-fr

Finetuned
(519)
this model

Dataset used to train mariadelcarmenramirez/kde4-en-fr

Evaluation results