Norwegian NER Model (nb-bert-base fine-tuned)

Model Description

This model is NbAiLab/nb-bert-base fine-tuned on the thivy/norwegian-ner-combined dataset for Named Entity Recognition in Norwegian (Bokmål and Nynorsk).

Model Performance

Metric Score
F1 0.9329
Precision 0.93
Recall 0.93

Best Epoch: 5 out of 20 (early stopped at epoch 11)

Supported Entity Types

Label Description Examples
PER Person names Erna Solberg, Ibsen
ORG Organizations Stortinget, NATO
LOC Locations Oslo, Norge, Europa
MISC Miscellaneous Nobels fredspris

Training Data

Dataset: thivy/norwegian-ner-combined

  • Training samples: 49,870
  • Validation samples: 14,289
  • Test samples: 13,450
  • Sources: NorNE + WikiANN Norwegian

The dataset combines:

  • NorNE (Norwegian Named Entities) - Bokmål and Nynorsk
  • WikiANN (Wikipedia-based NER) - Norwegian subset

Quality improvements:

  • 12 problematic samples filtered
  • Entity type remapping (9 → 4 types)
  • Combined evaluation sets for general Norwegian NER

Training Procedure

Training Hyperparameters

{
    "learning_rate": 3.5e-5,
    "num_train_epochs": 20,
    "per_device_train_batch_size": 8,
    "per_device_eval_batch_size": 8,
    "weight_decay": 0.15,
    "warmup_ratio": 0.05,
    "lr_scheduler_type": "cosine_with_restarts",
    "num_cycles": 4,
    "early_stopping_patience": 6,
    "metric_for_best_model": "f1",
}

Training Strategy

Phase 5: Gentle LR Restarts

The model was trained using a cosine learning rate schedule with gentle restarts:

  • Max LR: 3.5e-5 (identified as "sweet spot" from Phase 3 analysis)
  • Restarts: 4 restarts over 20 epochs (5 cycles of ~4 epochs each)
  • Warmup: 5% (1 epoch)
  • Early Stopping: Patience of 6 epochs

Why this worked:

  • Conservative LR avoided catastrophic forgetting (Phase 4 failed with 1.0e-4)
  • Gentle restarts provided escape velocity from local minima
  • Achieved 10.4% lower loss than Phase 3
  • Stable gradients (1.3% spike rate)

Training Phases History

Phase F1 Strategy Result
Phase 1 - Initial baseline Established pipeline
Phase 2 - Data filtering Improved quality
Phase 3 0.9298 OneCycleLR Good, but plateaued at epoch 6
Phase 4 0.9142 Aggressive restarts (1e-4) ❌ Catastrophic forgetting
Phase 5 0.9329 Gentle restarts (3.5e-5) Best model

Usage

Using Pipeline (Recommended)

from transformers import pipeline

# Load pipeline
ner = pipeline(
    "ner",
    model="thivy/nb-bert-norwegian-ner",
    aggregation_strategy="simple"
)

# Predict
text = "Erna Solberg er statsminister i Norge."
entities = ner(text)

print(entities)

Output:

[
    {'entity_group': 'PER', 'score': 0.99, 'word': 'Erna Solberg', 'start': 0, 'end': 12},
    {'entity_group': 'LOC', 'score': 0.99, 'word': 'Norge', 'start': 33, 'end': 38}
]

Using Transformers Directly

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "thivy/nb-bert-norwegian-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Tokenize input
text = "Oslo er hovedstaden i Norge."
inputs = tokenizer(text, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Decode predictions
labels = model.config.id2label
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, pred in zip(tokens, predictions[0]):
    if token not in ["[CLS]", "[SEP]", "[PAD]"]:
        print(f"{token}: {labels[pred.item()]}")

Label Mapping

id2label = {
    0: "O",
    1: "B-LOC",
    2: "B-MISC",
    3: "B-ORG",
    4: "B-PER",
    5: "I-LOC",
    6: "I-MISC",
    7: "I-ORG",
    8: "I-PER",
}

Evaluation Results

Test Set Performance

F1:        0.9329
Precision: 0.9300
Recall:    0.9358

Per-Entity Performance (Approximate)

Entity Precision Recall F1
PER 0.95 0.96 0.95
ORG 0.91 0.90 0.90
LOC 0.94 0.95 0.94
MISC 0.88 0.86 0.87

Limitations

  • Domain: Trained primarily on news and Wikipedia text; may not generalize well to informal Norwegian or specialized domains
  • Formality: Better on formal Norwegian (Bokmål and Nynorsk) than conversational text
  • Entity coverage: MISC category is underrepresented in training data
  • Temporal: May not recognize very recent entities (people, organizations) not in training data
  • Code-switching: Not optimized for texts mixing Norwegian with other languages

Ethical Considerations

  • The model may reflect biases present in news articles and Wikipedia
  • Person names in the training data are from public figures
  • Some entity recognitions may be politically or culturally sensitive

Training Infrastructure

  • Hardware: Apple M4 Mac (MPS)
  • Training time: ~2.5 hours for 11 epochs
  • Framework: PyTorch + HuggingFace Transformers

Citation

@misc{norwegian-ner-2024,
  author = {Thivyesh Ahilathasan},
  title = {Norwegian NER Model (nb-bert-base fine-tuned)},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/thivy/nb-bert-norwegian-ner}},
}

Base Model

@misc{kummervold2021operationalizing,
    title={Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model},
    author={Per E Kummervold and Javier de la Rosa and Freddy Wetjen and Svein Arne Brygfjeld},
    year={2021},
    eprint={2104.09617},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Training Dataset

See thivy/norwegian-ner-combined for NorNE and WikiANN citations.

License

CC-BY 4.0 (same as base model and dataset)

Acknowledgments

  • NBAiLab for the nb-bert-base model
  • Language Technology Group (LTG) at University of Oslo for the NorNE dataset
  • HuggingFace for the infrastructure and tools

Contact

Downloads last month
7
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for thivy/nb-bert-cleaned-norne-wikiann-ner

Finetuned
(24)
this model

Dataset used to train thivy/nb-bert-cleaned-norne-wikiann-ner

Paper for thivy/nb-bert-cleaned-norne-wikiann-ner

Evaluation results