Norwegian NER Model (nb-bert-base fine-tuned)

Model Description

This model is NbAiLab/nb-bert-base fine-tuned on the thivy/norwegian-ner-combined dataset for Named Entity Recognition in Norwegian (Bokmål and Nynorsk).

Model Performance

Metric	Score
F1	0.9329
Precision	0.93
Recall	0.93

Best Epoch: 5 out of 20 (early stopped at epoch 11)

Supported Entity Types

Label	Description	Examples
PER	Person names	Erna Solberg, Ibsen
ORG	Organizations	Stortinget, NATO
LOC	Locations	Oslo, Norge, Europa
MISC	Miscellaneous	Nobels fredspris

Training Data

Dataset: thivy/norwegian-ner-combined

Training samples: 49,870
Validation samples: 14,289
Test samples: 13,450
Sources: NorNE + WikiANN Norwegian

The dataset combines:

NorNE (Norwegian Named Entities) - Bokmål and Nynorsk
WikiANN (Wikipedia-based NER) - Norwegian subset

Quality improvements:

12 problematic samples filtered
Entity type remapping (9 → 4 types)
Combined evaluation sets for general Norwegian NER

Training Procedure

Training Hyperparameters

{
    "learning_rate": 3.5e-5,
    "num_train_epochs": 20,
    "per_device_train_batch_size": 8,
    "per_device_eval_batch_size": 8,
    "weight_decay": 0.15,
    "warmup_ratio": 0.05,
    "lr_scheduler_type": "cosine_with_restarts",
    "num_cycles": 4,
    "early_stopping_patience": 6,
    "metric_for_best_model": "f1",
}

Training Strategy

Phase 5: Gentle LR Restarts

The model was trained using a cosine learning rate schedule with gentle restarts:

Max LR: 3.5e-5 (identified as "sweet spot" from Phase 3 analysis)
Restarts: 4 restarts over 20 epochs (5 cycles of ~4 epochs each)
Warmup: 5% (1 epoch)
Early Stopping: Patience of 6 epochs

Why this worked:

Conservative LR avoided catastrophic forgetting (Phase 4 failed with 1.0e-4)
Gentle restarts provided escape velocity from local minima
Achieved 10.4% lower loss than Phase 3
Stable gradients (1.3% spike rate)

Training Phases History

Phase	F1	Strategy	Result
Phase 1	-	Initial baseline	Established pipeline
Phase 2	-	Data filtering	Improved quality
Phase 3	0.9298	OneCycleLR	Good, but plateaued at epoch 6
Phase 4	0.9142	Aggressive restarts (1e-4)	❌ Catastrophic forgetting
Phase 5	0.9329	Gentle restarts (3.5e-5)	✅ Best model

Usage

Using Pipeline (Recommended)

from transformers import pipeline

# Load pipeline
ner = pipeline(
    "ner",
    model="thivy/nb-bert-norwegian-ner",
    aggregation_strategy="simple"
)

# Predict
text = "Erna Solberg er statsminister i Norge."
entities = ner(text)

print(entities)

Output:

[
    {'entity_group': 'PER', 'score': 0.99, 'word': 'Erna Solberg', 'start': 0, 'end': 12},
    {'entity_group': 'LOC', 'score': 0.99, 'word': 'Norge', 'start': 33, 'end': 38}
]

Using Transformers Directly

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "thivy/nb-bert-norwegian-ner"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Tokenize input
text = "Oslo er hovedstaden i Norge."
inputs = tokenizer(text, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.argmax(outputs.logits, dim=-1)

# Decode predictions
labels = model.config.id2label
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])

for token, pred in zip(tokens, predictions[0]):
    if token not in ["[CLS]", "[SEP]", "[PAD]"]:
        print(f"{token}: {labels[pred.item()]}")

Label Mapping

id2label = {
    0: "O",
    1: "B-LOC",
    2: "B-MISC",
    3: "B-ORG",
    4: "B-PER",
    5: "I-LOC",
    6: "I-MISC",
    7: "I-ORG",
    8: "I-PER",
}

Evaluation Results

Test Set Performance

F1:        0.9329
Precision: 0.9300
Recall:    0.9358

Per-Entity Performance (Approximate)

Entity	Precision	Recall	F1
PER	0.95	0.96	0.95
ORG	0.91	0.90	0.90
LOC	0.94	0.95	0.94
MISC	0.88	0.86	0.87

Limitations

Domain: Trained primarily on news and Wikipedia text; may not generalize well to informal Norwegian or specialized domains
Formality: Better on formal Norwegian (Bokmål and Nynorsk) than conversational text
Entity coverage: MISC category is underrepresented in training data
Temporal: May not recognize very recent entities (people, organizations) not in training data
Code-switching: Not optimized for texts mixing Norwegian with other languages

Ethical Considerations

The model may reflect biases present in news articles and Wikipedia
Person names in the training data are from public figures
Some entity recognitions may be politically or culturally sensitive

Training Infrastructure

Hardware: Apple M4 Mac (MPS)
Training time: ~2.5 hours for 11 epochs
Framework: PyTorch + HuggingFace Transformers

Citation

@misc{norwegian-ner-2024,
  author = {Thivyesh Ahilathasan},
  title = {Norwegian NER Model (nb-bert-base fine-tuned)},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/thivy/nb-bert-norwegian-ner}},
}

Base Model

@misc{kummervold2021operationalizing,
    title={Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model},
    author={Per E Kummervold and Javier de la Rosa and Freddy Wetjen and Svein Arne Brygfjeld},
    year={2021},
    eprint={2104.09617},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Training Dataset

See thivy/norwegian-ner-combined for NorNE and WikiANN citations.

License

CC-BY 4.0 (same as base model and dataset)

Acknowledgments

NBAiLab for the nb-bert-base model
Language Technology Group (LTG) at University of Oslo for the NorNE dataset
HuggingFace for the infrastructure and tools

Contact

GitHub: finetune-ner-norne-wikian
HuggingFace: @thivy

Downloads last month: 7

Safetensors

Model size

0.2B params

Tensor type

F32

Model tree for thivy/nb-bert-cleaned-norne-wikiann-ner

Base model

NbAiLab/nb-bert-base

Finetuned

(24)

this model

Dataset used to train thivy/nb-bert-cleaned-norne-wikiann-ner

Paper for thivy/nb-bert-cleaned-norne-wikiann-ner

Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model

Paper • 2104.09617 • Published Apr 19, 2021 • 1

Evaluation results

F1 on thivy/norwegian-ner-combined
test set self-reported

0.933
Precision on thivy/norwegian-ner-combined
test set self-reported

0.930
Recall on thivy/norwegian-ner-combined
test set self-reported

0.930