---
language:
- de
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
- samheym/ger-dpr-collection
base_model:
- deepset/gbert-base
---

# Model Overview

GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance.
Training Configuration

- Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
- Training Dataset: samheym/ger-dpr-collection
- Dataset: 10% of randomly selected triples from the final dataset
- Vector Length: 128
- Maximum Document Length: 256 Tokens 
- Batch Size: 50
- Training Steps: 80,000
- Gradient Accumulation: 1 step
- Learning Rate: 5 × 10⁻⁶
- Optimizer: AdamW
- In-Batch Negatives: Included


## Usage
First install the PyLate library:

```bash
pip install -U pylate
```

### Retrieval 

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

```python
from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path=samheym/GerColBERT,
)
```


<!--
## Citation

### BibTeX

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->