--- language: - de tags: - ColBERT - PyLate - sentence-transformers - sentence-similarity pipeline_tag: sentence-similarity library_name: PyLate datasets: - samheym/ger-dpr-collection base_model: - deepset/gbert-base --- # Model Overview GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance. Training Configuration - Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base) - Training Dataset: samheym/ger-dpr-collection - Dataset: 10% of randomly selected triples from the final dataset - Vector Length: 128 - Maximum Document Length: 256 Tokens - Batch Size: 50 - Training Steps: 80,000 - Gradient Accumulation: 1 step - Learning Rate: 5 × 10⁻⁶ - Optimizer: AdamW - In-Batch Negatives: Included ## Usage First install the PyLate library: ```bash pip install -U pylate ``` ### Retrieval PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval. ```python from pylate import indexes, models, retrieve # Step 1: Load the ColBERT model model = models.ColBERT( model_name_or_path=samheym/GerColBERT, ) ```