SentenceTransformer based on gerulata/slovakbert
This is a sentence-transformers model finetuned from gerulata/slovakbert. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: gerulata/slovakbert
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("mrshu/sturovec-base-sk-v0")
sentences = [
'Veľká časť malých detí na Ukrajine trpí rôznymi zdravotnými problémami, ktoré boli doteraz pravdepodobne nediagnostikované, no odhaľujú sa v súvislosti s černobyľskou katastrofou. To by mohlo poukazovať na slabý systém zdravotnej starostlivosti a drsné životné podmienky.',
'Černobyľská katastrofa mala dôsledky mimo bývalého ZSSR.',
'Blair patrí k anglikánskej cirkvi.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
Evaluation
Metrics
Semantic Similarity
| Metric |
Value |
| pearson_cosine |
0.8347 |
| spearman_cosine |
0.8298 |
Binary Classification
| Metric |
validation_nli |
validation_rte |
| cosine_accuracy |
0.6663 |
0.5235 |
| cosine_accuracy_threshold |
0.9708 |
0.9567 |
| cosine_f1 |
0.4995 |
0.6388 |
| cosine_f1_threshold |
-0.0148 |
0.2217 |
| cosine_precision |
0.3331 |
0.471 |
| cosine_recall |
0.9988 |
0.9924 |
| cosine_ap |
0.2731 |
0.3712 |
| cosine_mcc |
-0.0283 |
-0.0635 |
Multi Task Dev
- Evaluated with
slovak_embeddings_v1.train.MultiTaskDevEvaluator
| Metric |
Value |
| validation_sts_pearson_cosine |
0.8347 |
| validation_sts_spearman_cosine |
0.8298 |
| validation_nli_cosine_accuracy |
0.6663 |
| validation_nli_cosine_accuracy_threshold |
0.9708 |
| validation_nli_cosine_f1 |
0.4995 |
| validation_nli_cosine_f1_threshold |
-0.0148 |
| validation_nli_cosine_precision |
0.3331 |
| validation_nli_cosine_recall |
0.9988 |
| validation_nli_cosine_ap |
0.2731 |
| validation_nli_cosine_mcc |
-0.0283 |
| validation_rte_cosine_accuracy |
0.5235 |
| validation_rte_cosine_accuracy_threshold |
0.9567 |
| validation_rte_cosine_f1 |
0.6388 |
| validation_rte_cosine_f1_threshold |
0.2217 |
| validation_rte_cosine_precision |
0.471 |
| validation_rte_cosine_recall |
0.9924 |
| validation_rte_cosine_ap |
0.3712 |
| validation_rte_cosine_mcc |
-0.0635 |
| validation_dev_overall |
0.4914 |
Training Details
Training Datasets
Unnamed Dataset
Unnamed Dataset
Unnamed Dataset
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
num_train_epochs: 20
multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
validation_sts_spearman_cosine |
validation_nli_cosine_ap |
validation_rte_cosine_ap |
validation_dev_overall |
| 0.3333 |
39 |
- |
0.7072 |
0.2996 |
0.4315 |
0.4794 |
| 0.6667 |
78 |
- |
0.7111 |
0.2988 |
0.4297 |
0.4799 |
| 1.0 |
117 |
- |
0.7182 |
0.2968 |
0.4279 |
0.4810 |
| 1.3333 |
156 |
- |
0.7272 |
0.2940 |
0.4256 |
0.4823 |
| 1.6667 |
195 |
- |
0.7371 |
0.2908 |
0.4195 |
0.4825 |
| 2.0 |
234 |
- |
0.7460 |
0.2862 |
0.4106 |
0.4809 |
| 2.3333 |
273 |
- |
0.7457 |
0.2824 |
0.4026 |
0.4769 |
| 2.6667 |
312 |
- |
0.7377 |
0.2822 |
0.3962 |
0.4721 |
| 3.0 |
351 |
- |
0.7415 |
0.2815 |
0.3943 |
0.4724 |
| 3.3333 |
390 |
- |
0.7471 |
0.2809 |
0.3916 |
0.4732 |
| 3.6667 |
429 |
- |
0.7522 |
0.2800 |
0.3903 |
0.4742 |
| 4.0 |
468 |
- |
0.7542 |
0.2796 |
0.3888 |
0.4742 |
| 4.2735 |
500 |
1.2693 |
- |
- |
- |
- |
| 4.3333 |
507 |
- |
0.7587 |
0.2788 |
0.3887 |
0.4754 |
| 4.6667 |
546 |
- |
0.7613 |
0.2780 |
0.3879 |
0.4757 |
| 5.0 |
585 |
- |
0.7642 |
0.2777 |
0.3867 |
0.4762 |
| 5.3333 |
624 |
- |
0.7673 |
0.2769 |
0.3865 |
0.4769 |
| 5.6667 |
663 |
- |
0.7674 |
0.2781 |
0.3861 |
0.4772 |
| 6.0 |
702 |
- |
0.7731 |
0.2769 |
0.3845 |
0.4782 |
| 6.3333 |
741 |
- |
0.7776 |
0.2764 |
0.3853 |
0.4797 |
| 6.6667 |
780 |
- |
0.7799 |
0.2758 |
0.3843 |
0.4800 |
| 7.0 |
819 |
- |
0.7825 |
0.2762 |
0.3842 |
0.4810 |
| 7.3333 |
858 |
- |
0.7856 |
0.2756 |
0.3830 |
0.4814 |
| 7.6667 |
897 |
- |
0.7866 |
0.2754 |
0.3824 |
0.4814 |
| 8.0 |
936 |
- |
0.7913 |
0.2748 |
0.3803 |
0.4821 |
| 8.3333 |
975 |
- |
0.7915 |
0.2746 |
0.3803 |
0.4821 |
| 8.5470 |
1000 |
0.4279 |
- |
- |
- |
- |
| 8.6667 |
1014 |
- |
0.7925 |
0.2746 |
0.3789 |
0.4820 |
| 9.0 |
1053 |
- |
0.7959 |
0.2739 |
0.3803 |
0.4834 |
| 9.3333 |
1092 |
- |
0.7974 |
0.2739 |
0.3762 |
0.4825 |
| 9.6667 |
1131 |
- |
0.7980 |
0.2740 |
0.3796 |
0.4839 |
| 10.0 |
1170 |
- |
0.8002 |
0.2738 |
0.3800 |
0.4847 |
| 10.3333 |
1209 |
- |
0.7971 |
0.2743 |
0.3770 |
0.4828 |
| 10.6667 |
1248 |
- |
0.8002 |
0.2741 |
0.3760 |
0.4835 |
| 11.0 |
1287 |
- |
0.8026 |
0.2737 |
0.3763 |
0.4842 |
| 11.3333 |
1326 |
- |
0.8017 |
0.2740 |
0.3744 |
0.4834 |
| 11.6667 |
1365 |
- |
0.8037 |
0.2741 |
0.3730 |
0.4836 |
| 12.0 |
1404 |
- |
0.8074 |
0.2737 |
0.3729 |
0.4847 |
| 12.3333 |
1443 |
- |
0.8062 |
0.2736 |
0.3747 |
0.4848 |
| 12.6667 |
1482 |
- |
0.8094 |
0.2735 |
0.3732 |
0.4854 |
| 12.8205 |
1500 |
0.2922 |
- |
- |
- |
- |
| 13.0 |
1521 |
- |
0.8102 |
0.2739 |
0.3706 |
0.4849 |
| 13.3333 |
1560 |
- |
0.8148 |
0.2723 |
0.3739 |
0.4870 |
| 13.6667 |
1599 |
- |
0.8136 |
0.2726 |
0.3729 |
0.4864 |
| 14.0 |
1638 |
- |
0.8140 |
0.2740 |
0.3688 |
0.4856 |
| 14.3333 |
1677 |
- |
0.8120 |
0.2738 |
0.3699 |
0.4852 |
| 14.6667 |
1716 |
- |
0.8153 |
0.2733 |
0.3693 |
0.4859 |
| 15.0 |
1755 |
- |
0.8211 |
0.2726 |
0.3692 |
0.4876 |
| 15.3333 |
1794 |
- |
0.8212 |
0.2726 |
0.3711 |
0.4883 |
| 15.6667 |
1833 |
- |
0.8189 |
0.2740 |
0.3711 |
0.4880 |
| 16.0 |
1872 |
- |
0.8224 |
0.2736 |
0.3696 |
0.4885 |
| 16.3333 |
1911 |
- |
0.8234 |
0.2726 |
0.3692 |
0.4884 |
| 16.6667 |
1950 |
- |
0.8248 |
0.2733 |
0.3677 |
0.4886 |
| 17.0 |
1989 |
- |
0.8276 |
0.2728 |
0.3662 |
0.4889 |
| 17.0940 |
2000 |
0.2114 |
- |
- |
- |
- |
| 17.3333 |
2028 |
- |
0.8264 |
0.2710 |
0.3714 |
0.4896 |
| 17.6667 |
2067 |
- |
0.8283 |
0.2713 |
0.3721 |
0.4906 |
| 18.0 |
2106 |
- |
0.8269 |
0.2724 |
0.3699 |
0.4897 |
| 18.3333 |
2145 |
- |
0.8291 |
0.2723 |
0.3718 |
0.4911 |
| 18.6667 |
2184 |
- |
0.8302 |
0.2720 |
0.3719 |
0.4914 |
| 19.0 |
2223 |
- |
0.8298 |
0.2731 |
0.3712 |
0.4914 |
Framework Versions
- Python: 3.13.0
- Sentence Transformers: 5.2.0
- Transformers: 4.57.3
- PyTorch: 2.9.1+cu128
- Accelerate: 1.12.0
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}