SentenceTransformer based on google-bert/bert-base-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-cased on the csv dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: google-bert/bert-base-cased
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • csv

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("cafierom/905_Statin_Contrastive")
# Run inference
sentences = [
    'CC(C)n1c(CC[C@@H](O)C[C@@H](O)CC([O-])=O)c(c(c1C(=O)NCc1cccc(c1)C(N)=O)-c1ccccc1)-c1ccc(F)cc1',
    'CC(C)c1nc(c(-c2ccc(F)cc2)n1\\C=C\\[C@@H](O)C[C@@H](O)CC(O)=O)-c1ccc(F)cc1',
    'CCn1nnc(n1)C(\\C=C\\[C@@H](O)C[C@@H](O)CC([O-])=O)=C(c1ccc(F)cc1)c1ccc(F)cc1',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.9994, -0.0483],
#         [ 0.9994,  1.0000, -0.0453],
#         [-0.0483, -0.0453,  1.0000]])

Training Details

Training Dataset

csv

  • Dataset: csv
  • Size: 116,941 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 17 tokens
    • mean: 70.84 tokens
    • max: 147 tokens
    • min: 16 tokens
    • mean: 62.37 tokens
    • max: 141 tokens
    • 0: ~66.30%
    • 2: ~33.70%
  • Samples:
    premise hypothesis label
    CCC@HC(=O)O[C@H]1CC@@HC[C@@H]2C=CC@HC@HC12 CCCCCCCCCCCCCCCC1(O)CCOC(O)C1 2
    OC@HCC([O-])=O C[C@@]1(O)CC@H\C=C\c1ccc(Cl)cc1Cl 2
    CC(C)c1nc(nc(-c2ccc(F)cc2)c1\C=C[C@@H]1CC@@HCC(=O)O1)-c1ccc(F)cc1 CC(C)CC@HC(=O)N1CCC[C@H]1C(=O)NC@@HC(=O)NCC(=O)NCC(O)=O 2
  • Loss: SoftmaxLoss

Evaluation Dataset

csv

  • Dataset: csv
  • Size: 20,637 evaluation samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 17 tokens
    • mean: 69.69 tokens
    • max: 147 tokens
    • min: 16 tokens
    • mean: 59.63 tokens
    • max: 141 tokens
    • 0: ~67.40%
    • 2: ~32.60%
  • Samples:
    premise hypothesis label
    COC(=O)CC@HCC@H\C=C\n1c(C(C)C)c(Br)c(c1-c1ccc(F)cc1)-c1ccc(F)cc1 CC@H[C@H]1CC[C@H]2[C@@H]3C@@HOC(C)=O 2
    CC(C)n1c(CCC@@HCC@@HCC([O-])=O)c(c(c1C(=O)Nc1ccc(O)cc1)-c1ccccc1)-c1ccc(F)cc1 CCC@HC(=O)O[C@H]1CC@HC=C2C=CC@HC@H[C@@H]12 0
    CC(C)C(=O)O[C@H]1CC@@HC=C2C=CC@HC@HC12 CC(C)c1c(nc(-c2ccc(F)cc2)n1\C=C[C@@H](O)CC@@HCC([O-])=O)-c1ccc(F)cc1 0
  • Loss: SoftmaxLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • weight_decay: 0.01
  • num_train_epochs: 10
  • warmup_steps: 100
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 128
  • per_device_eval_batch_size: 128
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1094 100 0.4346
0.2188 200 0.0656
0.3282 300 0.0082
0.4376 400 0.007
0.5470 500 0.0056
0.6565 600 0.0054
0.7659 700 0.0006
0.8753 800 0.0005
0.9847 900 0.0004
1.0941 1000 0.0004
1.2035 1100 0.0003
1.3129 1200 0.0003
1.4223 1300 0.0003
1.5317 1400 0.0003
1.6411 1500 0.0002
1.7505 1600 0.0002
1.8600 1700 0.0002
1.9694 1800 0.0002
2.0788 1900 0.0002
2.1882 2000 0.0002
2.2976 2100 0.0001
2.4070 2200 0.0001
2.5164 2300 0.0001
2.6258 2400 0.0001
2.7352 2500 0.0001
2.8446 2600 0.0001
2.9540 2700 0.0001
3.0635 2800 0.0001
3.1729 2900 0.0001
3.2823 3000 0.0001
3.3917 3100 0.0001
3.5011 3200 0.0001
3.6105 3300 0.0001
3.7199 3400 0.0001
3.8293 3500 0.0001
3.9387 3600 0.0001
4.0481 3700 0.0001
4.1575 3800 0.0001
4.2670 3900 0.0001
4.3764 4000 0.0
4.4858 4100 0.0
4.5952 4200 0.0
4.7046 4300 0.0
4.8140 4400 0.0
4.9234 4500 0.0
5.0328 4600 0.0
5.1422 4700 0.0
5.2516 4800 0.0
5.3611 4900 0.0
5.4705 5000 0.0
5.5799 5100 0.0
5.6893 5200 0.0
5.7987 5300 0.0
5.9081 5400 0.0
6.0175 5500 0.0002
6.1269 5600 0.0
6.2363 5700 0.0
6.3457 5800 0.0
6.4551 5900 0.0
6.5646 6000 0.0
6.6740 6100 0.0
6.7834 6200 0.0
6.8928 6300 0.0
7.0022 6400 0.0
7.1116 6500 0.0
7.2210 6600 0.0
7.3304 6700 0.0
7.4398 6800 0.0
7.5492 6900 0.0
7.6586 7000 0.0
7.7681 7100 0.0
7.8775 7200 0.0
7.9869 7300 0.0
8.0963 7400 0.0
8.2057 7500 0.0
8.3151 7600 0.0
8.4245 7700 0.0
8.5339 7800 0.0
8.6433 7900 0.0
8.7527 8000 0.0
8.8621 8100 0.0
8.9716 8200 0.0
9.0810 8300 0.0022
9.1904 8400 0.0019
9.2998 8500 0.0001
9.4092 8600 0.0
9.5186 8700 0.0
9.6280 8800 0.0
9.7374 8900 0.0
9.8468 9000 0.0
9.9562 9100 0.0

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cafierom/905_Statin_Contrastive

Finetuned
(2738)
this model

Paper for cafierom/905_Statin_Contrastive