SentenceTransformer based on microsoft/deberta-v3-small

This is a sentence-transformers model finetuned from microsoft/deberta-v3-small on the nli-pairs, sts-label, vitaminc-pairs, qnli-contrastive, scitail-pairs-qa, scitail-pairs-pos, xsum-pairs and compression-pairs datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: microsoft/deberta-v3-small
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Training Datasets:
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DebertaV2Model 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2")
# Run inference
sentences = [
    'All the members of one particular species in a give area are called a population.',
    'All the members of a species that live in the same area form a population.',
    'A(n) anaerobic organism does not need oxygen for growth and dies in its presence.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Datasets

nli-pairs

Dataset: nli-pairs at d482672
Size: 7,500 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 5 tokens
mean: 16.62 tokens
max: 62 tokens

min: 4 tokens
mean: 9.46 tokens
max: 29 tokens

	sentence1	sentence2
type	string	string
details	min: 5 tokens mean: 16.62 tokens max: 62 tokens	min: 4 tokens mean: 9.46 tokens max: 29 tokens

Samples:

sentence1	sentence2
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`
`Children smiling and waving at camera`	`There are children present`
`A boy is jumping on skateboard in the middle of a red bridge.`	`The boy does a skateboarding trick.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

sts-label

Dataset: sts-label at ab7a5ac
Size: 5,749 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 6 tokens
mean: 9.81 tokens
max: 27 tokens

min: 5 tokens
mean: 9.74 tokens
max: 25 tokens

min: 0.0
mean: 0.54
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 6 tokens mean: 9.81 tokens max: 27 tokens	min: 5 tokens mean: 9.74 tokens max: 25 tokens	min: 0.0 mean: 0.54 max: 1.0

Samples:

sentence1	sentence2	score
`A plane is taking off.`	`An air plane is taking off.`	`1.0`
`A man is playing a large flute.`	`A man is playing a flute.`	`0.76`
`A man is spreading shreded cheese on a pizza.`	`A man is spreading shredded cheese on an uncooked pizza.`	`0.76`

Loss: AnglELoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_angle_sim"
}

vitaminc-pairs

Dataset: vitaminc-pairs at be6febb
Size: 3,695 training samples
Columns: label, sentence1, and sentence2
Approximate statistics based on the first 1000 samples:
label sentence1 sentence2
type int string string
details
1: 100.00%

min: 6 tokens
mean: 16.02 tokens
max: 56 tokens

min: 8 tokens
mean: 38.57 tokens
max: 502 tokens

	label	sentence1	sentence2
type	int	string	string
details	1: 100.00%	min: 6 tokens mean: 16.02 tokens max: 56 tokens	min: 8 tokens mean: 38.57 tokens max: 502 tokens

Samples:

label	sentence1	sentence2
`1`	`The movie Yevadu grossed more than 390 million globally .`	`It also took the second spot in the list of the top 10 films with highest first week shares from AP.The film collected 390.5 million in 9 days , and more than 60 million from other areas , including Karnataka , the rest of India , and overseas territories , enabling it to cross the 400 million mark at the worldwide Box office , becoming Ram Charan 's fourth film to cross that mark .`
`1`	`The film 's score is based on 33 critics .`	`Metacritic gave the film a score of 44 out of 100 , based on 33 critics , indicating '' mixed or average reviews '' '' . ''`
`1`	`Back to Black ( album ) sold less than 15 million copies .`	`Worldwide , the album has sold over 12 million copies .`

Loss: GISTEmbedLoss with these parameters:

{'guide': SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), 'temperature': 0.05}

qnli-contrastive

Dataset: qnli-contrastive at bcdcba7
Size: 7,500 training samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string int
details
min: 6 tokens
mean: 13.92 tokens
max: 40 tokens

min: 6 tokens
mean: 35.87 tokens
max: 499 tokens

0: 100.00%

	sentence1	sentence2	label
type	string	string	int
details	min: 6 tokens mean: 13.92 tokens max: 40 tokens	min: 6 tokens mean: 35.87 tokens max: 499 tokens	0: 100.00%

Samples:

sentence1	sentence2	label
`Who was the biggest artist that CBS had?`	`CBS Inc., now CBS Corporation, retained the rights to the CBS name for music recordings but granted Sony a temporary license to use the CBS name.`	`0`
`What does a video-conference use that allows communication in live situations?`	`This is often accomplished by the use of a multipoint control unit (a centralized distribution and call management system) or by a similar non-centralized multipoint capability embedded in each videoconferencing unit.`	`0`
`What is the population of Saint Helena?`	`It is part of the British Overseas Territory of Saint Helena, Ascension and Tristan da Cunha.`	`0`

Loss: OnlineContrastiveLoss

scitail-pairs-qa

Dataset: scitail-pairs-qa at 0cc4353
Size: 14,987 training samples
Columns: sentence2 and sentence1
Approximate statistics based on the first 1000 samples:
sentence2 sentence1
type string string
details
min: 7 tokens
mean: 15.86 tokens
max: 41 tokens

min: 7 tokens
mean: 15.1 tokens
max: 41 tokens

	sentence2	sentence1
type	string	string
details	min: 7 tokens mean: 15.86 tokens max: 41 tokens	min: 7 tokens mean: 15.1 tokens max: 41 tokens

Samples:

sentence2	sentence1
`The largest known proteins are titins.`	`What are the largest known proteins?`
`Remote-control vehicles are able to go to the deepest ocean floor.`	`What type of vehicles is able to go to the deepest ocean floor?`
`Vaccine is a preventative measure that is often delivered by injection into the arm.`	`What preventative measure is often delivered by injection into the arm?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

scitail-pairs-pos

Dataset: scitail-pairs-pos at 0cc4353
Size: 8,600 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 7 tokens
mean: 23.75 tokens
max: 67 tokens

min: 7 tokens
mean: 15.47 tokens
max: 41 tokens

	sentence1	sentence2
type	string	string
details	min: 7 tokens mean: 23.75 tokens max: 67 tokens	min: 7 tokens mean: 15.47 tokens max: 41 tokens

Samples:

sentence1	sentence2
`The movement of molecules from a location where they are in a high concentration to an area where they are in a lower concentration is called diffusion .`	`You call the movement of a substance from an area of a higher amount toward an area of lower amount diffusion.`
`Climate is the average weather of an area over a long period of time.`	`Climate is the long-term average of weather in a particular spot.`
`Sunlight is captured by green plants during the process of photosynthesis to produce glucose, a carbohydrate from water and carbon dioxide.`	`Photosynthesis converts carbon dioxide and water into glucose.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

xsum-pairs

Dataset: xsum-pairs at 788ddaf
Size: 3,750 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 28 tokens
mean: 355.39 tokens
max: 512 tokens

min: 8 tokens
mean: 27.3 tokens
max: 61 tokens

	sentence1	sentence2
type	string	string
details	min: 28 tokens mean: 355.39 tokens max: 512 tokens	min: 8 tokens mean: 27.3 tokens max: 61 tokens

Samples:

sentence1	sentence2
Prices rose in all council areas and across all property types, but there were wide variations. In Derry City and Strabane prices were up by 11% but by less than 2% in Fermanagh and Omagh. The figures are from the NI Residential Property Price Index, which analyses almost all sales, including cash deals. The average standardised price, across all property types, is now £125,480. That compares to £97,428 at the bottom of the market in 2012, but is still far below the bubble-era peak of £224,670. Over the year the largest rise was in the apartment sector with prices up by 11%. For all other property types, the increase was about 5%. The council area with the highest average price is Lisburn and Castlereagh (£149,600) and the lowest is Derry City and Strabane (£108,464). The number of properties sold in 2016 was 21,669, down slightly on the 2015 figure. Northern Ireland experienced a huge house price bubble in the years leading up to 2007 before the market crashed. Prices more than halved between 2007 and early 2013 but have been increasing gradually since then.	`House prices in Northern Ireland rose by almost 6% in 2016, according to official figures.`
English and French clubs intend to break away from the Heineken Cup and create their own tournament. "It could well be the end of professional rugby in Scotland if the competition wasn't to go ahead," Nicol told BBC Scotland. "I don't think you can fill a hole of that amount with anything else." Let's get qualification sorted out and based on a meritocracy and then the distribution of revenues is for the boardrooms The Scottish Rugby Union currently receives about £5m per year for Glasgow Warriors and Edinburgh's participation in the Heineken Cup. European Rugby Cup (ERC), which has run the Heineken Cup since it began in 1995, wants to re-open negotiations about the tournament's future but English Premiership and French Top 14 clubs insist they will not attend talks planned by the organising body next month. They will quit the competition at the end of the season, citing factors such as their view that the Heineken Cup structure favours teams from the Pro12, which is made up of sides from Wales, Scotland, Ireland and Italy, and distribution of revenue. Nicol, who won the Heineken Cup with Bath in 1998, insists that arguments over the tournament format is a repetitive issue and he hopes "common sense" will prevail for the good of the game in Scotland. "It happens every few years," he told BBC Scotland. "The English and the French flex their collective muscles when the contract is coming to an end. "But this year, it's very different, because they've got a television deal on the table and it's a real clear and present danger. "I think there's an acceptance that the current format of the Heineken Cup will cease and there will be a new competition. Media playback is not supported on this device "Then we just need to ensure and hope that Scotland are heavily involved in it." Nicol conceded that the main stumbling block for advancing discussions was the perception that Celtic nations are favoured in the qualification process. At present, Ireland and Wales each have three sides guaranteed a place, while Scotland and Italy have two apiece. Nicol believes the English and French unions want to put a stop to automatic qualification, which could bring about the end of lucrative revenue for Glasgow and Edinburgh, although ending guaranteed entry may be necessary to ensure the future of a pan-European competition. The former Scotland captain said if the tournament comes to an end it would be "a sporting disaster" adding that "the Heineken Cup has been a fantastic competition". He added: "Where it's flawed is in the qualification. I don't think the two Scottish sides and the Italian sides or the Irish sides should qualify automatically. "So let's get qualification sorted out and based on a meritocracy and then the distribution of revenues is for the boardrooms. "There's a bit of posturing from both sides, but I just hope it's a bit of brinksmanship and they get around the table and sort something out - and we get a competition. "It might not be the Heineken Cup as we call it now, but hopefully we'll get something like it."	`Professional rugby union in Scotland could end if there is no European competition next season, fears former national captain Andy Nicol.`
The German was 0.203 seconds quicker than Hamilton, with Ferrari's Kimi Raikkonen third, a second off the pace. Mercedes set their times on the super-soft tyre, while Ferrari used the soft, which would account for about half the gap between the two cars. Ferrari's Sebastian Vettel was fourth, ahead of Force India's Sergio Perez. Hamilton enters the race nine points ahead of Rosberg in the championship after recovering from 21st on the grid to finish third at the Belgian Grand Prix last weekend, as Rosberg won. Ferrari have used the last of their remaining engine development 'tokens' ahead of their home race in an attempt to boost their competitiveness after a slump in form that has seen them lose second place in the constructors' championship to Red Bull. The fastest Red Bull was Max Verstappen in eighth, behind Haas driver Romain Grosjean and Williams' Valtteri Bottas, whose team-mate Felipe Massa announced on Thursday that he would retire at the end of the year. Verstappen remains the focus of attention following his controversial battle with Raikkonen in Belgium. Raikkonen has criticised Verstappen for being too dangerous, while the Dutchman said he would not change his driving because others were not happy. The stewards took no action against Verstappen in Spa, but BBC Sport has learned that Charlie Whiting, the F1 director of governing body the FIA, felt that Verstappen's late move in defence at 200mph as Raikkonen attacked was on the edge of acceptability. Whiting told the teams in a meeting on Thursday that he felt Verstappen could have received a black-and-white warning flag for his driving. The black-and-white flag is an indication of unsportsmanlike behaviour and is only shown once. If the driver commits the same offence again he can be disqualified from the race. Whiting's intervention raised the stakes in the debate ahead of the drivers' briefing after practice on Friday afternoon, where the incident is expected to be discussed. It was a relatively low-key session on track, despite a number of drivers running off the track at the tricky Monza chicanes in the warm sunshine. McLaren's session came to an unfortunate end as Fernando Alonso was forced to pit with a gearshift problem. He was 13th, with team-mate Jenson Button 11th, the drivers expecting their most difficult weekend of the year because of the lack of power of the Honda engine, which still lags despite recent updates. Button and Verstappen ran the halo head protection system in the first part of the session as trials continue ahead of the planned introduction of the device in 2018. Italian Grand Prix first practice results Italian Grand Prix coverage details	`Nico Rosberg headed team-mate Lewis Hamilton as Mercedes dominated first practice at the Italian Grand Prix.`

Loss: MultipleNegativesSymmetricRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

compression-pairs

Dataset: compression-pairs at 605bc91
Size: 45,000 training samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 10 tokens
mean: 31.78 tokens
max: 170 tokens

min: 5 tokens
mean: 10.14 tokens
max: 29 tokens

	sentence1	sentence2
type	string	string
details	min: 10 tokens mean: 31.78 tokens max: 170 tokens	min: 5 tokens mean: 10.14 tokens max: 29 tokens

Samples:

sentence1	sentence2
`The USHL completed an expansion draft on Monday as 10 players who were on the rosters of USHL teams during the 2009-10 season were selected by the League's two newest entries, the Muskegon Lumberjacks and Dubuque Fighting Saints.`	`USHL completes expansion draft`
`NRT LLC, one of the nation's largest residential real estate brokerage companies, announced several executive appointments within its Coldwell Banker Residential Brokerage operations in Southern California.`	`NRT announces executive appointments at its Coldwell Banker operations in Southern California`
`A new survey shows 30 percent of Californians use Twitter, and more and more of us are using our smart phones to go online.`	`Survey: 30 percent of Californians use Twitter`

Loss: MultipleNegativesSymmetricRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Datasets

nli-pairs

Dataset: nli-pairs at d482672
Size: 2,000 evaluation samples
Columns: sentence1 and sentence2
Approximate statistics based on the first 1000 samples:
sentence1 sentence2
type string string
details
min: 5 tokens
mean: 17.64 tokens
max: 63 tokens

min: 4 tokens
mean: 9.67 tokens
max: 29 tokens

	sentence1	sentence2
type	string	string
details	min: 5 tokens mean: 17.64 tokens max: 63 tokens	min: 4 tokens mean: 9.67 tokens max: 29 tokens

Samples:

sentence1	sentence2
`Two women are embracing while holding to go packages.`	`Two woman are holding packages.`
`Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.`	`Two kids in numbered jerseys wash their hands.`
`A man selling donuts to a customer during a world exhibition event held in the city of Angeles`	`A man selling donuts to a customer.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

scitail-pairs-pos

Dataset: scitail-pairs-pos at 0cc4353
Size: 1,304 evaluation samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string int
details
min: 5 tokens
mean: 22.52 tokens
max: 67 tokens

min: 8 tokens
mean: 15.34 tokens
max: 36 tokens

0: ~47.50%
1: ~52.50%

	sentence1	sentence2	label
type	string	string	int
details	min: 5 tokens mean: 22.52 tokens max: 67 tokens	min: 8 tokens mean: 15.34 tokens max: 36 tokens	0: ~47.50% 1: ~52.50%

Samples:

sentence1	sentence2	label
`An introduction to atoms and elements, compounds, atomic structure and bonding, the molecule and chemical reactions.`	`Replace another in a molecule happens to atoms during a substitution reaction.`	`0`
`Wavelength The distance between two consecutive points on a sinusoidal wave that are in phase;`	`Wavelength is the distance between two corresponding points of adjacent waves called.`	`1`
`humans normally have 23 pairs of chromosomes.`	`Humans typically have 23 pairs pairs of chromosomes.`	`1`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

qnli-contrastive

Dataset: qnli-contrastive at bcdcba7
Size: 2,000 evaluation samples
Columns: sentence1, sentence2, and label
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label
type string string int
details
min: 6 tokens
mean: 14.13 tokens
max: 36 tokens

min: 4 tokens
mean: 36.58 tokens
max: 225 tokens

0: 100.00%

	sentence1	sentence2	label
type	string	string	int
details	min: 6 tokens mean: 14.13 tokens max: 36 tokens	min: 4 tokens mean: 36.58 tokens max: 225 tokens	0: 100.00%

Samples:

sentence1	sentence2	label
`What came into force after the new constitution was herald?`	`As of that day, the new constitution heralding the Second Republic came into force.`	`0`
`What is the first major city in the stream of the Rhine?`	`The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz.`	`0`
`What is the minimum required if you want to teach in Canada?`	`In most provinces a second Bachelor's Degree such as a Bachelor of Education is required to become a qualified teacher.`	`0`

Loss: OnlineContrastiveLoss

sts-label

Dataset: sts-label at ab7a5ac
Size: 1,500 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 5 tokens
mean: 14.77 tokens
max: 45 tokens

min: 6 tokens
mean: 14.74 tokens
max: 49 tokens

min: 0.0
mean: 0.47
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 5 tokens mean: 14.77 tokens max: 45 tokens	min: 6 tokens mean: 14.74 tokens max: 49 tokens	min: 0.0 mean: 0.47 max: 1.0

Samples:

sentence1	sentence2	score
`A man with a hard hat is dancing.`	`A man wearing a hard hat is dancing.`	`1.0`
`A young child is riding a horse.`	`A child is riding a horse.`	`0.95`
`A man is feeding a mouse to a snake.`	`The man is feeding a mouse to the snake.`	`1.0`

Loss: AnglELoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_angle_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 28
per_device_eval_batch_size: 16
learning_rate: 3e-06
weight_decay: 1e-10
num_train_epochs: 5
max_steps: 5000
lr_scheduler_type: cosine
warmup_ratio: 0.33
save_safetensors: False
fp16: True
hub_model_id: bobox/DeBERTaV3-small-ST-checkpoints-tmp
hub_strategy: checkpoint
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 28
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 3e-06
weight_decay: 1e-10
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: 5000
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.33
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: bobox/DeBERTaV3-small-ST-checkpoints-tmp
hub_strategy: checkpoint
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	nli-pairs loss	sts-label loss	scitail-pairs-pos loss	qnli-contrastive loss
None	0	-	3.3906	6.4037	2.3949	2.6789
0.0723	250	3.2471	3.2669	6.3326	2.3286	2.6008
0.1445	500	3.051	3.0717	6.5578	2.0277	2.0795
0.2168	750	2.3717	2.8445	7.5564	1.5729	1.1601
0.2890	1000	1.5228	2.5520	8.3864	1.1221	0.7480
0.3613	1250	1.5747	2.1439	8.7993	0.9512	0.5071
0.4335	1500	1.2114	1.7986	9.0748	0.8195	0.3715
0.5058	1750	1.1832	1.5665	9.1778	0.6956	0.2920
0.5780	2000	0.9078	1.4173	9.3829	0.6840	0.2488
0.6503	2250	0.8436	1.3196	9.4585	0.6831	0.1584
0.7225	2500	0.8744	1.2192	9.5395	0.6232	0.1527
0.7948	2750	1.1809	1.1600	9.4297	0.5681	0.1369
0.8671	3000	0.7233	1.1149	9.4893	0.5523	0.1614
0.9393	3250	0.7862	1.0738	9.5408	0.5372	0.1291
1.0116	3500	1.0888	1.0328	9.5612	0.5286	0.1281
1.0838	3750	0.8116	1.0304	9.4794	0.5239	0.1144
1.1561	4000	1.0436	1.0215	9.4184	0.5278	0.0973
1.2283	4250	0.9298	1.0107	9.4322	0.5221	0.0970
1.3006	4500	0.682	1.0093	9.4643	0.5186	0.0951
1.3728	4750	0.9863	1.0080	9.4627	0.5176	0.0948
1.4451	5000	1.0022	1.0076	9.4645	0.5179	0.0945

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.41.2
PyTorch: 2.3.0+cu121
Accelerate: 0.31.0
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

AnglELoss

@misc{li2023angleoptimized,
    title={AnglE-optimized Text Embeddings}, 
    author={Xianming Li and Jing Li},
    year={2023},
    eprint={2309.12871},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

GISTEmbedLoss

@misc{solatorio2024gistembed,
    title={GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning}, 
    author={Aivin V. Solatorio},
    year={2024},
    eprint={2402.16829},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Downloads last month: 1

Model tree for bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2

Base model

microsoft/deberta-v3-small

Finetuned

(166)

this model

Datasets used to train bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2

Papers for bobox/DeBERTaV3-small-GeneralSentenceTransformer-v2

GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embedding Fine-tuning

Paper • 2402.16829 • Published Feb 26, 2024