metadata
base_model: BAAI/bge-small-en
datasets: []
language: []
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
- dot_accuracy@1
- dot_accuracy@3
- dot_accuracy@5
- dot_accuracy@10
- dot_precision@1
- dot_precision@3
- dot_precision@5
- dot_precision@10
- dot_recall@1
- dot_recall@3
- dot_recall@5
- dot_recall@10
- dot_ndcg@10
- dot_mrr@10
- dot_map@100
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:1010
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: >-
How does Prompt-RAG differ from traditional vector embedding-based
methodologies?
sentences:
- >-
Prompt-RAG differs from traditional vector embedding-based methodologies
by adopting a more direct and flexible retrieval process based on
natural language prompts, eliminating the need for a vector database or
an algorithm for indexing and selecting vectors.
- >-
By introducing a pre-aligned phrase prior to the standard SFT stage,
LLMs are guided to concentrate on the aligned knowledge, thereby
unlocking their internal alignment abilities and improving their
performance.
- >-
The accuracy of GPT 3.5 on 2500 overall TeleQnA questions related to
3GPP documents is 60.1, while the accuracy of GPT 3.5 + Telco-RAG is 6.9
points higher.
- source_sentence: >-
Explain the concept of in-context learning as described in the paper 'An
explanation of in-context learning as implicit Bayesian inference'.
sentences:
- >-
The main theme of the paper is that language models can learn to perform
many tasks in a zero-shot setting, without any explicit supervision.
- >-
In-context learning, as explained in the paper, is a process where a
language model uses the context provided in the input to make
predictions or generate outputs without explicit training on the
specific task. The paper argues that this process can be understood as
an implicit form of Bayesian inference.
- >-
The paper was presented in the 55th Annual Meeting of the Association
for Computational Linguistics.
- source_sentence: What is the purpose of the survey conducted by Huang et al. (2023)?
sentences:
- >-
The purpose of the survey conducted by Huang et al. (2023) is to provide
a comprehensive overview of hallucination in large language models,
including its principles, taxonomy, challenges, and open questions.
- >-
The study of Human and American Translation Learning contributes to
language development by understanding the cognitive processes involved
in translating between languages, which can lead to improved teaching
methods and translation technology.
- >-
Using profile data, triplet examples are constructed in the format of
(π₯π, π₯ πβ, π₯ π+). The anchor example π₯π is constructed as the
combination of the content ππ and the corresponding label ππ.
- source_sentence: Who is the first author of the paper and what is their last name?
sentences:
- >-
The key findings are that Vul-RAG achieves the highest accuracy and
pairwise accuracy among all baselines, substantially outperforming the
best baseline LLMAO. It also achieves the best trade-off between recall
and precision.
- >-
The first author of the paper is Nandan Thakur. Their last name is
Thakur.
- >-
The paper was presented at the 2022 Conference on Empirical Methods in
Natural Language Processing (EMNLP).
- source_sentence: >-
Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ
Dataset and HotpotQA.
sentences:
- >-
For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22,
which is significantly higher than BM25 + MQ's accuracy of 25.19. For
HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval
accuracy of 49.52.
- >-
The paper was presented at the 17th Annual International ACM-SIGIR
Conference on Research and Development in Information Retrieval.
- >-
The proof for Equation 5 progresses from Equation 20 to Equation 22 by
applying the transformation motivated by Xie et al. [2021] and
introducing the term p(R, x1:iβ1|z) to the equation.
model-index:
- name: SentenceTransformer based on BAAI/bge-small-en
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.01782178217821782
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.04356435643564356
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.06534653465346535
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.12475247524752475
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.01782178217821782
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.015841584158415842
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.016039603960396043
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.015841584158415842
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.00001839902956558168
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.00004498766525563503
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.00007262670252004521
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.00015079859335392304
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.016300874257683427
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.04234598459845988
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.0018766020656866668
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.01782178217821782
name: Dot Accuracy@1
- type: dot_accuracy@3
value: 0.04356435643564356
name: Dot Accuracy@3
- type: dot_accuracy@5
value: 0.06534653465346535
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 0.12475247524752475
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.01782178217821782
name: Dot Precision@1
- type: dot_precision@3
value: 0.015841584158415842
name: Dot Precision@3
- type: dot_precision@5
value: 0.016039603960396043
name: Dot Precision@5
- type: dot_precision@10
value: 0.015841584158415842
name: Dot Precision@10
- type: dot_recall@1
value: 0.00001839902956558168
name: Dot Recall@1
- type: dot_recall@3
value: 0.00004498766525563503
name: Dot Recall@3
- type: dot_recall@5
value: 0.00007262670252004521
name: Dot Recall@5
- type: dot_recall@10
value: 0.00015079859335392304
name: Dot Recall@10
- type: dot_ndcg@10
value: 0.016300874257683427
name: Dot Ndcg@10
- type: dot_mrr@10
value: 0.04234598459845988
name: Dot Mrr@10
- type: dot_map@100
value: 0.0018766020656866668
name: Dot Map@100
- type: cosine_accuracy@1
value: 0.019801980198019802
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.040594059405940595
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.06534653465346535
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.12673267326732673
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.019801980198019802
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.01485148514851485
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.014851485148514853
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.016831683168316833
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.000019670857914229207
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.00003554268094376118
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.0000667664165823309
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.0001670844654494185
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.01679069935920913
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.04252396668238257
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.002057887757857092
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.019801980198019802
name: Dot Accuracy@1
- type: dot_accuracy@3
value: 0.040594059405940595
name: Dot Accuracy@3
- type: dot_accuracy@5
value: 0.06534653465346535
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 0.12673267326732673
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.019801980198019802
name: Dot Precision@1
- type: dot_precision@3
value: 0.01485148514851485
name: Dot Precision@3
- type: dot_precision@5
value: 0.014851485148514853
name: Dot Precision@5
- type: dot_precision@10
value: 0.016831683168316833
name: Dot Precision@10
- type: dot_recall@1
value: 0.000019670857914229207
name: Dot Recall@1
- type: dot_recall@3
value: 0.00003554268094376118
name: Dot Recall@3
- type: dot_recall@5
value: 0.0000667664165823309
name: Dot Recall@5
- type: dot_recall@10
value: 0.0001670844654494185
name: Dot Recall@10
- type: dot_ndcg@10
value: 0.01679069935920913
name: Dot Ndcg@10
- type: dot_mrr@10
value: 0.04252396668238257
name: Dot Mrr@10
- type: dot_map@100
value: 0.002057887757857092
name: Dot Map@100
- type: cosine_accuracy@1
value: 0.01881188118811881
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.03762376237623762
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.06435643564356436
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.1306930693069307
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.01881188118811881
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.013861386138613862
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.015841584158415842
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.01722772277227723
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.000018836739119030395
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.00003852282962664283
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.00007907232140954174
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.00018073758516299118
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.01704492626324548
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.04188786735816444
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.002251865468050825
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.01881188118811881
name: Dot Accuracy@1
- type: dot_accuracy@3
value: 0.03762376237623762
name: Dot Accuracy@3
- type: dot_accuracy@5
value: 0.06435643564356436
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 0.1306930693069307
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.01881188118811881
name: Dot Precision@1
- type: dot_precision@3
value: 0.013861386138613862
name: Dot Precision@3
- type: dot_precision@5
value: 0.015841584158415842
name: Dot Precision@5
- type: dot_precision@10
value: 0.01722772277227723
name: Dot Precision@10
- type: dot_recall@1
value: 0.000018836739119030395
name: Dot Recall@1
- type: dot_recall@3
value: 0.00003852282962664283
name: Dot Recall@3
- type: dot_recall@5
value: 0.00007907232140954174
name: Dot Recall@5
- type: dot_recall@10
value: 0.00018073758516299118
name: Dot Recall@10
- type: dot_ndcg@10
value: 0.01704492626324548
name: Dot Ndcg@10
- type: dot_mrr@10
value: 0.04188786735816444
name: Dot Mrr@10
- type: dot_map@100
value: 0.002251865468050825
name: Dot Map@100
- type: cosine_accuracy@1
value: 0.01881188118811881
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 0.03663366336633663
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 0.06435643564356436
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 0.1306930693069307
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.01881188118811881
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.013531353135313529
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.015643564356435644
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.01722772277227723
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.000018836739119030395
name: Cosine Recall@1
- type: cosine_recall@3
value: 0.00003715905688573237
name: Cosine Recall@3
- type: cosine_recall@5
value: 0.00007929088142504806
name: Cosine Recall@5
- type: cosine_recall@10
value: 0.0001757722267344924
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.01701867523723249
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.0418477919220494
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.0022453604762727357
name: Cosine Map@100
- type: dot_accuracy@1
value: 0.01881188118811881
name: Dot Accuracy@1
- type: dot_accuracy@3
value: 0.03663366336633663
name: Dot Accuracy@3
- type: dot_accuracy@5
value: 0.06435643564356436
name: Dot Accuracy@5
- type: dot_accuracy@10
value: 0.1306930693069307
name: Dot Accuracy@10
- type: dot_precision@1
value: 0.01881188118811881
name: Dot Precision@1
- type: dot_precision@3
value: 0.013531353135313529
name: Dot Precision@3
- type: dot_precision@5
value: 0.015643564356435644
name: Dot Precision@5
- type: dot_precision@10
value: 0.01722772277227723
name: Dot Precision@10
- type: dot_recall@1
value: 0.000018836739119030395
name: Dot Recall@1
- type: dot_recall@3
value: 0.00003715905688573237
name: Dot Recall@3
- type: dot_recall@5
value: 0.00007929088142504806
name: Dot Recall@5
- type: dot_recall@10
value: 0.0001757722267344924
name: Dot Recall@10
- type: dot_ndcg@10
value: 0.01701867523723249
name: Dot Ndcg@10
- type: dot_mrr@10
value: 0.0418477919220494
name: Dot Mrr@10
- type: dot_map@100
value: 0.0022453604762727357
name: Dot Map@100
SentenceTransformer based on BAAI/bge-small-en
This is a sentence-transformers model finetuned from BAAI/bge-small-en. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-small-en
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Areeb-02/bge-small-en-MultiplrRankingLoss-30-Rag-paper-dataset")
sentences = [
'Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ Dataset and HotpotQA.',
"For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22, which is significantly higher than BM25 + MQ's accuracy of 25.19. For HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval accuracy of 49.52.",
'The proof for Equation 5 progresses from Equation 20 to Equation 22 by applying the transformation motivated by Xie et al. [2021] and introducing the term p(R, x1:iβ1|z) to the equation.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
Evaluation
Metrics
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.0178 |
cosine_accuracy@3 |
0.0436 |
cosine_accuracy@5 |
0.0653 |
cosine_accuracy@10 |
0.1248 |
cosine_precision@1 |
0.0178 |
cosine_precision@3 |
0.0158 |
cosine_precision@5 |
0.016 |
cosine_precision@10 |
0.0158 |
cosine_recall@1 |
0.0 |
cosine_recall@3 |
0.0 |
cosine_recall@5 |
0.0001 |
cosine_recall@10 |
0.0002 |
cosine_ndcg@10 |
0.0163 |
cosine_mrr@10 |
0.0423 |
cosine_map@100 |
0.0019 |
dot_accuracy@1 |
0.0178 |
dot_accuracy@3 |
0.0436 |
dot_accuracy@5 |
0.0653 |
dot_accuracy@10 |
0.1248 |
dot_precision@1 |
0.0178 |
dot_precision@3 |
0.0158 |
dot_precision@5 |
0.016 |
dot_precision@10 |
0.0158 |
dot_recall@1 |
0.0 |
dot_recall@3 |
0.0 |
dot_recall@5 |
0.0001 |
dot_recall@10 |
0.0002 |
dot_ndcg@10 |
0.0163 |
dot_mrr@10 |
0.0423 |
dot_map@100 |
0.0019 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.0198 |
cosine_accuracy@3 |
0.0406 |
cosine_accuracy@5 |
0.0653 |
cosine_accuracy@10 |
0.1267 |
cosine_precision@1 |
0.0198 |
cosine_precision@3 |
0.0149 |
cosine_precision@5 |
0.0149 |
cosine_precision@10 |
0.0168 |
cosine_recall@1 |
0.0 |
cosine_recall@3 |
0.0 |
cosine_recall@5 |
0.0001 |
cosine_recall@10 |
0.0002 |
cosine_ndcg@10 |
0.0168 |
cosine_mrr@10 |
0.0425 |
cosine_map@100 |
0.0021 |
dot_accuracy@1 |
0.0198 |
dot_accuracy@3 |
0.0406 |
dot_accuracy@5 |
0.0653 |
dot_accuracy@10 |
0.1267 |
dot_precision@1 |
0.0198 |
dot_precision@3 |
0.0149 |
dot_precision@5 |
0.0149 |
dot_precision@10 |
0.0168 |
dot_recall@1 |
0.0 |
dot_recall@3 |
0.0 |
dot_recall@5 |
0.0001 |
dot_recall@10 |
0.0002 |
dot_ndcg@10 |
0.0168 |
dot_mrr@10 |
0.0425 |
dot_map@100 |
0.0021 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.0188 |
cosine_accuracy@3 |
0.0376 |
cosine_accuracy@5 |
0.0644 |
cosine_accuracy@10 |
0.1307 |
cosine_precision@1 |
0.0188 |
cosine_precision@3 |
0.0139 |
cosine_precision@5 |
0.0158 |
cosine_precision@10 |
0.0172 |
cosine_recall@1 |
0.0 |
cosine_recall@3 |
0.0 |
cosine_recall@5 |
0.0001 |
cosine_recall@10 |
0.0002 |
cosine_ndcg@10 |
0.017 |
cosine_mrr@10 |
0.0419 |
cosine_map@100 |
0.0023 |
dot_accuracy@1 |
0.0188 |
dot_accuracy@3 |
0.0376 |
dot_accuracy@5 |
0.0644 |
dot_accuracy@10 |
0.1307 |
dot_precision@1 |
0.0188 |
dot_precision@3 |
0.0139 |
dot_precision@5 |
0.0158 |
dot_precision@10 |
0.0172 |
dot_recall@1 |
0.0 |
dot_recall@3 |
0.0 |
dot_recall@5 |
0.0001 |
dot_recall@10 |
0.0002 |
dot_ndcg@10 |
0.017 |
dot_mrr@10 |
0.0419 |
dot_map@100 |
0.0023 |
Information Retrieval
Metric |
Value |
cosine_accuracy@1 |
0.0188 |
cosine_accuracy@3 |
0.0366 |
cosine_accuracy@5 |
0.0644 |
cosine_accuracy@10 |
0.1307 |
cosine_precision@1 |
0.0188 |
cosine_precision@3 |
0.0135 |
cosine_precision@5 |
0.0156 |
cosine_precision@10 |
0.0172 |
cosine_recall@1 |
0.0 |
cosine_recall@3 |
0.0 |
cosine_recall@5 |
0.0001 |
cosine_recall@10 |
0.0002 |
cosine_ndcg@10 |
0.017 |
cosine_mrr@10 |
0.0418 |
cosine_map@100 |
0.0022 |
dot_accuracy@1 |
0.0188 |
dot_accuracy@3 |
0.0366 |
dot_accuracy@5 |
0.0644 |
dot_accuracy@10 |
0.1307 |
dot_precision@1 |
0.0188 |
dot_precision@3 |
0.0135 |
dot_precision@5 |
0.0156 |
dot_precision@10 |
0.0172 |
dot_recall@1 |
0.0 |
dot_recall@3 |
0.0 |
dot_recall@5 |
0.0001 |
dot_recall@10 |
0.0002 |
dot_ndcg@10 |
0.017 |
dot_mrr@10 |
0.0418 |
dot_map@100 |
0.0022 |
Training Details
Training Dataset
Unnamed Dataset
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: steps
per_device_train_batch_size
: 16
per_device_eval_batch_size
: 16
num_train_epochs
: 10
warmup_ratio
: 0.1
fp16
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: False
do_predict
: False
eval_strategy
: steps
prediction_loss_only
: True
per_device_train_batch_size
: 16
per_device_eval_batch_size
: 16
per_gpu_train_batch_size
: None
per_gpu_eval_batch_size
: None
gradient_accumulation_steps
: 1
eval_accumulation_steps
: None
learning_rate
: 5e-05
weight_decay
: 0.0
adam_beta1
: 0.9
adam_beta2
: 0.999
adam_epsilon
: 1e-08
max_grad_norm
: 1.0
num_train_epochs
: 10
max_steps
: -1
lr_scheduler_type
: linear
lr_scheduler_kwargs
: {}
warmup_ratio
: 0.1
warmup_steps
: 0
log_level
: passive
log_level_replica
: warning
log_on_each_node
: True
logging_nan_inf_filter
: True
save_safetensors
: True
save_on_each_node
: False
save_only_model
: False
restore_callback_states_from_checkpoint
: False
no_cuda
: False
use_cpu
: False
use_mps_device
: False
seed
: 42
data_seed
: None
jit_mode_eval
: False
use_ipex
: False
bf16
: False
fp16
: True
fp16_opt_level
: O1
half_precision_backend
: auto
bf16_full_eval
: False
fp16_full_eval
: False
tf32
: None
local_rank
: 0
ddp_backend
: None
tpu_num_cores
: None
tpu_metrics_debug
: False
debug
: []
dataloader_drop_last
: False
dataloader_num_workers
: 0
dataloader_prefetch_factor
: None
past_index
: -1
disable_tqdm
: False
remove_unused_columns
: True
label_names
: None
load_best_model_at_end
: False
ignore_data_skip
: False
fsdp
: []
fsdp_min_num_params
: 0
fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap
: None
accelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed
: None
label_smoothing_factor
: 0.0
optim
: adamw_torch
optim_args
: None
adafactor
: False
group_by_length
: False
length_column_name
: length
ddp_find_unused_parameters
: None
ddp_bucket_cap_mb
: None
ddp_broadcast_buffers
: False
dataloader_pin_memory
: True
dataloader_persistent_workers
: False
skip_memory_metrics
: True
use_legacy_prediction_loop
: False
push_to_hub
: False
resume_from_checkpoint
: None
hub_model_id
: None
hub_strategy
: every_save
hub_private_repo
: False
hub_always_push
: False
gradient_checkpointing
: False
gradient_checkpointing_kwargs
: None
include_inputs_for_metrics
: False
eval_do_concat_batches
: True
fp16_backend
: auto
push_to_hub_model_id
: None
push_to_hub_organization
: None
mp_parameters
:
auto_find_batch_size
: False
full_determinism
: False
torchdynamo
: None
ray_scope
: last
ddp_timeout
: 1800
torch_compile
: False
torch_compile_backend
: None
torch_compile_mode
: None
dispatch_batches
: None
split_batches
: None
include_tokens_per_second
: False
include_num_input_tokens_seen
: False
neftune_noise_alpha
: None
optim_target_modules
: None
batch_eval_metrics
: False
eval_on_start
: False
batch_sampler
: batch_sampler
multi_dataset_batch_sampler
: proportional
Training Logs
Epoch |
Step |
Training Loss |
cosine_map@100 |
0 |
0 |
- |
0.0018 |
1.5625 |
100 |
- |
0.0019 |
3.0 |
192 |
- |
0.0020 |
1.5625 |
100 |
- |
0.0021 |
3.125 |
200 |
- |
0.0020 |
4.6875 |
300 |
- |
0.0021 |
5.0 |
320 |
- |
0.0020 |
1.5625 |
100 |
- |
0.0020 |
3.125 |
200 |
- |
0.0021 |
4.6875 |
300 |
- |
0.0022 |
1.5625 |
100 |
- |
0.0021 |
3.125 |
200 |
- |
0.0019 |
4.6875 |
300 |
- |
0.0022 |
6.25 |
400 |
- |
0.0022 |
7.8125 |
500 |
0.0021 |
0.0022 |
9.375 |
600 |
- |
0.0023 |
10.0 |
640 |
- |
0.0022 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.3
- PyTorch: 2.3.0+cu121
- Accelerate: 0.32.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}