metadata

base_model: BAAI/bge-small-en
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1010
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      How does Prompt-RAG differ from traditional vector embedding-based
      methodologies?
    sentences:
      - >-
        Prompt-RAG differs from traditional vector embedding-based methodologies
        by adopting a more direct and flexible retrieval process based on
        natural language prompts, eliminating the need for a vector database or
        an algorithm for indexing and selecting vectors.
      - >-
        By introducing a pre-aligned phrase prior to the standard SFT stage,
        LLMs are guided to concentrate on the aligned knowledge, thereby
        unlocking their internal alignment abilities and improving their
        performance.
      - >-
        The accuracy of GPT 3.5 on 2500 overall TeleQnA questions related to
        3GPP documents is 60.1, while the accuracy of GPT 3.5 + Telco-RAG is 6.9
        points higher.
  - source_sentence: >-
      Explain the concept of in-context learning as described in the paper 'An
      explanation of in-context learning as implicit Bayesian inference'.
    sentences:
      - >-
        The main theme of the paper is that language models can learn to perform
        many tasks in a zero-shot setting, without any explicit supervision.
      - >-
        In-context learning, as explained in the paper, is a process where a
        language model uses the context provided in the input to make
        predictions or generate outputs without explicit training on the
        specific task. The paper argues that this process can be understood as
        an implicit form of Bayesian inference.
      - >-
        The paper was presented in the 55th Annual Meeting of the Association
        for Computational Linguistics.
  - source_sentence: What is the purpose of the survey conducted by Huang et al. (2023)?
    sentences:
      - >-
        The purpose of the survey conducted by Huang et al. (2023) is to provide
        a comprehensive overview of hallucination in large language models,
        including its principles, taxonomy, challenges, and open questions.
      - >-
        The study of Human and American Translation Learning contributes to
        language development by understanding the cognitive processes involved
        in translating between languages, which can lead to improved teaching
        methods and translation technology.
      - >-
        Using profile data, triplet examples are constructed in the format of
        (𝑥𝑖, 𝑥 𝑖−, 𝑥 𝑖+). The anchor example 𝑥𝑖 is constructed as the
        combination of the content 𝑐𝑖 and the corresponding label 𝑙𝑖.
  - source_sentence: Who is the first author of the paper and what is their last name?
    sentences:
      - >-
        The key findings are that Vul-RAG achieves the highest accuracy and
        pairwise accuracy among all baselines, substantially outperforming the
        best baseline LLMAO. It also achieves the best trade-off between recall
        and precision.
      - >-
        The first author of the paper is Nandan Thakur. Their last name is
        Thakur.
      - >-
        The paper was presented at the 2022 Conference on Empirical Methods in
        Natural Language Processing (EMNLP).
  - source_sentence: >-
      Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ
      Dataset and HotpotQA.
    sentences:
      - >-
        For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22,
        which is significantly higher than BM25 + MQ's accuracy of 25.19. For
        HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval
        accuracy of 49.52.
      - >-
        The paper was presented at the 17th Annual International ACM-SIGIR
        Conference on Research and Development in Information Retrieval.
      - >-
        The proof for Equation 5 progresses from Equation 20 to Equation 22 by
        applying the transformation motivated by Xie et al. [2021] and
        introducing the term p(R, x1:i−1|z) to the equation.
model-index:
  - name: SentenceTransformer based on BAAI/bge-small-en
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.01782178217821782
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.04356435643564356
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06534653465346535
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.12475247524752475
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.01782178217821782
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.015841584158415842
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.016039603960396043
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.015841584158415842
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.00001839902956558168
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00004498766525563503
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.00007262670252004521
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.00015079859335392304
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.016300874257683427
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.04234598459845988
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.0018766020656866668
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.01782178217821782
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.04356435643564356
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06534653465346535
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.12475247524752475
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.01782178217821782
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.015841584158415842
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.016039603960396043
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.015841584158415842
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.00001839902956558168
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00004498766525563503
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.00007262670252004521
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.00015079859335392304
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.016300874257683427
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.04234598459845988
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.0018766020656866668
            name: Dot Map@100
          - type: cosine_accuracy@1
            value: 0.019801980198019802
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.040594059405940595
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06534653465346535
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.12673267326732673
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.019801980198019802
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.01485148514851485
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.014851485148514853
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.016831683168316833
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.000019670857914229207
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00003554268094376118
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.0000667664165823309
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.0001670844654494185
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.01679069935920913
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.04252396668238257
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.002057887757857092
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.019801980198019802
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.040594059405940595
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06534653465346535
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.12673267326732673
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.019801980198019802
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.01485148514851485
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.014851485148514853
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.016831683168316833
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.000019670857914229207
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00003554268094376118
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.0000667664165823309
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.0001670844654494185
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.01679069935920913
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.04252396668238257
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.002057887757857092
            name: Dot Map@100
          - type: cosine_accuracy@1
            value: 0.01881188118811881
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.03762376237623762
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06435643564356436
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.1306930693069307
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.01881188118811881
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.013861386138613862
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.015841584158415842
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.01722772277227723
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.000018836739119030395
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00003852282962664283
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.00007907232140954174
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.00018073758516299118
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.01704492626324548
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.04188786735816444
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.002251865468050825
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.01881188118811881
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.03762376237623762
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06435643564356436
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.1306930693069307
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.01881188118811881
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.013861386138613862
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.015841584158415842
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.01722772277227723
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.000018836739119030395
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00003852282962664283
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.00007907232140954174
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.00018073758516299118
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.01704492626324548
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.04188786735816444
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.002251865468050825
            name: Dot Map@100
          - type: cosine_accuracy@1
            value: 0.01881188118811881
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.03663366336633663
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06435643564356436
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.1306930693069307
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.01881188118811881
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.013531353135313529
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.015643564356435644
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.01722772277227723
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.000018836739119030395
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00003715905688573237
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.00007929088142504806
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.0001757722267344924
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.01701867523723249
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.0418477919220494
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.0022453604762727357
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.01881188118811881
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.03663366336633663
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06435643564356436
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.1306930693069307
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.01881188118811881
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.013531353135313529
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.015643564356435644
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.01722772277227723
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.000018836739119030395
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00003715905688573237
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.00007929088142504806
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.0001757722267344924
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.01701867523723249
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.0418477919220494
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.0022453604762727357
            name: Dot Map@100

SentenceTransformer based on BAAI/bge-small-en

This is a sentence-transformers model finetuned from BAAI/bge-small-en. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-small-en
Maximum Sequence Length: 512 tokens
Output Dimensionality: 384 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Areeb-02/bge-small-en-MultiplrRankingLoss-30-Rag-paper-dataset")
# Run inference
sentences = [
    'Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ Dataset and HotpotQA.',
    "For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22, which is significantly higher than BM25 + MQ's accuracy of 25.19. For HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval accuracy of 49.52.",
    'The proof for Equation 5 progresses from Equation 20 to Equation 22 by applying the transformation motivated by Xie et al. [2021] and introducing the term p(R, x1:i−1|z) to the equation.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.0178
cosine_accuracy@3	0.0436
cosine_accuracy@5	0.0653
cosine_accuracy@10	0.1248
cosine_precision@1	0.0178
cosine_precision@3	0.0158
cosine_precision@5	0.016
cosine_precision@10	0.0158
cosine_recall@1	0.0
cosine_recall@3	0.0
cosine_recall@5	0.0001
cosine_recall@10	0.0002
cosine_ndcg@10	0.0163
cosine_mrr@10	0.0423
cosine_map@100	0.0019
dot_accuracy@1	0.0178
dot_accuracy@3	0.0436
dot_accuracy@5	0.0653
dot_accuracy@10	0.1248
dot_precision@1	0.0178
dot_precision@3	0.0158
dot_precision@5	0.016
dot_precision@10	0.0158
dot_recall@1	0.0
dot_recall@3	0.0
dot_recall@5	0.0001
dot_recall@10	0.0002
dot_ndcg@10	0.0163
dot_mrr@10	0.0423
dot_map@100	0.0019

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.0198
cosine_accuracy@3	0.0406
cosine_accuracy@5	0.0653
cosine_accuracy@10	0.1267
cosine_precision@1	0.0198
cosine_precision@3	0.0149
cosine_precision@5	0.0149
cosine_precision@10	0.0168
cosine_recall@1	0.0
cosine_recall@3	0.0
cosine_recall@5	0.0001
cosine_recall@10	0.0002
cosine_ndcg@10	0.0168
cosine_mrr@10	0.0425
cosine_map@100	0.0021
dot_accuracy@1	0.0198
dot_accuracy@3	0.0406
dot_accuracy@5	0.0653
dot_accuracy@10	0.1267
dot_precision@1	0.0198
dot_precision@3	0.0149
dot_precision@5	0.0149
dot_precision@10	0.0168
dot_recall@1	0.0
dot_recall@3	0.0
dot_recall@5	0.0001
dot_recall@10	0.0002
dot_ndcg@10	0.0168
dot_mrr@10	0.0425
dot_map@100	0.0021

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.0188
cosine_accuracy@3	0.0376
cosine_accuracy@5	0.0644
cosine_accuracy@10	0.1307
cosine_precision@1	0.0188
cosine_precision@3	0.0139
cosine_precision@5	0.0158
cosine_precision@10	0.0172
cosine_recall@1	0.0
cosine_recall@3	0.0
cosine_recall@5	0.0001
cosine_recall@10	0.0002
cosine_ndcg@10	0.017
cosine_mrr@10	0.0419
cosine_map@100	0.0023
dot_accuracy@1	0.0188
dot_accuracy@3	0.0376
dot_accuracy@5	0.0644
dot_accuracy@10	0.1307
dot_precision@1	0.0188
dot_precision@3	0.0139
dot_precision@5	0.0158
dot_precision@10	0.0172
dot_recall@1	0.0
dot_recall@3	0.0
dot_recall@5	0.0001
dot_recall@10	0.0002
dot_ndcg@10	0.017
dot_mrr@10	0.0419
dot_map@100	0.0023

Information Retrieval

Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.0188
cosine_accuracy@3	0.0366
cosine_accuracy@5	0.0644
cosine_accuracy@10	0.1307
cosine_precision@1	0.0188
cosine_precision@3	0.0135
cosine_precision@5	0.0156
cosine_precision@10	0.0172
cosine_recall@1	0.0
cosine_recall@3	0.0
cosine_recall@5	0.0001
cosine_recall@10	0.0002
cosine_ndcg@10	0.017
cosine_mrr@10	0.0418
cosine_map@100	0.0022
dot_accuracy@1	0.0188
dot_accuracy@3	0.0366
dot_accuracy@5	0.0644
dot_accuracy@10	0.1307
dot_precision@1	0.0188
dot_precision@3	0.0135
dot_precision@5	0.0156
dot_precision@10	0.0172
dot_recall@1	0.0
dot_recall@3	0.0
dot_recall@5	0.0001
dot_recall@10	0.0002
dot_ndcg@10	0.017
dot_mrr@10	0.0418
dot_map@100	0.0022

Training Details

Training Dataset

Unnamed Dataset

Size: 1,010 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 2 tokens
mean: 21.28 tokens
max: 59 tokens

min: 2 tokens
mean: 40.15 tokens
max: 129 tokens

	anchor	positive
type	string	string
details	min: 2 tokens mean: 21.28 tokens max: 59 tokens	min: 2 tokens mean: 40.15 tokens max: 129 tokens

Samples:

anchor	positive
`What is the purpose of the MultiHop-RAG dataset and what does it consist of?`	`The MultiHop-RAG dataset is developed to benchmark Retrieval-Augmented Generation (RAG) for multi-hop queries. It consists of a knowledge base, a large collection of multi-hop queries, their ground-truth answers, and the associated supporting evidence. The dataset is built using an English news article dataset as the underlying RAG knowledge base.`
`Among Google, Apple, and Nvidia, which company reported the largest profit margins in their third-quarter reports for the fiscal year 2023?`	`Apple reported the largest profit margins in their third-quarter reports for the fiscal year 2023.`
`Under what circumstances should the LLM answer the questions?`	`The LLM should answer the questions based solely on the information provided in the paragraphs, and it should not use any other information.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 10
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	cosine_map@100
0	0	-	0.0018
1.5625	100	-	0.0019
3.0	192	-	0.0020
1.5625	100	-	0.0021
3.125	200	-	0.0020
4.6875	300	-	0.0021
5.0	320	-	0.0020
1.5625	100	-	0.0020
3.125	200	-	0.0021
4.6875	300	-	0.0022
1.5625	100	-	0.0021
3.125	200	-	0.0019
4.6875	300	-	0.0022
6.25	400	-	0.0022
7.8125	500	0.0021	0.0022
9.375	600	-	0.0023
10.0	640	-	0.0022

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.42.3
PyTorch: 2.3.0+cu121
Accelerate: 0.32.1
Datasets: 2.20.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}