Areeb-02's picture
Add new SentenceTransformer model.
6b303a9 verified
metadata
base_model: BAAI/bge-small-en
datasets: []
language: []
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
  - dot_accuracy@1
  - dot_accuracy@3
  - dot_accuracy@5
  - dot_accuracy@10
  - dot_precision@1
  - dot_precision@3
  - dot_precision@5
  - dot_precision@10
  - dot_recall@1
  - dot_recall@3
  - dot_recall@5
  - dot_recall@10
  - dot_ndcg@10
  - dot_mrr@10
  - dot_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1010
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      How does Prompt-RAG differ from traditional vector embedding-based
      methodologies?
    sentences:
      - >-
        Prompt-RAG differs from traditional vector embedding-based methodologies
        by adopting a more direct and flexible retrieval process based on
        natural language prompts, eliminating the need for a vector database or
        an algorithm for indexing and selecting vectors.
      - >-
        By introducing a pre-aligned phrase prior to the standard SFT stage,
        LLMs are guided to concentrate on the aligned knowledge, thereby
        unlocking their internal alignment abilities and improving their
        performance.
      - >-
        The accuracy of GPT 3.5 on 2500 overall TeleQnA questions related to
        3GPP documents is 60.1, while the accuracy of GPT 3.5 + Telco-RAG is 6.9
        points higher.
  - source_sentence: >-
      Explain the concept of in-context learning as described in the paper 'An
      explanation of in-context learning as implicit Bayesian inference'.
    sentences:
      - >-
        The main theme of the paper is that language models can learn to perform
        many tasks in a zero-shot setting, without any explicit supervision.
      - >-
        In-context learning, as explained in the paper, is a process where a
        language model uses the context provided in the input to make
        predictions or generate outputs without explicit training on the
        specific task. The paper argues that this process can be understood as
        an implicit form of Bayesian inference.
      - >-
        The paper was presented in the 55th Annual Meeting of the Association
        for Computational Linguistics.
  - source_sentence: What is the purpose of the survey conducted by Huang et al. (2023)?
    sentences:
      - >-
        The purpose of the survey conducted by Huang et al. (2023) is to provide
        a comprehensive overview of hallucination in large language models,
        including its principles, taxonomy, challenges, and open questions.
      - >-
        The study of Human and American Translation Learning contributes to
        language development by understanding the cognitive processes involved
        in translating between languages, which can lead to improved teaching
        methods and translation technology.
      - >-
        Using profile data, triplet examples are constructed in the format of
        (π‘₯𝑖, π‘₯ π‘–βˆ’, π‘₯ 𝑖+). The anchor example π‘₯𝑖 is constructed as the
        combination of the content 𝑐𝑖 and the corresponding label 𝑙𝑖.
  - source_sentence: Who is the first author of the paper and what is their last name?
    sentences:
      - >-
        The key findings are that Vul-RAG achieves the highest accuracy and
        pairwise accuracy among all baselines, substantially outperforming the
        best baseline LLMAO. It also achieves the best trade-off between recall
        and precision.
      - >-
        The first author of the paper is Nandan Thakur. Their last name is
        Thakur.
      - >-
        The paper was presented at the 2022 Conference on Empirical Methods in
        Natural Language Processing (EMNLP).
  - source_sentence: >-
      Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ
      Dataset and HotpotQA.
    sentences:
      - >-
        For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22,
        which is significantly higher than BM25 + MQ's accuracy of 25.19. For
        HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval
        accuracy of 49.52.
      - >-
        The paper was presented at the 17th Annual International ACM-SIGIR
        Conference on Research and Development in Information Retrieval.
      - >-
        The proof for Equation 5 progresses from Equation 20 to Equation 22 by
        applying the transformation motivated by Xie et al. [2021] and
        introducing the term p(R, x1:iβˆ’1|z) to the equation.
model-index:
  - name: SentenceTransformer based on BAAI/bge-small-en
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: Unknown
          type: unknown
        metrics:
          - type: cosine_accuracy@1
            value: 0.01782178217821782
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.04356435643564356
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06534653465346535
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.12475247524752475
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.01782178217821782
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.015841584158415842
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.016039603960396043
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.015841584158415842
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.00001839902956558168
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00004498766525563503
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.00007262670252004521
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.00015079859335392304
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.016300874257683427
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.04234598459845988
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.0018766020656866668
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.01782178217821782
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.04356435643564356
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06534653465346535
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.12475247524752475
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.01782178217821782
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.015841584158415842
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.016039603960396043
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.015841584158415842
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.00001839902956558168
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00004498766525563503
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.00007262670252004521
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.00015079859335392304
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.016300874257683427
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.04234598459845988
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.0018766020656866668
            name: Dot Map@100
          - type: cosine_accuracy@1
            value: 0.019801980198019802
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.040594059405940595
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06534653465346535
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.12673267326732673
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.019801980198019802
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.01485148514851485
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.014851485148514853
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.016831683168316833
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.000019670857914229207
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00003554268094376118
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.0000667664165823309
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.0001670844654494185
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.01679069935920913
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.04252396668238257
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.002057887757857092
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.019801980198019802
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.040594059405940595
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06534653465346535
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.12673267326732673
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.019801980198019802
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.01485148514851485
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.014851485148514853
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.016831683168316833
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.000019670857914229207
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00003554268094376118
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.0000667664165823309
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.0001670844654494185
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.01679069935920913
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.04252396668238257
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.002057887757857092
            name: Dot Map@100
          - type: cosine_accuracy@1
            value: 0.01881188118811881
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.03762376237623762
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06435643564356436
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.1306930693069307
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.01881188118811881
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.013861386138613862
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.015841584158415842
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.01722772277227723
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.000018836739119030395
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00003852282962664283
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.00007907232140954174
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.00018073758516299118
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.01704492626324548
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.04188786735816444
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.002251865468050825
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.01881188118811881
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.03762376237623762
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06435643564356436
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.1306930693069307
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.01881188118811881
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.013861386138613862
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.015841584158415842
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.01722772277227723
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.000018836739119030395
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00003852282962664283
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.00007907232140954174
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.00018073758516299118
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.01704492626324548
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.04188786735816444
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.002251865468050825
            name: Dot Map@100
          - type: cosine_accuracy@1
            value: 0.01881188118811881
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.03663366336633663
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.06435643564356436
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.1306930693069307
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.01881188118811881
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.013531353135313529
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.015643564356435644
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.01722772277227723
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.000018836739119030395
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.00003715905688573237
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.00007929088142504806
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.0001757722267344924
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.01701867523723249
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.0418477919220494
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.0022453604762727357
            name: Cosine Map@100
          - type: dot_accuracy@1
            value: 0.01881188118811881
            name: Dot Accuracy@1
          - type: dot_accuracy@3
            value: 0.03663366336633663
            name: Dot Accuracy@3
          - type: dot_accuracy@5
            value: 0.06435643564356436
            name: Dot Accuracy@5
          - type: dot_accuracy@10
            value: 0.1306930693069307
            name: Dot Accuracy@10
          - type: dot_precision@1
            value: 0.01881188118811881
            name: Dot Precision@1
          - type: dot_precision@3
            value: 0.013531353135313529
            name: Dot Precision@3
          - type: dot_precision@5
            value: 0.015643564356435644
            name: Dot Precision@5
          - type: dot_precision@10
            value: 0.01722772277227723
            name: Dot Precision@10
          - type: dot_recall@1
            value: 0.000018836739119030395
            name: Dot Recall@1
          - type: dot_recall@3
            value: 0.00003715905688573237
            name: Dot Recall@3
          - type: dot_recall@5
            value: 0.00007929088142504806
            name: Dot Recall@5
          - type: dot_recall@10
            value: 0.0001757722267344924
            name: Dot Recall@10
          - type: dot_ndcg@10
            value: 0.01701867523723249
            name: Dot Ndcg@10
          - type: dot_mrr@10
            value: 0.0418477919220494
            name: Dot Mrr@10
          - type: dot_map@100
            value: 0.0022453604762727357
            name: Dot Map@100

SentenceTransformer based on BAAI/bge-small-en

This is a sentence-transformers model finetuned from BAAI/bge-small-en. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the πŸ€— Hub
model = SentenceTransformer("Areeb-02/bge-small-en-MultiplrRankingLoss-30-Rag-paper-dataset")
# Run inference
sentences = [
    'Compare the top-5 retrieval accuracy of BM25 + MQ and SERM + BF for the NQ Dataset and HotpotQA.',
    "For the NQ Dataset, SERM + BF has a top-5 retrieval accuracy of 88.22, which is significantly higher than BM25 + MQ's accuracy of 25.19. For HotpotQA, SERM + BF was not tested, but BM25 + MQ has a top-5 retrieval accuracy of 49.52.",
    'The proof for Equation 5 progresses from Equation 20 to Equation 22 by applying the transformation motivated by Xie et al. [2021] and introducing the term p(R, x1:iβˆ’1|z) to the equation.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.0178
cosine_accuracy@3 0.0436
cosine_accuracy@5 0.0653
cosine_accuracy@10 0.1248
cosine_precision@1 0.0178
cosine_precision@3 0.0158
cosine_precision@5 0.016
cosine_precision@10 0.0158
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0001
cosine_recall@10 0.0002
cosine_ndcg@10 0.0163
cosine_mrr@10 0.0423
cosine_map@100 0.0019
dot_accuracy@1 0.0178
dot_accuracy@3 0.0436
dot_accuracy@5 0.0653
dot_accuracy@10 0.1248
dot_precision@1 0.0178
dot_precision@3 0.0158
dot_precision@5 0.016
dot_precision@10 0.0158
dot_recall@1 0.0
dot_recall@3 0.0
dot_recall@5 0.0001
dot_recall@10 0.0002
dot_ndcg@10 0.0163
dot_mrr@10 0.0423
dot_map@100 0.0019

Information Retrieval

Metric Value
cosine_accuracy@1 0.0198
cosine_accuracy@3 0.0406
cosine_accuracy@5 0.0653
cosine_accuracy@10 0.1267
cosine_precision@1 0.0198
cosine_precision@3 0.0149
cosine_precision@5 0.0149
cosine_precision@10 0.0168
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0001
cosine_recall@10 0.0002
cosine_ndcg@10 0.0168
cosine_mrr@10 0.0425
cosine_map@100 0.0021
dot_accuracy@1 0.0198
dot_accuracy@3 0.0406
dot_accuracy@5 0.0653
dot_accuracy@10 0.1267
dot_precision@1 0.0198
dot_precision@3 0.0149
dot_precision@5 0.0149
dot_precision@10 0.0168
dot_recall@1 0.0
dot_recall@3 0.0
dot_recall@5 0.0001
dot_recall@10 0.0002
dot_ndcg@10 0.0168
dot_mrr@10 0.0425
dot_map@100 0.0021

Information Retrieval

Metric Value
cosine_accuracy@1 0.0188
cosine_accuracy@3 0.0376
cosine_accuracy@5 0.0644
cosine_accuracy@10 0.1307
cosine_precision@1 0.0188
cosine_precision@3 0.0139
cosine_precision@5 0.0158
cosine_precision@10 0.0172
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0001
cosine_recall@10 0.0002
cosine_ndcg@10 0.017
cosine_mrr@10 0.0419
cosine_map@100 0.0023
dot_accuracy@1 0.0188
dot_accuracy@3 0.0376
dot_accuracy@5 0.0644
dot_accuracy@10 0.1307
dot_precision@1 0.0188
dot_precision@3 0.0139
dot_precision@5 0.0158
dot_precision@10 0.0172
dot_recall@1 0.0
dot_recall@3 0.0
dot_recall@5 0.0001
dot_recall@10 0.0002
dot_ndcg@10 0.017
dot_mrr@10 0.0419
dot_map@100 0.0023

Information Retrieval

Metric Value
cosine_accuracy@1 0.0188
cosine_accuracy@3 0.0366
cosine_accuracy@5 0.0644
cosine_accuracy@10 0.1307
cosine_precision@1 0.0188
cosine_precision@3 0.0135
cosine_precision@5 0.0156
cosine_precision@10 0.0172
cosine_recall@1 0.0
cosine_recall@3 0.0
cosine_recall@5 0.0001
cosine_recall@10 0.0002
cosine_ndcg@10 0.017
cosine_mrr@10 0.0418
cosine_map@100 0.0022
dot_accuracy@1 0.0188
dot_accuracy@3 0.0366
dot_accuracy@5 0.0644
dot_accuracy@10 0.1307
dot_precision@1 0.0188
dot_precision@3 0.0135
dot_precision@5 0.0156
dot_precision@10 0.0172
dot_recall@1 0.0
dot_recall@3 0.0
dot_recall@5 0.0001
dot_recall@10 0.0002
dot_ndcg@10 0.017
dot_mrr@10 0.0418
dot_map@100 0.0022

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,010 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 2 tokens
    • mean: 21.28 tokens
    • max: 59 tokens
    • min: 2 tokens
    • mean: 40.15 tokens
    • max: 129 tokens
  • Samples:
    anchor positive
    What is the purpose of the MultiHop-RAG dataset and what does it consist of? The MultiHop-RAG dataset is developed to benchmark Retrieval-Augmented Generation (RAG) for multi-hop queries. It consists of a knowledge base, a large collection of multi-hop queries, their ground-truth answers, and the associated supporting evidence. The dataset is built using an English news article dataset as the underlying RAG knowledge base.
    Among Google, Apple, and Nvidia, which company reported the largest profit margins in their third-quarter reports for the fiscal year 2023? Apple reported the largest profit margins in their third-quarter reports for the fiscal year 2023.
    Under what circumstances should the LLM answer the questions? The LLM should answer the questions based solely on the information provided in the paragraphs, and it should not use any other information.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss cosine_map@100
0 0 - 0.0018
1.5625 100 - 0.0019
3.0 192 - 0.0020
1.5625 100 - 0.0021
3.125 200 - 0.0020
4.6875 300 - 0.0021
5.0 320 - 0.0020
1.5625 100 - 0.0020
3.125 200 - 0.0021
4.6875 300 - 0.0022
1.5625 100 - 0.0021
3.125 200 - 0.0019
4.6875 300 - 0.0022
6.25 400 - 0.0022
7.8125 500 0.0021 0.0022
9.375 600 - 0.0023
10.0 640 - 0.0022

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.20.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}