mavihsrr's picture
Add new SentenceTransformer model
e5b5e02 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:69227
  - loss:CosineSimilarityLoss
base_model: BAAI/bge-small-en-v1.5
widget:
  - source_sentence: >-
      Gliss Hair Repair Conditioner Color Protect & Shine. Description :This
      conditioner is designed for long-lasting colour protection for your
      coloured hair. The ultimate colour conditioner gives up to 10 weeks of
      colour protection and intense luminosity. The effective formula with the
      repair serum and the UV filter repairs the hair, seals and protects the
      colour perfectly from washing out and fading. It provides optimal colour
      protection for coloured hair up to 10 weeks with regular use.Hate hair
      drama? Then try Gliss Hair Repair products for beautiful, restored and
      healthy-looking hair. GLISS Hair Repair products with breakthrough
      patented Hair-Identical Keratin leverage technology to up to 10 layers
      deep. Combined with essential benefits like colour protection, intense
      hydration, long-lasting volume, and weightless nourishment, you get the
      repair you need without having to compromise.!
    sentences:
      - >-
        Dairy Milk Honeycomb & Nuts - Imported. Description :Cadbury dairy milk
        is all about regaling in the richness and creaminess of these classic
        chocolate bars. These chocolate bars are available in a number of
        diverse flavours that offer you a reason to celebrate every small and
        big occasion of happiness.!
      - >-
        product

        Bio Flame Of The Forest - Fresh Shine Expertise Oil    Bio Flame Of The
        Forest - Fresh Shine Expertis...

        Bio Flame Of The Forest - Fresh Shine Expertise Oil    Bio Flame Of The
        Forest - Fresh Shine Expertis...

        Name: combined, dtype: object
      - >-
        Hygiene Hand Wipes With Anti-bacterial Actives- Skin-Friendly.
        Description :Have you stepped out of your house and wondered if the door
        that you just pushed open, was clean? Are there germs lurking around
        you, that you wish you could see better? Have you wondered if you have
        been careful in ensuring the best protection against bacteria and germs?
        Is your personal hygiene standard good enough? Personal Hygiene is in
        your hands. Literally. KeepSafe by Marico takes care of your Hygiene
        needs through its range of premium quality sanitizer, disinfectants,
        wipes, hand wash and personal hygiene products.KeepSafe Hygiene Hand
        Wipes are rich in anti-bacterial actives that sanitise and effectively
        fight germs. These wipes are rich in Aloe Vera and Glycerine and are
        mild and soothing on the skin. These hygiene wipes are so soft, that you
        can use them every day, as many times as you want. Like a true Marico
        product, KeepSafe believes in transparency, superior quality and
        complete essential care. Try out the Multi-purpose Disinfectant and the
        Instant Hand Sanitiser from KeepSafe by Marico range for complete
        out-of-home hygiene. Take No Chances. Keep Safe.!
  - source_sentence: >-
      Fragrance Body Spray For Men (1000 sprays) - Forever. Description
      :Soothing experience throughout the day, Consists of refreshing & Long
      lasting fragrance.  For Beauty tips, tricks & more visit
      https://bigbasket.blog/!
    sentences:
      - "M2 Perfume Spray - for Men. Description :Engage Perfume Sprays created by International Experts  For Beauty tips, tricks & more visit http://lookbeautiful.in/  For Beauty tips, tricks & more visitÂÂ\_https://bigbasket.blog/!"
      - >-
        product

        Dazzle Opalware Noodle Bowl Set - Tropical Lagoon    Dazzle Opalware
        Noodle Bowl Set - Tropical Lag...

        Dazzle Opalware Noodle Bowl Set - Tropical Lagoon    Dazzle Opalware
        Noodle Bowl Set - Tropical Lag...

        Name: combined, dtype: object
      - |-
        product
        Hakka Noodles - Veg    Hakka Noodles - Veg. Description :Ching's Secr...
        Hakka Noodles - Veg    Hakka Noodles - Veg. Description :It is ready ...
        Name: combined, dtype: object
  - source_sentence: Amlant Ayurvedic Medicine For Acidity
    sentences:
      - Grapes - Thompson Seedless
      - >-
        Ice Cream Bowl. Description :Excellent quality crystal clear glass

        Easy to handle

        Ideal for gifting

        Dishwasher safe

        This glass is made from high-quality material & crafted in a new design
        for easy handling.!
      - "Melamine Snack Set - Red. Description :Made of 100% food-grade melamine and food contact grade colour, this snack set is heat resistant up to a temp of 140º C. It is resistant to breaking, cracking & chipping. Stain-proof, it comes with long-lasting designs.\_It has a glazed finish that makes it aesthetically pleasing. This snack set is safe for dishwasher use.!"
  - source_sentence: >-
      Wonder Pants - Small, Combo. Description :Your baby spends a good part of
      their day in a diaper. Therefore, choosing the right diaper for their
      tender and delicate skin is extremely important. And this is where, we
      introduce our next-generation product, India's 1st diaper pants with the
      unique Bubble-Bed technology. There are 3 areas where a diaper surrounds
      the baby's skin-the baby's bottom, the baby's waist, and the baby's thigh.
      The skin of the baby in all these areas is extremely delicate and
      sensitiveHuggies Wonder Pants Diapers Small Size pack with 3D Bubble bed
      technology with a Cushiony Waistband.!
    sentences:
      - >-
        Home Mate Garbage Bag - Green, Oxo-Bio-Degradable Roll, 30X37, 50
        Micron. Description :These garbage bags are designed to ease the task of
        garbage disposal, and the bio-degradable material, makes it environment
        friendly and helps spread the word of hygiene and cleanliness. They are
        strong enough to carry waste neatly without causing a mess, and large
        enough to carry it all at once. They also offer great flexibility,
        convenience and ensure a high degree of hygiene, whether at home or in
        office.!
      - >-
        product

        Baby Diapers & Sanitary Disposal Bag    Baby Diapers & Sanitary Disposal
        Bag. Descript...

        Baby Diapers & Sanitary Disposal Bag    Baby Diapers & Sanitary Disposal
        Bag. Descript...

        Name: combined, dtype: object
      - >-
        product

        Organic - Sugar/Sakkare Brown    Organic - Sugar/Sakkare Brown.
        Description :Pu...

        Organic - Sugar/Sakkare Brown    Organic - Sugar/Sakkare Brown.
        Description :Tu...

        Name: combined, dtype: object
  - source_sentence: >-
      Coffee Filter Papers - Size 02, White. Description :Hario brings in
      Cone-shaped natural paper filter for Pour-over brewing experience for a
      great cup of Coffee. Hario's V60, size 02 White, give you a perfect brew
      in comparison to mesh filters. These paper filters are of great quality
      and they produce a clean, flavorful, sediment-free cup. They are
      disposable, and thus it makes it convenient and easier to use for brewing
      and cleanup. Perfect choice for coffee enthusiasts who like to grind their
      coffee at home. These papers are safe to use and eco-friendly. The Box
      comes with 100 disposable 02 paper filters.!
    sentences:
      - Tomato Disc 70 g + Cheese Balls 70 g
      - >-
        product

        4mm Aluminium Induction Base Chapati Roti Tawa - Silver    4mm Aluminium
        Induction Base Chapati Roti Tawa...

        4mm Aluminium Induction Base Chapati Roti Tawa - Silver    4mm Aluminium
        Induction Base Chapati Roti Tawa...

        Name: combined, dtype: object
      - >-
        Steel Rice Serving Spoon - Medium, Classic Diana Series, BBST37.
        Description :BB Home provides fine and classy cooking and serving tools
        that can make difference to your kitchen experience. These
        cooking/serving tools are made from 100% food grade stainless steel. The
        handle is designed in a way so it does not feel heavy while
        cooking/serving. It is easy to store as it has a bottom hole on the
        handle to hang it on the wall.!
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: SentenceTransformer based on BAAI/bge-small-en-v1.5
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: bge eval
          type: bge-eval
        metrics:
          - type: pearson_cosine
            value: 0.9791486195203369
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.15795715637146185
            name: Spearman Cosine
          - type: pearson_cosine
            value: 0.9798210832808076
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.1632937701650559
            name: Spearman Cosine

SentenceTransformer based on BAAI/bge-small-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-small-en-v1.5. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-small-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mavihsrr/bgeEmbeddingsRetailedFT")
# Run inference
sentences = [
    "Coffee Filter Papers - Size 02, White. Description :Hario brings in Cone-shaped natural paper filter for Pour-over brewing experience for a great cup of Coffee. Hario's V60, size 02 White, give you a perfect brew in comparison to mesh filters. These paper filters are of great quality and they produce a clean, flavorful, sediment-free cup. They are disposable, and thus it makes it convenient and easier to use for brewing and cleanup. Perfect choice for coffee enthusiasts who like to grind their coffee at home. These papers are safe to use and eco-friendly. The Box comes with 100 disposable 02 paper filters.!",
    'Steel Rice Serving Spoon - Medium, Classic Diana Series, BBST37. Description :BB Home provides fine and classy cooking and serving tools that can make difference to your kitchen experience. These cooking/serving tools are made from 100% food grade stainless steel. The handle is designed in a way so it does not feel heavy while cooking/serving. It is easy to store as it has a bottom hole on the handle to hang it on the wall.!',
    'Tomato Disc 70 g + Cheese Balls 70 g',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric Value
pearson_cosine 0.9791
spearman_cosine 0.158

Semantic Similarity

Metric Value
pearson_cosine 0.9798
spearman_cosine 0.1633

Training Details

Training Dataset

Unnamed Dataset

  • Size: 69,227 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 114.97 tokens
    • max: 512 tokens
    • min: 4 tokens
    • mean: 101.87 tokens
    • max: 512 tokens
    • min: 0.18
    • mean: 0.88
    • max: 0.96
  • Samples:
    sentence1 sentence2 score
    Breakfast Mix - Masala Idli. Description :Established in 1924, MTR is the contemporary way to authentic tasting food, Our products are backed by culinary expertise honed, over 8 decades of serving wholesome, tasty and high quality vegetarian food, Using authentic Indian recipes, the purest and best quality natural ingredients and traditional methods of preparation, We brings you a range of products of unmatched flavour and taste, to delight your family at every meal and every occasion, MTR Daily Favourites is your dependable partner in the Kitchen that helps you make your family's everyday meals tasty and wholesome, So bring home the confidence of great tasting food everyday with MTR..! Quinoa Flakes. Description :Keep a good balance of satisfying your taste buds and satiating your hunger pangs. Nutriwish Quinoa Flakes are a “complete” protein containing all eight essential amino acids. The perfect antidote to all that sugar, Nutriwish Quinoa Flakes are delicious cold in a salad, served warm as a side dish or even combined with vegetables and dairy to make a spectacular and filling vegetarian main course. Curb food cravings and start your day yummy with the starchy Nutriwish Quinoa Flakes.! 0.9524586385560029
    1 To 1 Baking Flour - Gluten Free. Description :Bob Red Mill gluten-free 1-to-1 baking flour makes it easy to transform traditional recipes to gluten-free. Simply follow your favourite baking recipe, replacing the wheat flour with this blend. It is formulated for baked goods with terrific taste and texture, no additional speciality ingredients or recipes required. It is suitable for cookies, cakes, brownies, muffins, and more.! Chocolate - Drink Powder. Description :Hintz cocoa powder is not just ideal for making biscuits, ice cream and deserts. It is also dissolved in hot milk - a delicious chocolate beverage.! 0.8764388983469142
    Joy Round Kids Glass. Description :This glass, made of plastic material, is specially designed for your kid. It is lightweight and easy to use. This glass is ideal for drinking water, milk, juices, health drinks etc.! Plastic Lunch Box/Tiffin Box - Disney Mickey Mouse, BPA Free, HMHILB 199-MK. Description :HMI brings this 4 side lock and lock style. This is airtight, leak-proof and microwave safe. It comes with a small container, fork & spoon.! 0.9289614489097255
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 8,654 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 4 tokens
    • mean: 110.58 tokens
    • max: 512 tokens
    • min: 4 tokens
    • mean: 97.13 tokens
    • max: 512 tokens
    • min: 0.19
    • mean: 0.87
    • max: 0.96
  • Samples:
    sentence1 sentence2 score
    1947 Flora Natural Aroma Incense Sticks - Economy Pack. Description :A Traditional formula that is handed over by the founder, incense sticks is made the traditional way with a ‘masala’ or mixture of 100% natural aromatic botanicals. During your rituals, these incense sticks will bring about a fresh and fragrant breath of conscious soothing bliss.! Designer Jyot - Green. Description :This is made in India Initiative and create a meditative and peaceful ambience in your puja room with the handmade Brass Mandir Jyot. It extremely durable and crack-resistant, which allows you to use it with ease on a daily basis. This Jyot is very attractive and worth purchasing for personal use or for gifting purpose. Easy to Use and Clean. This Glass brass diya is designed for ease in inserting whip, refilling oil and cleaning. It emits brighter light due to the increased clarity provided by the superior quality glass. The flame of this brass diya does not go off or cause any danger even when the fan is on as the diya comes with a lid.! 0.9030882765047124
    Mexican Seasoning. Description :The rich tapestry of sweet and spicy flavours that Mexican cuisine is loved for - now captured in a magic blend. This international seasoning product is inbuilt with unique 2-way flip cap to sprinkle it or scoop it. On1y is a new way of rediscovering the power of herbs and spices. On1y can conveniently become a part of your daily diet for the irresistible benefits that it brings.! Rainbow Strands. Description :Colourful jimmies/sprinkles make decorating your cakes, cupcakes and cookies fun and easy. Great as an ice cream topping too.! 0.9584305870004965
    Intense 75% Dark Chocolate. Description :This pack has 100gm 75% Luxury Intense Dark Chocolate. With meticulous culinary skills the exotic intense bitterness of cacao beans emerges in this bar. Chocolate was invented in 1900 BC by the Aztecs in Central America. We at Didier & Frank bring you those exotic flavours and hand crafted chocolates that the Aztecs enjoyed secretly. Today, Didier & Frank makes the best chocolates in the world.! Puff Pastry Sticks With Butter. Description :The unique and timeless original Classic Millefoglie by Matilde Vicenzi: crumbly sticks of delicate pastry typical of the Italian tradition, with all the flavour of butter. With 192 crispy and delicate layers of puff pastry and just a light layer of premium butter, our inimitable Millefoglie d’Italia are among the most popular desserts in Italy.! 0.9553127949715517
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss bge-eval_spearman_cosine
0 0 - - 0.0923
0.0231 100 0.0657 0.0386 0.1450
0.0462 200 0.0248 0.0133 0.1661
0.0693 300 0.0118 - -
0.0231 100 0.0069 0.0070 0.1644
0.0462 200 0.0037 0.0040 0.1634
0.0693 300 0.0016 0.0038 0.1619
0.0924 400 0.0013 0.0042 0.1603
0.1156 500 0.0011 0.0049 0.1579
0.1387 600 0.0012 0.0052 0.1593
0.1618 700 0.0011 0.0053 0.1608
0.1849 800 0.0011 0.0055 0.1612
0.2080 900 0.0011 0.0063 0.1606
0.2311 1000 0.0011 0.0061 0.1585
0.2542 1100 0.0012 0.0061 0.1566
0.2773 1200 0.0011 0.0062 0.1557
0.3004 1300 0.0012 0.0062 0.1570
0.3235 1400 0.001 0.0058 0.1557
0.3467 1500 0.001 0.0063 0.1554
0.3698 1600 0.0011 0.0062 0.1572
0.3929 1700 0.0011 0.0061 0.1580
0.4160 1800 0.001 - 0.1598
0.2311 1000 0.0008 0.0063 0.1532
0.4622 2000 0.0008 0.0064 0.1651
0.6933 3000 0.001 0.0067 0.1627
0.9244 4000 0.001 0.0067 0.1633

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.47.1
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}