sentence-transformers
/

static-retrieval-mrl-en-v1

@@ -997,7 +997,11 @@ model-index:
 # Static Embeddings with BERT uncased tokenizer finetuned on various datasets
-This is a [sentence-transformers](https://www.SBERT.net) model trained on the [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq), [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1), [squad](https://huggingface.co/datasets/sentence-transformers/squad), [s2orc](https://huggingface.co/datasets/sentence-transformers/s2orc), [allnli](https://huggingface.co/datasets/sentence-transformers/all-nli), [paq](https://huggingface.co/datasets/sentence-transformers/paq), [trivia_qa](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [msmarco_10m](https://huggingface.co/datasets/bclavie/msmarco-10m-triplets), [swim_ir](https://huggingface.co/datasets/nthakur/swim-ir-monolingual), [pubmedqa](https://huggingface.co/datasets/sentence-transformers/pubmedqa), [miracl](https://huggingface.co/datasets/sentence-transformers/miracl), [mldr](https://huggingface.co/datasets/sentence-transformers/mldr) and [mr_tydi](https://huggingface.co/datasets/sentence-transformers/mr-tydi) datasets. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
@@ -1072,6 +1076,21 @@ print(similarities.shape)
 # [3, 3]
 ```
 <!--
 ### Direct Usage (Transformers)
@@ -1146,6 +1165,17 @@ You can finetune this model on your own dataset.
 | cosine_mrr@10       | 0.5482     |
 | cosine_map@100      | 0.4203     |
 <!--
 ## Bias, Risks and Limitations

 # Static Embeddings with BERT uncased tokenizer finetuned on various datasets
+This is a [sentence-transformers](https://www.SBERT.net) model trained on the [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq), [msmarco](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1), [squad](https://huggingface.co/datasets/sentence-transformers/squad), [s2orc](https://huggingface.co/datasets/sentence-transformers/s2orc), [allnli](https://huggingface.co/datasets/sentence-transformers/all-nli), [paq](https://huggingface.co/datasets/sentence-transformers/paq), [trivia_qa](https://huggingface.co/datasets/sentence-transformers/trivia-qa), [msmarco_10m](https://huggingface.co/datasets/bclavie/msmarco-10m-triplets), [swim_ir](https://huggingface.co/datasets/nthakur/swim-ir-monolingual), [pubmedqa](https://huggingface.co/datasets/sentence-transformers/pubmedqa), [miracl](https://huggingface.co/datasets/sentence-transformers/miracl), [mldr](https://huggingface.co/datasets/sentence-transformers/mldr) and [mr_tydi](https://huggingface.co/datasets/sentence-transformers/mr-tydi) datasets. It maps sentences & paragraphs to a 1024-dimensional dense vector space and is designed to be used for semantic search.
+This model was trained with a [Matryoshka loss](https://huggingface.co/blog/matryoshka), allowing you to truncate the embeddings for faster retrieval at minimal performance costs (See [Matryoshka Evaluations](#matryoshka-evaluations) for evaluations).
 ## Model Details
 # [3, 3]
 ```
+This model was trained with Matryoshka loss, allowing this model to be used with lower dimensionalities with minimal performance loss (See [Matryoshka Evaluations](#matryoshka-evaluations) for evaluations).
+Notably, a lower dimensionality allows for much faster and cheaper information retrieval. You can specify a lower dimensionality with the `truncate_dim` argument when initializing the Sentence Transformer model:
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer("tomaarsen/static-retrieval-mrl-en-v1", truncate_dim=256)
+embeddings = model.encode([
+    "what is the difference between chronological order and spatial order?",
+    "can lavender grow indoors?"
+])
+print(embeddings.shape)
+# => (2, 256)
+```
 <!--
 ### Direct Usage (Transformers)
 | cosine_mrr@10       | 0.5482     |
 | cosine_map@100      | 0.4203     |
+##### Matryoshka Evaluations
+| Dimensionality | NanoBEIR_mean | NanoArguAna | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoQuoraRetrieval | NanoSCIDOCS | NanoSciFact | NanoTouche2020 |
+|----------------|---------------|-------------|------------------|-------------|-----------|--------------|--------------|-------------|--------------|--------|--------------------|-------------|-------------|----------------|
+| 1024           | **0.5031**    | 0.4077      | 0.3308           | 0.5681      | 0.6921    | 0.3651       | 0.6547       | 0.4040      | 0.3241       | 0.4533 | 0.8950             | 0.2642      | 0.6111      | 0.5702         |
+| 512            | **0.4957**    | 0.3878      | 0.3360           | 0.5626      | 0.6945    | 0.3517       | 0.6280       | 0.3892      | 0.3206       | 0.4505 | 0.8986             | 0.2657      | 0.5953      | 0.5635         |
+| 256            | **0.4819**    | 0.3855      | 0.3203           | 0.5407      | 0.6734    | 0.3518       | 0.6027       | 0.4144      | 0.2860       | 0.4254 | 0.8948             | 0.2466      | 0.5620      | 0.5605         |
+| 128            | **0.4622**    | 0.4001      | 0.2982           | 0.5266      | 0.6273    | 0.3188       | 0.5606       | 0.4025      | 0.2693       | 0.4021 | 0.8930             | 0.2283      | 0.5447      | 0.5368         |
+| 64             | **0.4176**    | 0.3424      | 0.2809           | 0.5022      | 0.5480    | 0.2831       | 0.4680       | 0.3739      | 0.2153       | 0.3845 | 0.8525             | 0.1680      | 0.5045      | 0.5050         |
+| 32             | **0.3532**    | 0.2866      | 0.1870           | 0.4292      | 0.4193    | 0.2292       | 0.3602       | 0.3587      | 0.1444       | 0.3525 | 0.8325             | 0.1525      | 0.3983      | 0.4408         |
 <!--
 ## Bias, Risks and Limitations