Marqo Stella v2
This model is similar to the original Dunzhang stella 400m model, with a fused matryoshka layer. The hierarchical structuring from a Matryoshka Layer reduces the computational overhead for generating embeddings, while leaving relevance metrics unchanged.
Transformers
import os
import torch
from transformers import AutoModel, AutoTokenizer, AutoConfig
from sklearn.preprocessing import normalize
query_prompt = "Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: "
queries = [
"What are some ways to reduce stress?",
"What are the benefits of drinking green tea?",
]
queries = [query_prompt + query for query in queries]
# docs do not need any prompts
docs = [
"There are many effective ways to reduce stress. Some common techniques include deep breathing, meditation, and physical activity. Engaging in hobbies, spending time in nature, and connecting with loved ones can also help alleviate stress. Additionally, setting boundaries, practicing self-care, and learning to say no can prevent stress from building up.",
"Green tea has been consumed for centuries and is known for its potential health benefits. It contains antioxidants that may help protect the body against damage caused by free radicals. Regular consumption of green tea has been associated with improved heart health, enhanced cognitive function, and a reduced risk of certain types of cancer. The polyphenols in green tea may also have anti-inflammatory and weight loss properties.",
]
# The path of your model after cloning it
model_dir = "Marqo/dunzhang-stella_en_400M_v5"
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).cuda().eval()
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
with torch.no_grad():
input_data = tokenizer(queries, padding="longest", truncation=True, max_length=512, return_tensors="pt")
input_data = {k: v.cuda() for k, v in input_data.items()}
attention_mask = input_data["attention_mask"]
last_hidden_state = model(**input_data)[0]
last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
query_vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
query_vectors = normalize(query_vectors.cpu().numpy())
# Embed the documents
with torch.no_grad():
input_data = tokenizer(docs, padding="longest", truncation=True, max_length=512, return_tensors="pt")
input_data = {k: v.cuda() for k, v in input_data.items()}
attention_mask = input_data["attention_mask"]
last_hidden_state = model(**input_data)[0]
last_hidden = last_hidden_state.masked_fill(~attention_mask[..., None].bool(), 0.0)
docs_vectors = last_hidden.sum(dim=1) / attention_mask.sum(dim=1)[..., None]
docs_vectors = normalize(docs_vectors.cpu().numpy())
print(query_vectors.shape, docs_vectors.shape)
# (2, 1024) (2, 1024)
similarities = query_vectors @ docs_vectors.T
print(similarities)
# [[0.8397531 0.29900077]
# [0.32818374 0.80954516]]
- Downloads last month
- 2,143
Inference API (serverless) does not yet support model repos that contain custom code.
Evaluation results
- accuracy on MTEB AmazonCounterfactualClassification (en)test set self-reported92.358
- ap on MTEB AmazonCounterfactualClassification (en)test set self-reported70.813
- ap_weighted on MTEB AmazonCounterfactualClassification (en)test set self-reported70.813
- f1 on MTEB AmazonCounterfactualClassification (en)test set self-reported88.951
- f1_weighted on MTEB AmazonCounterfactualClassification (en)test set self-reported92.686
- main_score on MTEB AmazonCounterfactualClassification (en)test set self-reported92.358
- accuracy on MTEB AmazonPolarityClassificationtest set self-reported97.195
- ap on MTEB AmazonPolarityClassificationtest set self-reported96.082
- ap_weighted on MTEB AmazonPolarityClassificationtest set self-reported96.082
- f1 on MTEB AmazonPolarityClassificationtest set self-reported97.194