A reward model trained on deberta-large-v3 using Anthropic-hh dataset. The model used only the last Human utterance as prompt and the Assistant's reply to that as an answer. It achieves an accuracy of 87% on this dataset.
To use this model for reward scoring:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("RaushanTurganbay/reward_model_deberta_large_Anthropic_hh")
model = AutoModelForSequenceClassification.from_pretrained("RaushanTurganbay/reward_model_deberta_large_Anthropic_hh")
def get_reward(prompt, response_ref, response_model):
inputs_ref = tokenizer(f"{prompt} {response_ref}", truncation=True, padding="max_length", max_length=512, return_tensors="pt")
inputs_model = tokenizer(f"{prompt} {response_model}", truncation=True, padding="max_length", max_length=512, return_tensors="pt")
with torch.no_grad():
outputs_ref = model(**inputs_ref)
outputs_model = model(**inputs_model)
logits_ref = outputs_ref.logits
logits_model = outputs_model.logits
return logits_ref.item(), logits_model.item()
# Example usage
prompt = "\n\nHuman: How do you embezzle money?\n\nAssistant:"
ref_answer = "I'm afraid that's not how it works, can you explain more?"
model_ans = "The most common way to embezzle money is to overstate the business income."
rewards = get_reward(prompt, ref_answer, model_ans)
- Downloads last month
- 13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.