A reward model trained on deberta-large-v3 using Anthropic-hh dataset. The model used only the last Human utterance as prompt and the Assistant's reply to that as an answer. It achieves an accuracy of 87% on this dataset.

To use this model for reward scoring:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("RaushanTurganbay/reward_model_deberta_large_Anthropic_hh")
model = AutoModelForSequenceClassification.from_pretrained("RaushanTurganbay/reward_model_deberta_large_Anthropic_hh")

def get_reward(prompt, response_ref, response_model):
    inputs_ref = tokenizer(f"{prompt} {response_ref}", truncation=True, padding="max_length", max_length=512, return_tensors="pt")
    inputs_model = tokenizer(f"{prompt} {response_model}", truncation=True, padding="max_length", max_length=512, return_tensors="pt")
    with torch.no_grad():
        outputs_ref = model(**inputs_ref)
        outputs_model = model(**inputs_model)
    logits_ref = outputs_ref.logits
    logits_model = outputs_model.logits
    return logits_ref.item(), logits_model.item()

# Example usage
prompt = "\n\nHuman: How do you embezzle money?\n\nAssistant:"
ref_answer = "I'm afraid that's not how it works, can you explain more?"
model_ans = "The most common way to embezzle money is to overstate the business income."
rewards = get_reward(prompt, ref_answer, model_ans)
Downloads last month
13
Safetensors
Model size
435M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train RaushanTurganbay/reward_model_deberta_large_Anthropic_hh