dfurman/Qwen2-72B-Orpo-v0.1

Model

This model is a finetune of Qwen/Qwen2-72B-Instruct on 1.5k rows of mlabonne/orpo-dpo-mix-40k. It was trained as a generalist language model for a variety of text generation use cases, including support of agentic capabilities, roleplaying, reasoning, multi-turn conversations, long context coherence, and more.

Thanks go out to mlabonne, Qwen, et al. for the source dataset and base model.

image/png

image/png

You can find the experiment on W&B at this address.

πŸ’» Usage

Setup
!pip install -qU transformers accelerate bitsandbytes
!huggingface-cli download dfurman/Qwen2-72B-Orpo-v0.1
from transformers import AutoTokenizer, BitsAndBytesConfig
import transformers
import torch


if torch.cuda.get_device_capability()[0] >= 8:
    !pip install -qqq flash-attn
    attn_implementation = "flash_attention_2"
    torch_dtype = torch.bfloat16
else:
    attn_implementation = "eager"
    torch_dtype = torch.float16

# quantize if necessary
# bnb_config = BitsAndBytesConfig(
#    load_in_4bit=True,
#    bnb_4bit_quant_type="nf4",
#    bnb_4bit_compute_dtype=torch_dtype,
#    bnb_4bit_use_double_quant=True,
# )

model = "dfurman/Qwen2-72B-Orpo-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={
        "torch_dtype": torch_dtype,
        # "quantization_config": bnb_config,
        "device_map": "auto",
        "attn_implementation": attn_implementation,
    }
)

Run

question = """The bakers at the Beverly Hills Bakery baked 200 loaves of bread on Monday morning. 
They sold 93 loaves in the morning and 39 loaves in the afternoon. 
A grocery store then returned 6 unsold loaves back to the bakery. 
How many loaves of bread did the bakery have left?
Respond as succinctly as possible. Format the response as a completion of this table:
|step|subquestion|procedure|result|
|:---|:----------|:--------|:-----:|"""


messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": question},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# print("***Prompt:\n", prompt)

outputs = pipeline(prompt, max_new_tokens=1000, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print("***Generation:")
print(outputs[0]["generated_text"][len(prompt):])
***Generation:
|1|Initial loaves|Start with total loaves|200|
|2|Sold in morning|Subtract morning sales|200 - 93 = 107|
|3|Sold in afternoon|Subtract afternoon sales|107 - 39 = 68|
|4|Returned loaves|Add returned loaves|68 + 6 = 74|

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 43.32
IFEval (0-Shot) 78.80
BBH (3-Shot) 57.41
MATH Lvl 5 (4-Shot) 35.42
GPQA (0-shot) 17.90
MuSR (0-shot) 20.87
MMLU-PRO (5-shot) 49.50
Downloads last month
2,797
Safetensors
Model size
72.7B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for dfurman/Qwen2-72B-Orpo-v0.1

Base model

Qwen/Qwen2-72B
Finetuned
(6)
this model
Merges
1 model
Quantizations
2 models

Dataset used to train dfurman/Qwen2-72B-Orpo-v0.1

Space using dfurman/Qwen2-72B-Orpo-v0.1 1

Collection including dfurman/Qwen2-72B-Orpo-v0.1

Evaluation results