AtakanTekparmak commited on
Commit
9d0791a
·
verified ·
1 Parent(s): eff7ed5

fix: Updated README

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -130,7 +130,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
130
 
131
  generated_ids = model.generate(
132
  **model_inputs,
133
- max_new_tokens=512
134
  )
135
  generated_ids = [
136
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
@@ -178,10 +178,9 @@ This code first calculates tomorrow's date, then checks if the time slot from 10
178
 
179
  We evaluate the model on the following benchmarks:
180
 
181
- 1. Benchmark 1
182
- 2. Benchmark 2
183
- 3. ...
184
- 4. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark can be found on the [Github repo](https://github.com/firstbatchxyz/function-calling-eval) and in our [blog post](blog-link)
185
 
186
  Below are the evaluation results for Qwen2.5-Coder-3B-Instruct and Dria-Agent-α-3B
187
 
 
130
 
131
  generated_ids = model.generate(
132
  **model_inputs,
133
+ max_new_tokens=2048
134
  )
135
  generated_ids = [
136
  output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
 
178
 
179
  We evaluate the model on the following benchmarks:
180
 
181
+ 1. Berkeley Function Calling Leaderboard (BFCL)
182
+ 2. MMLU-Pro
183
+ 3. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark and the Github repo will be released soon.
 
184
 
185
  Below are the evaluation results for Qwen2.5-Coder-3B-Instruct and Dria-Agent-α-3B
186