AtakanTekparmak
commited on
fix: Updated README
Browse files
README.md
CHANGED
@@ -130,7 +130,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|
130 |
|
131 |
generated_ids = model.generate(
|
132 |
**model_inputs,
|
133 |
-
max_new_tokens=
|
134 |
)
|
135 |
generated_ids = [
|
136 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
@@ -178,10 +178,9 @@ This code first calculates tomorrow's date, then checks if the time slot from 10
|
|
178 |
|
179 |
We evaluate the model on the following benchmarks:
|
180 |
|
181 |
-
1.
|
182 |
-
2.
|
183 |
-
3.
|
184 |
-
4. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark can be found on the [Github repo](https://github.com/firstbatchxyz/function-calling-eval) and in our [blog post](blog-link)
|
185 |
|
186 |
Below are the evaluation results for Qwen2.5-Coder-3B-Instruct and Dria-Agent-α-3B
|
187 |
|
|
|
130 |
|
131 |
generated_ids = model.generate(
|
132 |
**model_inputs,
|
133 |
+
max_new_tokens=2048
|
134 |
)
|
135 |
generated_ids = [
|
136 |
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
|
|
|
178 |
|
179 |
We evaluate the model on the following benchmarks:
|
180 |
|
181 |
+
1. Berkeley Function Calling Leaderboard (BFCL)
|
182 |
+
2. MMLU-Pro
|
183 |
+
3. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark and the Github repo will be released soon.
|
|
|
184 |
|
185 |
Below are the evaluation results for Qwen2.5-Coder-3B-Instruct and Dria-Agent-α-3B
|
186 |
|