andthattoo
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -202,7 +202,7 @@ We evaluate the model on the following benchmarks:
|
|
202 |
2. MMLU-Pro
|
203 |
3. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark and the Github repo will be released soon.
|
204 |
|
205 |
-
Below are the BFCL results: evaluation results for ***Qwen2.5-Coder-3B-Instruct***, ***Dria-Agent-α-3B*** and ***gpt-4o-2024-11-20***
|
206 |
|
207 |
| Metric | Qwen/Qwen2.5-3B-Instruct | Dria-Agent-a-3B | Dria-Agent-a-7B | gpt-4o-2024-11-20 (Prompt) |
|
208 |
|---------------------------------------|-----------|-----------|-----------|-----------|
|
|
|
202 |
2. MMLU-Pro
|
203 |
3. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark and the Github repo will be released soon.
|
204 |
|
205 |
+
Below are the BFCL results: evaluation results for ***Qwen2.5-Coder-3B-Instruct***, ***Dria-Agent-α-3B***, ***Dria-Agent-α-7B***, and ***gpt-4o-2024-11-20***
|
206 |
|
207 |
| Metric | Qwen/Qwen2.5-3B-Instruct | Dria-Agent-a-3B | Dria-Agent-a-7B | gpt-4o-2024-11-20 (Prompt) |
|
208 |
|---------------------------------------|-----------|-----------|-----------|-----------|
|