andthattoo commited on
Commit
f8e0a19
·
verified ·
1 Parent(s): 267652f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -202,7 +202,7 @@ We evaluate the model on the following benchmarks:
202
  2. MMLU-Pro
203
  3. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark and the Github repo will be released soon.
204
 
205
- Below are the BFCL results: evaluation results for ***Qwen2.5-Coder-3B-Instruct***, ***Dria-Agent-α-3B*** and ***gpt-4o-2024-11-20***
206
 
207
  | Metric | Qwen/Qwen2.5-3B-Instruct | Dria-Agent-a-3B | Dria-Agent-a-7B | gpt-4o-2024-11-20 (Prompt) |
208
  |---------------------------------------|-----------|-----------|-----------|-----------|
 
202
  2. MMLU-Pro
203
  3. **Dria-Pythonic-Agent-Benchmark (DPAB):** The benchmark we curated with a synthetic data generation +model-based validation + filtering and manual selection to evaluate LLMs on their Pythonic function calling ability, spanning multiple scenarios and tasks. More detailed information about the benchmark and the Github repo will be released soon.
204
 
205
+ Below are the BFCL results: evaluation results for ***Qwen2.5-Coder-3B-Instruct***, ***Dria-Agent-α-3B***, ***Dria-Agent-α-7B***, and ***gpt-4o-2024-11-20***
206
 
207
  | Metric | Qwen/Qwen2.5-3B-Instruct | Dria-Agent-a-3B | Dria-Agent-a-7B | gpt-4o-2024-11-20 (Prompt) |
208
  |---------------------------------------|-----------|-----------|-----------|-----------|