Liqiang Jing

liqiang888

AI & ML interests

None yet

Recent Activity

upvoted a paper 12 days ago

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

reacted to m-ric's post with 👍 4 months ago

𝗔𝗿𝗲 𝗔𝗴𝗲𝗻𝘁𝘀 𝗰𝗮𝗽𝗮𝗯𝗹𝗲 𝗲𝗻𝗼𝘂𝗴𝗵 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲? ⇒ 𝗠𝗲𝗮𝘀𝘂𝗿𝗲 𝘁𝗵𝗲𝗶𝗿 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝘄𝗶𝘁𝗵 𝗗𝗦𝗕𝗲𝗻𝗰𝗵 📊 A team from Tencent AI wanted to evaluate agentic systems on data science (DS) tasks : but they noticed that existing agentic benchmarks were severely limited in several aspects: they were limited to text and did not include tables or images, were only specific to certain packages, only performed exact match evaluation… ➡️ So they set out to build a much more exhaustive approach, to finally make the definitive DS agent benchmark. 𝗧𝗵𝗲 𝗗𝗦𝗕𝗲𝗻𝗰𝗵 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 ▪️DS bench has 466 data analysis tasks and 74 data modelling tasks ▪️The tasks are sourced from ModelOff and Kaggle, the platforms hosting the most popular data science competitions ▪️Difference with previous DS benchmarks: ❶ This benchmark leverages various modalities on top of text: images, Excel files, tables ❷ Complex tables: sometimes several tables should be leveraged to answer one question ❸ The context is richer, with longer descriptions. ▪️ Evaluation metrics : the benchmark is scored with an LLM as a judge, using a specific prompt. 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝗳𝗿𝗼𝗺 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀 ▪️ Their evaluation confirms that using LLMs in an agent setup, for instance by allowing them to run a single step of code execution, is more costly (especially with multi-turn frameworks like autogen) but also much more performant than the vanilla LLM. ▪️ The sets of tasks solved by different models (like GPT-3.5 vs Llama-3-8B) has quite low overlap, which suggests that different models tend to try very different approches. This new benchmark is really welcome, can't wait to try transformers agents on it! 🤗 Read their full paper 👉 https://huggingface.co/papers/2409.07703

updated a dataset 4 months ago

liqiang888/DSBench

View all activity

Organizations

None yet

liqiang888's activity

upvoted a paper 12 days ago

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published Sep 12, 2024 • 67

reacted to m-ric's post with 👍 4 months ago

Post

1653

𝗔𝗿𝗲 𝗔𝗴𝗲𝗻𝘁𝘀 𝗰𝗮𝗽𝗮𝗯𝗹𝗲 𝗲𝗻𝗼𝘂𝗴𝗵 𝗳𝗼𝗿 𝗗𝗮𝘁𝗮 𝗦𝗰𝗶𝗲𝗻𝗰𝗲? ⇒ 𝗠𝗲𝗮𝘀𝘂𝗿𝗲 𝘁𝗵𝗲𝗶𝗿 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝘄𝗶𝘁𝗵 𝗗𝗦𝗕𝗲𝗻𝗰𝗵 📊

A team from Tencent AI wanted to evaluate agentic systems on data science (DS) tasks : but they noticed that existing agentic benchmarks were severely limited in several aspects: they were limited to text and did not include tables or images, were only specific to certain packages, only performed exact match evaluation…

➡️ So they set out to build a much more exhaustive approach, to finally make the definitive DS agent benchmark.

𝗧𝗵𝗲 𝗗𝗦𝗕𝗲𝗻𝗰𝗵 𝗱𝗮𝘁𝗮𝘀𝗲𝘁
▪️DS bench has 466 data analysis tasks and 74 data modelling tasks
▪️The tasks are sourced from ModelOff and Kaggle, the platforms hosting the most popular data science competitions
▪️Difference with previous DS benchmarks:
❶ This benchmark leverages various modalities on top of text: images, Excel files, tables
❷ Complex tables: sometimes several tables should be leveraged to answer one question
❸ The context is richer, with longer descriptions.
▪️ Evaluation metrics : the benchmark is scored with an LLM as a judge, using a specific prompt.

𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝗳𝗿𝗼𝗺 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗻𝗴 𝗮𝗴𝗲𝗻𝘁𝘀
▪️ Their evaluation confirms that using LLMs in an agent setup, for instance by allowing them to run a single step of code execution, is more costly (especially with multi-turn frameworks like autogen) but also much more performant than the vanilla LLM.
▪️ The sets of tasks solved by different models (like GPT-3.5 vs Llama-3-8B) has quite low overlap, which suggests that different models tend to try very different approches.

This new benchmark is really welcome, can't wait to try transformers agents on it! 🤗

Read their full paper 👉 DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? (2409.07703)

updated a dataset 4 months ago

liqiang888/DSBench

Updated Sep 16, 2024 • 19 • 3

authored a paper 4 months ago

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Paper • 2409.07703 • Published Sep 12, 2024 • 67

authored a paper 10 months ago

FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models

Paper • 2311.01477 • Published Nov 2, 2023 • 1

upvoted a paper 10 months ago

FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models

Paper • 2311.01477 • Published Nov 2, 2023 • 1