4 datasets found

P
DSEval-Exercise Dataset
paperswithcode.com
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-Exercise Dataset [Dataset]. https://paperswithcode.com/dataset/dseval-exercise
Explore at:
Dataset updated
Feb 26, 2024
Authors
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren
Description
In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.
P
DSEval-Kaggle Dataset
paperswithcode.com
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-Kaggle Dataset [Dataset]. https://paperswithcode.com/dataset/dseval
Explore at:
Dataset updated
Feb 26, 2024
Authors
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren
Description
In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.
P
DSEval-SO Dataset
paperswithcode.com
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-SO Dataset [Dataset]. https://paperswithcode.com/dataset/dseval-so
Explore at:
Dataset updated
Feb 26, 2024
Authors
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren
Description
In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.
P
DSEval-LeetCode Dataset
paperswithcode.com
Updated Feb 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-LeetCode Dataset [Dataset]. https://paperswithcode.com/dataset/dseval-leetcode
Explore at:
Dataset updated
Feb 26, 2024
Authors
Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren
Description
In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren (2024). DSEval-Exercise Dataset [Dataset]. https://paperswithcode.com/dataset/dseval-exercise

DSEval-Exercise Dataset

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Feb 26, 2024

Authors

Yuge Zhang; Qiyang Jiang; Xingyu Han; Nan Chen; Yuqing Yang; Kan Ren

Description

In this paper, we introduce a novel benchmarking framework designed specifically for evaluations of data science agents. Our contributions are three-fold. First, we propose DSEval, an evaluation paradigm that enlarges the evaluation scope to the full lifecycle of LLM-based data science agents. We also cover aspects including but not limited to the quality of the derived analytical solutions or machine learning models, as well as potential side effects such as unintentional changes to the original data. Second, we incorporate a novel bootstrapped annotation process letting LLM themselves generate and annotate the benchmarks with ``human in the loop''. A novel language (i.e., DSEAL) has been proposed and the derived four benchmarks have significantly improved the benchmark scalability and coverage, with largely reduced human labor. Third, based on DSEval and the four benchmarks, we conduct a comprehensive evaluation of various data science agents from different aspects. Our findings reveal the common challenges and limitations of the current works, providing useful insights and shedding light on future research on LLM-based data science agents.

This is one of DSEval benchmarks.

Clear search

Close search

Google apps

Main menu

DSEval-Exercise Dataset

DSEval-Kaggle Dataset

DSEval-SO Dataset

DSEval-LeetCode Dataset

DSEval-Exercise Dataset