7 datasets found

P
BigCodeBench Dataset
paperswithcode.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigCodeBench Dataset [Dataset]. https://paperswithcode.com/dataset/bigcodebench
Explore at:
Authors
Terry Yue Zhuo; Minh Chien Vu; Jenny Chim; Han Hu; Wenhao Yu; Ratnadira Widyasari; Imam Nur Bani Yusuf; Haolan Zhan; Junda He; Indraneil Paul; Simon Brunner; Chen Gong; Thong Hoang; Armel Randy Zebaze; Xiaoheng Hong; Wen-Ding Li; Jean Kaddour; Ming Xu; Zhihan Zhang; Prateek Yadav; Naman jain; Alex Gu; Zhoujun Cheng; Jiawei Liu; Qian Liu; Zijian Wang; David Lo; Binyuan Hui; Niklas Muennighoff; Daniel Fried; Xiaoning Du; Harm de Vries; Leandro von Werra
Description
BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks¹. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting¹. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls¹.

Here are some key features of BigCodeBench: - Precise evaluation & ranking: It provides a leaderboard for latest LLM rankings before & after rigorous evaluation¹. - Pre-generated samples: BigCodeBench accelerates code intelligence research by open-sourcing LLM-generated samples for various models¹. - Execution Environment: The execution environment in BigCodeBench is less bounded than EvalPlus to support tasks with diverse library dependencies¹. - Test Evaluation: BigCodeBench relies on unittest for evaluating the generated code¹.

(1) GitHub - bigcode-project/bigcodebench: BigCodeBench: The Next .... https://github.com/bigcode-project/bigcodebench/.
h
bigcodebench-hard
huggingface.co
Updated Nov 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bigcodebench-hard [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 12, 2024
Dataset authored and provided by
BigCode
Description
bigcode/bigcodebench-hard dataset hosted on Hugging Face and contributed by the HF Datasets community
h
bigcodebench-hard-results
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigCode, bigcodebench-hard-results [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard-results
Explore at:
Dataset authored and provided by
BigCode
Description
bigcode/bigcodebench-hard-results dataset hosted on Hugging Face and contributed by the HF Datasets community
h
bigcodebench-perf
huggingface.co
Updated Sep 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigCode (2024). bigcodebench-perf [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-perf
Explore at:
Dataset updated
Sep 13, 2024
Dataset authored and provided by
BigCode
Description
bigcode/bigcodebench-perf dataset hosted on Hugging Face and contributed by the HF Datasets community
h
bigcodebench-hard-perf
huggingface.co
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bigcodebench-hard-perf [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard-perf
Explore at:
Dataset updated
Jul 26, 2024
Dataset authored and provided by
BigCode
Description
Dataset Card for "bigcodebench-hard-perf"

More Information needed
h
bigcodebench-hard-solve-rate
huggingface.co
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigCode (2024). bigcodebench-hard-solve-rate [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard-solve-rate
Explore at:
Dataset updated
Jul 12, 2024
Dataset authored and provided by
BigCode
Description
bigcode/bigcodebench-hard-solve-rate dataset hosted on Hugging Face and contributed by the HF Datasets community
bigcodebench-lite-pro
huggingface.co
Updated Jan 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
bigcodebench-lite-pro [Dataset]. https://huggingface.co/datasets/CodeEval-Pro/bigcodebench-lite-pro
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 3, 2023
Dataset provided by
CodeEval, Inc.
Authors
CodeEval-Pro
Description
Evaluation dataset for umanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task (arxiv.org/abs/2412.21199).
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

BigCodeBench Dataset [Dataset]. https://paperswithcode.com/dataset/bigcodebench

BigCodeBench Dataset

Explore at:

102 scholarly articles cite this dataset (View in Google Scholar)

Authors

Terry Yue Zhuo; Minh Chien Vu; Jenny Chim; Han Hu; Wenhao Yu; Ratnadira Widyasari; Imam Nur Bani Yusuf; Haolan Zhan; Junda He; Indraneil Paul; Simon Brunner; Chen Gong; Thong Hoang; Armel Randy Zebaze; Xiaoheng Hong; Wen-Ding Li; Jean Kaddour; Ming Xu; Zhihan Zhang; Prateek Yadav; Naman jain; Alex Gu; Zhoujun Cheng; Jiawei Liu; Qian Liu; Zijian Wang; David Lo; Binyuan Hui; Niklas Muennighoff; Daniel Fried; Xiaoning Du; Harm de Vries; Leandro von Werra

Description

BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks¹. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting¹. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls¹.

Here are some key features of BigCodeBench: - Precise evaluation & ranking: It provides a leaderboard for latest LLM rankings before & after rigorous evaluation¹. - Pre-generated samples: BigCodeBench accelerates code intelligence research by open-sourcing LLM-generated samples for various models¹. - Execution Environment: The execution environment in BigCodeBench is less bounded than EvalPlus to support tasks with diverse library dependencies¹. - Test Evaluation: BigCodeBench relies on unittest for evaluating the generated code¹.

(1) GitHub - bigcode-project/bigcodebench: BigCodeBench: The Next .... https://github.com/bigcode-project/bigcodebench/.

Clear search

Close search

Google apps

Main menu

BigCodeBench Dataset

bigcodebench-hard

bigcodebench-hard-results

bigcodebench-perf

bigcodebench-hard-perf

bigcodebench-hard-solve-rate

bigcodebench-lite-pro

BigCodeBench Dataset