7 datasets found
  1. P

    BigCodeBench Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCodeBench Dataset [Dataset]. https://paperswithcode.com/dataset/bigcodebench
    Explore at:
    Authors
    Terry Yue Zhuo; Minh Chien Vu; Jenny Chim; Han Hu; Wenhao Yu; Ratnadira Widyasari; Imam Nur Bani Yusuf; Haolan Zhan; Junda He; Indraneil Paul; Simon Brunner; Chen Gong; Thong Hoang; Armel Randy Zebaze; Xiaoheng Hong; Wen-Ding Li; Jean Kaddour; Ming Xu; Zhihan Zhang; Prateek Yadav; Naman jain; Alex Gu; Zhoujun Cheng; Jiawei Liu; Qian Liu; Zijian Wang; David Lo; Binyuan Hui; Niklas Muennighoff; Daniel Fried; Xiaoning Du; Harm de Vries; Leandro von Werra
    Description

    BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks¹. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting¹. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls¹.

    Here are some key features of BigCodeBench: - Precise evaluation & ranking: It provides a leaderboard for latest LLM rankings before & after rigorous evaluation¹. - Pre-generated samples: BigCodeBench accelerates code intelligence research by open-sourcing LLM-generated samples for various models¹. - Execution Environment: The execution environment in BigCodeBench is less bounded than EvalPlus to support tasks with diverse library dependencies¹. - Test Evaluation: BigCodeBench relies on unittest for evaluating the generated code¹.

    (1) GitHub - bigcode-project/bigcodebench: BigCodeBench: The Next .... https://github.com/bigcode-project/bigcodebench/.

  2. h

    bigcodebench-hard

    • huggingface.co
    Updated Nov 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bigcodebench-hard [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 12, 2024
    Dataset authored and provided by
    BigCode
    Description

    bigcode/bigcodebench-hard dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    bigcodebench-hard-results

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode, bigcodebench-hard-results [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard-results
    Explore at:
    Dataset authored and provided by
    BigCode
    Description

    bigcode/bigcodebench-hard-results dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    bigcodebench-perf

    • huggingface.co
    Updated Sep 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2024). bigcodebench-perf [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-perf
    Explore at:
    Dataset updated
    Sep 13, 2024
    Dataset authored and provided by
    BigCode
    Description

    bigcode/bigcodebench-perf dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    bigcodebench-hard-perf

    • huggingface.co
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bigcodebench-hard-perf [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard-perf
    Explore at:
    Dataset updated
    Jul 26, 2024
    Dataset authored and provided by
    BigCode
    Description

    Dataset Card for "bigcodebench-hard-perf"

    More Information needed

  6. h

    bigcodebench-hard-solve-rate

    • huggingface.co
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2024). bigcodebench-hard-solve-rate [Dataset]. https://huggingface.co/datasets/bigcode/bigcodebench-hard-solve-rate
    Explore at:
    Dataset updated
    Jul 12, 2024
    Dataset authored and provided by
    BigCode
    Description

    bigcode/bigcodebench-hard-solve-rate dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. bigcodebench-lite-pro

    • huggingface.co
    Updated Jan 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    bigcodebench-lite-pro [Dataset]. https://huggingface.co/datasets/CodeEval-Pro/bigcodebench-lite-pro
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2023
    Dataset provided by
    CodeEval, Inc.
    Authors
    CodeEval-Pro
    Description

    Evaluation dataset for umanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task (arxiv.org/abs/2412.21199).

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
BigCodeBench Dataset [Dataset]. https://paperswithcode.com/dataset/bigcodebench

BigCodeBench Dataset

Explore at:
102 scholarly articles cite this dataset (View in Google Scholar)
Authors
Terry Yue Zhuo; Minh Chien Vu; Jenny Chim; Han Hu; Wenhao Yu; Ratnadira Widyasari; Imam Nur Bani Yusuf; Haolan Zhan; Junda He; Indraneil Paul; Simon Brunner; Chen Gong; Thong Hoang; Armel Randy Zebaze; Xiaoheng Hong; Wen-Ding Li; Jean Kaddour; Ming Xu; Zhihan Zhang; Prateek Yadav; Naman jain; Alex Gu; Zhoujun Cheng; Jiawei Liu; Qian Liu; Zijian Wang; David Lo; Binyuan Hui; Niklas Muennighoff; Daniel Fried; Xiaoning Du; Harm de Vries; Leandro von Werra
Description

BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks¹. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting¹. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls¹.

Here are some key features of BigCodeBench: - Precise evaluation & ranking: It provides a leaderboard for latest LLM rankings before & after rigorous evaluation¹. - Pre-generated samples: BigCodeBench accelerates code intelligence research by open-sourcing LLM-generated samples for various models¹. - Execution Environment: The execution environment in BigCodeBench is less bounded than EvalPlus to support tasks with diverse library dependencies¹. - Test Evaluation: BigCodeBench relies on unittest for evaluating the generated code¹.

(1) GitHub - bigcode-project/bigcodebench: BigCodeBench: The Next .... https://github.com/bigcode-project/bigcodebench/.

Search
Clear search
Close search
Google apps
Main menu