77 datasets found

openai_humaneval
huggingface.co
Updated Jan 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenAI (2022). openai_humaneval [Dataset]. https://huggingface.co/datasets/openai/openai_humaneval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 1, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for OpenAI HumanEval

Dataset Summary

The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.

Supported Tasks and Leaderboards Languages

The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.
HumanEval-X
opendatalab.com
zip
Updated Jan 1, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tsinghua University (2023). HumanEval-X [Dataset]. https://opendatalab.com/OpenDataLab/humaneval-x
Explore at:
zipAvailable download formats
Dataset updated
Jan 1, 2023
Dataset provided by
智谱http://zhipuai.cn/
Tsinghua University
Huawei
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
HumanEval-X is a benchmark for evaluating the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks, such as code generation and translation.
OpenAI HumanEval (Coding Challenges & Unit-tests)
kaggle.com
Updated Nov 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). OpenAI HumanEval (Coding Challenges & Unit-tests) [Dataset]. https://www.kaggle.com/datasets/thedevastator/handcrafted-dataset-for-code-generation-models
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 21, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
OpenAI HumanEval (Coding Challenges & Unit-tests)

164 programming problems with a function signature, docstring, body, unittests

Source

Huggingface Hub: link

About this dataset

The OpenAI HumanEval dataset is a handcrafted set of 164 programming problems designed to challenge code generation models. The problems include a function signature, docstring, body, and several unit tests, all handwritten to ensure they're not included in the training set of code generation models. The entry point for each problem is the prompt, making it an ideal dataset for testing natural language processing and machine learning models' ability to generate Python programs from scratch

How to use the dataset

To use this dataset, simply download the zip file and extract it. The resulting directory will contain the following files:

canonical_solution.py: The solution to the problem. (String) entry_point.py: The entry point for the problem. (String) prompt.txt: The prompt for the problem. (String) test.py: The unit tests for the problem

Research Ideas

The dataset could be used to develop a model that generates programs from natural language.

The dataset could be used to develop a model that completes or debugs programs.

The dataset could be used to develop a model that writes unit tests for programs

Acknowledgements

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: test.csv | Column name | Description | |:-----------------------|:--------------------------------------------------------------------------------------------------| | prompt | A natural language description of the programming problem. (String) | | canonical_solution | The correct Python code solution to the problem. (String) | | test | A set of unit tests that the generated code must pass in order to be considered correct. (String) | | entry_point | The starting point for the generated code. (String) |
openai-humaneval
opendatalab.com
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anthropic AI (2023). openai-humaneval [Dataset]. https://opendatalab.com/OpenDataLab/openai-humaneval
Explore at:
zipAvailable download formats
Dataset updated
Dec 16, 2023
Dataset provided by
Anthropichttps://anthropic.com/
OpenAIhttp://openai.com/
Zipline
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.
h
humanevalpack
huggingface.co
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigCode (2024). humanevalpack [Dataset]. https://huggingface.co/datasets/bigcode/humanevalpack
Explore at:
Dataset updated
Apr 15, 2024
Dataset authored and provided by
BigCode
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for HumanEvalPack

Dataset Summary

HumanEvalPack is an extension of OpenAI's HumanEval to cover 6 total languages across 3 tasks. The Python split is exactly the same as OpenAI's Python HumanEval. The other splits are translated by humans (similar to HumanEval-X but with additional cleaning, see here). Refer to the OctoPack paper for more details.

Languages: Python, JavaScript, Java, Go, C++, Rust OctoPack🐙🎒:

Data CommitPack 4TB of GitHub commits… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/humanevalpack.
h
HumanEval
huggingface.co
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton-AI (2025). HumanEval [Dataset]. https://huggingface.co/datasets/Gen-Verse/HumanEval
Explore at:
Dataset updated
Aug 27, 2025
Dataset authored and provided by
Princeton-AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Gen-Verse/HumanEval dataset hosted on Hugging Face and contributed by the HF Datasets community
h
instructhumaneval
huggingface.co
opendatalab.com
Updated Jun 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeParrot (2023). instructhumaneval [Dataset]. https://huggingface.co/datasets/codeparrot/instructhumaneval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 29, 2023
Dataset authored and provided by
CodeParrot
Description
Instruct HumanEval

Summary

InstructHumanEval is a modified version of OpenAI HumanEval. For a given prompt, we extracted its signature, its docstring as well as its header to create a flexing setting which would allow to evaluation instruction-tuned LLM. The delimiters used in the instruction-tuning procedure can be use to build and instruction that would allow the model to elicit its best capabilities. Here is an example of use The prompt can be built as follows… See the full description on the dataset page: https://huggingface.co/datasets/codeparrot/instructhumaneval.
h
Reorganized-humaneval
huggingface.co
Updated Mar 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yixin He (2025). Reorganized-humaneval [Dataset]. https://huggingface.co/datasets/HeyixInn0/Reorganized-humaneval
Explore at:
Dataset updated
Mar 11, 2025
Authors
Yixin He
Description
HeyixInn0/Reorganized-humaneval dataset hosted on Hugging Face and contributed by the HF Datasets community
humaneval-pro
huggingface.co
Updated Dec 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeEval-Pro (2024). humaneval-pro [Dataset]. https://huggingface.co/datasets/CodeEval-Pro/humaneval-pro
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 31, 2024
Dataset provided by
CodeEval, Inc.
Authors
CodeEval-Pro
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Evaluation dataset for umanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Task (arxiv.org/abs/2412.21199).
openai-human-eval
kaggle.com
Updated Apr 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Inoichan (2024). openai-human-eval [Dataset]. https://www.kaggle.com/datasets/inoueu1/openai-human-eval/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 10, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Inoichan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Inoichan

Released under MIT

Contents
h
humaneval-fix-starcoder
huggingface.co
Updated Feb 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eitan Turok (2024). humaneval-fix-starcoder [Dataset]. https://huggingface.co/datasets/eitanturok/humaneval-fix-starcoder
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 13, 2024
Authors
Eitan Turok
Description
eitanturok/humaneval-fix-starcoder dataset hosted on Hugging Face and contributed by the HF Datasets community
bc-humaneval
opendatalab.com
huggingface.co
zip
Updated Jan 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google Research (2024). bc-humaneval [Dataset]. https://opendatalab.com/OpenDataLab/bc-humaneval
Explore at:
zipAvailable download formats
Dataset updated
Jan 9, 2024
Dataset provided by
谷歌http://google.com/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The BabelCode-HumaneEval (BC-HumanEval) dataset converts the HumanEval dataset released by OpenAI to 16 programming languages.
h
humaneval
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaosen Zheng, humaneval [Dataset]. https://huggingface.co/datasets/xszheng2020/humaneval
Explore at:
Authors
Xiaosen Zheng
Description
xszheng2020/humaneval dataset hosted on Hugging Face and contributed by the HF Datasets community
h
HumanEval-V-Benchmark
huggingface.co
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HumanEval-V (2025). HumanEval-V-Benchmark [Dataset]. https://huggingface.co/datasets/HumanEval-V/HumanEval-V-Benchmark
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2025
Dataset authored and provided by
HumanEval-V
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
HumanEval-V: Benchmarking High-Level Visual Reasoning with Complex Diagrams in Coding Tasks

📄 Paper • 🏠 Home Page • 💻 GitHub Repository • 🏆 Leaderboard • 🤗 Dataset Viewer

HumanEval-V is a novel benchmark designed to evaluate the diagram understanding and reasoning capabilities of Large Multimodal Models (LMMs) in programming contexts. Unlike existing benchmarks, HumanEval-V focuses on coding tasks that require sophisticated visual reasoning over… See the full description on the dataset page: https://huggingface.co/datasets/HumanEval-V/HumanEval-V-Benchmark.

Replication package for "The Art of Repair: Optimizing Iterative Program...

zenodo.org

xz, zip

Updated May 6, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Fernando Vallecillos Ruiz; Fernando Vallecillos Ruiz; Max Hort; Max Hort; Leon Moonen; Leon Moonen (2025). Replication package for "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" [Dataset]. http://doi.org/10.5281/zenodo.15294696

Explore at:

xz, zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.15294696

Dataset updated

May 6, 2025

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Fernando Vallecillos Ruiz; Fernando Vallecillos Ruiz; Max Hort; Max Hort; Leon Moonen; Leon Moonen

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This repository contains the replication package for the paper "The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models" by Fernando Vallecillos Ruiz, Max Hort, and Leon Moonen, accepted for the research track of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025). A preprint of the paper is included.

The source code is distributed under the MIT license, and except for 3rd party datasets that come with their own license, all documentation, data, models and results in this repository are distributed under the CC BY 4.0 license.

Repository Overview

This repository contains the necessary scripts, data, and resources to replicate the experiments presented in our conference paper. The structure of this repository has been organized to facilitate ease of use for researchers interested in reproducing our results, conducting similar analyses, or building upon our work.

Repository Structure

Folder	Description
analysis	Contains Jupyter notebook scripts used to generate tables and visual analyses. These scripts assist in visualizing results, comparing metrics, and summarizing data from the experiments. The outputs can be easily exported for further use.
apr_training	Contains the dataset used for the Automated Program Repair (APR) training phase. This data is utilized by the scripts in `train_src/` for fine-tuning the models.
benchmarks	Includes JSON files representing different benchmarks, specifically HumanEval-Java and Defects4J. In this work, we have primarily focused on and revised HumanEval-Java.
inference_and_validation_src	Contains Python scripts used to generate patches and validate them across different benchmarks. These scripts play a critical role in producing and assessing model outputs.
inference_scripts	Bash scripts used to automate the process of submitting inference and validation jobs to the compute cluster. This facilitates multiple iterations of inference and validation in a streamlined manner.
models*	Stores the fine-tuned machine learning models used in the experiments. These models are the output of the fine-tuning process and are referenced by the inference scripts.
results	Contains all the outputs from the models in JSON format, generated during the inference process. These files represent the raw experimental results.
train_src	Python scripts for model fine-tuning. These scripts include methods for performing both full model training and LoRA fine-tuning for parameter-efficient updates.
validation_benchmark_dataset	Contains the benchmark datasets used during validation.

* Note that all contents except for the model files from the models/ folder are included in the compressed zip file in this Zenodo repository. The model files are uploaded separately to the repository to facilitate individual downloads, as several of them are relatively large (9.5-11.2GB).

Detailed Folder Descriptions

Analysis (`analysis/`)

This folder contains Jupyter notebook scripts used to generate tables and visual analyses of the experimental data. These scripts are designed to assist in visualizing results, comparing performance metrics, and summarizing experimental outcomes. Researchers can easily export the generated tables to spreadsheets for further processing or visualization. The outputs help in validating the experiment's consistency and provide insights into the performance of various model configurations.

Inference and Validation Source (`inference_and_validation_src/`)

The Python scripts in this folder are used for generating patches and validating them against predefined benchmarks. We utilize the "Fire" library to parse parameters and execute the relevant methods efficiently. This folder contains:

Scripts for generating patches directly from the benchmark data or using iterative approaches.
Validation utilities for Defects4J and HumanEval benchmarks to ensure the generated patches are functional and comply with benchmark requirements.

Key components include:

Patch generation logic.
Validation commands for HumanEval and Defects4J benchmarks.
Utilities to verify data integrity of generated JSON files.

Training Source (`train_src/`)

This folder contains the scripts used for model fine-tuning:

full_finetune.py: This script performs full fine-tuning of a model on a given training dataset. It updates all trainable parameters to achieve optimal model performance on the target task.
lora_finetune.py: This script implements LoRA (Low-Rank Adaptation) fine-tuning. LoRA is a parameter-efficient fine-tuning approach where only a smaller subset of model parameters are updated, making it effective for resource-constrained tasks.

Inference Scripts (`inference_scripts/`)

These Bash scripts are designed to automate the inference process by submitting multiple iterations of inference and validation jobs to the compute cluster. The scripts create job dependencies, ensuring that all necessary tasks are completed in a logical sequence.

The available inference scripts include:

model_inferencing_adjustable_FULL_d4j_big.sh: Executes inference for specified model configurations with multiple iterations and outputs per iteration.
model_inferencing_adjustable_FULL_d4j_lora_big.sh: Similar to the previous script, but optimized for LoRA-based models.

These scripts accept three parameters:

MODEL: The name of the model, as found in the models/ folder.
NUM_ITERATIONS: The number of iterations to run.
NUM_OUTPUTS: The number of outputs generated in each iteration.

Citation and Zenodo links

We hope this package serves as a useful resource for reproducing and expanding upon our research results. Please cite this work by referring to the published paper:

Fernando Vallecillos Ruiz, Max Hort, and Leon Moonen, 2025. The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models. In proceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025), ACM, 12 pages.

@inproceedings{ruiz2025:art,
  title = {{The Art of Repair: Optimizing Iterative Program Repair with 
       Instruction-Tuned Models}},
  author = {Ruiz, Fernando Vallecillos and Hort, Max and Moonen, Leon},
  booktitle = {{Proceedings of the 29th International Conference on Evaluation 
         and Assessment in Software Engineering (EASE)}},
  year = {2025},
  pages = {12},
  publisher = {{ACM}},
  language = {en}
}

The replication package is archived on Zenodo with DOI: 10.5281/zenodo.15294695.

h
humaneval-for-solidity-25
huggingface.co
Updated Apr 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BrainDAO (2025). humaneval-for-solidity-25 [Dataset]. https://huggingface.co/datasets/braindao/humaneval-for-solidity-25
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 3, 2025
Dataset authored and provided by
BrainDAO
Description
braindao/humaneval-for-solidity-25 dataset hosted on Hugging Face and contributed by the HF Datasets community
a
codellama-generations
aifasthub.com
huggingface.co
Updated Sep 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BigCode (2025). codellama-generations [Dataset]. https://www.aifasthub.com/datasets/bigcode/codellama-generations
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 12, 2025
Dataset authored and provided by
BigCode
Description
Here you can find the solutions generated by of the Code Llama models to the HumanEval and multiPL-E benchmarks used in the Big Code models Leaderboard: https://huggingface.co/spaces/bigcode/bigcode-models-leaderboard.
h
AdaDecode-CodeLlama-13B-Instruct-HumanEval
huggingface.co
Updated Jan 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yu Meng's Lab (2025). AdaDecode-CodeLlama-13B-Instruct-HumanEval [Dataset]. https://huggingface.co/datasets/meng-lab/AdaDecode-CodeLlama-13B-Instruct-HumanEval
Explore at:
Dataset updated
Jan 28, 2025
Dataset authored and provided by
Yu Meng's Lab
Description
meng-lab/AdaDecode-CodeLlama-13B-Instruct-HumanEval dataset hosted on Hugging Face and contributed by the HF Datasets community
h
humaneval-mbpp-codegen-qa
huggingface.co
Updated Apr 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oliver Stanley (2023). humaneval-mbpp-codegen-qa [Dataset]. https://huggingface.co/datasets/OllieStanley/humaneval-mbpp-codegen-qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 2, 2023
Authors
Oliver Stanley
Description
Dataset Card for "humaneval-mbpp-codegen-qa"

This dataset contains prompt-reply (question-answer) pairs where the prompt is to create a Python function which satisfies the functionality described in a specified docstring. The responses are then the generated functions.
h
humaneval
huggingface.co
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Leon (2025). humaneval [Dataset]. https://huggingface.co/datasets/Leon-Leee/humaneval
Explore at:
Dataset updated
Aug 11, 2025
Authors
Leon
Description
Leon-Leee/humaneval dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

OpenAI (2022). openai_humaneval [Dataset]. https://huggingface.co/datasets/openai/openai_humaneval

openai_humaneval

OpenAI HumanEval

openai/openai_humaneval

Explore at:

26 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 1, 2022

Dataset authored and provided by

OpenAIhttp://openai.com/

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for OpenAI HumanEval

  Dataset Summary

The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.

  Supported Tasks and Leaderboards





  Languages

The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.

Clear search

Close search

Google apps

Main menu

openai_humaneval

HumanEval-X

OpenAI HumanEval (Coding Challenges & Unit-tests)

OpenAI HumanEval (Coding Challenges & Unit-tests)

164 programming problems with a function signature, docstring, body, unittests

Source

About this dataset

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

openai-humaneval

humanevalpack

HumanEval

instructhumaneval

Reorganized-humaneval

humaneval-pro

openai-human-eval

Dataset

Contents

humaneval-fix-starcoder

bc-humaneval

humaneval

HumanEval-V-Benchmark

Replication package for "The Art of Repair: Optimizing Iterative Program...

Repository Overview

Repository Structure

Detailed Folder Descriptions

Analysis (analysis/)

Inference and Validation Source (inference_and_validation_src/)

Training Source (train_src/)

Inference Scripts (inference_scripts/)

Citation and Zenodo links

humaneval-for-solidity-25

codellama-generations

AdaDecode-CodeLlama-13B-Instruct-HumanEval

humaneval-mbpp-codegen-qa

humaneval

openai_humaneval

OpenAI HumanEval

openai/openai_humaneval

Analysis (`analysis/`)

Inference and Validation Source (`inference_and_validation_src/`)

Training Source (`train_src/`)

Inference Scripts (`inference_scripts/`)