100+ datasets found
  1. openai_humaneval

    • huggingface.co
    Updated Jan 1, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenAI (2022). openai_humaneval [Dataset]. https://huggingface.co/datasets/openai/openai_humaneval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 1, 2022
    Dataset authored and provided by
    OpenAIhttp://openai.com/
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for OpenAI HumanEval

      Dataset Summary
    

    The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.

      Supported Tasks and Leaderboards
    
    
    
    
    
      Languages
    

    The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.

  2. h

    Human-Eval

    • huggingface.co
    Updated Jan 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RefineBench (2025). Human-Eval [Dataset]. https://huggingface.co/datasets/RefineBench/Human-Eval
    Explore at:
    Dataset updated
    Jan 1, 2025
    Dataset authored and provided by
    RefineBench
    Description

    RefineBench/Human-Eval dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. h

    proofdb-human-eval

    • huggingface.co
    Updated May 4, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Reichel (2024). proofdb-human-eval [Dataset]. https://huggingface.co/datasets/tomreichel/proofdb-human-eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 4, 2024
    Authors
    Tom Reichel
    Description

    tomreichel/proofdb-human-eval dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. OpenAI HumanEval (Coding Challenges & Unit-tests)

    • kaggle.com
    Updated Nov 21, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). OpenAI HumanEval (Coding Challenges & Unit-tests) [Dataset]. https://www.kaggle.com/datasets/thedevastator/handcrafted-dataset-for-code-generation-models
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 21, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    OpenAI HumanEval (Coding Challenges & Unit-tests)

    164 programming problems with a function signature, docstring, body, unittests

    Source

    Huggingface Hub: link

    About this dataset

    The OpenAI HumanEval dataset is a handcrafted set of 164 programming problems designed to challenge code generation models. The problems include a function signature, docstring, body, and several unit tests, all handwritten to ensure they're not included in the training set of code generation models. The entry point for each problem is the prompt, making it an ideal dataset for testing natural language processing and machine learning models' ability to generate Python programs from scratch

    How to use the dataset

    To use this dataset, simply download the zip file and extract it. The resulting directory will contain the following files:

    canonical_solution.py: The solution to the problem. (String) entry_point.py: The entry point for the problem. (String) prompt.txt: The prompt for the problem. (String) test.py: The unit tests for the problem

    Research Ideas

    • The dataset could be used to develop a model that generates programs from natural language.
    • The dataset could be used to develop a model that completes or debugs programs.
    • The dataset could be used to develop a model that writes unit tests for programs

    Acknowledgements

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: test.csv | Column name | Description | |:-----------------------|:--------------------------------------------------------------------------------------------------| | prompt | A natural language description of the programming problem. (String) | | canonical_solution | The correct Python code solution to the problem. (String) | | test | A set of unit tests that the generated code must pass in order to be considered correct. (String) | | entry_point | The starting point for the generated code. (String) |

  5. h

    humaneval-x

    • huggingface.co
    Updated Oct 9, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Z.ai (2022). humaneval-x [Dataset]. https://huggingface.co/datasets/zai-org/humaneval-x
    Explore at:
    Dataset updated
    Oct 9, 2022
    Dataset provided by
    Z.ai
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    HumanEval-X is a benchmark for the evaluation of the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks.

  6. h

    humanevalpack

    • huggingface.co
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BigCode (2024). humanevalpack [Dataset]. https://huggingface.co/datasets/bigcode/humanevalpack
    Explore at:
    Dataset updated
    Apr 15, 2024
    Dataset authored and provided by
    BigCode
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for HumanEvalPack

      Dataset Summary
    

    HumanEvalPack is an extension of OpenAI's HumanEval to cover 6 total languages across 3 tasks. The Python split is exactly the same as OpenAI's Python HumanEval. The other splits are translated by humans (similar to HumanEval-X but with additional cleaning, see here). Refer to the OctoPack paper for more details.

    Languages: Python, JavaScript, Java, Go, C++, Rust OctoPack🐙🎒:

    Data CommitPack 4TB of GitHub commits… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/humanevalpack.

  7. Human_Eval_5_Split

    • kaggle.com
    Updated May 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Imtiaz Sajid
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Imtiaz Sajid

    Released under MIT

    Contents

  8. h

    wmt-da-human-evaluation-long-context

    • huggingface.co
    Updated Jan 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yasmin Moslem (2025). wmt-da-human-evaluation-long-context [Dataset]. https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Authors
    Yasmin Moslem
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary

    Long-context / document-level dataset for Quality Estimation of Machine Translation. It is an augmented variant of the sentence-level WMT DA Human Evaluation dataset. In addition to individual sentences, it contains augmentations of 2, 4, 8, 16, and 32 sentences, among each language pair lp and domain. The raw column represents a weighted average of scores of augmented sentences using character lengths of src and mt as weights. The code used to apply the augmentation… See the full description on the dataset page: https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context.

  9. Data from: First-in-human evaluation of [11C]PS13, a novel PET radioligand,...

    • openneuro.org
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min-Jeong Kim; Jae-Hoon Lee; Fernanda Juarez Anaya; Jinsoo Hong; William Miller; Sanjay Telu; Prachi Singh; Michelle Y Cortes; Katharine Henry; George L Tye; Michael P Frankland; Jose A Montero Santamaria; Jeih-San Liow; Sami S Zoghbi; Masahiro Fujita; Victor W Pike; Robert B Innis (2022). First-in-human evaluation of [11C]PS13, a novel PET radioligand, to quantify cyclooxygenase-1 in the brain [Dataset]. http://doi.org/10.18112/openneuro.ds004230.v2.1.0
    Explore at:
    Dataset updated
    Aug 3, 2022
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Min-Jeong Kim; Jae-Hoon Lee; Fernanda Juarez Anaya; Jinsoo Hong; William Miller; Sanjay Telu; Prachi Singh; Michelle Y Cortes; Katharine Henry; George L Tye; Michael P Frankland; Jose A Montero Santamaria; Jeih-San Liow; Sami S Zoghbi; Masahiro Fujita; Victor W Pike; Robert B Innis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was gathered between 2017 and 2018 for as part of the "First-in-human evaluation of [ 11 C]PS13, a novel PET radioligand, to quantify cyclooxygenase-1 in the brain". This dataset consists of 17 subjects and their imaging data as a first release. The second release of this dataset will include blood data for each of the the 17 subjects involved in this study.

    For more details about the paper, authors, or dataset see the attached dataset_description.json or the participants.tsv.

  10. Evaluation Brief: Building Evaluation Capacity in Human Service...

    • data.virginia.gov
    • catalog.data.gov
    html
    Updated Sep 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Administration for Children and Families (2025). Evaluation Brief: Building Evaluation Capacity in Human Service Organizations [Dataset]. https://data.virginia.gov/dataset/evaluation-brief-building-evaluation-capacity-in-human-service-organizations
    Explore at:
    htmlAvailable download formats
    Dataset updated
    Sep 6, 2025
    Dataset provided by
    Administration for Children and Families
    Description

    This brief provides information about what evaluation capacity is, its importance, and how human service organizations can develop their own evaluation capacity.

    Metadata-only record linking to the original dataset. Open original dataset below.

  11. f

    Human evaluation results of the summaries from DS-SS, BART, as well as the...

    • plos.figshare.com
    xls
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mingkai Zhang; Dan You; Shouguang Wang (2024). Human evaluation results of the summaries from DS-SS, BART, as well as the Ground Truth summary. [Dataset]. http://doi.org/10.1371/journal.pone.0302104.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Mingkai Zhang; Dan You; Shouguang Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Human evaluation results of the summaries from DS-SS, BART, as well as the Ground Truth summary.

  12. f

    Generated examples from our baseline, ablation and proposed models on the...

    • figshare.com
    xls
    Updated Jun 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deeksha Varshney; Asif Ekbal; Mrigank Tiwari; Ganesh Prasad Nagaraja (2023). Generated examples from our baseline, ablation and proposed models on the test set. [Dataset]. http://doi.org/10.1371/journal.pone.0280458.t008
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 19, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Deeksha Varshney; Asif Ekbal; Mrigank Tiwari; Ganesh Prasad Nagaraja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Generated examples from our baseline, ablation and proposed models on the test set.

  13. e

    LLM and Human Generated Ideas (dataset)

    • ore.exeter.ac.uk
    zip
    Updated Aug 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WY Li; J Han; S Ahmed-Kristensen (2025). LLM and Human Generated Ideas (dataset) [Dataset]. http://doi.org/10.24378/exe.5746
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 19, 2025
    Dataset provided by
    University of Exeter
    Authors
    WY Li; J Han; S Ahmed-Kristensen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Each of the folders contains 6 csv files with the file names made up of [GPT/human]_[TASK_NUMBER]. Each of the csv files will have at least two columns (task, response_list). Texts under the response_list column are the ideas that we have used for our experiments.

  14. Data from: First-in-human evaluation of [11C]PS13, a novel PET radioligand,...

    • openneuro.org
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Min-Jeong Kim; Jae-Hoon Lee; Fernanda Juarez Anaya; Jinsoo Hong; William Miller; Sanjay Telu; Prachi Singh; Michelle Y Cortes; Katharine Henry; George L Tye; Michael P Frankland; Jose A Montero Santamaria; Jeih-San Liow; Sami S Zoghbi; Masahiro Fujita; Victor W Pike; Robert B Innis (2024). First-in-human evaluation of [11C]PS13, a novel PET radioligand, to quantify cyclooxygenase-1 in the brain [Dataset]. http://doi.org/10.18112/openneuro.ds004230.v3.0.0
    Explore at:
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    OpenNeurohttps://openneuro.org/
    Authors
    Min-Jeong Kim; Jae-Hoon Lee; Fernanda Juarez Anaya; Jinsoo Hong; William Miller; Sanjay Telu; Prachi Singh; Michelle Y Cortes; Katharine Henry; George L Tye; Michael P Frankland; Jose A Montero Santamaria; Jeih-San Liow; Sami S Zoghbi; Masahiro Fujita; Victor W Pike; Robert B Innis
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This dataset was gathered between 2017 and 2018 for as part of the "First-in-human evaluation of [ 11 C]PS13, a novel PET radioligand, to quantify cyclooxygenase-1 in the brain". This dataset consists of 16 subjects and their imaging data, including blood data.

    For more details about the paper, authors, or dataset see the attached dataset_description.json or the participants.tsv.

  15. h

    mt-human-evaluation-da

    • huggingface.co
    Updated Mar 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhiwei He (2024). mt-human-evaluation-da [Dataset]. https://huggingface.co/datasets/zwhe99/mt-human-evaluation-da
    Explore at:
    Dataset updated
    Mar 28, 2024
    Authors
    Zhiwei He
    Description

    zwhe99/mt-human-evaluation-da dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. Data from: ExpliCA Dataset

    • zenodo.org
    • data.niaid.nih.gov
    Updated Nov 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Miruna-Adriana Clinciu; Miruna-Adriana Clinciu; Helen Hastie; Arash Eshghi; Arash Eshghi; Helen Hastie (2024). ExpliCA Dataset [Dataset]. http://doi.org/10.5281/zenodo.14066155
    Explore at:
    Dataset updated
    Nov 11, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Miruna-Adriana Clinciu; Miruna-Adriana Clinciu; Helen Hastie; Arash Eshghi; Arash Eshghi; Helen Hastie
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Time period covered
    Nov 4, 2024
    Description

    The ExpliCA dataset comprises 100 causal natural language explanations (NLEs), each meticulously paired with a set of causal triples. This dataset was developed to advance research in explainable artificial intelligence, with a focus on understanding and modeling causal relationships in text.

    The dataset has been structured to enable comprehensive analysis of both original, human-curated explanations and AI-generated explanations, allowing researchers to make direct comparisons between the two. This setup supports a deeper investigation into how causal reasoning is represented in AI-generated content versus human explanations.

    Original Explanations and Causal Triples

    Each of the 100 curated explanations is linked with a corresponding set of causal triples, designed to capture key components of the causal relationship:

    • T1: The subject or initiator of the causal relationship.
    • T2: The causal verb or predicate describing the cause-effect connection.
    • T3: The object or effect, representing the outcome of the causal relationship.

    Generated Explanations

    In addition to original explanations, the dataset includes several types of generated explanations:

    • Explanations Generated from Triples: Explanations generated directly from the causal triples to assess the potential of automated explanation generation.
    • Explanations Generated from Triples with Reference: Explanations generated from triples that also reference the original explanations, providing additional context and coherence.

    Human, Automated and LLMa Evaluation Using the REFLEX Framework

    To evaluate the quality and reliability of explanations, both human and automated evaluations were conducted, as well as evaluation using LLMs as evaluators.

    • Human Evaluation of Original Explanations: Human evaluators assessed the original explanations to establish baseline quality metrics.
    • Human Evaluation of Generated Explanations: Human evaluators reviewed the generated explanations (both from triples alone and with reference to original explanations) for clarity, accuracy, and consistency with causal relationships.

    The REFLEX framework, as presented in the PhD thesis of Miruna Clinciu, was applied to evaluate both original and generated explanations.

  17. Data from: Evaluation of Services to Domestic Minor Victims of Human...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Evaluation of Services to Domestic Minor Victims of Human Trafficking; 2011-2013 [Dataset]. https://catalog.data.gov/dataset/evaluation-of-services-to-domestic-minor-victims-of-human-trafficking-2011-2013-65df2
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset provided by
    National Institute of Justicehttp://nij.ojp.gov/
    Description

    These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study was a process evaluation of three programs funded by the U.S. Department of Justice (DOJ) Office for Victims of Crime (OVC) to identify and provide services to victims of sex and labor trafficking who are U.S citizens and lawful permanent residents (LPR) under the age of 18. The three programs evaluated in this study were: The Standing Against Global Exploitation Everywhere (SAGE) Project The Salvation Army Trafficking Outreach Program and Intervention Techniques (STOP-IT) program The Streetwork Project at Safe Horizon The goals of the evaluation were to document program implementation in the three programs, identify promising practices for service delivery programs, and inform delivery of current and future efforts by the programs to serve this population. The evaluation examined young people served by the programs, their service needs and services delivered by the programs, the experiences of young people and staff with the programs, and programs' efforts to strengthen community response to trafficked youth.

  18. f

    A snippet of dialogue from the topical chat conversation.

    • figshare.com
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deeksha Varshney; Asif Ekbal; Mrigank Tiwari; Ganesh Prasad Nagaraja (2023). A snippet of dialogue from the topical chat conversation. [Dataset]. http://doi.org/10.1371/journal.pone.0280458.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Deeksha Varshney; Asif Ekbal; Mrigank Tiwari; Ganesh Prasad Nagaraja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A snippet of dialogue from the topical chat conversation.

  19. Data from: Technical evaluation and standardization of the human thyroid...

    • catalog.data.gov
    Updated Dec 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2024). Technical evaluation and standardization of the human thyroid microtissue assay [Dataset]. https://catalog.data.gov/dataset/technical-evaluation-and-standardization-of-the-human-thyroid-microtissue-assay
    Explore at:
    Dataset updated
    Dec 23, 2024
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    Dataset for Foley et al. 'Technical evaluation and standardization of the human thyroid microtissue assay'; Toxicological Sciences, Vol 199, Issue 1, pg 89-107, kfae014 May 2024. DOI https://doi.org/10.1093/toxsci/kfae014. This dataset is associated with the following publication: Foley, B., K. Breaux, J. Gamble, S. Lynn, R. Thomas, and C. Deisenroth. Technical Evaluation and Standardization of the Human Thyroid Microtissue Assay. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 199(1): 89-107, (2024).

  20. h

    MPII_Human_Pose_Dataset

    • huggingface.co
    Updated May 7, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Voxel51 (2024). MPII_Human_Pose_Dataset [Dataset]. https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Voxel51
    License

    https://choosealicense.com/licenses/bsd-2-clause/https://choosealicense.com/licenses/bsd-2-clause/

    Description

    Dataset Card for MPII Human Pose

    MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
OpenAI (2022). openai_humaneval [Dataset]. https://huggingface.co/datasets/openai/openai_humaneval
Organization logo

openai_humaneval

OpenAI HumanEval

openai/openai_humaneval

Explore at:
28 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 1, 2022
Dataset authored and provided by
OpenAIhttp://openai.com/
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for OpenAI HumanEval

  Dataset Summary

The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.

  Supported Tasks and Leaderboards





  Languages

The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.

Search
Clear search
Close search
Google apps
Main menu