MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for OpenAI HumanEval
Dataset Summary
The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.
Supported Tasks and Leaderboards
Languages
The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.
RefineBench/Human-Eval dataset hosted on Hugging Face and contributed by the HF Datasets community
tomreichel/proofdb-human-eval dataset hosted on Hugging Face and contributed by the HF Datasets community
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Huggingface Hub: link
The OpenAI HumanEval dataset is a handcrafted set of 164 programming problems designed to challenge code generation models. The problems include a function signature, docstring, body, and several unit tests, all handwritten to ensure they're not included in the training set of code generation models. The entry point for each problem is the prompt, making it an ideal dataset for testing natural language processing and machine learning models' ability to generate Python programs from scratch
To use this dataset, simply download the zip file and extract it. The resulting directory will contain the following files:
canonical_solution.py: The solution to the problem. (String) entry_point.py: The entry point for the problem. (String) prompt.txt: The prompt for the problem. (String) test.py: The unit tests for the problem
- The dataset could be used to develop a model that generates programs from natural language.
- The dataset could be used to develop a model that completes or debugs programs.
- The dataset could be used to develop a model that writes unit tests for programs
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: test.csv | Column name | Description | |:-----------------------|:--------------------------------------------------------------------------------------------------| | prompt | A natural language description of the programming problem. (String) | | canonical_solution | The correct Python code solution to the problem. (String) | | test | A set of unit tests that the generated code must pass in order to be considered correct. (String) | | entry_point | The starting point for the generated code. (String) |
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
HumanEval-X is a benchmark for the evaluation of the multilingual ability of code generative models. It consists of 820 high-quality human-crafted data samples (each with test cases) in Python, C++, Java, JavaScript, and Go, and can be used for various tasks.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for HumanEvalPack
Dataset Summary
HumanEvalPack is an extension of OpenAI's HumanEval to cover 6 total languages across 3 tasks. The Python split is exactly the same as OpenAI's Python HumanEval. The other splits are translated by humans (similar to HumanEval-X but with additional cleaning, see here). Refer to the OctoPack paper for more details.
Languages: Python, JavaScript, Java, Go, C++, Rust OctoPack🐙🎒:
Data CommitPack 4TB of GitHub commits… See the full description on the dataset page: https://huggingface.co/datasets/bigcode/humanevalpack.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Imtiaz Sajid
Released under MIT
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Summary
Long-context / document-level dataset for Quality Estimation of Machine Translation. It is an augmented variant of the sentence-level WMT DA Human Evaluation dataset. In addition to individual sentences, it contains augmentations of 2, 4, 8, 16, and 32 sentences, among each language pair lp and domain. The raw column represents a weighted average of scores of augmented sentences using character lengths of src and mt as weights. The code used to apply the augmentation… See the full description on the dataset page: https://huggingface.co/datasets/ymoslem/wmt-da-human-evaluation-long-context.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was gathered between 2017 and 2018 for as part of the "First-in-human evaluation of [ 11 C]PS13, a novel PET radioligand, to quantify cyclooxygenase-1 in the brain". This dataset consists of 17 subjects and their imaging data as a first release. The second release of this dataset will include blood data for each of the the 17 subjects involved in this study.
For more details about the paper, authors, or dataset see the attached dataset_description.json or the participants.tsv.
This brief provides information about what evaluation capacity is, its importance, and how human service organizations can develop their own evaluation capacity.
Metadata-only record linking to the original dataset. Open original dataset below.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Human evaluation results of the summaries from DS-SS, BART, as well as the Ground Truth summary.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generated examples from our baseline, ablation and proposed models on the test set.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Each of the folders contains 6 csv files with the file names made up of [GPT/human]_[TASK_NUMBER]. Each of the csv files will have at least two columns (task, response_list). Texts under the response_list column are the ideas that we have used for our experiments.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset was gathered between 2017 and 2018 for as part of the "First-in-human evaluation of [ 11 C]PS13, a novel PET radioligand, to quantify cyclooxygenase-1 in the brain". This dataset consists of 16 subjects and their imaging data, including blood data.
For more details about the paper, authors, or dataset see the attached dataset_description.json or the participants.tsv.
zwhe99/mt-human-evaluation-da dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The ExpliCA dataset comprises 100 causal natural language explanations (NLEs), each meticulously paired with a set of causal triples. This dataset was developed to advance research in explainable artificial intelligence, with a focus on understanding and modeling causal relationships in text.
The dataset has been structured to enable comprehensive analysis of both original, human-curated explanations and AI-generated explanations, allowing researchers to make direct comparisons between the two. This setup supports a deeper investigation into how causal reasoning is represented in AI-generated content versus human explanations.
Each of the 100 curated explanations is linked with a corresponding set of causal triples, designed to capture key components of the causal relationship:
In addition to original explanations, the dataset includes several types of generated explanations:
To evaluate the quality and reliability of explanations, both human and automated evaluations were conducted, as well as evaluation using LLMs as evaluators.
The REFLEX framework, as presented in the PhD thesis of Miruna Clinciu, was applied to evaluate both original and generated explanations.
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study was a process evaluation of three programs funded by the U.S. Department of Justice (DOJ) Office for Victims of Crime (OVC) to identify and provide services to victims of sex and labor trafficking who are U.S citizens and lawful permanent residents (LPR) under the age of 18. The three programs evaluated in this study were: The Standing Against Global Exploitation Everywhere (SAGE) Project The Salvation Army Trafficking Outreach Program and Intervention Techniques (STOP-IT) program The Streetwork Project at Safe Horizon The goals of the evaluation were to document program implementation in the three programs, identify promising practices for service delivery programs, and inform delivery of current and future efforts by the programs to serve this population. The evaluation examined young people served by the programs, their service needs and services delivered by the programs, the experiences of young people and staff with the programs, and programs' efforts to strengthen community response to trafficked youth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A snippet of dialogue from the topical chat conversation.
Dataset for Foley et al. 'Technical evaluation and standardization of the human thyroid microtissue assay'; Toxicological Sciences, Vol 199, Issue 1, pg 89-107, kfae014 May 2024. DOI https://doi.org/10.1093/toxsci/kfae014. This dataset is associated with the following publication: Foley, B., K. Breaux, J. Gamble, S. Lynn, R. Thomas, and C. Deisenroth. Technical Evaluation and Standardization of the Human Thyroid Microtissue Assay. TOXICOLOGICAL SCIENCES. Society of Toxicology, RESTON, VA, 199(1): 89-107, (2024).
https://choosealicense.com/licenses/bsd-2-clause/https://choosealicense.com/licenses/bsd-2-clause/
Dataset Card for MPII Human Pose
MPII Human Pose dataset is a state of the art benchmark for evaluation of articulated human pose estimation. The dataset includes around 25K images containing over 40K people with annotated body joints. The images were systematically collected using an established taxonomy of every day human activities. Overall the dataset covers 410 human activities and each image is provided with an activity label. Each image was extracted from a YouTube video… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/MPII_Human_Pose_Dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for OpenAI HumanEval
Dataset Summary
The HumanEval dataset released by OpenAI includes 164 programming problems with a function sig- nature, docstring, body, and several unit tests. They were handwritten to ensure not to be included in the training set of code generation models.
Supported Tasks and Leaderboards
Languages
The programming problems are written in Python and contain English natural text in comments and docstrings.… See the full description on the dataset page: https://huggingface.co/datasets/openai/openai_humaneval.