Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for GPQA
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Wanfq/gpqa.
Dataset Card for GPQA
Formatted version of original GPQA dataset. This removes most columns and adds single columns options and answer to contain a list of the possible answers and the index of the correct one. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy… See the full description on the dataset page: https://huggingface.co/datasets/jeggers/gpqa_formatted.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google.
lihaoxin2020/llama-synthetic_1-math_only-gpqa-questions-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community
lihaoxin2020/qwen-synthetic_1-math_only-gpqa-questions-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community
ko-gpqa
ko-gpqa is a Korean-translated version of the GPQA (Graduate-Level Google‑Proof Q&A) benchmark dataset, which consists of high-difficulty science questions. Introduced in this paper, GPQA is designed to go beyond simple fact retrieval and instead test an AI system’s ability to perform deep understanding and logical reasoning. It is particularly useful for evaluating true comprehension and inference capabilities in language models. The Korean translation was performed using… See the full description on the dataset page: https://huggingface.co/datasets/davidkim205/ko-gpqa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
gpqa_formatted Dataset (TR)
Dataset Overview
This is the TR version of jeggers/gpqa_formatted dataset which is formatted version of the original GPQA dataset. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google.… See the full description on the dataset page: https://huggingface.co/datasets/FurkyT/gpqa_formatted_TR.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Contains Solution to GPQA , Provided by Gemini-2.0-Thinking Pass@1 , Prompt Engineered to Align better to solve scintefic questions
ACCURACY: 64.65
Correct-Questions: 353
Verified Using Gemini-2.0
Rated(max 10) by Gemini-2.0
 and LM judges (boolean verdicts).
Dataset Structure
Split: Single… See the full description on the dataset page: https://huggingface.co/datasets/hazyresearch/GPQA_with_Llama_3.1_70B_Instruct_v1.
ReasonSet Dataset
This dataset is sourced from the paper ["REL: Working Out Is All You Need"].
Dataset Description
ReasonSet is a dataset of problems and their worked solutions, specifically designed to help improve models reasoning abilities. Questions are sourced from AIME, GPQA, MATH and some hand created ones
Question: The question Working out: Indepth solution with reasoning steps provided_solution: Provided Solution by the benchmark
Citation
If you use… See the full description on the dataset page: https://huggingface.co/datasets/TamasSimonds/ReasonSet.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for GPQA
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Wanfq/gpqa.