10 datasets found

h
gpqa
huggingface.co
Updated Jan 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fanqi Wan (2025). gpqa [Dataset]. https://huggingface.co/datasets/Wanfq/gpqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2025
Authors
Fanqi Wan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for GPQA

GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Wanfq/gpqa.
h
gpqa_formatted
huggingface.co
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jorin Eggers (2023). gpqa_formatted [Dataset]. https://huggingface.co/datasets/jeggers/gpqa_formatted
Explore at:
Dataset updated
Nov 21, 2023
Authors
Jorin Eggers
Description
Dataset Card for GPQA

Formatted version of original GPQA dataset. This removes most columns and adds single columns options and answer to contain a list of the possible answers and the index of the correct one. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy… See the full description on the dataset page: https://huggingface.co/datasets/jeggers/gpqa_formatted.
O
GPQA
opendatalab.com
huggingface.co
zip
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cohere (2023). GPQA [Dataset]. https://opendatalab.com/OpenDataLab/GPQA
Explore at:
zipAvailable download formats
Dataset updated
Nov 20, 2023
Dataset provided by
Cohere
New York University
Anthropic AI
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google.
h
llama-synthetic_1-math_only-gpqa-questions-r1
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Li, llama-synthetic_1-math_only-gpqa-questions-r1 [Dataset]. https://huggingface.co/datasets/lihaoxin2020/llama-synthetic_1-math_only-gpqa-questions-r1
Explore at:
Authors
Alan Li
Description
lihaoxin2020/llama-synthetic_1-math_only-gpqa-questions-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
qwen-synthetic_1-math_only-gpqa-questions-r1
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alan Li, qwen-synthetic_1-math_only-gpqa-questions-r1 [Dataset]. https://huggingface.co/datasets/lihaoxin2020/qwen-synthetic_1-math_only-gpqa-questions-r1
Explore at:
Authors
Alan Li
Description
lihaoxin2020/qwen-synthetic_1-math_only-gpqa-questions-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
ko-gpqa
huggingface.co
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
davidkim205 (2025). ko-gpqa [Dataset]. https://huggingface.co/datasets/davidkim205/ko-gpqa
Explore at:
Dataset updated
Jul 22, 2025
Authors
davidkim205
Description
ko-gpqa

ko-gpqa is a Korean-translated version of the GPQA (Graduate-Level Google‑Proof Q&A) benchmark dataset, which consists of high-difficulty science questions. Introduced in this paper, GPQA is designed to go beyond simple fact retrieval and instead test an AI system’s ability to perform deep understanding and logical reasoning. It is particularly useful for evaluating true comprehension and inference capabilities in language models. The Korean translation was performed using… See the full description on the dataset page: https://huggingface.co/datasets/davidkim205/ko-gpqa.
h
gpqa_formatted_TR
huggingface.co
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Furkan Burhan Türkay (2024). gpqa_formatted_TR [Dataset]. https://huggingface.co/datasets/FurkyT/gpqa_formatted_TR
Explore at:
Dataset updated
Jul 23, 2024
Authors
Furkan Burhan Türkay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
gpqa_formatted Dataset (TR)

Dataset Overview

This is the TR version of jeggers/gpqa_formatted dataset which is formatted version of the original GPQA dataset. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google.… See the full description on the dataset page: https://huggingface.co/datasets/FurkyT/gpqa_formatted_TR.
h
GPQA-Verified-Thinking-O1-Rated
huggingface.co
Updated Jan 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Low IQ Gen AI (2025). GPQA-Verified-Thinking-O1-Rated [Dataset]. https://huggingface.co/datasets/fhai50032/GPQA-Verified-Thinking-O1-Rated
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 19, 2025
Authors
Low IQ Gen AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Contains Solution to GPQA , Provided by Gemini-2.0-Thinking Pass@1 , Prompt Engineered to Align better to solve scintefic questions

ACCURACY: 64.65 Correct-Questions: 353

Verified Using Gemini-2.0

Rated(max 10) by Gemini-2.0

![Results](data:image/png;base64… See the full description on the dataset page: https://huggingface.co/datasets/fhai50032/GPQA-Verified-Thinking-O1-Rated.
h
GPQA_with_Llama_3.1_70B_Instruct_v1
huggingface.co
Updated Jun 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HazyResearch (2025). GPQA_with_Llama_3.1_70B_Instruct_v1 [Dataset]. https://huggingface.co/datasets/hazyresearch/GPQA_with_Llama_3.1_70B_Instruct_v1
Explore at:
Dataset updated
Jun 24, 2025
Dataset authored and provided by
HazyResearch
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
GPQA with Llama-3.1-70B-Instruct

This dataset contains 646 graduate-level science questions from the GPQA benchmark with 100 candidate responses generated by Llama-3.1-70B-Instruct for each problem. Each response has been evaluated for correctness using a mixture of GPT-4o-mini and procedural Python code to robustly parse different answer formats, and scored by multiple reward models (scalar values) and LM judges (boolean verdicts).

Dataset Structure

Split: Single… See the full description on the dataset page: https://huggingface.co/datasets/hazyresearch/GPQA_with_Llama_3.1_70B_Instruct_v1.
h
ReasonSet
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Toby Simonds, ReasonSet [Dataset]. https://huggingface.co/datasets/TamasSimonds/ReasonSet
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Toby Simonds
Description
ReasonSet Dataset

This dataset is sourced from the paper ["REL: Working Out Is All You Need"].

Dataset Description

ReasonSet is a dataset of problems and their worked solutions, specifically designed to help improve models reasoning abilities. Questions are sourced from AIME, GPQA, MATH and some hand created ones

Question: The question Working out: Indepth solution with reasoning steps provided_solution: Provided Solution by the benchmark

Citation

If you use… See the full description on the dataset page: https://huggingface.co/datasets/TamasSimonds/ReasonSet.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Fanqi Wan (2025). gpqa [Dataset]. https://huggingface.co/datasets/Wanfq/gpqa

gpqa

GPQA

Wanfq/gpqa

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 15, 2025

Authors

Fanqi Wan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for GPQA

GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Wanfq/gpqa.

Clear search

Close search

Google apps

Main menu

gpqa

gpqa_formatted

GPQA

llama-synthetic_1-math_only-gpqa-questions-r1

qwen-synthetic_1-math_only-gpqa-questions-r1

ko-gpqa

gpqa_formatted_TR

GPQA-Verified-Thinking-O1-Rated

GPQA_with_Llama_3.1_70B_Instruct_v1

ReasonSet

gpqaSee More Versions

GPQA

Wanfq/gpqa

gpqa