Facebook
Twitterfingertap/GPQA-Diamond dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for GPQA
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation modelโฆ See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.
Facebook
TwitterGPQA Diamond Dataset
This dataset contains filtered JSONL files of human annotations on question specificity, answer uniqueness, answer matching to the ground truth for different models for the GPQA Diamond dataset.
The dataset was annotated by two human graders. It contains 198 (original size) * 2 = 396 rows as each rows is repeated twice (one for each human). A human grader given the question, actual answer and model response, has to answer whether the response matches theโฆ See the full description on the dataset page: https://huggingface.co/datasets/nikhilchandak/gpqa-diamond-annotations.
Facebook
TwitterGPQA-Diamond Dataset
Generated by Llama-3.1-8B-InstructDataset: GPQA-Diamond
Usage
from datasets import load_dataset
dataset = load_dataset("dongboklee/GPQA-diamond")["train"]
Facebook
TwitterYukinoshitaYukino/GPQA-Diamond dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twittershash42/GPQA-Diamond-Verify dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterWilson-Lee/tts-embed-dataset-gpqa-diamond dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitteryulia-volkova/gpqa-diamond-base-summary-8192-mt dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitteryulia-volkova/gpqa-diamond-cue-long-8192-mt dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitteradrilmanurung/gpqa-diamond-scrambled-high dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterdrproduck/r1-qwen7b-gpqa-diamond-n128 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterlvogel123/gpqa-diamond-gpt-5-high dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterdongboklee/GPQA-diamond-SmolLM3-3B dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterdongboklee/GPQA-diamond-Llama-3.1-8B-Instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitteradrilmanurung/gpqa-diamond-scrambled-low dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterdongboklee/GPQA-diamond-gemma-2-9b-it dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterelichen-skymizer/qwen3-4b-thinking-2507-gpqa-diamond-sampling dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterlvogel123/gpqa-diamond-qwen3-235b-a22b-2507 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterdongboklee/GPQA-diamond-Qwen2.5-7B-Instruct dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterlvogel123/gpqa-diamond-glm-4.6 dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
Twitterfingertap/GPQA-Diamond dataset hosted on Hugging Face and contributed by the HF Datasets community