Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for GPQA
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.
GPQA stands for Graduate-Level Google-Proof Q&A Benchmark. It's a challenging dataset designed to evaluate the capabilities of Large Language Models (LLMs) and scalable oversight mechanisms. Let me provide more details about it:
Description: GPQA consists of 448 multiple-choice questions meticulously crafted by domain experts in biology, physics, and chemistry. These questions are intentionally designed to be high-quality and extremely difficult. Expert Accuracy: Even experts who hold or are pursuing PhDs in the corresponding domains achieve only 65% accuracy on these questions (or 74% when excluding clear mistakes identified in retrospect). Google-Proof: The questions are "Google-proof," meaning that even with unrestricted access to the web, highly skilled non-expert validators only reach an accuracy of 34% despite spending over 30 minutes searching for answers. AI Systems Difficulty: State-of-the-art AI systems, including our strongest GPT-4 based baseline, achieve only 39% accuracy on this challenging dataset.
The difficulty of GPQA for both skilled non-experts and cutting-edge AI systems makes it an excellent resource for conducting realistic scalable oversight experiments. These experiments aim to explore ways for human experts to reliably obtain truthful information from AI systems that surpass human capabilities¹³.
In summary, GPQA serves as a valuable benchmark for assessing the robustness and limitations of language models, especially when faced with complex and nuanced questions. Its difficulty level encourages research into effective oversight methods, bridging the gap between AI and human expertise.
(1) [2311.12022] GPQA: A Graduate-Level Google-Proof Q&A Benchmark - arXiv.org. https://arxiv.org/abs/2311.12022. (2) GPQA: A Graduate-Level Google-Proof Q&A Benchmark — Klu. https://klu.ai/glossary/gpqa-eval. (3) GPA Dataset (Spring 2010 through Spring 2020) - Data Science Discovery. https://discovery.cs.illinois.edu/dataset/gpa/. (4) GPQA: A Graduate-Level Google-Proof Q&A Benchmark - GitHub. https://github.com/idavidrein/gpqa. (5) Data Sets - OpenIntro. https://www.openintro.org/data/index.php?data=satgpa. (6) undefined. https://doi.org/10.48550/arXiv.2311.12022. (7) undefined. https://arxiv.org/abs/2311.12022%29.
math-ai/gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community
Overview
This contains the GPQA correctness preference evaluation set for Preference Proxy Evaluations. The prompts are sampled from GPQA. This dataset is meant for benchmarking and evaluation, not for training. Paper Code
License
User prompts are licensed under CC BY 4.0, and model outputs are governed by the terms of use set by the respective model providers.
Citation
@misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward Models for… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/PPE-GPQA-Best-of-K.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a reformatted version of the original GPQA dataset from Idavidrein/gpqa. It includes only the main question, four shuffled answer choices, the correct answer index, subdomain, and a unique id for each entry.Please cite the GPQA paper if you use this data: GPQA: A Graduate-Level Google-Proof Q&A Benchmark.
dogtooth/gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community
jerome-white/leaderboard-documents-gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community
multi-view imagery of people interacting with a variety of rich 3D environments
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Data on 78 students including GPA, IQ, and gender.
A data frame with 78 observations representing students on the following 5 variables.
macabdul9/gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for GPQA
Formatted version of original GPQA dataset. This removes most columns and adds single columns options and answer to contain a list of the possible answers and the index of the correct one. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy… See the full description on the dataset page: https://huggingface.co/datasets/jeggers/gpqa_formatted.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GPA reported 2.64 in PE Price to Earnings for its fiscal quarter ending in September of 2024. Data for GPA | PCAR3 - PE Price to Earnings including historical, tables and charts were last updated by Trading Economics this last July in 2025.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GPA stock price, live market quote, shares value, historical data, intraday chart, earnings per share and news.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
GPA reported 9.85 in Dividend Yield for its fiscal quarter ending in March of 2024. Data for GPA | PCAR3 - Dividend Yield including historical, tables and charts were last updated by Trading Economics this last July in 2025.
Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model
Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Protein-Protein, Genetic, and Chemical Interactions for GPA-13 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein GPA-13
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dimers of transmembrane (TM) peptides based on the Glycophorin A (GpA) dimer are simulated in different membrane environments. Three different homodimers with varying TM domain lengths and one heterodimer are considered. The homodimers are formed of either
peptides, while the heterodimer consists of one 17L peptide and one 29L peptide. In the sequences, the bold letters denote the amino acids involved in the GpA dimerization motif. The dimers are simulated in DLPC (12:0 PC), DOPC (18:1 PC), or DEPC (22:1 PC) bilayers. Additionally, a polyleucine dimer is simulated in a DOPC bilayer. Bilayers consist of 400 lipids and they are adequately hydrated with 24000 water molecules and 134 mM NaCl. The simulations are 100 ns long with trajectories written every 100 ps.
The files are named as XXX-YYYY.ZZZ, where XXX denotes to the peptide type ('het' for the heterodimer and 'polyl' for the polyleucine), YYYY denotes the bilayer type, and ZZZ denotes the file type. Files are in Gromacs format: .xtc for trajectories, .edr for energy data, .cpt for continue points, .ndx for index files, .top for topology files, and .tpr for run input files (Gromacs 5.1). The simulation parameter file (md.mdp) is common for all systems. The CHARMM36 force field is used; topologies are obtained from CHARMM-GUI, and those of the peptides are included in Gromacs format (.itp).
More information on the systems is available in the publication, available here: (TO BE INCLUDED!)
Note that the data for the heterodimer and for the polyleucine are in part 2/2, available at https://doi.org/10.5281/zenodo.573274
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Protein-Protein, Genetic, and Chemical Interactions for GPA-16 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein GPA-16
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for GPQA
GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.