68 datasets found
  1. h

    gpqa

    • huggingface.co
    • opendatalab.com
    Updated Nov 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Rein (2023). gpqa [Dataset]. https://huggingface.co/datasets/Idavidrein/gpqa
    Explore at:
    Dataset updated
    Nov 21, 2023
    Authors
    David Rein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for GPQA

    GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.

  2. P

    GPQA Dataset

    • paperswithcode.com
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Rein; Betty Li Hou; Asa Cooper Stickland; Jackson Petty; Richard Yuanzhe Pang; Julien Dirani; Julian Michael; Samuel R. Bowman (2025). GPQA Dataset [Dataset]. https://paperswithcode.com/dataset/gpqa
    Explore at:
    Dataset updated
    Jan 30, 2025
    Authors
    David Rein; Betty Li Hou; Asa Cooper Stickland; Jackson Petty; Richard Yuanzhe Pang; Julien Dirani; Julian Michael; Samuel R. Bowman
    Description

    GPQA stands for Graduate-Level Google-Proof Q&A Benchmark. It's a challenging dataset designed to evaluate the capabilities of Large Language Models (LLMs) and scalable oversight mechanisms. Let me provide more details about it:

    Description: GPQA consists of 448 multiple-choice questions meticulously crafted by domain experts in biology, physics, and chemistry. These questions are intentionally designed to be high-quality and extremely difficult. Expert Accuracy: Even experts who hold or are pursuing PhDs in the corresponding domains achieve only 65% accuracy on these questions (or 74% when excluding clear mistakes identified in retrospect). Google-Proof: The questions are "Google-proof," meaning that even with unrestricted access to the web, highly skilled non-expert validators only reach an accuracy of 34% despite spending over 30 minutes searching for answers. AI Systems Difficulty: State-of-the-art AI systems, including our strongest GPT-4 based baseline, achieve only 39% accuracy on this challenging dataset.

    The difficulty of GPQA for both skilled non-experts and cutting-edge AI systems makes it an excellent resource for conducting realistic scalable oversight experiments. These experiments aim to explore ways for human experts to reliably obtain truthful information from AI systems that surpass human capabilities¹³.

    In summary, GPQA serves as a valuable benchmark for assessing the robustness and limitations of language models, especially when faced with complex and nuanced questions. Its difficulty level encourages research into effective oversight methods, bridging the gap between AI and human expertise.

    (1) [2311.12022] GPQA: A Graduate-Level Google-Proof Q&A Benchmark - arXiv.org. https://arxiv.org/abs/2311.12022. (2) GPQA: A Graduate-Level Google-Proof Q&A Benchmark — Klu. https://klu.ai/glossary/gpqa-eval. (3) GPA Dataset (Spring 2010 through Spring 2020) - Data Science Discovery. https://discovery.cs.illinois.edu/dataset/gpa/. (4) GPQA: A Graduate-Level Google-Proof Q&A Benchmark - GitHub. https://github.com/idavidrein/gpqa. (5) Data Sets - OpenIntro. https://www.openintro.org/data/index.php?data=satgpa. (6) undefined. https://doi.org/10.48550/arXiv.2311.12022. (7) undefined. https://arxiv.org/abs/2311.12022%29.

  3. h

    gpqa

    • huggingface.co
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    math-ai (2025). gpqa [Dataset]. https://huggingface.co/datasets/math-ai/gpqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2025
    Dataset authored and provided by
    math-ai
    Description

    math-ai/gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. h

    PPE-GPQA-Best-of-K

    • huggingface.co
    Updated Oct 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMArena (2024). PPE-GPQA-Best-of-K [Dataset]. https://huggingface.co/datasets/lmarena-ai/PPE-GPQA-Best-of-K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 25, 2024
    Dataset authored and provided by
    LMArena
    Description

    Overview

    This contains the GPQA correctness preference evaluation set for Preference Proxy Evaluations. The prompts are sampled from GPQA. This dataset is meant for benchmarking and evaluation, not for training. Paper Code

      License
    

    User prompts are licensed under CC BY 4.0, and model outputs are governed by the terms of use set by the respective model providers.

      Citation
    

    @misc{frick2024evaluaterewardmodelsrlhf, title={How to Evaluate Reward Models for… See the full description on the dataset page: https://huggingface.co/datasets/lmarena-ai/PPE-GPQA-Best-of-K.

  5. h

    gpqa

    • huggingface.co
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casimir Nuesperling (2025). gpqa [Dataset]. https://huggingface.co/datasets/casimiir/gpqa
    Explore at:
    Dataset updated
    Jun 2, 2025
    Authors
    Casimir Nuesperling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is a reformatted version of the original GPQA dataset from Idavidrein/gpqa. It includes only the main question, four shuffled answer choices, the correct answer index, subdomain, and a unique id for each entry.Please cite the GPQA paper if you use this data: GPQA: A Graduate-Level Google-Proof Q&A Benchmark.

  6. h

    gpqa

    • huggingface.co
    Updated Mar 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tianjian Li (2025). gpqa [Dataset]. https://huggingface.co/datasets/dogtooth/gpqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2025
    Authors
    Tianjian Li
    Description

    dogtooth/gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  7. h

    leaderboard-documents-gpqa

    • huggingface.co
    Updated Feb 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jerome White (2025). leaderboard-documents-gpqa [Dataset]. https://huggingface.co/datasets/jerome-white/leaderboard-documents-gpqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2025
    Authors
    Jerome White
    Description

    jerome-white/leaderboard-documents-gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. P

    GPA Dataset

    • paperswithcode.com
    Updated Jul 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhe Wang; Liyan Chen; Shaurya Rathore; Daeyun Shin; Charless Fowlkes (2022). GPA Dataset [Dataset]. https://paperswithcode.com/dataset/gpa
    Explore at:
    Dataset updated
    Jul 11, 2022
    Authors
    Zhe Wang; Liyan Chen; Shaurya Rathore; Daeyun Shin; Charless Fowlkes
    Description

    multi-view imagery of people interacting with a variety of rich 3D environments

  9. GPA & IQ

    • kaggle.com
    Updated Aug 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2023). GPA & IQ [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/gpa-and-iq
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2023
    Dataset provided by
    Kaggle
    Authors
    Joakim Arvidsson
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Data on 78 students including GPA, IQ, and gender.

    A data frame with 78 observations representing students on the following 5 variables.

    • obs: a numeric vector
    • gpa: Grade point average (GPA).
    • iq: IQ.
    • gender: Gender.
    • concept: a numeric vector
  10. h

    gpqa

    • huggingface.co
    Updated Apr 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abdul Waheed (2025). gpqa [Dataset]. https://huggingface.co/datasets/macabdul9/gpqa
    Explore at:
    Dataset updated
    Apr 10, 2025
    Authors
    Abdul Waheed
    Description

    macabdul9/gpqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. a

    Pricing by Model

    • artificialanalysis.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The citation is currently not available for this dataset.
    Explore at:
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Cost (USD) to run the evaluation by Model

  12. h

    gpqa_formatted

    • huggingface.co
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorin Eggers (2023). gpqa_formatted [Dataset]. https://huggingface.co/datasets/jeggers/gpqa_formatted
    Explore at:
    Dataset updated
    Nov 21, 2023
    Authors
    Jorin Eggers
    Description

    Dataset Card for GPQA

    Formatted version of original GPQA dataset. This removes most columns and adds single columns options and answer to contain a list of the possible answers and the index of the correct one. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy… See the full description on the dataset page: https://huggingface.co/datasets/jeggers/gpqa_formatted.

  13. T

    GPA | PCAR3 - PE Price to Earnings

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Sep 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2024). GPA | PCAR3 - PE Price to Earnings [Dataset]. https://tradingeconomics.com/pcar3:bz:pe
    Explore at:
    csv, json, xml, excelAvailable download formats
    Dataset updated
    Sep 15, 2024
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2000 - Jul 1, 2025
    Description

    GPA reported 2.64 in PE Price to Earnings for its fiscal quarter ending in September of 2024. Data for GPA | PCAR3 - PE Price to Earnings including historical, tables and charts were last updated by Trading Economics this last July in 2025.

  14. T

    GPA | PCAR3 - Stock Price | Live Quote | Historical Chart

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Sep 27, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). GPA | PCAR3 - Stock Price | Live Quote | Historical Chart [Dataset]. https://tradingeconomics.com/pcar3:bz
    Explore at:
    xml, excel, json, csvAvailable download formats
    Dataset updated
    Sep 27, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2000 - Jul 1, 2025
    Description

    GPA stock price, live market quote, shares value, historical data, intraday chart, earnings per share and news.

  15. T

    GPA | PCAR3 - Dividend Yield

    • tradingeconomics.com
    csv, excel, json, xml
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS, GPA | PCAR3 - Dividend Yield [Dataset]. https://tradingeconomics.com/pcar3:bz:dy
    Explore at:
    csv, xml, json, excelAvailable download formats
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2000 - Jul 1, 2025
    Description

    GPA reported 9.85 in Dividend Yield for its fiscal quarter ending in March of 2024. Data for GPA | PCAR3 - Dividend Yield including historical, tables and charts were last updated by Trading Economics this last July in 2025.

  16. a

    Intelligence Index by Claude Endpoint

    • artificialanalysis.ai
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by Claude Endpoint [Dataset]. https://artificialanalysis.ai/models/claude-2
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model

  17. a

    Intelligence Index by GPT-4 Endpoint

    • artificialanalysis.ai
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by GPT-4 Endpoint [Dataset]. https://artificialanalysis.ai/models/gpt-4
    Explore at:
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model

  18. t

    BIOGRID CURATED DATA FOR GPA-13 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Jun 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2024). BIOGRID CURATED DATA FOR GPA-13 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/44766/table/caenorhabditis-elegans/gpa-13.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 11, 2024
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for GPA-13 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein GPA-13

  19. Simulations of GpA-based dimers of various lengths in DEPC, DOPC, and DLPC...

    • zenodo.org
    bin
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matti Javanainen; Waldemar Kulig; Ilpo Vattulainen; Matti Javanainen; Waldemar Kulig; Ilpo Vattulainen (2020). Simulations of GpA-based dimers of various lengths in DEPC, DOPC, and DLPC bilayers, part 1/2 [Dataset]. http://doi.org/10.5281/zenodo.573257
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Matti Javanainen; Waldemar Kulig; Ilpo Vattulainen; Matti Javanainen; Waldemar Kulig; Ilpo Vattulainen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dimers of transmembrane (TM) peptides based on the Glycophorin A (GpA) dimer are simulated in different membrane environments. Three different homodimers with varying TM domain lengths and one heterodimer are considered. The homodimers are formed of either

    • 17L (GRPNLKLLLGVLLGVLLTLLLLEYP)
    • 23L (GRPNLKLLLLLLGVLLGVLLTLLLLLLLEYP)
    • 29L (GRPNLKLLLLLLLLLGVLLGVLLTLLLLLLLLLLEYP)

    peptides, while the heterodimer consists of one 17L peptide and one 29L peptide. In the sequences, the bold letters denote the amino acids involved in the GpA dimerization motif. The dimers are simulated in DLPC (12:0 PC), DOPC (18:1 PC), or DEPC (22:1 PC) bilayers. Additionally, a polyleucine dimer is simulated in a DOPC bilayer. Bilayers consist of 400 lipids and they are adequately hydrated with 24000 water molecules and 134 mM NaCl. The simulations are 100 ns long with trajectories written every 100 ps.

    The files are named as XXX-YYYY.ZZZ, where XXX denotes to the peptide type ('het' for the heterodimer and 'polyl' for the polyleucine), YYYY denotes the bilayer type, and ZZZ denotes the file type. Files are in Gromacs format: .xtc for trajectories, .edr for energy data, .cpt for continue points, .ndx for index files, .top for topology files, and .tpr for run input files (Gromacs 5.1). The simulation parameter file (md.mdp) is common for all systems. The CHARMM36 force field is used; topologies are obtained from CHARMM-GUI, and those of the peptides are included in Gromacs format (.itp).

    More information on the systems is available in the publication, available here: (TO BE INCLUDED!)

    Note that the data for the heterodimer and for the polyleucine are in part 2/2, available at https://doi.org/10.5281/zenodo.573274

  20. t

    BIOGRID CURATED DATA FOR GPA-16 (Caenorhabditis elegans)

    • thebiogrid.org
    zip
    Updated Jun 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BioGRID Project (2021). BIOGRID CURATED DATA FOR GPA-16 (Caenorhabditis elegans) [Dataset]. https://thebiogrid.org/37175/table/caenorhabditis-elegans/gpa-16.html
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 2, 2021
    Dataset authored and provided by
    BioGRID Project
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Protein-Protein, Genetic, and Chemical Interactions for GPA-16 (Caenorhabditis elegans) curated by BioGRID (https://thebiogrid.org); DEFINITION: Protein GPA-16

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
David Rein (2023). gpqa [Dataset]. https://huggingface.co/datasets/Idavidrein/gpqa

gpqa

GPQA

Idavidrein/gpqa

Explore at:
Dataset updated
Nov 21, 2023
Authors
David Rein
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for GPQA

GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.

Search
Clear search
Close search
Google apps
Main menu