10 datasets found
  1. h

    gpqa

    • huggingface.co
    Updated Jan 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fanqi Wan (2025). gpqa [Dataset]. https://huggingface.co/datasets/Wanfq/gpqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2025
    Authors
    Fanqi Wan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for GPQA

    GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Wanfq/gpqa.

  2. h

    gpqa_formatted

    • huggingface.co
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorin Eggers (2023). gpqa_formatted [Dataset]. https://huggingface.co/datasets/jeggers/gpqa_formatted
    Explore at:
    Dataset updated
    Nov 21, 2023
    Authors
    Jorin Eggers
    Description

    Dataset Card for GPQA

    Formatted version of original GPQA dataset. This removes most columns and adds single columns options and answer to contain a list of the possible answers and the index of the correct one. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy… See the full description on the dataset page: https://huggingface.co/datasets/jeggers/gpqa_formatted.

  3. O

    GPQA

    • opendatalab.com
    • huggingface.co
    zip
    Updated Nov 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cohere (2023). GPQA [Dataset]. https://opendatalab.com/OpenDataLab/GPQA
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 20, 2023
    Dataset provided by
    Cohere
    New York University
    Anthropic AI
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google.

  4. h

    llama-synthetic_1-math_only-gpqa-questions-r1

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Li, llama-synthetic_1-math_only-gpqa-questions-r1 [Dataset]. https://huggingface.co/datasets/lihaoxin2020/llama-synthetic_1-math_only-gpqa-questions-r1
    Explore at:
    Authors
    Alan Li
    Description

    lihaoxin2020/llama-synthetic_1-math_only-gpqa-questions-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    qwen-synthetic_1-math_only-gpqa-questions-r1

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alan Li, qwen-synthetic_1-math_only-gpqa-questions-r1 [Dataset]. https://huggingface.co/datasets/lihaoxin2020/qwen-synthetic_1-math_only-gpqa-questions-r1
    Explore at:
    Authors
    Alan Li
    Description

    lihaoxin2020/qwen-synthetic_1-math_only-gpqa-questions-r1 dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. h

    ko-gpqa

    • huggingface.co
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    davidkim205 (2025). ko-gpqa [Dataset]. https://huggingface.co/datasets/davidkim205/ko-gpqa
    Explore at:
    Dataset updated
    Jul 22, 2025
    Authors
    davidkim205
    Description

    ko-gpqa

    ko-gpqa is a Korean-translated version of the GPQA (Graduate-Level Google‑Proof Q&A) benchmark dataset, which consists of high-difficulty science questions. Introduced in this paper, GPQA is designed to go beyond simple fact retrieval and instead test an AI system’s ability to perform deep understanding and logical reasoning. It is particularly useful for evaluating true comprehension and inference capabilities in language models. The Korean translation was performed using… See the full description on the dataset page: https://huggingface.co/datasets/davidkim205/ko-gpqa.

  7. h

    gpqa_formatted_TR

    • huggingface.co
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Furkan Burhan Türkay (2024). gpqa_formatted_TR [Dataset]. https://huggingface.co/datasets/FurkyT/gpqa_formatted_TR
    Explore at:
    Dataset updated
    Jul 23, 2024
    Authors
    Furkan Burhan Türkay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    gpqa_formatted Dataset (TR)

      Dataset Overview
    

    This is the TR version of jeggers/gpqa_formatted dataset which is formatted version of the original GPQA dataset. GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google.… See the full description on the dataset page: https://huggingface.co/datasets/FurkyT/gpqa_formatted_TR.

  8. h

    GPQA-Verified-Thinking-O1-Rated

    • huggingface.co
    Updated Jan 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Low IQ Gen AI (2025). GPQA-Verified-Thinking-O1-Rated [Dataset]. https://huggingface.co/datasets/fhai50032/GPQA-Verified-Thinking-O1-Rated
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 19, 2025
    Authors
    Low IQ Gen AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Contains Solution to GPQA , Provided by Gemini-2.0-Thinking Pass@1 , Prompt Engineered to Align better to solve scintefic questions

      ACCURACY: 64.65
    
    
    
    
    
      Correct-Questions: 353
    

    Verified Using Gemini-2.0

    Rated(max 10) by Gemini-2.0

    ![Results](data:image/png;base64… See the full description on the dataset page: https://huggingface.co/datasets/fhai50032/GPQA-Verified-Thinking-O1-Rated.

  9. h

    GPQA_with_Llama_3.1_70B_Instruct_v1

    • huggingface.co
    Updated Jun 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HazyResearch (2025). GPQA_with_Llama_3.1_70B_Instruct_v1 [Dataset]. https://huggingface.co/datasets/hazyresearch/GPQA_with_Llama_3.1_70B_Instruct_v1
    Explore at:
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    HazyResearch
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    GPQA with Llama-3.1-70B-Instruct

    This dataset contains 646 graduate-level science questions from the GPQA benchmark with 100 candidate responses generated by Llama-3.1-70B-Instruct for each problem. Each response has been evaluated for correctness using a mixture of GPT-4o-mini and procedural Python code to robustly parse different answer formats, and scored by multiple reward models (scalar values) and LM judges (boolean verdicts).

      Dataset Structure
    

    Split: Single… See the full description on the dataset page: https://huggingface.co/datasets/hazyresearch/GPQA_with_Llama_3.1_70B_Instruct_v1.

  10. h

    ReasonSet

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Toby Simonds, ReasonSet [Dataset]. https://huggingface.co/datasets/TamasSimonds/ReasonSet
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Toby Simonds
    Description

    ReasonSet Dataset

    This dataset is sourced from the paper ["REL: Working Out Is All You Need"].

      Dataset Description
    

    ReasonSet is a dataset of problems and their worked solutions, specifically designed to help improve models reasoning abilities. Questions are sourced from AIME, GPQA, MATH and some hand created ones

    Question: The question Working out: Indepth solution with reasoning steps provided_solution: Provided Solution by the benchmark

      Citation
    

    If you use… See the full description on the dataset page: https://huggingface.co/datasets/TamasSimonds/ReasonSet.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Fanqi Wan (2025). gpqa [Dataset]. https://huggingface.co/datasets/Wanfq/gpqa

gpqa

GPQA

Wanfq/gpqa

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2025
Authors
Fanqi Wan
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Dataset Card for GPQA

GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Wanfq/gpqa.

Search
Clear search
Close search
Google apps
Main menu