55 datasets found
  1. h

    GPQA-Diamond

    • huggingface.co
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han (2025). GPQA-Diamond [Dataset]. https://huggingface.co/datasets/fingertap/GPQA-Diamond
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Han
    Description

    fingertap/GPQA-Diamond dataset hosted on Hugging Face and contributed by the HF Datasets community

  2. h

    gpqa

    • huggingface.co
    • opendatalab.com
    Updated Nov 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David Rein (2023). gpqa [Dataset]. https://huggingface.co/datasets/Idavidrein/gpqa
    Explore at:
    Dataset updated
    Nov 21, 2023
    Authors
    David Rein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Card for GPQA

    GPQA is a multiple-choice, Q&A dataset of very hard questions written and validated by experts in biology, physics, and chemistry. When attempting questions out of their own domain (e.g., a physicist answers a chemistry question), these experts get only 34% accuracy, despite spending >30m with full access to Google. We request that you do not reveal examples from this dataset in plain text or images online, to reduce the risk of leakage into foundation model… See the full description on the dataset page: https://huggingface.co/datasets/Idavidrein/gpqa.

  3. a

    Gpqa by Model

    • artificialanalysis.ai
    Updated Jun 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Gpqa by Model [Dataset]. https://artificialanalysis.ai/evaluations/gpqa-diamond
    Explore at:
    Dataset updated
    Jun 28, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Independently conducted by Artificial Analysis by Model

  4. h

    gpqa-diamond

    • huggingface.co
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jinu Lee (2025). gpqa-diamond [Dataset]. https://huggingface.co/datasets/jinulee-v/gpqa-diamond
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Jinu Lee
    Description

    jinulee-v/gpqa-diamond dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    gpqa-diamond-annotations

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Chandak, gpqa-diamond-annotations [Dataset]. https://huggingface.co/datasets/nikhilchandak/gpqa-diamond-annotations
    Explore at:
    Authors
    Nikhil Chandak
    Description

    GPQA Diamond Dataset

    This dataset contains filtered JSONL files of human annotations on question specificity, answer uniqueness, answer matching to the ground truth for different models for the GPQA Diamond dataset.

    The dataset was annotated by two human graders. It contains 198 (original size) * 2 = 396 rows as each rows is repeated twice (one for each human). A human grader given the question, actual answer and model response, has to answer whether the response matches the… See the full description on the dataset page: https://huggingface.co/datasets/nikhilchandak/gpqa-diamond-annotations.

  6. a

    Gpqa by Model

    • artificialanalysis.ai
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Gpqa by Model [Dataset]. https://artificialanalysis.ai/models/comparisons/gpt-4-5-vs-grok-3-reasoning
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of GPQA Diamond (Scientific Reasoning) by Model

  7. h

    tts-embed-dataset-gpqa-diamond

    • huggingface.co
    Updated Aug 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ruosen Li (2025). tts-embed-dataset-gpqa-diamond [Dataset]. https://huggingface.co/datasets/Wilson-Lee/tts-embed-dataset-gpqa-diamond
    Explore at:
    Dataset updated
    Aug 31, 2025
    Authors
    Ruosen Li
    Description

    Wilson-Lee/tts-embed-dataset-gpqa-diamond dataset hosted on Hugging Face and contributed by the HF Datasets community

  8. h

    GPQA-diamond-free

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Chandak (2025). GPQA-diamond-free [Dataset]. https://huggingface.co/datasets/nikhilchandak/GPQA-diamond-free
    Explore at:
    Dataset updated
    Jun 26, 2025
    Authors
    Nikhil Chandak
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    nikhilchandak/GPQA-diamond-free dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. a

    Intelligence Index by GLM-4.5-Air Endpoint

    • artificialanalysis.ai
    Updated Jul 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by GLM-4.5-Air Endpoint [Dataset]. https://artificialanalysis.ai/models/glm-4-5-air
    Explore at:
    Dataset updated
    Jul 30, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index v2.2 incorporates 8 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, IFBench, AA-LCR by Model

  10. h

    gpqa-diamond-test2

    • huggingface.co
    Updated Jun 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Chandak (2025). gpqa-diamond-test2 [Dataset]. https://huggingface.co/datasets/nikhilchandak/gpqa-diamond-test2
    Explore at:
    Dataset updated
    Jun 12, 2025
    Authors
    Nikhil Chandak
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    nikhilchandak/gpqa-diamond-test2 dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. a

    Intelligence Index by GPT-5 Endpoint

    • artificialanalysis.ai
    Updated Aug 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by GPT-5 Endpoint [Dataset]. https://artificialanalysis.ai/models/gpt-5
    Explore at:
    Dataset updated
    Aug 11, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index v2.2 incorporates 8 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, IFBench, AA-LCR by Model

  12. a

    Intelligence Index by o3 Endpoint

    • artificialanalysis.ai
    Updated Jul 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by o3 Endpoint [Dataset]. https://artificialanalysis.ai/models/o3
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index v2.2 incorporates 8 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, IFBench, AA-LCR by Model

  13. h

    gpqa-diamond-physics

    • huggingface.co
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Khalifa (2025). gpqa-diamond-physics [Dataset]. https://huggingface.co/datasets/mkhalifa/gpqa-diamond-physics
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2025
    Authors
    Muhammad Khalifa
    Description

    mkhalifa/gpqa-diamond-physics dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. a

    Intelligence Index by Model

    • artificialanalysis.ai
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by Model [Dataset]. https://artificialanalysis.ai/models/comparisons/deepseek-r1-vs-qwen2.5-32b-instruct
    Explore at:
    Dataset updated
    May 15, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index incorporates 7 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, MATH-500 by Model

  15. h

    r1-qwen7b-gpqa-diamond-n128

    • huggingface.co
    Updated Aug 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Khiem Pham (2025). r1-qwen7b-gpqa-diamond-n128 [Dataset]. https://huggingface.co/datasets/drproduck/r1-qwen7b-gpqa-diamond-n128
    Explore at:
    Dataset updated
    Aug 31, 2025
    Authors
    Khiem Pham
    Description

    drproduck/r1-qwen7b-gpqa-diamond-n128 dataset hosted on Hugging Face and contributed by the HF Datasets community

  16. a

    Intelligence Index by Model

    • artificialanalysis.ai
    Updated Mar 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by Model [Dataset]. https://artificialanalysis.ai/models/comparisons/gemma-3-1b-vs-gpt-4-5
    Explore at:
    Dataset updated
    Mar 12, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index v2.2 incorporates 8 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, IFBench, AA-LCR by Model

  17. h

    verified-reasoning-o1-gpqa-mmlu-pro

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aria A. (2024). verified-reasoning-o1-gpqa-mmlu-pro [Dataset]. https://huggingface.co/datasets/ariaattarml/verified-reasoning-o1-gpqa-mmlu-pro
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Authors
    Aria A.
    Description

    Reasoning PRM Preference Dataset

    This dataset contains reasoning traces from multiple sources (GPQA Diamond and MMLU Pro), labeled with preference information based on correctness verification.

      Dataset Description
    
    
    
    
    
      Overview
    

    The dataset consists of reasoning problems and their solutions, where each example has been verified for correctness and labeled with a preference score. It combines data from two main sources:

    GPQA Diamond MMLU Pro

      Data Fields… See the full description on the dataset page: https://huggingface.co/datasets/ariaattarml/verified-reasoning-o1-gpqa-mmlu-pro.
    
  18. a

    Intelligence Index by Magistral Endpoint

    • artificialanalysis.ai
    Updated Jun 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis, Intelligence Index by Magistral Endpoint [Dataset]. https://artificialanalysis.ai/models/magistral-medium
    Explore at:
    Dataset updated
    Jun 18, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index v2.2 incorporates 8 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, IFBench, AA-LCR by Model

  19. h

    HLE_SFT_GPQA_Diamond

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neko, HLE_SFT_GPQA_Diamond [Dataset]. https://huggingface.co/datasets/neko-llm/HLE_SFT_GPQA_Diamond
    Explore at:
    Dataset authored and provided by
    Neko
    Description

    HLE SFT GPQA Diamond Dataset

      概要
    

    このデータセットは、GPQA (Graduate-level Google-proof Q&A) Diamond データセットを基に、Chain of Thought (CoT) 推論を追加して生成されたSupervised Fine-Tuning (SFT) 用のデータセットです。 専門的な科学分野(物理学、化学、生物学)における高度な質問に対して、段階的な推論プロセスを含む回答を提供します。

      データセット統計
    

    総問題数: 198 問 成功生成数: 61 問 成功率: 30.8%

      ファイル形式
    

    このデータセットは以下の3つの形式で提供されています:

      1. CSV形式 (gpqa_diamond_cot_dataset.csv)
    

    一般的な表形式データ Excel やスプレッドシートソフトで開けます Pandas で簡単に読み込み可能

      2. Parquet形式… See the full description on the dataset page: https://huggingface.co/datasets/neko-llm/HLE_SFT_GPQA_Diamond.
    
  20. a

    Intelligence Index by Devstral Endpoint

    • artificialanalysis.ai
    Updated Jul 10, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artificial Analysis (2025). Intelligence Index by Devstral Endpoint [Dataset]. https://artificialanalysis.ai/models/devstral-small
    Explore at:
    Dataset updated
    Jul 10, 2025
    Dataset authored and provided by
    Artificial Analysis
    Description

    Comparison of Artificial Analysis Intelligence Index v2.2 incorporates 8 evaluations: MMLU-Pro, GPQA Diamond, Humanity's Last Exam, LiveCodeBench, SciCode, AIME, IFBench, AA-LCR by Model

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Han (2025). GPQA-Diamond [Dataset]. https://huggingface.co/datasets/fingertap/GPQA-Diamond

GPQA-Diamond

fingertap/GPQA-Diamond

Explore at:
344 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
May 28, 2025
Authors
Han
Description

fingertap/GPQA-Diamond dataset hosted on Hugging Face and contributed by the HF Datasets community

Search
Clear search
Close search
Google apps
Main menu