29 datasets found

Data from: sciq
huggingface.co
paperswithcode.com
+1more
Updated Mar 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). sciq [Dataset]. https://huggingface.co/datasets/allenai/sciq
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
Dataset Card for "sciq"

Dataset Summary

The SciQ dataset contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others. The questions are in multiple-choice format with 4 answer options each. For the majority of the questions, an additional paragraph with supporting evidence for the correct answer is provided.

Supported Tasks and Leaderboards

More Information Needed

Languages

More Information Needed… See the full description on the dataset page: https://huggingface.co/datasets/allenai/sciq.
h
Data from: SciQ
huggingface.co
Updated Jun 8, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dhruv C (2023). SciQ [Dataset]. https://huggingface.co/datasets/dhruvjwc/SciQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 8, 2023
Authors
Dhruv C
Description
dhruvjwc/SciQ dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Data from: sciq
huggingface.co
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
llm-uncertainty-head (2025). sciq [Dataset]. https://huggingface.co/datasets/llm-uncertainty-head/sciq
Explore at:
Dataset updated
May 29, 2025
Dataset authored and provided by
llm-uncertainty-head
Description
llm-uncertainty-head/sciq dataset hosted on Hugging Face and contributed by the HF Datasets community
h
eval-sciq
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yonathan, eval-sciq [Dataset]. https://huggingface.co/datasets/c0ntrolZ/eval-sciq
Explore at:
Authors
Yonathan
Description
c0ntrolZ/eval-sciq dataset hosted on Hugging Face and contributed by the HF Datasets community
SCIQ for NLP
kaggle.com
Updated Mar 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ziad Ayman (2024). SCIQ for NLP [Dataset]. https://www.kaggle.com/datasets/ziadaymantesla/filtered/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 13, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ziad Ayman
Description
Dataset

This dataset was created by Ziad Ayman

Contents
h
Data from: sciq
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Youssef Boughizane (2025). sciq [Dataset]. https://huggingface.co/datasets/Youssefbou62/sciq
Explore at:
Dataset updated
Jun 1, 2025
Authors
Youssef Boughizane
Description
Youssefbou62/sciq dataset hosted on Hugging Face and contributed by the HF Datasets community
o
Scientific Knowledge Evaluation Dataset
opendatabay.com
.undefined
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Scientific Knowledge Evaluation Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/606f5704-f0a7-4949-810d-443d020dd438
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 4, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Education & Learning Analytics
Description
This dataset contains a collection of 13,679 crowdsourced science exam questions, primarily focusing on Physics, Chemistry, and Biology. The questions are presented in a multiple-choice format, each with four answer options. For the majority of the questions, an additional paragraph providing supporting evidence for the correct answer is also included. The dataset is designed to evaluate a person's knowledge of science and can be used for various research and application purposes.

Columns

question: The main text of the scientific question. (String)

distractor3: One of the incorrect answer options designed to distract the test taker. (String)

distractor1: Another incorrect answer option. (String)

distractor2: A third incorrect answer option. (String)

correct_answer: The accurate answer to the question. (String)

support: Supplementary text that provides evidence or further context for the correct answer, helping users understand the question. (String)

Distribution

The dataset is primarily available as a CSV file, specifically test.csv, which is used for evaluation. It comprises 13,679 records or individual science exam questions. The exact file size is not detailed in the provided information, but its structure is consistent with a tabular format where each row represents a question and its associated data.

Usage

This dataset is ideally suited for evaluating scientific knowledge and for research in natural language processing (NLP). It can be particularly useful for: * Developing and training models to answer scientific questions. * Creating AI-powered educational tools for science learning. * Assessing human or AI performance on science examinations. * Generating insights into common distractors and improving question design.

Coverage

The dataset offers global relevance as the scientific questions are not tied to a specific geographical region. It covers core science subjects including Physics, Chemistry, and Biology. No specific time range is indicated for the origin of the questions, suggesting they are general science concepts. There are no particular notes on data availability for specific demographic groups, as the focus is on subject matter knowledge.

License

CCO

Who Can Use It

The dataset is intended for a variety of users, including: * Researchers in AI, machine learning, and natural language processing to develop and test question-answering systems. * Educators and educational technology developers to create assessment tools or learning platforms. * Data scientists and analysts interested in text data analysis and knowledge representation. * Students undertaking projects related to scientific reasoning and AI.

Dataset Name Suggestions

Scientific Knowledge Evaluation Dataset

Science Exam Questions Collection

Multi-Choice Science Questions

SciQ Science Questions and Answers

AI Science Question-Answering Corpus

Attributes

Original Data Source: SciQ (Scientific Question Answering)
h
sciq-qa-dataset_llama_template
huggingface.co
Updated May 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeena A Thankachan (2024). sciq-qa-dataset_llama_template [Dataset]. https://huggingface.co/datasets/JeenaAT/sciq-qa-dataset_llama_template
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2024
Authors
Jeena A Thankachan
Description
JeenaAT/sciq-qa-dataset_llama_template dataset hosted on Hugging Face and contributed by the HF Datasets community
10K rewritten texts dataset/LLM Prompt Recovery
kaggle.com
zip
Updated Apr 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aisha AL Mahmoud (2024). 10K rewritten texts dataset/LLM Prompt Recovery [Dataset]. https://www.kaggle.com/datasets/aishaalmahmoud/10k-rewritten-texts-datasetllm-prompt-recovery
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Apr 8, 2024
Authors
Aisha AL Mahmoud
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
About 10000 rewritten texts using Gemma 7b-it, the original texts from column "Support" in file train.csv from dataset SciQ (Scientific Question Answering)

if you find it useful, upvote it
h
sandbagging-sciq
huggingface.co
Updated Mar 7, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rob G (2024). sandbagging-sciq [Dataset]. https://huggingface.co/datasets/themachinefan/sandbagging-sciq
Explore at:
Dataset updated
Mar 7, 2024
Authors
Rob G
Description
themachinefan/sandbagging-sciq dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sandbagging-sciq
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Bloom, sandbagging-sciq [Dataset]. https://huggingface.co/datasets/jbloom-aisi/sandbagging-sciq
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Joseph Bloom
Description
jbloom-aisi/sandbagging-sciq dataset hosted on Hugging Face and contributed by the HF Datasets community
h
gemma-2-2b-it-sciq
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Bloom, gemma-2-2b-it-sciq [Dataset]. https://huggingface.co/datasets/jbloom-aisi/gemma-2-2b-it-sciq
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Joseph Bloom
Description
jbloom-aisi/gemma-2-2b-it-sciq dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sciq-with-generated-questions
huggingface.co
Updated Feb 24, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NLP Group 6 (2013). sciq-with-generated-questions [Dataset]. https://huggingface.co/datasets/nlp-group-6/sciq-with-generated-questions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 24, 2013
Dataset authored and provided by
NLP Group 6
Description
nlp-group-6/sciq-with-generated-questions dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sciq-qa1
huggingface.co
Updated May 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
B (2024). sciq-qa1 [Dataset]. https://huggingface.co/datasets/Nandini82/sciq-qa1
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2024
Authors
B
Description
Nandini82/sciq-qa1 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sciq-text-only
huggingface.co
Updated Jun 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul (2025). sciq-text-only [Dataset]. https://huggingface.co/datasets/pmdlt/sciq-text-only
Explore at:
Dataset updated
Jun 8, 2025
Authors
Paul
Description
pmdlt/sciq-text-only dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sciq-mcqa
huggingface.co
Updated Jun 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Viola (2025). sciq-mcqa [Dataset]. https://huggingface.co/datasets/viols/sciq-mcqa
Explore at:
Dataset updated
Jun 1, 2025
Authors
Viola
Description
viols/sciq-mcqa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sciq-dpo-stem
huggingface.co
Updated Jun 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reza (2025). sciq-dpo-stem [Dataset]. https://huggingface.co/datasets/reza-rgb/sciq-dpo-stem
Explore at:
Dataset updated
Jun 1, 2025
Authors
Reza
Description
reza-rgb/sciq-dpo-stem dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sandbagging-sciq-emulate-gemma-2-2b-it
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph Bloom, sandbagging-sciq-emulate-gemma-2-2b-it [Dataset]. https://huggingface.co/datasets/jbloom-aisi/sandbagging-sciq-emulate-gemma-2-2b-it
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Joseph Bloom
Description
jbloom-aisi/sandbagging-sciq-emulate-gemma-2-2b-it dataset hosted on Hugging Face and contributed by the HF Datasets community
h
quirky_sciq_raw
huggingface.co
Updated Apr 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Erik Jenner (2024). quirky_sciq_raw [Dataset]. https://huggingface.co/datasets/ejenner/quirky_sciq_raw
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2024
Authors
Erik Jenner
Description
ejenner/quirky_sciq_raw dataset hosted on Hugging Face and contributed by the HF Datasets community
h
sciq_italian
huggingface.co
Updated Dec 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sapienza NLP, Sapienza University of Rome (2024). sciq_italian [Dataset]. https://huggingface.co/datasets/sapienzanlp/sciq_italian
Explore at:
Dataset updated
Dec 4, 2024
Dataset authored and provided by
Sapienza NLP, Sapienza University of Rome
License
Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically
Description
SciQ - Italian (IT)

This dataset is an Italian translation of SciQ. SciQ is a dataset for scientific questions, which were semi-automatically generated from an existing set of questions. The dataset is designed to test the ability of models to answer questions that require scientific knowledge.

Dataset Details

The dataset consists of science-related questions, where each question is associated with a correct answer and three possible distractors. The task is to predict… See the full description on the dataset page: https://huggingface.co/datasets/sapienzanlp/sciq_italian.

Facebook

Twitter

Click to copy link

Link copied

Cite

Ai2 (2024). sciq [Dataset]. https://huggingface.co/datasets/allenai/sciq

Data from: sciq

SciQ

allenai/sciq

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Mar 1, 2024

Dataset provided by

Allen Institute for AIhttp://allenai.org/

Authors

Ai2

License

Attribution-NonCommercial 3.0 (CC BY-NC 3.0)https://creativecommons.org/licenses/by-nc/3.0/
License information was derived automatically

Description

Dataset Card for "sciq"

  Dataset Summary

The SciQ dataset contains 13,679 crowdsourced science exam questions about Physics, Chemistry and Biology, among others. The questions are in multiple-choice format with 4 answer options each. For the majority of the questions, an additional paragraph with supporting evidence for the correct answer is provided.

  Supported Tasks and Leaderboards

More Information Needed

  Languages

More Information Needed… See the full description on the dataset page: https://huggingface.co/datasets/allenai/sciq.

Clear search

Close search

Google apps

Main menu

Data from: sciq

Data from: SciQ

Data from: sciq

eval-sciq

SCIQ for NLP

Dataset

Contents

Data from: sciq

Scientific Knowledge Evaluation Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

sciq-qa-dataset_llama_template

10K rewritten texts dataset/LLM Prompt Recovery

sandbagging-sciq

sandbagging-sciq

gemma-2-2b-it-sciq

sciq-with-generated-questions

sciq-qa1

sciq-text-only

sciq-mcqa

sciq-dpo-stem

sandbagging-sciq-emulate-gemma-2-2b-it

quirky_sciq_raw

sciq_italian

Data from: sciqSee More Versions

SciQ

allenai/sciq

Data from: sciq