8 datasets found

P
BoolQ Dataset
paperswithcode.com
Updated Dec 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova (2023). BoolQ Dataset [Dataset]. https://paperswithcode.com/dataset/boolq
Explore at:
Dataset updated
Dec 13, 2023
Authors
Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova
Description
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.

Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.
h
autoeval-staging-eval-boolq-default-049b58-14205948
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evaluation on the Hub, autoeval-staging-eval-boolq-default-049b58-14205948 [Dataset]. https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-boolq-default-049b58-14205948
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset authored and provided by
Evaluation on the Hub
Description
Dataset Card for AutoTrain Evaluator

This repository contains model predictions generated by AutoTrain for the following task and dataset:

Task: Natural Language Inference Model: andi611/distilbert-base-uncased-qa-boolq Dataset: boolq Config: default Split: validation

To run new evaluation jobs, visit Hugging Face's automatic model evaluator.

Contributions

Thanks to @lewtun for evaluating this model.
o
BoolQ: Question Answering Dataset
opendatabay.com
.undefined
Updated Jul 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). BoolQ: Question Answering Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/0aa8f4c4-227b-48ab-8294-fafde5cb3afe
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 6, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Data Science and Analytics
Description
The BoolQ dataset is a valuable resource crafted for question answering tasks. It is organised into two main splits: a validation split and a training split. The primary aim of this dataset is to facilitate research in natural language processing (NLP) and machine learning (ML), particularly in tasks involving the answering of questions based on provided text. It offers a rich collection of user-posed questions, their corresponding answers, and the passages from which these answers are derived. This enables researchers to develop and evaluate models for real-world scenarios where information needs to be retrieved or understood from textual sources.

Columns

question: This column contains the specific questions posed by users. It provides insight into the information that needs to be extracted from the given passage.

answer: This column holds the correct answers to each corresponding question in the dataset. The objective is to build models that can accurately predict these answers. The 'answer' column includes Boolean values, with true appearing 5,874 times (62%) and false appearing 3,553 times (38%).

passage: This column serves as the context or background information from which questions are formulated and answers must be located.

Distribution

The BoolQ dataset consists of two main parts: a validation split and a training split. Both splits feature consistent data fields: question, answer, and passage. The train.csv file, for example, is part of the training data. While specific row or record counts are not detailed for the entire dataset, the 'answer' column uniquely features 9,427 boolean values.

Usage

This dataset is ideally suited for: * Question Answering Systems: Training models to identify correct answers from multiple choices, given a question and a passage. * Machine Reading Comprehension: Developing models that can understand and interpret written text effectively. * Information Retrieval: Enabling models to retrieve relevant passages or documents that contain answers to a given query or question.

Coverage

The sources do not specify the geographic, time range, or demographic scope of the data.

License

CC0

Who Can Use It

The BoolQ dataset is primarily intended for researchers and developers working in artificial intelligence fields such as Natural Language Processing (NLP) and Machine Learning (ML). It is particularly useful for those building or evaluating: * Question answering algorithms * Information retrieval systems * Machine reading comprehension models

Dataset Name Suggestions

BoolQ: Question Answering Dataset

Text-Based Question Answering Corpus

NLP Question-Answer-Passage Data

Machine Reading Comprehension BoolQ

Boolean Question Answering Data

Attributes

Original Data Source: BoolQ - Question-Answer-Passage Consistency
h
boolq_N_A
huggingface.co
Updated Dec 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
jack gallifant (2024). boolq_N_A [Dataset]. https://huggingface.co/datasets/gallifantjack/boolq_N_A
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2024
Authors
jack gallifant
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
boolq

Dataset Description

This dataset contains evaluation results for boolq with label column N_A, with various model performance metrics and samples.

Dataset Summary

The dataset contains original samples from the evaluation process, along with metadata like model names, input columns, and scores. This helps with understanding model performance across different tasks and datasets.

Features

id: Unique identifier for the sample. user: User… See the full description on the dataset page: https://huggingface.co/datasets/gallifantjack/boolq_N_A.
h
NO-BoolQ
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Norwegian Generative Language Models, NO-BoolQ [Dataset]. https://huggingface.co/datasets/NorGLM/NO-BoolQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Norwegian Generative Language Models
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for NO-BoolQ

NO-BoolQ is machine translated from Google Boolq dataset. It is a question answering dataset split with train, test and validation set the same with it's original dataset. This dataset belongs to NLEBench Norwegian benchmarks for evaluation on Norwegian Natrual Language Undersanding (NLU) tasks.

Licensing Information

This dataset is built upon the existing datasets. We therefore follow its original license information.

Citation… See the full description on the dataset page: https://huggingface.co/datasets/NorGLM/NO-BoolQ.
h
AraDiCE-BoolQ
huggingface.co
Updated May 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qatar Computing Research Institute (2025). AraDiCE-BoolQ [Dataset]. https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2025
Dataset authored and provided by
Qatar Computing Research Institute
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

Overview

The AraDiCE dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. In this repository, we present the BoolQ split of the data.

Evaluation

We have used lm-harness eval framework to for the… See the full description on the dataset page: https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ.
h
Llama-3.1-405B-evals
huggingface.co
Updated Jul 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meta Llama (2024). Llama-3.1-405B-evals [Dataset]. https://huggingface.co/datasets/meta-llama/Llama-3.1-405B-evals
Explore at:
Dataset updated
Jul 23, 2024
Dataset provided by
Metahttp://meta.com/
Authors
Meta Llama
License
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Description
Dataset Card for Llama-3.1-405B Evaluation Result Details

This dataset contains the Meta evaluation result details for Llama-3.1-405B. The dataset has been created from 12 evaluation tasks. These tasks are triviaqa_wiki, mmlu_pro, commonsenseqa, winogrande, mmlu, boolq, squad, quac, drop, bbh, arc_challenge, agieval_english. Each task detail can be found as a specific subset in each configuration and each subset is named using the task name plus the timestamp of the upload time… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.1-405B-evals.
h
Llama-3.1-8B-evals
huggingface.co
Updated Jul 23, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meta Llama (2024). Llama-3.1-8B-evals [Dataset]. https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-evals
Explore at:
Dataset updated
Jul 23, 2024
Dataset provided by
Metahttp://meta.com/
Authors
Meta Llama
License
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Description
Dataset Card for Llama-3.1-8B Evaluation Result Details

This dataset contains the Meta evaluation result details for Llama-3.1-8B. The dataset has been created from 12 evaluation tasks. These tasks are triviaqa_wiki, mmlu_pro, commonsenseqa, winogrande, mmlu, boolq, squad, quac, drop, bbh, arc_challenge, agieval_english. Each task detail can be found as a specific subset in each configuration and each subset is named using the task name plus the timestamp of the upload time and… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-evals.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova (2023). BoolQ Dataset [Dataset]. https://paperswithcode.com/dataset/boolq

BoolQ Dataset

Boolean Questions

Explore at:

Dataset updated

Dec 13, 2023

Authors

Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova

Description

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.

Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.

Clear search

Close search

Google apps

Main menu

BoolQ Dataset

autoeval-staging-eval-boolq-default-049b58-14205948

BoolQ: Question Answering Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

boolq_N_A

NO-BoolQ

AraDiCE-BoolQ

Llama-3.1-405B-evals

Llama-3.1-8B-evals

BoolQ DatasetSee More Versions

Boolean Questions

BoolQ Dataset