8 datasets found
  1. P

    BoolQ Dataset

    • paperswithcode.com
    Updated Dec 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova (2023). BoolQ Dataset [Dataset]. https://paperswithcode.com/dataset/boolq
    Explore at:
    Dataset updated
    Dec 13, 2023
    Authors
    Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova
    Description

    BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.

    Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.

  2. h

    autoeval-staging-eval-boolq-default-049b58-14205948

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evaluation on the Hub, autoeval-staging-eval-boolq-default-049b58-14205948 [Dataset]. https://huggingface.co/datasets/autoevaluate/autoeval-staging-eval-boolq-default-049b58-14205948
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset authored and provided by
    Evaluation on the Hub
    Description

    Dataset Card for AutoTrain Evaluator

    This repository contains model predictions generated by AutoTrain for the following task and dataset:

    Task: Natural Language Inference Model: andi611/distilbert-base-uncased-qa-boolq Dataset: boolq Config: default Split: validation

    To run new evaluation jobs, visit Hugging Face's automatic model evaluator.

      Contributions
    

    Thanks to @lewtun for evaluating this model.

  3. o

    BoolQ: Question Answering Dataset

    • opendatabay.com
    .undefined
    Updated Jul 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). BoolQ: Question Answering Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/0aa8f4c4-227b-48ab-8294-fafde5cb3afe
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 6, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Data Science and Analytics
    Description

    The BoolQ dataset is a valuable resource crafted for question answering tasks. It is organised into two main splits: a validation split and a training split. The primary aim of this dataset is to facilitate research in natural language processing (NLP) and machine learning (ML), particularly in tasks involving the answering of questions based on provided text. It offers a rich collection of user-posed questions, their corresponding answers, and the passages from which these answers are derived. This enables researchers to develop and evaluate models for real-world scenarios where information needs to be retrieved or understood from textual sources.

    Columns

    • question: This column contains the specific questions posed by users. It provides insight into the information that needs to be extracted from the given passage.
    • answer: This column holds the correct answers to each corresponding question in the dataset. The objective is to build models that can accurately predict these answers. The 'answer' column includes Boolean values, with true appearing 5,874 times (62%) and false appearing 3,553 times (38%).
    • passage: This column serves as the context or background information from which questions are formulated and answers must be located.

    Distribution

    The BoolQ dataset consists of two main parts: a validation split and a training split. Both splits feature consistent data fields: question, answer, and passage. The train.csv file, for example, is part of the training data. While specific row or record counts are not detailed for the entire dataset, the 'answer' column uniquely features 9,427 boolean values.

    Usage

    This dataset is ideally suited for: * Question Answering Systems: Training models to identify correct answers from multiple choices, given a question and a passage. * Machine Reading Comprehension: Developing models that can understand and interpret written text effectively. * Information Retrieval: Enabling models to retrieve relevant passages or documents that contain answers to a given query or question.

    Coverage

    The sources do not specify the geographic, time range, or demographic scope of the data.

    License

    CC0

    Who Can Use It

    The BoolQ dataset is primarily intended for researchers and developers working in artificial intelligence fields such as Natural Language Processing (NLP) and Machine Learning (ML). It is particularly useful for those building or evaluating: * Question answering algorithms * Information retrieval systems * Machine reading comprehension models

    Dataset Name Suggestions

    • BoolQ: Question Answering Dataset
    • Text-Based Question Answering Corpus
    • NLP Question-Answer-Passage Data
    • Machine Reading Comprehension BoolQ
    • Boolean Question Answering Data

    Attributes

    Original Data Source: BoolQ - Question-Answer-Passage Consistency

  4. h

    boolq_N_A

    • huggingface.co
    Updated Dec 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    jack gallifant (2024). boolq_N_A [Dataset]. https://huggingface.co/datasets/gallifantjack/boolq_N_A
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2024
    Authors
    jack gallifant
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    boolq

      Dataset Description
    

    This dataset contains evaluation results for boolq with label column N_A, with various model performance metrics and samples.

      Dataset Summary
    

    The dataset contains original samples from the evaluation process, along with metadata like model names, input columns, and scores. This helps with understanding model performance across different tasks and datasets.

      Features
    

    id: Unique identifier for the sample. user: User… See the full description on the dataset page: https://huggingface.co/datasets/gallifantjack/boolq_N_A.

  5. h

    NO-BoolQ

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Norwegian Generative Language Models, NO-BoolQ [Dataset]. https://huggingface.co/datasets/NorGLM/NO-BoolQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Norwegian Generative Language Models
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for NO-BoolQ

    NO-BoolQ is machine translated from Google Boolq dataset. It is a question answering dataset split with train, test and validation set the same with it's original dataset. This dataset belongs to NLEBench Norwegian benchmarks for evaluation on Norwegian Natrual Language Undersanding (NLU) tasks.

      Licensing Information
    

    This dataset is built upon the existing datasets. We therefore follow its original license information.

      Citation… See the full description on the dataset page: https://huggingface.co/datasets/NorGLM/NO-BoolQ.
    
  6. h

    AraDiCE-BoolQ

    • huggingface.co
    Updated May 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qatar Computing Research Institute (2025). AraDiCE-BoolQ [Dataset]. https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2025
    Dataset authored and provided by
    Qatar Computing Research Institute
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

      Overview
    

    The AraDiCE dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. In this repository, we present the BoolQ split of the data.

      Evaluation
    

    We have used lm-harness eval framework to for the… See the full description on the dataset page: https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ.

  7. h

    Llama-3.1-405B-evals

    • huggingface.co
    Updated Jul 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meta Llama (2024). Llama-3.1-405B-evals [Dataset]. https://huggingface.co/datasets/meta-llama/Llama-3.1-405B-evals
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Metahttp://meta.com/
    Authors
    Meta Llama
    License

    https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/

    Description

    Dataset Card for Llama-3.1-405B Evaluation Result Details

    This dataset contains the Meta evaluation result details for Llama-3.1-405B. The dataset has been created from 12 evaluation tasks. These tasks are triviaqa_wiki, mmlu_pro, commonsenseqa, winogrande, mmlu, boolq, squad, quac, drop, bbh, arc_challenge, agieval_english. Each task detail can be found as a specific subset in each configuration and each subset is named using the task name plus the timestamp of the upload time… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.1-405B-evals.

  8. h

    Llama-3.1-8B-evals

    • huggingface.co
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meta Llama (2024). Llama-3.1-8B-evals [Dataset]. https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-evals
    Explore at:
    Dataset updated
    Jul 23, 2024
    Dataset provided by
    Metahttp://meta.com/
    Authors
    Meta Llama
    License

    https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/

    Description

    Dataset Card for Llama-3.1-8B Evaluation Result Details

    This dataset contains the Meta evaluation result details for Llama-3.1-8B. The dataset has been created from 12 evaluation tasks. These tasks are triviaqa_wiki, mmlu_pro, commonsenseqa, winogrande, mmlu, boolq, squad, quac, drop, bbh, arc_challenge, agieval_english. Each task detail can be found as a specific subset in each configuration and each subset is named using the task name plus the timestamp of the upload time and… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-evals.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova (2023). BoolQ Dataset [Dataset]. https://paperswithcode.com/dataset/boolq

BoolQ Dataset

Boolean Questions

Explore at:
Dataset updated
Dec 13, 2023
Authors
Christopher Clark; Kenton Lee; Ming-Wei Chang; Tom Kwiatkowski; Michael Collins; Kristina Toutanova
Description

BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.

Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.

Search
Clear search
Close search
Google apps
Main menu