BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.
Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.
Dataset Card for AutoTrain Evaluator
This repository contains model predictions generated by AutoTrain for the following task and dataset:
Task: Natural Language Inference Model: andi611/distilbert-base-uncased-qa-boolq Dataset: boolq Config: default Split: validation
To run new evaluation jobs, visit Hugging Face's automatic model evaluator.
Contributions
Thanks to @lewtun for evaluating this model.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The BoolQ dataset is a valuable resource crafted for question answering tasks. It is organised into two main splits: a validation split and a training split. The primary aim of this dataset is to facilitate research in natural language processing (NLP) and machine learning (ML), particularly in tasks involving the answering of questions based on provided text. It offers a rich collection of user-posed questions, their corresponding answers, and the passages from which these answers are derived. This enables researchers to develop and evaluate models for real-world scenarios where information needs to be retrieved or understood from textual sources.
true
appearing 5,874 times (62%) and false
appearing 3,553 times (38%).The BoolQ dataset consists of two main parts: a validation split and a training split. Both splits feature consistent data fields: question
, answer
, and passage
. The train.csv
file, for example, is part of the training data. While specific row or record counts are not detailed for the entire dataset, the 'answer' column uniquely features 9,427 boolean values.
This dataset is ideally suited for: * Question Answering Systems: Training models to identify correct answers from multiple choices, given a question and a passage. * Machine Reading Comprehension: Developing models that can understand and interpret written text effectively. * Information Retrieval: Enabling models to retrieve relevant passages or documents that contain answers to a given query or question.
The sources do not specify the geographic, time range, or demographic scope of the data.
CC0
The BoolQ dataset is primarily intended for researchers and developers working in artificial intelligence fields such as Natural Language Processing (NLP) and Machine Learning (ML). It is particularly useful for those building or evaluating: * Question answering algorithms * Information retrieval systems * Machine reading comprehension models
Original Data Source: BoolQ - Question-Answer-Passage Consistency
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
boolq
Dataset Description
This dataset contains evaluation results for boolq with label column N_A, with various model performance metrics and samples.
Dataset Summary
The dataset contains original samples from the evaluation process, along with metadata like model names, input columns, and scores. This helps with understanding model performance across different tasks and datasets.
Features
id: Unique identifier for the sample. user: User… See the full description on the dataset page: https://huggingface.co/datasets/gallifantjack/boolq_N_A.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for NO-BoolQ
NO-BoolQ is machine translated from Google Boolq dataset. It is a question answering dataset split with train, test and validation set the same with it's original dataset. This dataset belongs to NLEBench Norwegian benchmarks for evaluation on Norwegian Natrual Language Undersanding (NLU) tasks.
Licensing Information
This dataset is built upon the existing datasets. We therefore follow its original license information.
Citation… See the full description on the dataset page: https://huggingface.co/datasets/NorGLM/NO-BoolQ.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs
Overview
The AraDiCE dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. In this repository, we present the BoolQ split of the data.
Evaluation
We have used lm-harness eval framework to for the… See the full description on the dataset page: https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ.
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Dataset Card for Llama-3.1-405B Evaluation Result Details
This dataset contains the Meta evaluation result details for Llama-3.1-405B. The dataset has been created from 12 evaluation tasks. These tasks are triviaqa_wiki, mmlu_pro, commonsenseqa, winogrande, mmlu, boolq, squad, quac, drop, bbh, arc_challenge, agieval_english. Each task detail can be found as a specific subset in each configuration and each subset is named using the task name plus the timestamp of the upload time… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.1-405B-evals.
https://choosealicense.com/licenses/llama3.1/https://choosealicense.com/licenses/llama3.1/
Dataset Card for Llama-3.1-8B Evaluation Result Details
This dataset contains the Meta evaluation result details for Llama-3.1-8B. The dataset has been created from 12 evaluation tasks. These tasks are triviaqa_wiki, mmlu_pro, commonsenseqa, winogrande, mmlu, boolq, squad, quac, drop, bbh, arc_challenge, agieval_english. Each task detail can be found as a specific subset in each configuration and each subset is named using the task name plus the timestamp of the upload time and… See the full description on the dataset page: https://huggingface.co/datasets/meta-llama/Llama-3.1-8B-evals.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
BoolQ is a question answering dataset for yes/no questions containing 15942 examples. These questions are naturally occurring – they are generated in unprompted and unconstrained settings. Each example is a triplet of (question, passage, answer), with the title of the page as optional additional context.
Questions are gathered from anonymized, aggregated queries to the Google search engine. Queries that are likely to be yes/no questions are heuristically identified and questions are only kept if a Wikipedia page is returned as one of the first five results, in which case the question and Wikipedia page are given to a human annotator for further processing. Annotators label question/article pairs in a three-step process. First, they decide if the question is good, meaning it is comprehensible, unambiguous, and requesting factual information. This judgment is made before the annotator sees the Wikipedia page. Next, for good questions, annotators find a passage within the document that contains enough information to answer the question. Annotators can mark questions as “not answerable” if the Wikipedia article does not contain the requested information. Finally, annotators mark whether the question’s answer is “yes” or “no”. Only questions that were marked as having a yes/no answer are used, and each question is paired with the selected passage instead of the entire document.