68 datasets found

h
squad_v2
huggingface.co
Updated Jun 15, 2005
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
Explore at:
Dataset updated
Jun 15, 2005
Authors
Pranav R
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for SQuAD 2.0

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.
The Stanford Question Answering Dataset
kaggle.com
zip
Updated Nov 25, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Le Viet Thang (2020). The Stanford Question Answering Dataset [Dataset]. https://www.kaggle.com/toreleon/squad-20-the-stanford-question-answering-dataset
Explore at:
zip(10281338 bytes)Available download formats
Dataset updated
Nov 25, 2020
Authors
Le Viet Thang
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Content

There are two files to help you get started with the dataset and evaluate your models:

train-v2.0.json

dev-v2.0.json

Acknowledgements

The original datasets can be found here.

Inspiration

Can you build a prediction model that can accurately predict answers to different types of questions?

You can also explore SQuAD here
h
squad
huggingface.co
tensorflow.org
+1more
Updated Mar 5, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pranav R (2024). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
Explore at:
Dataset updated
Mar 5, 2024
Authors
Pranav R
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for SQuAD

Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

Supported Tasks and Leaderboards

Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.
m
Rust QA: question answering dataset for "The Rust Programming Language" in...
mostwiedzy.pl
zip
Updated Feb 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michał Maciszka; Kamil Paluszewski; Grzegorz Pozorski; Wojciech Rosenthal; Łukasz Zaleski (2024). Rust QA: question answering dataset for "The Rust Programming Language" in SQuAD 2.0 format [Dataset]. http://doi.org/10.34808/c05c-9542
Explore at:
zip(9911246)Available download formats
Unique identifier
https://doi.org/10.34808/c05c-9542
Dataset updated
Feb 28, 2024
Authors
Michał Maciszka; Kamil Paluszewski; Grzegorz Pozorski; Wojciech Rosenthal; Łukasz Zaleski
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Rust QA is a dataset for training and evaluating QA systems. The dataset consists of 1068 questions to "The Rust Programming Language" book (https://doc.rust-lang.org/stable/book/) with the answers provided as text spans from the book. The dataset is released in SQuAD 2.0 format.
h
Data from: squad-2.0
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bayes Group, squad-2.0 [Dataset]. https://huggingface.co/datasets/bayes-group-diffusion/squad-2.0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Bayes Group
Description
bayes-group-diffusion/squad-2.0 dataset hosted on Hugging Face and contributed by the HF Datasets community
Data from: SQuAD 2.0
kaggle.com
Updated Mar 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Asrst (2019). SQuAD 2.0 [Dataset]. https://www.kaggle.com/asrsaiteja/squad-2/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 23, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Asrst
Description
Dataset

This dataset was created by Asrst

Contents
Question Answering Dataset
kaggle.com
Updated Oct 28, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ARES (2020). Question Answering Dataset [Dataset]. https://www.kaggle.com/ananthu017/squad-csv-format/tasks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 28, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ARES
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
SQuAD 2.0 train dataset converted from JSON to CSV data. The dataset can be used to built complex open QA systems.

Source - https://rajpurkar.github.io/SQuAD-explorer/
e
Czech Translation of SQuAD 2.0 and 1.1 - Dataset - B2FIND
b2find.eudat.eu
Updated May 4, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Czech Translation of SQuAD 2.0 and 1.1 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/475e998a-f796-55a4-8114-9b63f477ca8c
Explore at:
Dataset updated
May 4, 2023
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Czech translation of SQuAD 2.0 and SQuAD 1.1 datasets contains automatically translated texts, questions and answers from the training set and the development set of the respective datasets. The test set is missing, because it is not publicly available. The data is released under the CC BY-NC-SA 4.0 license. If you use the dataset, please cite the following paper (the exact format was not available during the submission of the dataset): Kateřina Macková and Straka Milan: Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer, presented at TSD 2020, Brno, Czech Republic, September 8-11 2020.
h
squad_es
huggingface.co
opendatalab.com
Updated May 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casimiro Pio Carrino (2024). squad_es [Dataset]. https://huggingface.co/datasets/ccasimiro/squad_es
Explore at:
Dataset updated
May 24, 2024
Authors
Casimiro Pio Carrino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish
h
idk_mrc
huggingface.co
Updated Oct 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SEACrowd (2023). idk_mrc [Dataset]. https://huggingface.co/datasets/SEACrowd/idk_mrc
Explore at:
Dataset updated
Oct 13, 2023
Dataset authored and provided by
SEACrowd
Description
I(n)dontKnow-MRC (IDK-MRC) is an Indonesian Machine Reading Comprehension dataset that covers answerable and unanswerable questions. Based on the combination of the existing answerable questions in TyDiQA, the new unanswerable question in IDK-MRC is generated using a question generation model and human-written question. Each paragraph in the dataset has a set of answerable and unanswerable questions with the corresponding answer.

Besides IDK-MRC (idk_mrc) dataset, several baseline datasets also provided: 1. Trans SQuAD (trans_squad): machine translated SQuAD 2.0 (Muis and Purwarianti, 2020) 2. TyDiQA (tydiqa): Indonesian answerable questions set from the TyDiQA-GoldP (Clark et al., 2020) 3. Model Gen (model_gen): TyDiQA + the unanswerable questions output from the question generation model 4. Human Filt (human_filt): Model Gen dataset that has been filtered by human annotator
h
SQuAD_v2_fi
huggingface.co
Updated Sep 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ilmari Kylliäinen (2022). SQuAD_v2_fi [Dataset]. https://huggingface.co/datasets/ilmariky/SQuAD_v2_fi
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 23, 2022
Authors
Ilmari Kylliäinen
License
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
Description
Dataset Card for "squad-v2-fi"

Dataset Summary

Machine translated and normalized Finnish version of the SQuAD-v2.0 dataset. Details about the translation and normalization processes can be found here. Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the… See the full description on the dataset page: https://huggingface.co/datasets/ilmariky/SQuAD_v2_fi.
BERT_with_SQUAD2
kaggle.com
Updated Feb 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vijender Singh (2020). BERT_with_SQUAD2 [Dataset]. https://www.kaggle.com/vijendersingh412/bert-with-squad2/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vijender Singh
Description
Dataset

This dataset was created by Vijender Singh

Contents
bert_squad2_3epochs
kaggle.com
Updated Apr 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Padhy (2023). bert_squad2_3epochs [Dataset]. https://www.kaggle.com/datasets/jimhalpert26/bert-squad2-3epochs
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 28, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Rahul Padhy
License
https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/
Description
This BERT baseline model was trained from scratch on the SQuAD 2.0 dataset for 3 epochs. Due to computational resource limitations, further regularization and optimizations weren't added.
h
Bengali-SQuAD
huggingface.co
Updated Sep 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tahsin Mayeesha (2022). Bengali-SQuAD [Dataset]. https://huggingface.co/datasets/Tahsin-Mayeesha/Bengali-SQuAD
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2022
Authors
Tahsin Mayeesha
Description
Overview

This dataset contains the data for the paper Deep learning based question answering system in Bengali. It is a translated version of SQuAD 2.0 dataset to bengali language. Preprocessing details can be found in the paper.
BioBERT QA Model
kaggle.com
Updated Apr 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jonas Kemp (2020). BioBERT QA Model [Dataset]. https://www.kaggle.com/jonasbkemp/biobert-qa/notebooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jonas Kemp
Description
This model checkpoint was trained using the Huggingface Transformers library. To reproduce, use the script run_squad.py from the provided examples with the following command:

python run_squad.py \ --model_type bert \ --model_name_or_path monologg/biobert_v1.1_pubmed \ --do_train \ --do_eval \ --train_file $SQUAD_DIR/train-v2.0.json \ --predict_file $SQUAD_DIR/dev-v2.0.json \ --per_gpu_train_batch_size 8 \ --learning_rate 3e-5 \ --num_train_epochs 4 \ --max_seq_length 384 \ --doc_stride 128 \ --output_dir /tmp/biobert_squad2_cased/ \ --version_2_with_negative

Load the model checkpoint in just a few lines of code: ``` from transformers import AutoTokenizer, AutoModelForQuestionAnswering

model_path = '/kaggle/input/biobert-qa/biobert_squad2_cased' model = AutoModelForQuestionAnswering.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) ```

This model was used to power a Q&A engine for the CORD-19 challenge in the following submissions: Transmission Risk factors Vaccines and therapeutics Medical care
h
squad-nl-v2.0
huggingface.co
Updated Jun 15, 2005
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GroNLP (2005). squad-nl-v2.0 [Dataset]. https://huggingface.co/datasets/GroNLP/squad-nl-v2.0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 15, 2005
Dataset authored and provided by
GroNLP
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
SQuAD-NL v2.0 [translated SQuAD / XQuAD]

SQuAD-NL v2.0 is a translation of The Stanford Question Answering Dataset (SQuAD) v2.0. Since the original English SQuAD test data is not public, we reserve the same documents that were used for XQuAD for testing purposes. These documents are sampled from the original dev data split. The English data is automatically translated using Google Translate (February 2023) and the test data is manually post-edited. This version of SQuAD-NL also… See the full description on the dataset page: https://huggingface.co/datasets/GroNLP/squad-nl-v2.0.
h
dutch-squad-v2.0
huggingface.co
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric Anderson (2024). dutch-squad-v2.0 [Dataset]. https://huggingface.co/datasets/eanderson/dutch-squad-v2.0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 18, 2024
Authors
Eric Anderson
Description
eanderson/dutch-squad-v2.0 dataset hosted on Hugging Face and contributed by the HF Datasets community
SQuAD-v2.0
kaggle.com
Updated Apr 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shengxiang Lin (2025). SQuAD-v2.0 [Dataset]. https://www.kaggle.com/datasets/linshengxiang/squad-v2-0
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shengxiang Lin
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Shengxiang Lin

Released under MIT

Contents
h
squad-v2-mod
huggingface.co
Updated Jan 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fidan Shala (2025). squad-v2-mod [Dataset]. https://huggingface.co/datasets/fshala/squad-v2-mod
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 29, 2025
Authors
Fidan Shala
Description
Dataset Card for squad-v2-mod

This dataset has been created with distilabel.

Dataset Summary

This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/fshala/squad-v2-mod/raw/main/pipeline.yaml"

or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/fshala/squad-v2-mod.
e
M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset -...
b2find.eudat.eu
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/5d2530e3-5423-5797-aca1-5dbee9cb669f
Explore at:
Dataset updated
Jul 28, 2025
Description
M2QA (Multi-domain Multilingual Question Answering) is an extractive question answering benchmark for evaluating joint language and domain transfer. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing.

Facebook

Twitter

Click to copy link

Link copied

Cite

Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2

squad_v2

SQuAD2.0

rajpurkar/squad_v2

Explore at:

79 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 15, 2005

Authors

Pranav R

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for SQuAD 2.0

  Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.

Clear search

Close search

Google apps

Main menu

squad_v2

The Stanford Question Answering Dataset

Context

Content

Acknowledgements

Inspiration

squad

Rust QA: question answering dataset for "The Rust Programming Language" in...

Data from: squad-2.0

Data from: SQuAD 2.0

Dataset

Contents

Question Answering Dataset

Czech Translation of SQuAD 2.0 and 1.1 - Dataset - B2FIND

squad_es

idk_mrc

SQuAD_v2_fi

BERT_with_SQUAD2

Dataset

Contents

bert_squad2_3epochs

Bengali-SQuAD

BioBERT QA Model

squad-nl-v2.0

dutch-squad-v2.0

SQuAD-v2.0

Dataset

Contents

squad-v2-mod

M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset -...

squad_v2

SQuAD2.0

rajpurkar/squad_v2