68 datasets found
  1. h

    squad_v2

    • huggingface.co
    Updated Jun 15, 2005
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2
    Explore at:
    Dataset updated
    Jun 15, 2005
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD 2.0

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.

  2. The Stanford Question Answering Dataset

    • kaggle.com
    zip
    Updated Nov 25, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Le Viet Thang (2020). The Stanford Question Answering Dataset [Dataset]. https://www.kaggle.com/toreleon/squad-20-the-stanford-question-answering-dataset
    Explore at:
    zip(10281338 bytes)Available download formats
    Dataset updated
    Nov 25, 2020
    Authors
    Le Viet Thang
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

    Content

    There are two files to help you get started with the dataset and evaluate your models:

    • train-v2.0.json
    • dev-v2.0.json

    Acknowledgements

    The original datasets can be found here.

    Inspiration

    • Can you build a prediction model that can accurately predict answers to different types of questions?
    • You can also explore SQuAD here
  3. h

    squad

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Mar 5, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2024). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
    Explore at:
    Dataset updated
    Mar 5, 2024
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

      Supported Tasks and Leaderboards
    

    Question Answering.… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.

  4. m

    Rust QA: question answering dataset for "The Rust Programming Language" in...

    • mostwiedzy.pl
    zip
    Updated Feb 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michał Maciszka; Kamil Paluszewski; Grzegorz Pozorski; Wojciech Rosenthal; Łukasz Zaleski (2024). Rust QA: question answering dataset for "The Rust Programming Language" in SQuAD 2.0 format [Dataset]. http://doi.org/10.34808/c05c-9542
    Explore at:
    zip(9911246)Available download formats
    Dataset updated
    Feb 28, 2024
    Authors
    Michał Maciszka; Kamil Paluszewski; Grzegorz Pozorski; Wojciech Rosenthal; Łukasz Zaleski
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rust QA is a dataset for training and evaluating QA systems. The dataset consists of 1068 questions to "The Rust Programming Language" book (https://doc.rust-lang.org/stable/book/) with the answers provided as text spans from the book. The dataset is released in SQuAD 2.0 format.

  5. h

    Data from: squad-2.0

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bayes Group, squad-2.0 [Dataset]. https://huggingface.co/datasets/bayes-group-diffusion/squad-2.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Bayes Group
    Description

    bayes-group-diffusion/squad-2.0 dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. Data from: SQuAD 2.0

    • kaggle.com
    Updated Mar 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asrst (2019). SQuAD 2.0 [Dataset]. https://www.kaggle.com/asrsaiteja/squad-2/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 23, 2019
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Asrst
    Description

    Dataset

    This dataset was created by Asrst

    Contents

  7. Question Answering Dataset

    • kaggle.com
    Updated Oct 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ARES (2020). Question Answering Dataset [Dataset]. https://www.kaggle.com/ananthu017/squad-csv-format/tasks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 28, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ARES
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    SQuAD 2.0 train dataset converted from JSON to CSV data. The dataset can be used to built complex open QA systems.

    Source - https://rajpurkar.github.io/SQuAD-explorer/

  8. e

    Czech Translation of SQuAD 2.0 and 1.1 - Dataset - B2FIND

    • b2find.eudat.eu
    Updated May 4, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Czech Translation of SQuAD 2.0 and 1.1 - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/475e998a-f796-55a4-8114-9b63f477ca8c
    Explore at:
    Dataset updated
    May 4, 2023
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Czech translation of SQuAD 2.0 and SQuAD 1.1 datasets contains automatically translated texts, questions and answers from the training set and the development set of the respective datasets. The test set is missing, because it is not publicly available. The data is released under the CC BY-NC-SA 4.0 license. If you use the dataset, please cite the following paper (the exact format was not available during the submission of the dataset): Kateřina Macková and Straka Milan: Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer, presented at TSD 2020, Brno, Czech Republic, September 8-11 2020.

  9. h

    squad_es

    • huggingface.co
    • opendatalab.com
    Updated May 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casimiro Pio Carrino (2024). squad_es [Dataset]. https://huggingface.co/datasets/ccasimiro/squad_es
    Explore at:
    Dataset updated
    May 24, 2024
    Authors
    Casimiro Pio Carrino
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish

  10. h

    idk_mrc

    • huggingface.co
    Updated Oct 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SEACrowd (2023). idk_mrc [Dataset]. https://huggingface.co/datasets/SEACrowd/idk_mrc
    Explore at:
    Dataset updated
    Oct 13, 2023
    Dataset authored and provided by
    SEACrowd
    Description

    I(n)dontKnow-MRC (IDK-MRC) is an Indonesian Machine Reading Comprehension dataset that covers answerable and unanswerable questions. Based on the combination of the existing answerable questions in TyDiQA, the new unanswerable question in IDK-MRC is generated using a question generation model and human-written question. Each paragraph in the dataset has a set of answerable and unanswerable questions with the corresponding answer.

    Besides IDK-MRC (idk_mrc) dataset, several baseline datasets also provided: 1. Trans SQuAD (trans_squad): machine translated SQuAD 2.0 (Muis and Purwarianti, 2020) 2. TyDiQA (tydiqa): Indonesian answerable questions set from the TyDiQA-GoldP (Clark et al., 2020) 3. Model Gen (model_gen): TyDiQA + the unanswerable questions output from the question generation model 4. Human Filt (human_filt): Model Gen dataset that has been filtered by human annotator

  11. h

    SQuAD_v2_fi

    • huggingface.co
    Updated Sep 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ilmari Kylliäinen (2022). SQuAD_v2_fi [Dataset]. https://huggingface.co/datasets/ilmariky/SQuAD_v2_fi
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 23, 2022
    Authors
    Ilmari Kylliäinen
    License

    https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/

    Description

    Dataset Card for "squad-v2-fi"

      Dataset Summary
    

    Machine translated and normalized Finnish version of the SQuAD-v2.0 dataset. Details about the translation and normalization processes can be found here. Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the… See the full description on the dataset page: https://huggingface.co/datasets/ilmariky/SQuAD_v2_fi.

  12. BERT_with_SQUAD2

    • kaggle.com
    Updated Feb 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vijender Singh (2020). BERT_with_SQUAD2 [Dataset]. https://www.kaggle.com/vijendersingh412/bert-with-squad2/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vijender Singh
    Description

    Dataset

    This dataset was created by Vijender Singh

    Contents

  13. bert_squad2_3epochs

    • kaggle.com
    Updated Apr 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Padhy (2023). bert_squad2_3epochs [Dataset]. https://www.kaggle.com/datasets/jimhalpert26/bert-squad2-3epochs
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Rahul Padhy
    License

    https://cdla.io/sharing-1-0/https://cdla.io/sharing-1-0/

    Description

    This BERT baseline model was trained from scratch on the SQuAD 2.0 dataset for 3 epochs. Due to computational resource limitations, further regularization and optimizations weren't added.

  14. h

    Bengali-SQuAD

    • huggingface.co
    Updated Sep 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tahsin Mayeesha (2022). Bengali-SQuAD [Dataset]. https://huggingface.co/datasets/Tahsin-Mayeesha/Bengali-SQuAD
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2022
    Authors
    Tahsin Mayeesha
    Description

    Overview

    This dataset contains the data for the paper Deep learning based question answering system in Bengali. It is a translated version of SQuAD 2.0 dataset to bengali language. Preprocessing details can be found in the paper.

  15. BioBERT QA Model

    • kaggle.com
    Updated Apr 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonas Kemp (2020). BioBERT QA Model [Dataset]. https://www.kaggle.com/jonasbkemp/biobert-qa/notebooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jonas Kemp
    Description

    This model checkpoint was trained using the Huggingface Transformers library. To reproduce, use the script run_squad.py from the provided examples with the following command:

    python run_squad.py \
     --model_type bert \
     --model_name_or_path monologg/biobert_v1.1_pubmed \
     --do_train \
     --do_eval \
     --train_file $SQUAD_DIR/train-v2.0.json \
     --predict_file $SQUAD_DIR/dev-v2.0.json \
     --per_gpu_train_batch_size 8 \
     --learning_rate 3e-5 \
     --num_train_epochs 4 \
     --max_seq_length 384 \
     --doc_stride 128 \
     --output_dir /tmp/biobert_squad2_cased/ \
     --version_2_with_negative
    

    Load the model checkpoint in just a few lines of code: ``` from transformers import AutoTokenizer, AutoModelForQuestionAnswering

    model_path = '/kaggle/input/biobert-qa/biobert_squad2_cased' model = AutoModelForQuestionAnswering.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) ```

    This model was used to power a Q&A engine for the CORD-19 challenge in the following submissions: Transmission Risk factors Vaccines and therapeutics Medical care

  16. h

    squad-nl-v2.0

    • huggingface.co
    Updated Jun 15, 2005
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GroNLP (2005). squad-nl-v2.0 [Dataset]. https://huggingface.co/datasets/GroNLP/squad-nl-v2.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2005
    Dataset authored and provided by
    GroNLP
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    SQuAD-NL v2.0 [translated SQuAD / XQuAD]

    SQuAD-NL v2.0 is a translation of The Stanford Question Answering Dataset (SQuAD) v2.0. Since the original English SQuAD test data is not public, we reserve the same documents that were used for XQuAD for testing purposes. These documents are sampled from the original dev data split. The English data is automatically translated using Google Translate (February 2023) and the test data is manually post-edited. This version of SQuAD-NL also… See the full description on the dataset page: https://huggingface.co/datasets/GroNLP/squad-nl-v2.0.

  17. h

    dutch-squad-v2.0

    • huggingface.co
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Eric Anderson (2024). dutch-squad-v2.0 [Dataset]. https://huggingface.co/datasets/eanderson/dutch-squad-v2.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 18, 2024
    Authors
    Eric Anderson
    Description

    eanderson/dutch-squad-v2.0 dataset hosted on Hugging Face and contributed by the HF Datasets community

  18. SQuAD-v2.0

    • kaggle.com
    Updated Apr 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shengxiang Lin (2025). SQuAD-v2.0 [Dataset]. https://www.kaggle.com/datasets/linshengxiang/squad-v2-0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shengxiang Lin
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Shengxiang Lin

    Released under MIT

    Contents

  19. h

    squad-v2-mod

    • huggingface.co
    Updated Jan 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fidan Shala (2025). squad-v2-mod [Dataset]. https://huggingface.co/datasets/fshala/squad-v2-mod
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2025
    Authors
    Fidan Shala
    Description

    Dataset Card for squad-v2-mod

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/fshala/squad-v2-mod/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/fshala/squad-v2-mod.

  20. e

    M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset -...

    • b2find.eudat.eu
    Updated Jul 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/5d2530e3-5423-5797-aca1-5dbee9cb669f
    Explore at:
    Dataset updated
    Jul 28, 2025
    Description

    M2QA (Multi-domain Multilingual Question Answering) is an extractive question answering benchmark for evaluating joint language and domain transfer. M2QA includes 13,500 SQuAD 2.0-style question-answer instances in German, Turkish, and Chinese for the domains of product reviews, news, and creative writing.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pranav R (2005). squad_v2 [Dataset]. https://huggingface.co/datasets/rajpurkar/squad_v2

squad_v2

SQuAD2.0

rajpurkar/squad_v2

Explore at:
79 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 15, 2005
Authors
Pranav R
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Description

Dataset Card for SQuAD 2.0

  Dataset Summary

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad_v2.

Search
Clear search
Close search
Google apps
Main menu