34 datasets found
  1. P

    SQuAD Dataset

    • paperswithcode.com
    Updated May 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav Rajpurkar; Jian Zhang; Konstantin Lopyrev; Percy Liang (2021). SQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/squad
    Explore at:
    Dataset updated
    May 16, 2021
    Authors
    Pranav Rajpurkar; Jian Zhang; Konstantin Lopyrev; Percy Liang
    Description

    The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.

  2. h

    squad

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated Jun 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav R (2020). squad [Dataset]. https://huggingface.co/datasets/rajpurkar/squad
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 12, 2020
    Authors
    Pranav R
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for SQuAD

      Dataset Summary
    

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable. SQuAD 1.1 contains 100,000+ question-answer pairs on 500+ articles.

      Supported Tasks and Leaderboards
    

    Question… See the full description on the dataset page: https://huggingface.co/datasets/rajpurkar/squad.

  3. h

    Data from: squad-2.0

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bayes Group, squad-2.0 [Dataset]. https://huggingface.co/datasets/bayes-group-diffusion/squad-2.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Bayes Group
    Description

    bayes-group-diffusion/squad-2.0 dataset hosted on Hugging Face and contributed by the HF Datasets community

  4. l

    Czech Translation of SQuAD 2.0 and 1.1

    • lindat.cz
    • live.european-language-grid.eu
    • +1more
    Updated Sep 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kateřina Macková; Milan Straka (2020). Czech Translation of SQuAD 2.0 and 1.1 [Dataset]. https://lindat.cz/repository/xmlui/handle/11234/1-3249?show=full
    Explore at:
    Dataset updated
    Sep 17, 2020
    Authors
    Kateřina Macková; Milan Straka
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Czech translation of SQuAD 2.0 and SQuAD 1.1 datasets contains automatically translated texts, questions and answers from the training set and the development set of the respective datasets.

    The test set is missing, because it is not publicly available.

    The data is released under the CC BY-NC-SA 4.0 license.

    If you use the dataset, please cite the following paper (the exact format was not available during the submission of the dataset): Kateřina Macková and Straka Milan: Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer, presented at TSD 2020, Brno, Czech Republic, September 8-11 2020.

  5. O

    squad-es

    • opendatalab.com
    • huggingface.co
    zip
    Updated Dec 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Universitat Politècnica de Catalunya (2023). squad-es [Dataset]. https://opendatalab.com/OpenDataLab/squad-es
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 18, 2023
    Dataset provided by
    Universitat Politècnica de Catalunya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Automatic translation of the Stanford Question Answering Dataset (SQuAD) v2 into Spanish

  6. br-quad-2.0

    • kaggle.com
    zip
    Updated Nov 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pi Esposito (2020). br-quad-2.0 [Dataset]. https://www.kaggle.com/piesposito/br-quad-20
    Explore at:
    zip(11791749 bytes)Available download formats
    Dataset updated
    Nov 18, 2020
    Authors
    Pi Esposito
    Description

    Dataset

    This dataset was created by Pi Esposito

    Released under Data files © Original Authors

    Contents

  7. h

    squad-nl-v2.0

    • huggingface.co
    Updated Jun 15, 2005
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    squad-nl-v2.0 [Dataset]. https://huggingface.co/datasets/GroNLP/squad-nl-v2.0
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 15, 2005
    Dataset authored and provided by
    GroNLP
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    SQuAD-NL v2.0 [translated SQuAD / XQuAD]

    SQuAD-NL v2.0 is a translation of The Stanford Question Answering Dataset (SQuAD) v2.0. Since the original English SQuAD test data is not public, we reserve the same documents that were used for XQuAD for testing purposes. These documents are sampled from the original dev data split. The English data is automatically translated using Google Translate (February 2023) and the test data is manually post-edited. This version of SQuAD-NL also… See the full description on the dataset page: https://huggingface.co/datasets/GroNLP/squad-nl-v2.0.

  8. Z

    E-Commerce Question Answering Dataset

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 12, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arthur Baia (2022). E-Commerce Question Answering Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7314394
    Explore at:
    Dataset updated
    Nov 12, 2022
    Dataset provided by
    Arthur Baia
    Rodrigo Caus
    Victor Hochgreb
    Victor Ávila
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The E-Commerce Question Answering Dataset (ECQuAD) is a reading comprehension dataset for question answering in brazilian e-commerce platforms. It consists of questions annotated by crowdworkers on a set of products' descriptions. It follows the SQuAD-v2 format, so questions might be unanswerable.

    This is a development set, for public usage, powered by GoBots.

  9. NewsQA-to-SQuAD

    • kaggle.com
    zip
    Updated Mar 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vissarion Moutafis (2022). NewsQA-to-SQuAD [Dataset]. https://www.kaggle.com/vissarionmoutafis/newsqatosquad
    Explore at:
    zip(149981024 bytes)Available download formats
    Dataset updated
    Mar 2, 2022
    Authors
    Vissarion Moutafis
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Vissarion Moutafis

    Released under CC0: Public Domain

    Contents

  10. xlm-roberta-large-squad-v2

    • kaggle.com
    zip
    Updated Nov 5, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sy8200 (2021). xlm-roberta-large-squad-v2 [Dataset]. https://www.kaggle.com/shooota/xlmrobertalargesquadv2
    Explore at:
    zip(1851291692 bytes)Available download formats
    Dataset updated
    Nov 5, 2021
    Authors
    sy8200
    Description

    Dataset

    This dataset was created by sy8200

    Contents

  11. h

    squad-v2-modified

    • huggingface.co
    Updated Jan 27, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fidan Shala (2025). squad-v2-modified [Dataset]. https://huggingface.co/datasets/fshala/squad-v2-modified
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 27, 2025
    Authors
    Fidan Shala
    Description

    Dataset Card for squad-v2-modified

    This dataset has been created with distilabel.

      Dataset Summary
    

    This dataset contains a pipeline.yaml which can be used to reproduce the pipeline that generated it in distilabel using the distilabel CLI: distilabel pipeline run --config "https://huggingface.co/datasets/fshala/squad-v2-modified/raw/main/pipeline.yaml"

    or explore the configuration: distilabel pipeline info --config… See the full description on the dataset page: https://huggingface.co/datasets/fshala/squad-v2-modified.

  12. xlm-roberta-large-squad-v2-backup

    • kaggle.com
    zip
    Updated Oct 31, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    KungKaching (2021). xlm-roberta-large-squad-v2-backup [Dataset]. https://www.kaggle.com/kungkaching/xlm-roberta-large-squad-v2-backup
    Explore at:
    zip(1850980945 bytes)Available download formats
    Dataset updated
    Oct 31, 2021
    Authors
    KungKaching
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by KungKaching

    Released under CC0: Public Domain

    Contents

  13. h

    squad-v2-reference-task

    • huggingface.co
    Updated Mar 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jason Caro (2024). squad-v2-reference-task [Dataset]. https://huggingface.co/datasets/PageTurnIO/squad-v2-reference-task
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 28, 2024
    Authors
    Jason Caro
    Description

    PageTurnIO/squad-v2-reference-task dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    dev-v2-w_context_word_embs_substitute-original

    • huggingface.co
    Updated May 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dev-v2-w_context_word_embs_substitute-original [Dataset]. https://huggingface.co/datasets/PAD6/dev-v2-w_context_word_embs_substitute-original
    Explore at:
    Dataset updated
    May 14, 2024
    Authors
    PAD
    Description

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

  15. P

    Disfl-QA Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Nov 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aditya Gupta; Jiacheng Xu; Shyam Upadhyay; Diyi Yang; Manaal Faruqui (2022). Disfl-QA Dataset [Dataset]. https://paperswithcode.com/dataset/disfl-qa
    Explore at:
    Dataset updated
    Nov 15, 2022
    Authors
    Aditya Gupta; Jiacheng Xu; Shyam Upadhyay; Diyi Yang; Manaal Faruqui
    Description

    Disfl-QA is a targeted dataset for contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages. Disfl-QA builds upon the SQuAD-v2 dataset, where each question in the dev set is annotated to add a contextual disfluency using the paragraph as a source of distractors.

    The final dataset consists of ~12k (disfluent question, answer) pairs. Over 90% of the disfluencies are corrections or restarts, making it a much harder test set for disfluency correction. Disfl-QA aims to fill a major gap between speech and NLP research community. We hope the dataset can serve as a benchmark dataset for testing robustness of models against disfluent inputs.

  16. D

    Vraag-en-antwoord dataset Rijksportaal Personeel

    • open.staging.dexspace.nl
    • staging.dexes.eu
    • +1more
    json, pdf
    Updated Mar 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    P-Direkt (2025). Vraag-en-antwoord dataset Rijksportaal Personeel [Dataset]. https://open.staging.dexspace.nl/en/dataset/vraag-en-antwoord-dataset-rijksportaal-personeel
    Explore at:
    pdf, jsonAvailable download formats
    Dataset updated
    Mar 16, 2025
    Dataset authored and provided by
    P-Direkt
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    https://data.overheid.nl/dataset/vraag-en-antwoord-dataset-rijksportaal-personeelhttps://data.overheid.nl/dataset/vraag-en-antwoord-dataset-rijksportaal-personeel

    Description

    In de dataset zijn vragen, antwoorden en documenten opgeslagen. Elke vraag heeft een antwoord en het antwoord komt van een pagina van Rijksportaal Personeel (intranet Rijksoverheid) . Met deze dataset kan een vraag-en-antwoordmodel getrained worden. De computer leert zo om vragen te beantwoorden in de context van P-Direkt. In totaal zijn er 322 vragen gebruikt die ooit per e-mail zijn gesteld aan het contact center van P-Direkt. De vragen zijn zeer algemeen en vragen nooit naar persoonlijke omstandigheden. Doel van de dataset was om uit te proberen of vraag-en-antwoordmodellen eventueel in een P-Direkt omgeving gebruikt kunnen worden. De structuur van de dataset komt overeen met de Squad 2.0 dataset. ### Voorbeeld: #### Vraag: Klopt dat mijn IKB uren van 2020 vervallen als ik ze niet opneem? #### Antwoord: U kunt uw IKB-uren opsparen in uw IKB-spaarverlof. IKB-uren die u niet heeft opgenomen als verlof en niet heeft laten uitbetalen, worden eind december toegevoegd aan uw IKB-spaarverlof. Uw IKB-spaarverlof kan niet vervallen #### Bron*: U kunt uw IKB-uren opsparen in uw IKB-spaarverlof. IKB-uren die u niet heeft opgenomen als verlof en niet heeft laten uitbetalen, worden eind december toegevoegd aan uw IKB-spaarverlof. Uw IKB-spaarverlof kan niet vervallen. U kunt uw IKB-spaarverlof niet laten uitbetalen. Uitbetaling vindt alleen plaats bij uitdiensttreding of overlijden. U kunt maximaal 1800 uur sparen. Werkt u in deeltijd of meer dan gemiddeld 36 uur per week? Dan wordt het maximaal aantal te sparen uren naar verhouding berekend en naar beneden afgerond op hele uren. Uw eventuele restant vakantie-uren 2015 en bovenwettelijke vakantie-uren die u over had uit 2016 tot en met 2019 worden op 1 januari 2020 omgezet in IKB-uren en deze zijn toegevoegd aan uw IKB-spaarverlof. * Let op, bron is een momentopname van Rijksportaal Personeel van april 2021. Ga naar Rijksportaal Personeel op het intranet voor actuele informatie over personeelszaken.

  17. h

    dev-v2-c_substitute-modified

    • huggingface.co
    Updated Apr 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PAD (2024). dev-v2-c_substitute-modified [Dataset]. https://huggingface.co/datasets/PAD6/dev-v2-c_substitute-modified
    Explore at:
    Dataset updated
    Apr 21, 2024
    Authors
    PAD
    Description

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

  18. h

    dev-v2-addsent-ori

    • huggingface.co
    Updated May 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PAD (2024). dev-v2-addsent-ori [Dataset]. https://huggingface.co/datasets/PAD6/dev-v2-addsent-ori
    Explore at:
    Dataset updated
    May 1, 2024
    Authors
    PAD
    Description

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

  19. h

    dev-v2-c_swap_middle-original

    • huggingface.co
    Updated May 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    dev-v2-c_swap_middle-original [Dataset]. https://huggingface.co/datasets/PAD6/dev-v2-c_swap_middle-original
    Explore at:
    Dataset updated
    May 14, 2024
    Authors
    PAD
    Description

    Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

  20. h

    clean_squad_v2

    • huggingface.co
    Updated Jan 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    clean_squad_v2 [Dataset]. https://huggingface.co/datasets/decodingchris/clean_squad_v2
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 30, 2025
    Authors
    Chris
    Description

    Clean SQuAD v2

    This is a refined version of the SQuAD v2 dataset. It has been preprocessed to ensure higher data quality and usability for NLP tasks such as Question Answering.

      Description
    

    The Clean SQuAD v2 dataset was created by applying preprocessing steps to the original SQuAD v2 dataset, including:

    Trimming whitespace: All leading and trailing spaces have been removed from the question field. Minimum question length: Questions with fewer than 12 characters were… See the full description on the dataset page: https://huggingface.co/datasets/decodingchris/clean_squad_v2.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Pranav Rajpurkar; Jian Zhang; Konstantin Lopyrev; Percy Liang (2021). SQuAD Dataset [Dataset]. https://paperswithcode.com/dataset/squad

SQuAD Dataset

Stanford Question Answering Dataset

Explore at:
Dataset updated
May 16, 2021
Authors
Pranav Rajpurkar; Jian Zhang; Konstantin Lopyrev; Percy Liang
Description

The Stanford Question Answering Dataset (SQuAD) is a collection of question-answer pairs derived from Wikipedia articles. In SQuAD, the correct answers of questions can be any sequence of tokens in the given text. Because the questions and answers are produced by humans through crowdsourcing, it is more diverse than some other question-answering datasets. SQuAD 1.1 contains 107,785 question-answer pairs on 536 articles. SQuAD2.0 (open-domain SQuAD, SQuAD-Open), the latest version, combines the 100,000 questions in SQuAD1.1 with over 50,000 un-answerable questions written adversarially by crowdworkers in forms that are similar to the answerable ones.

Search
Clear search
Close search
Google apps
Main menu