19 datasets found
  1. narrativeqa

    • huggingface.co
    Updated Jun 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepmind (2024). narrativeqa [Dataset]. https://huggingface.co/datasets/deepmind/narrativeqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 3, 2024
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Authors
    Deepmind
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for Narrative QA

      Dataset Summary
    

    NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents.

      Supported Tasks and Leaderboards
    

    The dataset is used to test reading comprehension. There are 2 tasks proposed in the paper: "summaries only" and "stories only", depending on whether the human-generated summary or the full story text is used to answer the question.… See the full description on the dataset page: https://huggingface.co/datasets/deepmind/narrativeqa.

  2. Data from: The NarrativeQA Reading Comprehension Challenge Dataset

    • github.com
    Updated Dec 21, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DeepMind (2017). The NarrativeQA Reading Comprehension Challenge Dataset [Dataset]. https://github.com/google-deepmind/narrativeqa
    Explore at:
    Dataset updated
    Dec 21, 2017
    Dataset provided by
    DeepMindhttp://deepmind.com/
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.

  3. narrativeqa_manual

    • huggingface.co
    Updated May 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepmind (2024). narrativeqa_manual [Dataset]. https://huggingface.co/datasets/deepmind/narrativeqa_manual
    Explore at:
    Dataset updated
    May 29, 2024
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Authors
    Deepmind
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Narrative QA Manual dataset is a reading comprehension dataset, in which the reader must answer questions about stories by reading entire books or movie scripts. The QA tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.\THIS DATASET REQUIRES A MANUALLY DOWNLOADED FILE! Because of a script in the original repository which downloads the stories from original URLs everytime, The links are sometimes broken or invalid. Therefore, you need to manually download the stories for this dataset using the script provided by the authors (https://github.com/deepmind/narrativeqa/blob/master/download_stories.sh). Running the shell script creates a folder named "tmp" in the root directory and downloads the stories there. This folder containing the storiescan be used to load the dataset via datasets.load_dataset("narrativeqa_manual", data_dir="<path/to/folder>").

  4. h

    narrativeqa

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sapienza NLP, Sapienza University of Rome, narrativeqa [Dataset]. https://huggingface.co/datasets/sapienzanlp/narrativeqa
    Explore at:
    Dataset authored and provided by
    Sapienza NLP, Sapienza University of Rome
    Description

    sapienzanlp/narrativeqa dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. h

    narrativeqa

    • huggingface.co
    Updated Apr 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    teste1 (2024). narrativeqa [Dataset]. https://huggingface.co/datasets/testzin/narrativeqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2024
    Dataset authored and provided by
    teste1
    Description

    Dataset Card for Dataset Name

    This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

      Dataset Sources [optional]
    

    Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/testzin/narrativeqa.

  6. E

    NarrativeQA

    • live.european-language-grid.eu
    csv
    Updated Dec 30, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). NarrativeQA [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5044
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 30, 2017
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset contains the list of documents with Wikipedia summaries, links to full stories, and questions and answers.

  7. h

    narrative-qa

    • huggingface.co
    Updated Jun 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Contextualized Document Embedding Benchmark (2025). narrative-qa [Dataset]. https://huggingface.co/datasets/illuin-conteb/narrative-qa
    Explore at:
    Dataset updated
    Jun 2, 2025
    Dataset authored and provided by
    Contextualized Document Embedding Benchmark
    Description

    ConTEB - NarrativeQA

    This dataset is part of ConTEB (Context-aware Text Embedding Benchmark), designed for evaluating contextual embedding model capabilities. It stems from the widely used NarrativeQA dataset.

      Dataset Summary
    

    NarrativeQA (literature), consists of long documents, associated to existing sets of question-answer pairs. To build the corpus, we start from the pre-existing collection documents, extract the text, and chunk them (using LangChain's… See the full description on the dataset page: https://huggingface.co/datasets/illuin-conteb/narrative-qa.

  8. h

    narrativeqa-test-raft

    • huggingface.co
    Updated Aug 19, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    phat (2024). narrativeqa-test-raft [Dataset]. https://huggingface.co/datasets/phatvo/narrativeqa-test-raft
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 19, 2024
    Authors
    phat
    Description

    phatvo/narrativeqa-test-raft dataset hosted on Hugging Face and contributed by the HF Datasets community

  9. m

    dataset

    • data.mendeley.com
    Updated Oct 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vignesh A (2023). dataset [Dataset]. http://doi.org/10.17632/cpp3bx8ghd.1
    Explore at:
    Dataset updated
    Oct 4, 2023
    Authors
    Vignesh A
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains SQUAD and NarrativeQA dataset files

  10. h

    copyrightBooks

    • huggingface.co
    Updated Nov 25, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kangqi Wang (2024). copyrightBooks [Dataset]. https://huggingface.co/datasets/kqwang/copyrightBooks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 25, 2024
    Authors
    Kangqi Wang
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ForgetRetainBooks

    This dataset is derived from the NarrativeQA dataset, created by Kocisky et al. (2018). NarrativeQA is a dataset for evaluating reading comprehension and narrative understanding. This dataset is an extraction of the book content from the original NarrativeQA dataset.

      Citation
    

    If you want to use this dataset, please also cite the original NarrativeQA dataset. @article{narrativeqa, author = {Tom\'a\v s Ko\v cisk\'y and Jonathan Schwarz and Phil Blunsom and… See the full description on the dataset page: https://huggingface.co/datasets/kqwang/copyrightBooks.

  11. h

    copyrightQA

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wilson Wei, copyrightQA [Dataset]. https://huggingface.co/datasets/WARSO46/copyrightQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Wilson Wei
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    CopyrightQA

    This dataset is derived from the NarrativeQA dataset, created by Kocisky et al. (2018). NarrativeQA is a dataset for evaluating reading comprehension and narrative understanding. This dataset is an extraction of the question answer pairs from the original NarrativeQA dataset. It's original use is to evaluate LLMs forgetting ability using TOFU, created by Maini et al. (2024). TOFU is a benchmark for evaluating unlearning performance of LLMs on realistic tasks.… See the full description on the dataset page: https://huggingface.co/datasets/WARSO46/copyrightQA.

  12. q

    Question-Generation

    • hf.qhduan.com
    Updated Jul 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Box (2024). Question-Generation [Dataset]. https://hf.qhduan.com/datasets/RUCAIBox/Question-Generation
    Explore at:
    Dataset updated
    Jul 13, 2024
    Dataset authored and provided by
    AI Box
    Description

    This is the question generation datasets collected by TextBox, including:

    SQuAD (squadqg) CoQA (coqaqg) NewsQA (newsqa) HotpotQA (hotpotqa) MS MARCO (marco) MSQG (msqg) NarrativeQA (nqa) QuAC (quac).

    The detail and leaderboard of each dataset can be found in TextBox page.

  13. h

    narrativeqa-raft-50-p0.9

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    phat, narrativeqa-raft-50-p0.9 [Dataset]. https://huggingface.co/datasets/phatvo/narrativeqa-raft-50-p0.9
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    phat
    Description

    phatvo/narrativeqa-raft-50-p0.9 dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    LEMBNarrativeQARetrieval

    • huggingface.co
    Updated May 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2025). LEMBNarrativeQARetrieval [Dataset]. https://huggingface.co/datasets/mteb/LEMBNarrativeQARetrieval
    Explore at:
    Dataset updated
    May 10, 2025
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    LEMBNarrativeQARetrieval An MTEB dataset Massive Text Embedding Benchmark

    narrativeqa subset of dwzhu/LongEmbed dataset.

    Task category t2t

    Domains Fiction, Non-fiction, Written

    Reference https://huggingface.co/datasets/dwzhu/LongEmbed

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import mteb

    task = mteb.get_tasks(["LEMBNarrativeQARetrieval"]) evaluator = mteb.MTEB(task)

    model =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/LEMBNarrativeQARetrieval.

  15. h

    NarrativeQARetrieval

    • huggingface.co
    Updated Feb 1, 2001
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Massive Text Embedding Benchmark (2001). NarrativeQARetrieval [Dataset]. https://huggingface.co/datasets/mteb/NarrativeQARetrieval
    Explore at:
    Dataset updated
    Feb 1, 2001
    Dataset authored and provided by
    Massive Text Embedding Benchmark
    Description

    NarrativeQARetrieval An MTEB dataset Massive Text Embedding Benchmark

    NarrativeQA is a dataset for the task of question answering on long narratives. It consists of realistic QA instances collected from literature (fiction and non-fiction) and movie scripts.

    Task categoryt2t

    Domains None

    Reference https://metatext.io/datasets/narrativeqa

      How to evaluate on this task
    

    You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/NarrativeQARetrieval.

  16. h

    doc-qa-rl-datasets

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shreya Shankar, doc-qa-rl-datasets [Dataset]. https://huggingface.co/datasets/shreyashankar/doc-qa-rl-datasets
    Explore at:
    Authors
    Shreya Shankar
    Description

    Document Question-Answering Dataset

    This dataset combines and transforms the QASPER and NarrativeQA datasets into a unified format for document-based question answering tasks.

      Dataset Description
    

    This dataset is designed for training and evaluating models on document-level question answering with source attribution. Each entry contains:

    A question about a document A corresponding answer Source text passages from the document that support the answer Position information… See the full description on the dataset page: https://huggingface.co/datasets/shreyashankar/doc-qa-rl-datasets.

  17. ChatQA2-Long-SFT-data

    • huggingface.co
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2025). ChatQA2-Long-SFT-data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
    License information was derived automatically

    Description

    Data Description

    Here, we release the full long SFT training dataset of ChatQA2. It consists of two parts: long_sft and NarrativeQA_131072. The long_sft dataset is built and derived from existing datasets: LongAlpaca12k, GPT-4 samples from Open Orca, and Long Data Collections. The NarrativeQA_131072 dataset is synthetically generated from NarrativeQA by adding related paragraphs to the given ground truth summary. For the first two steps training of ChatQA-2, we follow ChatQA1.5. For… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data.

  18. ChatQA-Training-Data

    • huggingface.co
    Updated Jun 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 30, 2023
    Dataset provided by
    Nvidiahttp://nvidia.com/
    Authors
    NVIDIA
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    Data Description

    We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!

      Other… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.
    
  19. pg19

    • huggingface.co
    • tensorflow.org
    • +1more
    Updated May 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepmind (2024). pg19 [Dataset]. https://huggingface.co/datasets/deepmind/pg19
    Explore at:
    Dataset updated
    May 25, 2024
    Dataset provided by
    DeepMindhttp://deepmind.com/
    Authors
    Deepmind
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Project Gutenberg books library, that were published before 1919. It also contains metadata of book titles and publication dates.

    PG-19 is over double the size of the Billion Word benchmark and contains documents that are 20X longer, on average, than the WikiText long-range language modelling benchmark. Books are partitioned into a train, validation, and test set. Book metadata is stored in metadata.csv which contains (book_id, short_book_title, publication_date).

    Unlike prior benchmarks, we do not constrain the vocabulary size --- i.e. mapping rare words to an UNK token --- but instead release the data as an open-vocabulary benchmark. The only processing of the text that has been applied is the removal of boilerplate license text, and the mapping of offensive discriminatory words as specified by Ofcom to placeholder tokens. Users are free to model the data at the character-level, subword-level, or via any mechanism that can model an arbitrary string of text. To compare models we propose to continue measuring the word-level perplexity, by calculating the total likelihood of the dataset (via any chosen subword vocabulary or character-based scheme) divided by the number of tokens --- specified below in the dataset statistics table. One could use this dataset for benchmarking long-range language models, or use it to pre-train for other natural language processing tasks which require long-range reasoning, such as LAMBADA or NarrativeQA. We would not recommend using this dataset to train a general-purpose language model, e.g. for applications to a production-system dialogue agent, due to the dated linguistic style of old texts and the inherent biases present in historical writing.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Deepmind (2024). narrativeqa [Dataset]. https://huggingface.co/datasets/deepmind/narrativeqa
Organization logo

narrativeqa

NarrativeQA

deepmind/narrativeqa

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 3, 2024
Dataset provided by
DeepMindhttp://deepmind.com/
Authors
Deepmind
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for Narrative QA

  Dataset Summary

NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents.

  Supported Tasks and Leaderboards

The dataset is used to test reading comprehension. There are 2 tasks proposed in the paper: "summaries only" and "stories only", depending on whether the human-generated summary or the full story text is used to answer the question.… See the full description on the dataset page: https://huggingface.co/datasets/deepmind/narrativeqa.

Search
Clear search
Close search
Google apps
Main menu