19 datasets found

narrativeqa
huggingface.co
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepmind (2024). narrativeqa [Dataset]. https://huggingface.co/datasets/deepmind/narrativeqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 3, 2024
Dataset provided by
DeepMindhttp://deepmind.com/
Authors
Deepmind
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for Narrative QA

Dataset Summary

NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents.

Supported Tasks and Leaderboards

The dataset is used to test reading comprehension. There are 2 tasks proposed in the paper: "summaries only" and "stories only", depending on whether the human-generated summary or the full story text is used to answer the question.… See the full description on the dataset page: https://huggingface.co/datasets/deepmind/narrativeqa.
Data from: The NarrativeQA Reading Comprehension Challenge Dataset
github.com
Updated Dec 21, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DeepMind (2017). The NarrativeQA Reading Comprehension Challenge Dataset [Dataset]. https://github.com/google-deepmind/narrativeqa
Explore at:
Dataset updated
Dec 21, 2017
Dataset provided by
DeepMindhttp://deepmind.com/
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This repository contains the NarrativeQA dataset. It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
narrativeqa_manual
huggingface.co
Updated May 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepmind (2024). narrativeqa_manual [Dataset]. https://huggingface.co/datasets/deepmind/narrativeqa_manual
Explore at:
Dataset updated
May 29, 2024
Dataset provided by
DeepMindhttp://deepmind.com/
Authors
Deepmind
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The Narrative QA Manual dataset is a reading comprehension dataset, in which the reader must answer questions about stories by reading entire books or movie scripts. The QA tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience.\THIS DATASET REQUIRES A MANUALLY DOWNLOADED FILE! Because of a script in the original repository which downloads the stories from original URLs everytime, The links are sometimes broken or invalid. Therefore, you need to manually download the stories for this dataset using the script provided by the authors (https://github.com/deepmind/narrativeqa/blob/master/download_stories.sh). Running the shell script creates a folder named "tmp" in the root directory and downloads the stories there. This folder containing the storiescan be used to load the dataset via datasets.load_dataset("narrativeqa_manual", data_dir="<path/to/folder>").
h
narrativeqa
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sapienza NLP, Sapienza University of Rome, narrativeqa [Dataset]. https://huggingface.co/datasets/sapienzanlp/narrativeqa
Explore at:
Dataset authored and provided by
Sapienza NLP, Sapienza University of Rome
Description
sapienzanlp/narrativeqa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
narrativeqa
huggingface.co
Updated Apr 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
teste1 (2024). narrativeqa [Dataset]. https://huggingface.co/datasets/testzin/narrativeqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 24, 2024
Dataset authored and provided by
teste1
Description
Dataset Card for Dataset Name

This dataset card aims to be a base template for new datasets. It has been generated using this raw template.

Dataset Details Dataset Description

Curated by: [More Information Needed] Funded by [optional]: [More Information Needed] Shared by [optional]: [More Information Needed] Language(s) (NLP): [More Information Needed] License: [More Information Needed]

Dataset Sources [optional]

Repository: [More… See the full description on the dataset page: https://huggingface.co/datasets/testzin/narrativeqa.
E
NarrativeQA
live.european-language-grid.eu
csv
Updated Dec 30, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2017). NarrativeQA [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5044
Explore at:
csvAvailable download formats
Dataset updated
Dec 30, 2017
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset contains the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
h
narrative-qa
huggingface.co
Updated Jun 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Contextualized Document Embedding Benchmark (2025). narrative-qa [Dataset]. https://huggingface.co/datasets/illuin-conteb/narrative-qa
Explore at:
Dataset updated
Jun 2, 2025
Dataset authored and provided by
Contextualized Document Embedding Benchmark
Description
ConTEB - NarrativeQA

This dataset is part of ConTEB (Context-aware Text Embedding Benchmark), designed for evaluating contextual embedding model capabilities. It stems from the widely used NarrativeQA dataset.

Dataset Summary

NarrativeQA (literature), consists of long documents, associated to existing sets of question-answer pairs. To build the corpus, we start from the pre-existing collection documents, extract the text, and chunk them (using LangChain's… See the full description on the dataset page: https://huggingface.co/datasets/illuin-conteb/narrative-qa.
h
narrativeqa-test-raft
huggingface.co
Updated Aug 19, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
phat (2024). narrativeqa-test-raft [Dataset]. https://huggingface.co/datasets/phatvo/narrativeqa-test-raft
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2024
Authors
phat
Description
phatvo/narrativeqa-test-raft dataset hosted on Hugging Face and contributed by the HF Datasets community
m
dataset
data.mendeley.com
Updated Oct 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vignesh A (2023). dataset [Dataset]. http://doi.org/10.17632/cpp3bx8ghd.1
Explore at:
Unique identifier
https://doi.org/10.17632/cpp3bx8ghd.1
Dataset updated
Oct 4, 2023
Authors
Vignesh A
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains SQUAD and NarrativeQA dataset files
h
copyrightBooks
huggingface.co
Updated Nov 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kangqi Wang (2024). copyrightBooks [Dataset]. https://huggingface.co/datasets/kqwang/copyrightBooks
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 25, 2024
Authors
Kangqi Wang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ForgetRetainBooks

This dataset is derived from the NarrativeQA dataset, created by Kocisky et al. (2018). NarrativeQA is a dataset for evaluating reading comprehension and narrative understanding. This dataset is an extraction of the book content from the original NarrativeQA dataset.

Citation

If you want to use this dataset, please also cite the original NarrativeQA dataset. @article{narrativeqa, author = {Tom\'a\v s Ko\v cisk\'y and Jonathan Schwarz and Phil Blunsom and… See the full description on the dataset page: https://huggingface.co/datasets/kqwang/copyrightBooks.
h
copyrightQA
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wilson Wei, copyrightQA [Dataset]. https://huggingface.co/datasets/WARSO46/copyrightQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Wilson Wei
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
CopyrightQA

This dataset is derived from the NarrativeQA dataset, created by Kocisky et al. (2018). NarrativeQA is a dataset for evaluating reading comprehension and narrative understanding. This dataset is an extraction of the question answer pairs from the original NarrativeQA dataset. It's original use is to evaluate LLMs forgetting ability using TOFU, created by Maini et al. (2024). TOFU is a benchmark for evaluating unlearning performance of LLMs on realistic tasks.… See the full description on the dataset page: https://huggingface.co/datasets/WARSO46/copyrightQA.
q
Question-Generation
hf.qhduan.com
Updated Jul 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Box (2024). Question-Generation [Dataset]. https://hf.qhduan.com/datasets/RUCAIBox/Question-Generation
Explore at:
Dataset updated
Jul 13, 2024
Dataset authored and provided by
AI Box
Description
This is the question generation datasets collected by TextBox, including:

SQuAD (squadqg) CoQA (coqaqg) NewsQA (newsqa) HotpotQA (hotpotqa) MS MARCO (marco) MSQG (msqg) NarrativeQA (nqa) QuAC (quac).

The detail and leaderboard of each dataset can be found in TextBox page.
h
narrativeqa-raft-50-p0.9
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
phat, narrativeqa-raft-50-p0.9 [Dataset]. https://huggingface.co/datasets/phatvo/narrativeqa-raft-50-p0.9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
phat
Description
phatvo/narrativeqa-raft-50-p0.9 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
LEMBNarrativeQARetrieval
huggingface.co
Updated May 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2025). LEMBNarrativeQARetrieval [Dataset]. https://huggingface.co/datasets/mteb/LEMBNarrativeQARetrieval
Explore at:
Dataset updated
May 10, 2025
Dataset authored and provided by
Massive Text Embedding Benchmark
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
LEMBNarrativeQARetrieval An MTEB dataset Massive Text Embedding Benchmark

narrativeqa subset of dwzhu/LongEmbed dataset.

Task category t2t

Domains Fiction, Non-fiction, Written

Reference https://huggingface.co/datasets/dwzhu/LongEmbed

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import mteb

task = mteb.get_tasks(["LEMBNarrativeQARetrieval"]) evaluator = mteb.MTEB(task)

model =… See the full description on the dataset page: https://huggingface.co/datasets/mteb/LEMBNarrativeQARetrieval.
h
NarrativeQARetrieval
huggingface.co
Updated Feb 1, 2001
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Massive Text Embedding Benchmark (2001). NarrativeQARetrieval [Dataset]. https://huggingface.co/datasets/mteb/NarrativeQARetrieval
Explore at:
Dataset updated
Feb 1, 2001
Dataset authored and provided by
Massive Text Embedding Benchmark
Description
NarrativeQARetrieval An MTEB dataset Massive Text Embedding Benchmark

NarrativeQA is a dataset for the task of question answering on long narratives. It consists of realistic QA instances collected from literature (fiction and non-fiction) and movie scripts.

Task categoryt2t

Domains None

Reference https://metatext.io/datasets/narrativeqa

How to evaluate on this task

You can evaluate an embedding model on this dataset using the following code: import… See the full description on the dataset page: https://huggingface.co/datasets/mteb/NarrativeQARetrieval.
h
doc-qa-rl-datasets
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreya Shankar, doc-qa-rl-datasets [Dataset]. https://huggingface.co/datasets/shreyashankar/doc-qa-rl-datasets
Explore at:
Authors
Shreya Shankar
Description
Document Question-Answering Dataset

This dataset combines and transforms the QASPER and NarrativeQA datasets into a unified format for document-based question answering tasks.

Dataset Description

This dataset is designed for training and evaluating models on document-level question answering with source attribution. Each entry contains:

A question about a document A corresponding answer Source text passages from the document that support the answer Position information… See the full description on the dataset page: https://huggingface.co/datasets/shreyashankar/doc-qa-rl-datasets.
ChatQA2-Long-SFT-data
huggingface.co
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2025). ChatQA2-Long-SFT-data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2025
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
Attribution-NonCommercial 2.0 (CC BY-NC 2.0)https://creativecommons.org/licenses/by-nc/2.0/
License information was derived automatically
Description
Data Description

Here, we release the full long SFT training dataset of ChatQA2. It consists of two parts: long_sft and NarrativeQA_131072. The long_sft dataset is built and derived from existing datasets: LongAlpaca12k, GPT-4 samples from Open Orca, and Long Data Collections. The NarrativeQA_131072 dataset is synthetically generated from NarrativeQA by adding related paragraphs to the given ground truth summary. For the first two steps training of ChatQA-2, we follow ChatQA1.5. For… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA2-Long-SFT-data.
ChatQA-Training-Data
huggingface.co
Updated Jun 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NVIDIA (2023). ChatQA-Training-Data [Dataset]. https://huggingface.co/datasets/nvidia/ChatQA-Training-Data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 30, 2023
Dataset provided by
Nvidiahttp://nvidia.com/
Authors
NVIDIA
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
Data Description

We release the training dataset of ChatQA. It is built and derived from existing datasets: DROP, NarrativeQA, NewsQA, Quoref, ROPES, SQuAD1.1, SQuAD2.0, TAT-QA, a SFT dataset, as well as a our synthetic conversational QA dataset by GPT-3.5-turbo-0613. The SFT dataset is built and derived from: Soda, ELI5, FLAN, the FLAN collection, Self-Instruct, Unnatural Instructions, OpenAssistant, and Dolly. For more information about ChatQA, check the website!

Other… See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatQA-Training-Data.
pg19
huggingface.co
tensorflow.org
+1more
Updated May 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepmind (2024). pg19 [Dataset]. https://huggingface.co/datasets/deepmind/pg19
Explore at:
Dataset updated
May 25, 2024
Dataset provided by
DeepMindhttp://deepmind.com/
Authors
Deepmind
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This repository contains the PG-19 language modeling benchmark. It includes a set of books extracted from the Project Gutenberg books library, that were published before 1919. It also contains metadata of book titles and publication dates.

PG-19 is over double the size of the Billion Word benchmark and contains documents that are 20X longer, on average, than the WikiText long-range language modelling benchmark. Books are partitioned into a train, validation, and test set. Book metadata is stored in metadata.csv which contains (book_id, short_book_title, publication_date).

Unlike prior benchmarks, we do not constrain the vocabulary size --- i.e. mapping rare words to an UNK token --- but instead release the data as an open-vocabulary benchmark. The only processing of the text that has been applied is the removal of boilerplate license text, and the mapping of offensive discriminatory words as specified by Ofcom to placeholder tokens. Users are free to model the data at the character-level, subword-level, or via any mechanism that can model an arbitrary string of text. To compare models we propose to continue measuring the word-level perplexity, by calculating the total likelihood of the dataset (via any chosen subword vocabulary or character-based scheme) divided by the number of tokens --- specified below in the dataset statistics table. One could use this dataset for benchmarking long-range language models, or use it to pre-train for other natural language processing tasks which require long-range reasoning, such as LAMBADA or NarrativeQA. We would not recommend using this dataset to train a general-purpose language model, e.g. for applications to a production-system dialogue agent, due to the dated linguistic style of old texts and the inherent biases present in historical writing.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Deepmind (2024). narrativeqa [Dataset]. https://huggingface.co/datasets/deepmind/narrativeqa

narrativeqa

NarrativeQA

deepmind/narrativeqa

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jun 3, 2024

Dataset provided by

DeepMindhttp://deepmind.com/

Authors

Deepmind

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for Narrative QA

  Dataset Summary

NarrativeQA is an English-lanaguage dataset of stories and corresponding questions designed to test reading comprehension, especially on long documents.

  Supported Tasks and Leaderboards

The dataset is used to test reading comprehension. There are 2 tasks proposed in the paper: "summaries only" and "stories only", depending on whether the human-generated summary or the full story text is used to answer the question.… See the full description on the dataset page: https://huggingface.co/datasets/deepmind/narrativeqa.

Clear search

Close search

Google apps

Main menu

narrativeqa

Data from: The NarrativeQA Reading Comprehension Challenge Dataset

narrativeqa_manual

narrativeqa

narrativeqa

NarrativeQA

narrative-qa

narrativeqa-test-raft

dataset

copyrightBooks

copyrightQA

Question-Generation

narrativeqa-raft-50-p0.9

LEMBNarrativeQARetrieval

NarrativeQARetrieval

doc-qa-rl-datasets

ChatQA2-Long-SFT-data

ChatQA-Training-Data

pg19

narrativeqa

NarrativeQA

deepmind/narrativeqa