1 dataset found
  1. P

    WikiSum Dataset

    • paperswithcode.com
    • opendatalab.com
    • +1more
    Updated Apr 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Peter J. Liu; Mohammad Saleh; Etienne Pot; Ben Goodrich; Ryan Sepassi; Lukasz Kaiser; Noam Shazeer (2023). WikiSum Dataset [Dataset]. https://paperswithcode.com/dataset/wikisum
    Explore at:
    Dataset updated
    Apr 9, 2023
    Authors
    Peter J. Liu; Mohammad Saleh; Etienne Pot; Ben Goodrich; Ryan Sepassi; Lukasz Kaiser; Noam Shazeer
    Description

    WikiSum is a dataset based on English Wikipedia and suitable for a task of multi-document abstractive summarization. In each instance, the input is comprised of a Wikipedia topic (title of article) and a collection of non-Wikipedia reference documents, and the target is the Wikipedia article text. The dataset is restricted to the articles with at least one crawlable citation. The official split divides the articles roughly into 80/10/10 for train/development/test subsets, resulting in 1865750, 233252, and 232998 examples respectively.

  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Peter J. Liu; Mohammad Saleh; Etienne Pot; Ben Goodrich; Ryan Sepassi; Lukasz Kaiser; Noam Shazeer (2023). WikiSum Dataset [Dataset]. https://paperswithcode.com/dataset/wikisum

WikiSum Dataset

Explore at:
280 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Apr 9, 2023
Authors
Peter J. Liu; Mohammad Saleh; Etienne Pot; Ben Goodrich; Ryan Sepassi; Lukasz Kaiser; Noam Shazeer
Description

WikiSum is a dataset based on English Wikipedia and suitable for a task of multi-document abstractive summarization. In each instance, the input is comprised of a Wikipedia topic (title of article) and a collection of non-Wikipedia reference documents, and the target is the Wikipedia article text. The dataset is restricted to the articles with at least one crawlable citation. The official split divides the articles roughly into 80/10/10 for train/development/test subsets, resulting in 1865750, 233252, and 232998 examples respectively.

Search
Clear search
Close search
Google apps
Main menu