7 datasets found
  1. P

    WikiBio Dataset

    • paperswithcode.com
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Remi Lebret; David Grangier; Michael Auli (2021). WikiBio Dataset [Dataset]. https://paperswithcode.com/dataset/wikibio
    Explore at:
    Dataset updated
    Nov 16, 2021
    Authors
    Remi Lebret; David Grangier; Michael Auli
    Description

    This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).

  2. T

    wiki_bio

    • tensorflow.org
    • opendatalab.com
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). wiki_bio [Dataset]. https://www.tensorflow.org/datasets/catalog/wiki_bio
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    WikiBio is constructed using Wikipedia biography pages, it contains the first paragraph and the infobox tokenized. The dataset follows a standarized table format.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('wiki_bio', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  3. O

    wiki-bio-gpt3-hallucination

    • opendatalab.com
    • huggingface.co
    zip
    Updated Jul 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Cambridge (2023). wiki-bio-gpt3-hallucination [Dataset]. https://opendatalab.com/OpenDataLab/wiki-bio-gpt3-hallucination
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 2, 2023
    Dataset provided by
    University of Cambridge
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into sentences, and we annotate each sentence into one of the 3 options: (1) accurate (2) minor_inaccurate (3) major_inaccurate. We report the data statistics, annotation process, and inter-annotator agreement in our paper.

  4. O

    WikiBioCTE

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Oxford (2023). WikiBioCTE [Dataset]. https://opendatalab.com/OpenDataLab/WikiBioCTE
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    University of Oxford
    Description

    WikiBioCTE is a dataset for controllable text edition based on the existing dataset WikiBio (originally created for table-to-text generation). In the task of controllable text edition the input is a long text, a question, and a target answer, and the output is a minimally modified text, so that it fits the target answer. This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document, or changing some key information of an event in a news text.

  5. h

    KnowEdit

    • huggingface.co
    Updated Jun 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZJUNLP (2025). KnowEdit [Dataset]. https://huggingface.co/datasets/zjunlp/KnowEdit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 17, 2025
    Dataset authored and provided by
    ZJUNLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    KnowEdit: A Benchmark of Knowledge Editing for LLMs

    This README is about reproducing the paper A Comprehensive Study of Knowledge Editing for Large Language Models. You can use EasyEdit to load and use this benchmark.

    ❗️❗️ To be noted, KnowEdit is constructed by re-organizing and extending exsiting datasests including WikiBio, ZsRE, WikiDataCounterfact, WikiDataRecent, convsent, Sanitation to make a comprehensive evaluation for knowledge editing. Special thanks to the builders and… See the full description on the dataset page: https://huggingface.co/datasets/zjunlp/KnowEdit.

  6. P

    KnowEdit Dataset

    • paperswithcode.com
    Updated Mar 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ningyu Zhang; Yunzhi Yao; Bozhong Tian; Peng Wang; Shumin Deng; Mengru Wang; Zekun Xi; Shengyu Mao; Jintian Zhang; Yuansheng Ni; Siyuan Cheng; Ziwen Xu; Xin Xu; Jia-Chen Gu; Yong Jiang; Pengjun Xie; Fei Huang; Lei Liang; Zhiqiang Zhang; Xiaowei Zhu; Jun Zhou; Huajun Chen (2025). KnowEdit Dataset [Dataset]. https://paperswithcode.com/dataset/knowedit
    Explore at:
    Dataset updated
    Mar 2, 2025
    Authors
    Ningyu Zhang; Yunzhi Yao; Bozhong Tian; Peng Wang; Shumin Deng; Mengru Wang; Zekun Xi; Shengyu Mao; Jintian Zhang; Yuansheng Ni; Siyuan Cheng; Ziwen Xu; Xin Xu; Jia-Chen Gu; Yong Jiang; Pengjun Xie; Fei Huang; Lei Liang; Zhiqiang Zhang; Xiaowei Zhu; Jun Zhou; Huajun Chen
    Description

    This is the dataset for knowledge editing. It contains six tasks: ZsRE, $Wiki_{recent}$, $Wiki_{counterfact}$, WikiBio, ConvSent and Sanitation. This repo shows the former 4 tasks and you can get the data for ConvSent and Sanitation from their original papers.

  7. h

    raw-wiki-bio-filtered

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Najma Dwiharani, raw-wiki-bio-filtered [Dataset]. https://huggingface.co/datasets/najmharani/raw-wiki-bio-filtered
    Explore at:
    Authors
    Najma Dwiharani
    Description

    Dataset Card for "raw-wiki-bio-filtered"

    More Information needed

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Remi Lebret; David Grangier; Michael Auli (2021). WikiBio Dataset [Dataset]. https://paperswithcode.com/dataset/wikibio

WikiBio Dataset

Wikipedia Biography Dataset

Explore at:
405 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 16, 2021
Authors
Remi Lebret; David Grangier; Michael Auli
Description

This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).

Search
Clear search
Close search
Google apps
Main menu