This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).
WikiBio is constructed using Wikipedia biography pages, it contains the first paragraph and the infobox tokenized. The dataset follows a standarized table format.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wiki_bio', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into sentences, and we annotate each sentence into one of the 3 options: (1) accurate (2) minor_inaccurate (3) major_inaccurate. We report the data statistics, annotation process, and inter-annotator agreement in our paper.
WikiBioCTE is a dataset for controllable text edition based on the existing dataset WikiBio (originally created for table-to-text generation). In the task of controllable text edition the input is a long text, a question, and a target answer, and the output is a minimally modified text, so that it fits the target answer. This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document, or changing some key information of an event in a news text.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
KnowEdit: A Benchmark of Knowledge Editing for LLMs
This README is about reproducing the paper A Comprehensive Study of Knowledge Editing for Large Language Models. You can use EasyEdit to load and use this benchmark.
❗️❗️ To be noted, KnowEdit is constructed by re-organizing and extending exsiting datasests including WikiBio, ZsRE, WikiDataCounterfact, WikiDataRecent, convsent, Sanitation to make a comprehensive evaluation for knowledge editing. Special thanks to the builders and… See the full description on the dataset page: https://huggingface.co/datasets/zjunlp/KnowEdit.
This is the dataset for knowledge editing. It contains six tasks: ZsRE, $Wiki_{recent}$, $Wiki_{counterfact}$, WikiBio, ConvSent and Sanitation. This repo shows the former 4 tasks and you can get the data for ConvSent and Sanitation from their original papers.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).