Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for WikiBio GPT-3 Hallucination Dataset
GitHub repository: https://github.com/potsawee/selfcheckgpt Paper: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Dataset Summary
We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into… See the full description on the dataset page: https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset contains 728,321 biographies from wikipedia. For each article, it provides the first paragraph and the infobox (both tokenized).
Facebook
TwitterWikiBio is constructed using Wikipedia biography pages, it contains the first paragraph and the infobox tokenized. The dataset follows a standarized table format.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('wiki_bio', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Facebook
TwitterThis is the data-to-text generation datasets collected by TextBox, including:
WebNLG v2.1 (webnlg) WebNLG v3.0 (webnlg2) WikiBio (wikibio) E2E (e2e) DART (dart) ToTTo (totto) ENT-DESC (ent) AGENDA (agenda) GenWiki (genwiki) TEKGEN (tekgen) LogicNLG (logicnlg) WikiTableT (wikit) WEATHERGOV (wg).
The detail and leaderboard of each dataset can be found in TextBox page.
Facebook
TwitterWikiBioCTE is a dataset for controllable text edition based on the existing dataset WikiBio (originally created for table-to-text generation). In the task of controllable text edition the input is a long text, a question, and a target answer, and the output is a minimally modified text, so that it fits the target answer. This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document, or changing some key information of an event in a news text.
Facebook
TwitterDataset Card for ICE
This dataset is uesd for our work: In-Context Editing: Learning Knowledge from Self-Induced Distributions, and our code has been released on GitHub here.
Dataset Sources
Our dataset is constructed based on KnowEdit, and we generate contexts for each data using GPT-4o.
Dataset Structure
We evaluate our method using four datasets, WikiDatarecent, ZsRE, WikiBio, WikiDatacounter fact. These datasets encompass two knowledge editing tasks… See the full description on the dataset page: https://huggingface.co/datasets/Yofuria/ICE.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the WikiBio dataset released as part of IndicNLG Suite. Each example has four fields: id, infobox, serialized infobox and summary. We create this dataset in nine languages including as, bn, hi, kn, ml, or, pa, ta, te. The total size of the dataset is 57,426.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
KnowEdit: A Benchmark of Knowledge Editing for LLMs
This README is about reproducing the paper A Comprehensive Study of Knowledge Editing for Large Language Models. You can use EasyEdit to load and use this benchmark.
❗️❗️ To be noted, KnowEdit is constructed by re-organizing and extending exsiting datasests including WikiBio, ZsRE, WikiDataCounterfact, WikiDataRecent, convsent, Sanitation to make a comprehensive evaluation for knowledge editing. Special thanks to the builders and… See the full description on the dataset page: https://huggingface.co/datasets/zjunlp/KnowEdit.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for WikiBio GPT-3 Hallucination Dataset
GitHub repository: https://github.com/potsawee/selfcheckgpt Paper: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Dataset Summary
We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into… See the full description on the dataset page: https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination.