10 datasets found

h
wiki_bio_gpt3_hallucination
huggingface.co
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Potsawee Manakul, wiki_bio_gpt3_hallucination [Dataset]. https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Potsawee Manakul
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset Card for WikiBio GPT-3 Hallucination Dataset

GitHub repository: https://github.com/potsawee/selfcheckgpt Paper: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

Dataset Summary

We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into… See the full description on the dataset page: https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination.
E
WikiBio
live.european-language-grid.eu
Updated Dec 30, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). WikiBio [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5121
Explore at:
Dataset updated
Dec 30, 2016
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
Dataset contains 728,321 biographies from wikipedia. For each article, it provides the first paragraph and the infobox (both tokenized).
T
wiki_bio
tensorflow.org
opendatalab.com
Updated Dec 6, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). wiki_bio [Dataset]. https://www.tensorflow.org/datasets/catalog/wiki_bio
Explore at:
Dataset updated
Dec 6, 2022
Description
WikiBio is constructed using Wikipedia biography pages, it contains the first paragraph and the infobox tokenized. The dataset follows a standarized table format.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('wiki_bio', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
t
Wikibio Dataset - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Wikibio Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/wikibio-dataset
Explore at:
Dataset updated
Dec 16, 2024
Description
Text summarization and data-to-text generation datasets
h
Data-to-text-Generation
huggingface.co
Updated Oct 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI Box (2023). Data-to-text-Generation [Dataset]. https://huggingface.co/datasets/RUCAIBox/Data-to-text-Generation
Explore at:
Dataset updated
Oct 6, 2023
Dataset authored and provided by
AI Box
Description
This is the data-to-text generation datasets collected by TextBox, including:

WebNLG v2.1 (webnlg) WebNLG v3.0 (webnlg2) WikiBio (wikibio) E2E (e2e) DART (dart) ToTTo (totto) ENT-DESC (ent) AGENDA (agenda) GenWiki (genwiki) TEKGEN (tekgen) LogicNLG (logicnlg) WikiTableT (wikit) WEATHERGOV (wg).

The detail and leaderboard of each dataset can be found in TextBox page.
O
WikiBioCTE
opendatalab.com
zip
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Oxford (2023). WikiBioCTE [Dataset]. https://opendatalab.com/OpenDataLab/WikiBioCTE
Explore at:
zipAvailable download formats
Dataset updated
Apr 19, 2023
Dataset provided by
University of Oxford
Description
WikiBioCTE is a dataset for controllable text edition based on the existing dataset WikiBio (originally created for table-to-text generation). In the task of controllable text edition the input is a long text, a question, and a target answer, and the output is a minimally modified text, so that it fits the target answer. This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document, or changing some key information of an event in a news text.
h
ICE
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaobo Wang, ICE [Dataset]. https://huggingface.co/datasets/Yofuria/ICE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Xiaobo Wang
Description
Dataset Card for ICE

This dataset is uesd for our work: In-Context Editing: Learning Knowledge from Self-Induced Distributions, and our code has been released on GitHub here.

Dataset Sources

Our dataset is constructed based on KnowEdit, and we generate contexts for each data using GPT-4o.

Dataset Structure

We evaluate our method using four datasets, WikiDatarecent, ZsRE, WikiBio, WikiDatacounter fact. These datasets encompass two knowledge editing tasks… See the full description on the dataset page: https://huggingface.co/datasets/Yofuria/ICE.
h
IndicWikiBio
huggingface.co
Updated Dec 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI4Bharat (2022). IndicWikiBio [Dataset]. https://huggingface.co/datasets/ai4bharat/IndicWikiBio
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 14, 2022
Dataset authored and provided by
AI4Bharat
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This is the WikiBio dataset released as part of IndicNLG Suite. Each example has four fields: id, infobox, serialized infobox and summary. We create this dataset in nine languages including as, bn, hi, kn, ml, or, pa, ta, te. The total size of the dataset is 57,426.
h
KnowEdit
huggingface.co
Updated Jul 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ZJUNLP (2025). KnowEdit [Dataset]. https://huggingface.co/datasets/zjunlp/KnowEdit
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2025
Dataset authored and provided by
ZJUNLP
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
KnowEdit: A Benchmark of Knowledge Editing for LLMs

This README is about reproducing the paper A Comprehensive Study of Knowledge Editing for Large Language Models. You can use EasyEdit to load and use this benchmark.

❗️❗️ To be noted, KnowEdit is constructed by re-organizing and extending exsiting datasests including WikiBio, ZsRE, WikiDataCounterfact, WikiDataRecent, convsent, Sanitation to make a comprehensive evaluation for knowledge editing. Special thanks to the builders and… See the full description on the dataset page: https://huggingface.co/datasets/zjunlp/KnowEdit.
h
raw-wiki-bio-filtered
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Najma Dwiharani, raw-wiki-bio-filtered [Dataset]. https://huggingface.co/datasets/najmharani/raw-wiki-bio-filtered
Explore at:
Authors
Najma Dwiharani
Description
Dataset Card for "raw-wiki-bio-filtered"

More Information needed
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Potsawee Manakul, wiki_bio_gpt3_hallucination [Dataset]. https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination

wiki_bio_gpt3_hallucination

potsawee/wiki_bio_gpt3_hallucination

Explore at:

6 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Authors

Potsawee Manakul

License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Dataset Card for WikiBio GPT-3 Hallucination Dataset

GitHub repository: https://github.com/potsawee/selfcheckgpt Paper: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

  Dataset Summary

We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into… See the full description on the dataset page: https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination.

Clear search

Close search

Google apps

Main menu

wiki_bio_gpt3_hallucination

WikiBio

wiki_bio

Wikibio Dataset - Dataset - LDM

Data-to-text-Generation

WikiBioCTE

ICE

IndicWikiBio

KnowEdit

raw-wiki-bio-filtered

wiki_bio_gpt3_hallucination

potsawee/wiki_bio_gpt3_hallucination