10 datasets found
  1. h

    wiki_bio_gpt3_hallucination

    • huggingface.co
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Potsawee Manakul, wiki_bio_gpt3_hallucination [Dataset]. https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Potsawee Manakul
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset Card for WikiBio GPT-3 Hallucination Dataset

    GitHub repository: https://github.com/potsawee/selfcheckgpt Paper: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

      Dataset Summary
    

    We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into… See the full description on the dataset page: https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination.

  2. E

    WikiBio

    • live.european-language-grid.eu
    Updated Dec 30, 2016
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2016). WikiBio [Dataset]. https://live.european-language-grid.eu/catalogue/corpus/5121
    Explore at:
    Dataset updated
    Dec 30, 2016
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description

    Dataset contains 728,321 biographies from wikipedia. For each article, it provides the first paragraph and the infobox (both tokenized).

  3. T

    wiki_bio

    • tensorflow.org
    • opendatalab.com
    Updated Dec 6, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). wiki_bio [Dataset]. https://www.tensorflow.org/datasets/catalog/wiki_bio
    Explore at:
    Dataset updated
    Dec 6, 2022
    Description

    WikiBio is constructed using Wikipedia biography pages, it contains the first paragraph and the infobox tokenized. The dataset follows a standarized table format.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('wiki_bio', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  4. t

    Wikibio Dataset - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Wikibio Dataset - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/wikibio-dataset
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    Text summarization and data-to-text generation datasets

  5. h

    Data-to-text-Generation

    • huggingface.co
    Updated Oct 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI Box (2023). Data-to-text-Generation [Dataset]. https://huggingface.co/datasets/RUCAIBox/Data-to-text-Generation
    Explore at:
    Dataset updated
    Oct 6, 2023
    Dataset authored and provided by
    AI Box
    Description

    This is the data-to-text generation datasets collected by TextBox, including:

    WebNLG v2.1 (webnlg) WebNLG v3.0 (webnlg2) WikiBio (wikibio) E2E (e2e) DART (dart) ToTTo (totto) ENT-DESC (ent) AGENDA (agenda) GenWiki (genwiki) TEKGEN (tekgen) LogicNLG (logicnlg) WikiTableT (wikit) WEATHERGOV (wg).

    The detail and leaderboard of each dataset can be found in TextBox page.

  6. O

    WikiBioCTE

    • opendatalab.com
    zip
    Updated Apr 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Oxford (2023). WikiBioCTE [Dataset]. https://opendatalab.com/OpenDataLab/WikiBioCTE
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 19, 2023
    Dataset provided by
    University of Oxford
    Description

    WikiBioCTE is a dataset for controllable text edition based on the existing dataset WikiBio (originally created for table-to-text generation). In the task of controllable text edition the input is a long text, a question, and a target answer, and the output is a minimally modified text, so that it fits the target answer. This task is very important in many situations, such as changing some conditions, consequences, or properties in a legal document, or changing some key information of an event in a news text.

  7. h

    ICE

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaobo Wang, ICE [Dataset]. https://huggingface.co/datasets/Yofuria/ICE
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Xiaobo Wang
    Description

    Dataset Card for ICE

    This dataset is uesd for our work: In-Context Editing: Learning Knowledge from Self-Induced Distributions, and our code has been released on GitHub here.

      Dataset Sources
    

    Our dataset is constructed based on KnowEdit, and we generate contexts for each data using GPT-4o.

      Dataset Structure
    

    We evaluate our method using four datasets, WikiDatarecent, ZsRE, WikiBio, WikiDatacounter fact. These datasets encompass two knowledge editing tasks… See the full description on the dataset page: https://huggingface.co/datasets/Yofuria/ICE.

  8. h

    IndicWikiBio

    • huggingface.co
    Updated Dec 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI4Bharat (2022). IndicWikiBio [Dataset]. https://huggingface.co/datasets/ai4bharat/IndicWikiBio
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 14, 2022
    Dataset authored and provided by
    AI4Bharat
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    This is the WikiBio dataset released as part of IndicNLG Suite. Each example has four fields: id, infobox, serialized infobox and summary. We create this dataset in nine languages including as, bn, hi, kn, ml, or, pa, ta, te. The total size of the dataset is 57,426.

  9. h

    KnowEdit

    • huggingface.co
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ZJUNLP (2025). KnowEdit [Dataset]. https://huggingface.co/datasets/zjunlp/KnowEdit
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2025
    Dataset authored and provided by
    ZJUNLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    KnowEdit: A Benchmark of Knowledge Editing for LLMs

    This README is about reproducing the paper A Comprehensive Study of Knowledge Editing for Large Language Models. You can use EasyEdit to load and use this benchmark.

    ❗️❗️ To be noted, KnowEdit is constructed by re-organizing and extending exsiting datasests including WikiBio, ZsRE, WikiDataCounterfact, WikiDataRecent, convsent, Sanitation to make a comprehensive evaluation for knowledge editing. Special thanks to the builders and… See the full description on the dataset page: https://huggingface.co/datasets/zjunlp/KnowEdit.

  10. h

    raw-wiki-bio-filtered

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Najma Dwiharani, raw-wiki-bio-filtered [Dataset]. https://huggingface.co/datasets/najmharani/raw-wiki-bio-filtered
    Explore at:
    Authors
    Najma Dwiharani
    Description

    Dataset Card for "raw-wiki-bio-filtered"

    More Information needed

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Potsawee Manakul, wiki_bio_gpt3_hallucination [Dataset]. https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination

wiki_bio_gpt3_hallucination

potsawee/wiki_bio_gpt3_hallucination

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Potsawee Manakul
License

Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically

Description

Dataset Card for WikiBio GPT-3 Hallucination Dataset

GitHub repository: https://github.com/potsawee/selfcheckgpt Paper: SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

  Dataset Summary

We generate Wikipedia-like passages using GPT-3 (text-davinci-003) using the prompt: This is a Wikipedia passage about {concept} where concept represents an individual from the WikiBio dataset. We split the generated passages into… See the full description on the dataset page: https://huggingface.co/datasets/potsawee/wiki_bio_gpt3_hallucination.

Search
Clear search
Close search
Google apps
Main menu