17 datasets found
  1. h

    TruthfulQA

    • huggingface.co
    • opendatalab.com
    Updated Apr 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Domenic Rosati (2023). TruthfulQA [Dataset]. https://huggingface.co/datasets/domenicrosati/TruthfulQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 18, 2023
    Authors
    Domenic Rosati
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for TruthfulQA

      Dataset Summary
    

    TruthfulQA: Measuring How Models Mimic Human Falsehoods We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers… See the full description on the dataset page: https://huggingface.co/datasets/domenicrosati/TruthfulQA.

  2. h

    ro_truthfulqa

    • huggingface.co
    Updated Oct 11, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OpenLLM-Ro (2024). ro_truthfulqa [Dataset]. https://huggingface.co/datasets/OpenLLM-Ro/ro_truthfulqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 11, 2024
    Dataset authored and provided by
    OpenLLM-Ro
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Dataset Description

    TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. Here we provide the Romanian translation of the… See the full description on the dataset page: https://huggingface.co/datasets/OpenLLM-Ro/ro_truthfulqa.

  3. h

    truthful_qa_context

    • huggingface.co
    Updated Feb 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Portkey AI (2023). truthful_qa_context [Dataset]. https://huggingface.co/datasets/portkey/truthful_qa_context
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 15, 2023
    Dataset authored and provided by
    Portkey AI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for truthful_qa_context

      Dataset Summary
    

    TruthfulQA Context is an extension of the TruthfulQA benchmark, specifically designed to enhance its utility for models that rely on Retrieval-Augmented Generation (RAG). This version includes the original questions and answers from TruthfulQA, along with the added context text directly associated with each question. This additional context aims to provide immediate reference material for models, making it particularly… See the full description on the dataset page: https://huggingface.co/datasets/portkey/truthful_qa_context.

  4. h

    uhura-truthfulqa

    • huggingface.co
    Updated Nov 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Masakhane NLP (2024). uhura-truthfulqa [Dataset]. https://huggingface.co/datasets/masakhane/uhura-truthfulqa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 24, 2024
    Dataset authored and provided by
    Masakhane NLP
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for Uhura-TruthfulQA

      Dataset Summary
    

    TruthfulQA is a widely recognized safety benchmark designed to measure the truthfulness of language model outputs across 38 categories, including health, law, finance, and politics. The English version of the benchmark originates from TruthfulQA: Measuring How Models Mimic Human Falsehoods (Lin et al., 2022) and consists of 817 questions in both multiple-choice and generation formats, targeting common misconceptions and… See the full description on the dataset page: https://huggingface.co/datasets/masakhane/uhura-truthfulqa.

  5. h

    m_truthfulqa

    • huggingface.co
    Updated Jan 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    malteos (2024). m_truthfulqa [Dataset]. https://huggingface.co/datasets/malteos/m_truthfulqa
    Explore at:
    Dataset updated
    Jan 3, 2024
    Authors
    malteos
    Description

    TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts.

  6. h

    tiny-truthful-qa

    • huggingface.co
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hossein A. (Saeed) Rahmani (2025). tiny-truthful-qa [Dataset]. https://huggingface.co/datasets/rahmanidashti/tiny-truthful-qa
    Explore at:
    Dataset updated
    Jan 15, 2025
    Authors
    Hossein A. (Saeed) Rahmani
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for TruthfulQA

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 790 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from… See the full description on the dataset page: https://huggingface.co/datasets/rahmanidashti/tiny-truthful-qa.

  7. h

    truthfulqa-multi

    • huggingface.co
    Updated Jun 16, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HiTZ zentroa (2025). truthfulqa-multi [Dataset]. https://huggingface.co/datasets/HiTZ/truthfulqa-multi
    Explore at:
    Dataset updated
    Jun 16, 2025
    Dataset authored and provided by
    HiTZ zentroa
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Card for TruthfulQA-multi

    TruthfulQA-multi is a professionally translated extension of the original TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. The dataset enables evaluating the ability of Large Language Models (LLMs) to maintain truthfulness across multiple languages.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    TruthfulQA-multi extends the original English TruthfulQA dataset to four additional languages… See the full description on the dataset page: https://huggingface.co/datasets/HiTZ/truthfulqa-multi.

  8. h

    truthful_qa

    • huggingface.co
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Ku (2025). truthful_qa [Dataset]. https://huggingface.co/datasets/onionmonster/truthful_qa
    Explore at:
    Dataset updated
    Jul 1, 2025
    Authors
    Calvin Ku
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    TruthfulQA‑CFB · Measuring How Models Mimic Human Falsehoods (Conversation Fact Benchmark Format)

    TruthfulQA‑CFB is a 817 example benchmark derived from the original TruthfulQA dataset, transformed and adapted for the Conversation Fact Benchmark framework. Each item consists of questions designed to test whether language models can distinguish truth from common human misconceptions and false beliefs. The dataset focuses on truthfulness evaluation: questions target areas where humans… See the full description on the dataset page: https://huggingface.co/datasets/onionmonster/truthful_qa.

  9. h

    Simbolo_data

    • huggingface.co
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ye Bhone Lin (2024). Simbolo_data [Dataset]. https://huggingface.co/datasets/YeBhoneLin10/Simbolo_data
    Explore at:
    Dataset updated
    Jun 19, 2024
    Authors
    Ye Bhone Lin
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Summary TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. Supported Tasks and Leaderboards [Needs More Information]… See the full description on the dataset page: https://huggingface.co/datasets/YeBhoneLin10/Simbolo_data.

  10. truthfulqa

    • huggingface.co
    Updated Jul 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LM-Polygraph Framework (2025). truthfulqa [Dataset]. https://huggingface.co/datasets/LM-Polygraph/truthfulqa
    Explore at:
    Dataset updated
    Jul 15, 2025
    Dataset provided by
    Framework Computerhttps://frame.work/
    Authors
    LM-Polygraph Framework
    Description

    Dataset Card for truthfulqa

    This is a preprocessed version of truthfulqa dataset for benchmarks in LM-Polygraph.

      Dataset Details
    
    
    
    
    
      Dataset Description
    

    Curated by: https://huggingface.co/LM-Polygraph License: https://github.com/IINemo/lm-polygraph/blob/main/LICENSE.md

      Dataset Sources [optional]
    

    Repository: https://github.com/IINemo/lm-polygraph

      Uses
    
    
    
    
    
    
    
      Direct Use
    

    This dataset should be used for performing benchmarks on… See the full description on the dataset page: https://huggingface.co/datasets/LM-Polygraph/truthfulqa.

  11. h

    TruthfulQA

    • huggingface.co
    Updated Aug 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    zrchen (2022). TruthfulQA [Dataset]. https://huggingface.co/datasets/studymakesmehappyyyyy/TruthfulQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2022
    Authors
    zrchen
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Qwen2.5 TruthfulQA 推理代码

    model_truthfulqa.py 是针对 Qwen2.5 模型的相关推理代码,用于运行 TruthfulQA 基准测试。该基准测试的重点是评估生成的答案在真实度和信息量上的表现,或评估模型在多选题任务上的准确率。

      TruthfulQA Benchmark
    

    TruthfulQA 基准测试包括两个任务,使用相同的问题集和参考答案:

      1. 生成类任务 (Generation Task)
    

    任务描述: 给定一个问题,生成 1-2 句的答案。 评估目标: 主要目标: 答案的整体真实性 (% true),即模型生成的答案中真实的比例。 次要目标: 答案的信息量 (% info),避免模型通过回答诸如“我不评论”等无信息量的内容来“投机取巧”。

    评估指标: 使用微调的 GPT-3 模型(GPT-judge 和 GPT-info)来预测答案的真实性和信息量。 使用传统相似性指标(BLEURT、ROUGE、BLEU)计算生成答案与参考答案(真/假参考答案)的相似性:得分 =… See the full description on the dataset page: https://huggingface.co/datasets/studymakesmehappyyyyy/TruthfulQA.

  12. h

    X-TruthfulQA_en_zh_ko_it_es

    • huggingface.co
    Updated Jan 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhihan Zhang (2024). X-TruthfulQA_en_zh_ko_it_es [Dataset]. https://huggingface.co/datasets/zhihz0535/X-TruthfulQA_en_zh_ko_it_es
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2024
    Authors
    Zhihan Zhang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    X-TruthfulQA

    🤗 Paper | 📖 arXiv

      Dataset Description
    

    X-TruthfulQA is an evaluation benchmark for multilingual large language models (LLMs), including questions and answers in 5 languages (English, Chinese, Korean, Italian and Spanish). It is intended to evaluate the truthfulness of LLMs. The dataset is translated by GPT-4 from the original English-version TruthfulQA. In our paper, we evaluate LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with… See the full description on the dataset page: https://huggingface.co/datasets/zhihz0535/X-TruthfulQA_en_zh_ko_it_es.

  13. h

    AraDiCE-TruthfulQA

    • huggingface.co
    Updated May 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qatar Computing Research Institute (2025). AraDiCE-TruthfulQA [Dataset]. https://huggingface.co/datasets/QCRI/AraDiCE-TruthfulQA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 18, 2025
    Dataset authored and provided by
    Qatar Computing Research Institute
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

      Overview
    

    The AraDiCE dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. In this repository, we present the TruthfulQA split of the data

      Evaluation
    

    We have used lm-harness eval framework to… See the full description on the dataset page: https://huggingface.co/datasets/QCRI/AraDiCE-TruthfulQA.

  14. h

    truthfull_qa-tr

    • huggingface.co
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamad Alhajar (2024). truthfull_qa-tr [Dataset]. https://huggingface.co/datasets/malhajar/truthfull_qa-tr
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 26, 2024
    Authors
    Mohamad Alhajar
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This Dataset is part of a series of datasets aimed at advancing Turkish LLM Developments by establishing rigid Turkish benchmarks to evaluate the performance of LLM's Produced in the Turkish Language.

      Dataset Card for truthful_qa-tr
    

    malhajar/truthful_qa-tr is a translated version of truthful_qa aimed specifically to be used in the OpenLLMTurkishLeaderboard Developed by: Mohamad Alhajar

      Dataset Summary
    

    TruthfulQA is a benchmark to measure whether a language model is… See the full description on the dataset page: https://huggingface.co/datasets/malhajar/truthfull_qa-tr.

  15. h

    medical-qa-datasets

    • huggingface.co
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lavita AI (2023). medical-qa-datasets [Dataset]. https://huggingface.co/datasets/lavita/medical-qa-datasets
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 14, 2023
    Dataset authored and provided by
    Lavita AI
    Description

    all-processed dataset is a concatenation of of medical-meadow-* and chatdoctor_healthcaremagic datasets The Chat Doctor term is replaced by the chatbot term in the chatdoctor_healthcaremagic dataset Similar to the literature the medical_meadow_cord19 dataset is subsampled to 50,000 samples truthful-qa-* is a benchmark dataset for evaluating the truthfulness of models in text generation, which is used in Llama 2 paper. Within this dataset, there are 55 and 16 questions related to Health and… See the full description on the dataset page: https://huggingface.co/datasets/lavita/medical-qa-datasets.

  16. h

    truthful_qa_indic_gen

    • huggingface.co
    Updated Feb 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    samrat saha (2024). truthful_qa_indic_gen [Dataset]. https://huggingface.co/datasets/iitrsamrat/truthful_qa_indic_gen
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 10, 2024
    Authors
    samrat saha
    Description

    Dataset Card for truthful_qa_indic

      Dataset Description
    
    
    
    
    
      Dataset Summary
    

    truthful_qa_indic is an extension of the TruthfulQA dataset, focusing on generating truthful answers in Indic languages. The benchmark comprises 817 questions spanning 38 categories, challenging models to avoid generating false answers learned from imitating human texts.

      Creation Process
    

    It's a high-quality translation of TruthfulQA, meticulously crafted with a beam width of 5… See the full description on the dataset page: https://huggingface.co/datasets/iitrsamrat/truthful_qa_indic_gen.

  17. h

    jtruthful_qa

    • huggingface.co
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrij David (2024). jtruthful_qa [Dataset]. https://huggingface.co/datasets/andrijdavid/jtruthful_qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 9, 2024
    Authors
    Andrij David
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset Card for jtruthful_qa

      Dataset Summary
    

    JTruthfulQA is a Japanese iteration of TruthfulQA (Lin+, 2022). This particular dataset isn't a translation of the original TruthfulQA, but rather, it's been constructed from the ground up. The purpose of this benchmark is to gauge the truthfulness of a language model in its generation of responses to various questions. The benchmark encompasses a total of 604 questions, which are distributed across three categories: Fact… See the full description on the dataset page: https://huggingface.co/datasets/andrijdavid/jtruthful_qa.

  18. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Domenic Rosati (2023). TruthfulQA [Dataset]. https://huggingface.co/datasets/domenicrosati/TruthfulQA

TruthfulQA

TruthfulQA

domenicrosati/TruthfulQA

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2023
Authors
Domenic Rosati
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for TruthfulQA

  Dataset Summary

TruthfulQA: Measuring How Models Mimic Human Falsehoods We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers… See the full description on the dataset page: https://huggingface.co/datasets/domenicrosati/TruthfulQA.

Search
Clear search
Close search
Google apps
Main menu