17 datasets found

h
TruthfulQA
huggingface.co
opendatalab.com
Updated Apr 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Domenic Rosati (2023). TruthfulQA [Dataset]. https://huggingface.co/datasets/domenicrosati/TruthfulQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 18, 2023
Authors
Domenic Rosati
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for TruthfulQA

Dataset Summary

TruthfulQA: Measuring How Models Mimic Human Falsehoods We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers… See the full description on the dataset page: https://huggingface.co/datasets/domenicrosati/TruthfulQA.
h
ro_truthfulqa
huggingface.co
Updated Oct 11, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenLLM-Ro (2024). ro_truthfulqa [Dataset]. https://huggingface.co/datasets/OpenLLM-Ro/ro_truthfulqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 11, 2024
Dataset authored and provided by
OpenLLM-Ro
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Dataset Description

TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. Here we provide the Romanian translation of the… See the full description on the dataset page: https://huggingface.co/datasets/OpenLLM-Ro/ro_truthfulqa.
h
truthful_qa_context
huggingface.co
Updated Feb 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Portkey AI (2023). truthful_qa_context [Dataset]. https://huggingface.co/datasets/portkey/truthful_qa_context
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 15, 2023
Dataset authored and provided by
Portkey AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for truthful_qa_context

Dataset Summary

TruthfulQA Context is an extension of the TruthfulQA benchmark, specifically designed to enhance its utility for models that rely on Retrieval-Augmented Generation (RAG). This version includes the original questions and answers from TruthfulQA, along with the added context text directly associated with each question. This additional context aims to provide immediate reference material for models, making it particularly… See the full description on the dataset page: https://huggingface.co/datasets/portkey/truthful_qa_context.
h
uhura-truthfulqa
huggingface.co
Updated Nov 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Masakhane NLP (2024). uhura-truthfulqa [Dataset]. https://huggingface.co/datasets/masakhane/uhura-truthfulqa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 24, 2024
Dataset authored and provided by
Masakhane NLP
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for Uhura-TruthfulQA

Dataset Summary

TruthfulQA is a widely recognized safety benchmark designed to measure the truthfulness of language model outputs across 38 categories, including health, law, finance, and politics. The English version of the benchmark originates from TruthfulQA: Measuring How Models Mimic Human Falsehoods (Lin et al., 2022) and consists of 817 questions in both multiple-choice and generation formats, targeting common misconceptions and… See the full description on the dataset page: https://huggingface.co/datasets/masakhane/uhura-truthfulqa.
h
m_truthfulqa
huggingface.co
Updated Jan 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
malteos (2024). m_truthfulqa [Dataset]. https://huggingface.co/datasets/malteos/m_truthfulqa
Explore at:
Dataset updated
Jan 3, 2024
Authors
malteos
Description
TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts.
h
tiny-truthful-qa
huggingface.co
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hossein A. (Saeed) Rahmani (2025). tiny-truthful-qa [Dataset]. https://huggingface.co/datasets/rahmanidashti/tiny-truthful-qa
Explore at:
Dataset updated
Jan 15, 2025
Authors
Hossein A. (Saeed) Rahmani
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for TruthfulQA

Dataset Details Dataset Description

TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 790 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from… See the full description on the dataset page: https://huggingface.co/datasets/rahmanidashti/tiny-truthful-qa.
h
truthfulqa-multi
huggingface.co
Updated Jun 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HiTZ zentroa (2025). truthfulqa-multi [Dataset]. https://huggingface.co/datasets/HiTZ/truthfulqa-multi
Explore at:
Dataset updated
Jun 16, 2025
Dataset authored and provided by
HiTZ zentroa
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Card for TruthfulQA-multi

TruthfulQA-multi is a professionally translated extension of the original TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. The dataset enables evaluating the ability of Large Language Models (LLMs) to maintain truthfulness across multiple languages.

Dataset Details Dataset Description

TruthfulQA-multi extends the original English TruthfulQA dataset to four additional languages… See the full description on the dataset page: https://huggingface.co/datasets/HiTZ/truthfulqa-multi.
h
truthful_qa
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Calvin Ku (2025). truthful_qa [Dataset]. https://huggingface.co/datasets/onionmonster/truthful_qa
Explore at:
Dataset updated
Jul 1, 2025
Authors
Calvin Ku
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
TruthfulQA‑CFB · Measuring How Models Mimic Human Falsehoods (Conversation Fact Benchmark Format)

TruthfulQA‑CFB is a 817 example benchmark derived from the original TruthfulQA dataset, transformed and adapted for the Conversation Fact Benchmark framework. Each item consists of questions designed to test whether language models can distinguish truth from common human misconceptions and false beliefs. The dataset focuses on truthfulness evaluation: questions target areas where humans… See the full description on the dataset page: https://huggingface.co/datasets/onionmonster/truthful_qa.
h
Simbolo_data
huggingface.co
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ye Bhone Lin (2024). Simbolo_data [Dataset]. https://huggingface.co/datasets/YeBhoneLin10/Simbolo_data
Explore at:
Dataset updated
Jun 19, 2024
Authors
Ye Bhone Lin
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Summary TruthfulQA is a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. Questions are crafted so that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers learned from imitating human texts. Supported Tasks and Leaderboards [Needs More Information]… See the full description on the dataset page: https://huggingface.co/datasets/YeBhoneLin10/Simbolo_data.
truthfulqa
huggingface.co
Updated Jul 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LM-Polygraph Framework (2025). truthfulqa [Dataset]. https://huggingface.co/datasets/LM-Polygraph/truthfulqa
Explore at:
Dataset updated
Jul 15, 2025
Dataset provided by
Framework Computerhttps://frame.work/
Authors
LM-Polygraph Framework
Description
Dataset Card for truthfulqa

This is a preprocessed version of truthfulqa dataset for benchmarks in LM-Polygraph.

Dataset Details Dataset Description

Curated by: https://huggingface.co/LM-Polygraph License: https://github.com/IINemo/lm-polygraph/blob/main/LICENSE.md

Dataset Sources [optional]

Repository: https://github.com/IINemo/lm-polygraph

Uses Direct Use

This dataset should be used for performing benchmarks on… See the full description on the dataset page: https://huggingface.co/datasets/LM-Polygraph/truthfulqa.
h
TruthfulQA
huggingface.co
Updated Aug 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zrchen (2022). TruthfulQA [Dataset]. https://huggingface.co/datasets/studymakesmehappyyyyy/TruthfulQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2022
Authors
zrchen
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Qwen2.5 TruthfulQA 推理代码

model_truthfulqa.py 是针对 Qwen2.5 模型的相关推理代码，用于运行 TruthfulQA 基准测试。该基准测试的重点是评估生成的答案在真实度和信息量上的表现，或评估模型在多选题任务上的准确率。

TruthfulQA Benchmark

TruthfulQA 基准测试包括两个任务，使用相同的问题集和参考答案：

1. 生成类任务 (Generation Task)

任务描述: 给定一个问题，生成 1-2 句的答案。评估目标: 主要目标: 答案的整体真实性 (% true)，即模型生成的答案中真实的比例。次要目标: 答案的信息量 (% info)，避免模型通过回答诸如“我不评论”等无信息量的内容来“投机取巧”。

评估指标: 使用微调的 GPT-3 模型（GPT-judge 和 GPT-info）来预测答案的真实性和信息量。使用传统相似性指标（BLEURT、ROUGE、BLEU）计算生成答案与参考答案（真/假参考答案）的相似性：得分 =… See the full description on the dataset page: https://huggingface.co/datasets/studymakesmehappyyyyy/TruthfulQA.
h
X-TruthfulQA_en_zh_ko_it_es
huggingface.co
Updated Jan 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhihan Zhang (2024). X-TruthfulQA_en_zh_ko_it_es [Dataset]. https://huggingface.co/datasets/zhihz0535/X-TruthfulQA_en_zh_ko_it_es
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2024
Authors
Zhihan Zhang
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
X-TruthfulQA

🤗 Paper | 📖 arXiv

Dataset Description

X-TruthfulQA is an evaluation benchmark for multilingual large language models (LLMs), including questions and answers in 5 languages (English, Chinese, Korean, Italian and Spanish). It is intended to evaluate the truthfulness of LLMs. The dataset is translated by GPT-4 from the original English-version TruthfulQA. In our paper, we evaluate LLMs in a zero-shot generative setting: prompt the instruction-tuned LLM with… See the full description on the dataset page: https://huggingface.co/datasets/zhihz0535/X-TruthfulQA_en_zh_ko_it_es.
h
AraDiCE-TruthfulQA
huggingface.co
Updated May 18, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Qatar Computing Research Institute (2025). AraDiCE-TruthfulQA [Dataset]. https://huggingface.co/datasets/QCRI/AraDiCE-TruthfulQA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 18, 2025
Dataset authored and provided by
Qatar Computing Research Institute
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

Overview

The AraDiCE dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. In this repository, we present the TruthfulQA split of the data

Evaluation

We have used lm-harness eval framework to… See the full description on the dataset page: https://huggingface.co/datasets/QCRI/AraDiCE-TruthfulQA.
h
truthfull_qa-tr
huggingface.co
Updated Apr 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamad Alhajar (2024). truthfull_qa-tr [Dataset]. https://huggingface.co/datasets/malhajar/truthfull_qa-tr
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 26, 2024
Authors
Mohamad Alhajar
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This Dataset is part of a series of datasets aimed at advancing Turkish LLM Developments by establishing rigid Turkish benchmarks to evaluate the performance of LLM's Produced in the Turkish Language.

Dataset Card for truthful_qa-tr

malhajar/truthful_qa-tr is a translated version of truthful_qa aimed specifically to be used in the OpenLLMTurkishLeaderboard Developed by: Mohamad Alhajar

Dataset Summary

TruthfulQA is a benchmark to measure whether a language model is… See the full description on the dataset page: https://huggingface.co/datasets/malhajar/truthfull_qa-tr.
h
medical-qa-datasets
huggingface.co
Updated Nov 14, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lavita AI (2023). medical-qa-datasets [Dataset]. https://huggingface.co/datasets/lavita/medical-qa-datasets
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 14, 2023
Dataset authored and provided by
Lavita AI
Description
all-processed dataset is a concatenation of of medical-meadow-* and chatdoctor_healthcaremagic datasets The Chat Doctor term is replaced by the chatbot term in the chatdoctor_healthcaremagic dataset Similar to the literature the medical_meadow_cord19 dataset is subsampled to 50,000 samples truthful-qa-* is a benchmark dataset for evaluating the truthfulness of models in text generation, which is used in Llama 2 paper. Within this dataset, there are 55 and 16 questions related to Health and… See the full description on the dataset page: https://huggingface.co/datasets/lavita/medical-qa-datasets.
h
truthful_qa_indic_gen
huggingface.co
Updated Feb 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
samrat saha (2024). truthful_qa_indic_gen [Dataset]. https://huggingface.co/datasets/iitrsamrat/truthful_qa_indic_gen
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 10, 2024
Authors
samrat saha
Description
Dataset Card for truthful_qa_indic

Dataset Description Dataset Summary

truthful_qa_indic is an extension of the TruthfulQA dataset, focusing on generating truthful answers in Indic languages. The benchmark comprises 817 questions spanning 38 categories, challenging models to avoid generating false answers learned from imitating human texts.

Creation Process

It's a high-quality translation of TruthfulQA, meticulously crafted with a beam width of 5… See the full description on the dataset page: https://huggingface.co/datasets/iitrsamrat/truthful_qa_indic_gen.
h
jtruthful_qa
huggingface.co
Updated Apr 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrij David (2024). jtruthful_qa [Dataset]. https://huggingface.co/datasets/andrijdavid/jtruthful_qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 9, 2024
Authors
Andrij David
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Dataset Card for jtruthful_qa

Dataset Summary

JTruthfulQA is a Japanese iteration of TruthfulQA (Lin+, 2022). This particular dataset isn't a translation of the original TruthfulQA, but rather, it's been constructed from the ground up. The purpose of this benchmark is to gauge the truthfulness of a language model in its generation of responses to various questions. The benchmark encompasses a total of 604 questions, which are distributed across three categories: Fact… See the full description on the dataset page: https://huggingface.co/datasets/andrijdavid/jtruthful_qa.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Domenic Rosati (2023). TruthfulQA [Dataset]. https://huggingface.co/datasets/domenicrosati/TruthfulQA

TruthfulQA

domenicrosati/TruthfulQA

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 18, 2023

Authors

Domenic Rosati

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Dataset Card for TruthfulQA

  Dataset Summary

TruthfulQA: Measuring How Models Mimic Human Falsehoods We propose a benchmark to measure whether a language model is truthful in generating answers to questions. The benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics. We crafted questions that some humans would answer falsely due to a false belief or misconception. To perform well, models must avoid generating false answers… See the full description on the dataset page: https://huggingface.co/datasets/domenicrosati/TruthfulQA.

Clear search

Close search

Google apps

Main menu

TruthfulQA

ro_truthfulqa

truthful_qa_context

uhura-truthfulqa

m_truthfulqa

tiny-truthful-qa

truthfulqa-multi

truthful_qa

Simbolo_data

truthfulqa

TruthfulQA

X-TruthfulQA_en_zh_ko_it_es

AraDiCE-TruthfulQA

truthfull_qa-tr

medical-qa-datasets

truthful_qa_indic_gen

jtruthful_qa

TruthfulQA

TruthfulQA

domenicrosati/TruthfulQA