RAGTruth Dataset
Dataset Description
Dataset Summary
The RAGTruth dataset is designed for evaluating hallucinations in text generation models, particularly in retrieval-augmented generation (RAG) contexts. It contains examples of model outputs along with expert annotations indicating whether the outputs contain hallucinations.
Dataset Structure
Each example contains:
A query/question Context passages Model output Hallucination labels (evident… See the full description on the dataset page: https://huggingface.co/datasets/wandb/RAGTruth-processed.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
RAGTruth test set
Dataset
Test split of RAGTruth dataset by ParticleMedia available from https://github.com/ParticleMedia/RAGTruth/tree/main/dataset The dataset was published in RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
Preprocessing
We kept only the test split of the original dataset Joined response and source info files Created the response level hallucination labels as described in the paper using binary… See the full description on the dataset page: https://huggingface.co/datasets/flowaicom/RAGTruth_test.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
nimitkalra/RAGTruth dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is created from the RAGTruth dataset by translating it to German. We've used Mistral Small 3.1 for the translation. The translation was done on a single A100 machine using VLLM as a server.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is created from the RAGTruth dataset by translating it to Italian. We've used Gemma 3 27B for the translation. The translation was done on a single A100 machine using VLLM as a server.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is created from the RAGTruth dataset by translating it to Hungarian. We've used Gemma 3 27B for the translation. The translation was done on a single A100 machine using VLLM as a server.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The dataset is created from the RAGTruth dataset by translating it to Chinese. We've used Gemma 3 27B for the translation. The translation was done on a single A100 machine using VLLM as a server.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
KRLabsOrg/ragtruth-de-translated-manual-300 dataset hosted on Hugging Face and contributed by the HF Datasets community
Dataset Card for Dataset Name
ragtruth-qa 데이터셋을 gpt-4o를 이용하여 한글로 번역 한 데이터셋.
Dataset Details
Dataset Description
Curated by: [More Information Needed] Language(s) (NLP): [한국어] License: [미정]
Dataset Sources [optional]
Repository: [https://huggingface.co/datasets/flowaicom/formatted-ragtruth-qa]
Uses
Direct Use
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Dataset Structure… See the full description on the dataset page: https://huggingface.co/datasets/Yettiesoft/ragtruth-qa-ko.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
Important Update 08.09.2024
We announce the LLM-AggreFact leaderboard with 35 latest fact-checking models being evaluated.
We include one additional dataset RAGTruth to our benchmark. We convert the dataset to the same format as in our benchmark and removed those non-checkworthy claims. We include a randomly sampled subset of the training set from RAGTruth into the validation set of the benchmark since the original training set is too large after conversion.… See the full description on the dataset page: https://huggingface.co/datasets/lytang/LLM-AggreFact.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
RAGTruth Dataset
Dataset Description
Dataset Summary
The RAGTruth dataset is designed for evaluating hallucinations in text generation models, particularly in retrieval-augmented generation (RAG) contexts. It contains examples of model outputs along with expert annotations indicating whether the outputs contain hallucinations.
Dataset Structure
Each example contains:
A query/question Context passages Model output Hallucination labels (evident… See the full description on the dataset page: https://huggingface.co/datasets/wandb/RAGTruth-processed.