Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RAGBench
Dataset Overview
RAGBEnch is a large-scale RAG benchmark dataset of 100k RAG examples. It covers five unique industry-specific domains and various RAG task types. RAGBench examples are sourced from industry corpora such as user manuals, making it particularly relevant for industry applications. RAGBench comrises 12 sub-component datasets, each one split into train/validation/test splits
Usage
from datasets import load_dataset
param-bharat/ragbench-dual-clf-preprocessed dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MBPP dataset annotated with ground-truth programming solutions, to enable evaluations for retrieval and retrieval-augmented code generation. Please refer to code-rag-bench for more details.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The StackOverflow posts retrieval source for code-rag-bench.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
DS-1000 dataset annotated with the ground-truth library documentation, to enable evaluations for retrieval and retrieval-augmented code generation. Please refer to [code-rag-bench] for more details
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
ODEX dataset annotated with the ground-truth library documentation, to enable evaluations for retrieval and retrieval-augmented code generation. Please refer to [code-rag-bench] for more details.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The entire dump of GitHub repositories.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Public RAG bench dataset with texts
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Summary
This dataset is a domain-specific benchmark for Question Answering, using the Jeep 2023 Gladiator Car manual as its knowledge base. It combines the corpus from the original DelucionQA project by Bosch Research with questions sourced from the RAGBench dataset. The result is a challenging dataset designed to evaluate a system's ability to answer specific, technical questions based on a complex, real-world document.
Supported Tasks
Question Answering:โฆ See the full description on the dataset page: https://huggingface.co/datasets/corvicai/delucionqa.
code-rag-bench/code-retrieval-stackoverflow-small dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/
REAL-MM-RAG-Bench: A Real-World Multi-Modal Retrieval Benchmark
We introduced REAL-MM-RAG-Bench, a real-world multi-modal retrieval benchmark designed to evaluate retrieval models in reliable, challenging, and realistic settings. The benchmark was constructed using an automated pipeline, where queries were generated by a vision-language model (VLM), filtered by a large language model (LLM), and rephrased by an LLM to ensure high-quality retrieval evaluation. To simulate real-worldโฆ See the full description on the dataset page: https://huggingface.co/datasets/ibm-research/REAL-MM-RAG_FinReport.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
DRAGON bench history public texts. Date: 2025.07.10
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
RAG bench private QA dataset. Test version
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
DRAGON bench history private QA dataset. Date: 2025.07.10
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
DRAGON bench history public questions. Date: 2025.07.10
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
DRAGON bench history private texts (mappings). Date: 2025.07.10
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
ChatRAG Bench
ChatRAG Bench is a benchmark for evaluating a model's conversational QA capability over documents or retrieved context. ChatRAG Bench are built on and derived from 10 existing datasets: Doc2Dial, QuAC, QReCC, TopioCQA, INSCIT, CoQA, HybriDialogue, DoQA, SQA, ConvFinQA. ChatRAG Bench covers a wide range of documents and question types, which require models to generate responses from long context, comprehend and reason over tables, conduct arithmetic calculations, andโฆ See the full description on the dataset page: https://huggingface.co/datasets/nvidia/ChatRAG-Bench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RAGBench
Dataset Overview
RAGBEnch is a large-scale RAG benchmark dataset of 100k RAG examples. It covers five unique industry-specific domains and various RAG task types. RAGBench examples are sourced from industry corpora such as user manuals, making it particularly relevant for industry applications. RAGBench comrises 12 sub-component datasets, each one split into train/validation/test splits
Usage
from datasets import load_dataset