98 datasets found

f
RAGProbe: An Automated Approach for Evaluating RAG Pipelines
figshare.com
pdf
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shangeetha Sivasothy (2024). RAGProbe: An Automated Approach for Evaluating RAG Pipelines [Dataset]. http://doi.org/10.6084/m9.figshare.25940956.v11
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25940956.v11
Dataset updated
Nov 14, 2024
Dataset provided by
figshare
Authors
Shangeetha Sivasothy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This contains the dataset of the manual execution results of evaluation scenarios, the list of RAG repositories, the automatic generation of question answer pairs and execution of evaluation scenarios across 5 open-source RAG pipelines using our approach and RAGAS approach, and automation scripts for generating question-answer pairs and execution of generated questions across the selected RAG pipelines.
RAG-Evaluation-Dataset-KO
huggingface.co
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
allganize (2024). RAG-Evaluation-Dataset-KO [Dataset]. https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-KO
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2024
Dataset provided by
Allganize, Inc.
Authors
allganize
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Allganize RAG Leaderboard

Allganize RAG 리더보드는 5개 도메인(금융, 공공, 의료, 법률, 커머스)에 대해서 한국어 RAG의 성능을 평가합니다.일반적인 RAG는 간단한 질문에 대해서는 답변을 잘 하지만, 문서의 테이블과 이미지에 대한 질문은 답변을 잘 못합니다.
RAG 도입을 원하는 수많은 기업들은 자사에 맞는 도메인, 문서 타입, 질문 형태를 반영한 한국어 RAG 성능표를 원하고 있습니다.평가를 위해서는 공개된 문서와 질문, 답변 같은 데이터 셋이 필요하지만, 자체 구축은 시간과 비용이 많이 드는 일입니다.이제 올거나이즈는 RAG 평가 데이터를 모두 공개합니다. RAG는 Parser, Retrieval, Generation 크게 3가지 파트로 구성되어 있습니다.현재, 공개되어 있는 RAG 리더보드 중, 3가지 파트를 전체적으로 평가하는 한국어로 구성된 리더보드는 없습니다. Allganize RAG 리더보드에서는 문서를… See the full description on the dataset page: https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-KO.
o
Acquired Podcast RAG Evaluation Dataset
opendatabay.com
.undefined
Updated Jul 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Acquired Podcast RAG Evaluation Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/711f5e5e-b873-46d9-a9ec-70f78ed57d50
Explore at:
.undefinedAvailable download formats
Dataset updated
Jul 5, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Entertainment & Media Consumption
Description
This dataset provides a collection of Acquired Podcast Transcripts, specifically curated for evaluating Retrieval-Augmented Generation (RAG) systems. It includes human-verified answers and AI model responses both with and without access to the transcripts, along with correctness ratings and quality assessments. The dataset's core purpose is to facilitate the development and testing of AI models, particularly in the domain of natural language processing and question-answering.

Columns

The dataset contains several key columns designed for RAG evaluation: * question: The query posed for evaluation. * human_answer: The reference answer provided by a human. * ai_answer_without_the_transcript: The answer generated by an AI model when it does not have access to the transcript. * ai_answer_without_the_transcript_correctness: A human-verified assessment of the factual accuracy of the AI answer without the transcript (e.g., CORRECT, INCORRECT, Other). * ai_answer_with_the_transcript: The answer generated by an AI model when it does have access to the transcript. * ai_answer_with_the_transcript_correctness: A human-verified assessment of the factual accuracy of the AI answer with the transcript (e.g., CORRECT, INCORRECT, Other). * quality_rating_for_answer_with_transcript: A human rating of the quality of the AI answer when the model had access to the transcript. * post_url: The URL of the specific Acquired Podcast episode related to the question. * file_name: The name of the transcript file corresponding to the episode.

Distribution

The dataset comprises 200 Acquired Podcast Transcripts, totalling approximately 3.5 million words. This is roughly equivalent to 5,500 pages when formatted into a Word document. It also includes a dedicated QA dataset for RAG evaluation, structured as a CSV file.

Usage

This dataset is ideal for: * Evaluating the factual accuracy and quality of AI models, particularly those employing RAG techniques. * Developing and refining natural language processing (NLP) models. * Training and testing question-answering systems. * Benchmarking the performance of different AI models in information retrieval tasks. * Conducting research in artificial intelligence and machine learning, focusing on generative AI.

Coverage

The dataset's content is derived from 200 episodes of the Acquired Podcast, collected from its official website. It covers a range of topics typically discussed on the podcast, including business, technology, and finance. The data collection focused on transcripts available at the time of sourcing.

License

CC0

Who Can Use It

AI/ML Researchers: For developing and testing new RAG models and NLP techniques.

Data Scientists: For analysing and extracting insights from large text datasets and evaluating model performance.

NLP Developers: For building and improving question-answering systems and conversational AI.

Students and Academics: For educational projects and academic research in generative AI and data analytics.

Data Providers: To understand best practices for creating and structuring evaluation datasets.

Dataset Name Suggestions

Acquired Podcast RAG Evaluation Dataset

Podcast Transcripts for AI QA

Acquired QA Dataset for Generative AI

Podcast RAG Performance Benchmark

Acquired Transcripts & QA Evaluation

Attributes

Original Data Source: Acquired Podcast Transcripts and RAG Evaluation
h
RAG-EVALUATION-QA
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mohamed Emir Bouhamar, RAG-EVALUATION-QA [Dataset]. https://huggingface.co/datasets/emirMb/RAG-EVALUATION-QA
Explore at:
Authors
Mohamed Emir Bouhamar
Description
emirMb/RAG-EVALUATION-QA dataset hosted on Hugging Face and contributed by the HF Datasets community
Ger-RAG-eval
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deutsche Telekom AG, Ger-RAG-eval [Dataset]. https://huggingface.co/datasets/deutsche-telekom/Ger-RAG-eval
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Deutsche Telekomhttp://www.telekom.de/
Authors
Deutsche Telekom AG
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
German RAG LLM Evaluation Dataset

This dataset is intended for the evaluation of German RAG (retrieval augmented generation) capabilities of LLM models. It is based on the test set of the deutsche-telekom/wikipedia-22-12-de-dpr data set (also see wikipedia-22-12-de-dpr on GitHub) and consists of 4 subsets or tasks.

Task Description

The dataset consists of 4 subsets for the following 4 tasks (each task with 1000 prompts):

choose_context_by_question (subset… See the full description on the dataset page: https://huggingface.co/datasets/deutsche-telekom/Ger-RAG-eval.
f
Data from: RAGProbe: Breaking RAG Pipelines with Evaluation Scenarios
figshare.com
pdf
Updated Nov 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shangeetha Sivasothy (2024). RAGProbe: Breaking RAG Pipelines with Evaluation Scenarios [Dataset]. http://doi.org/10.6084/m9.figshare.27740646.v3
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27740646.v3
Dataset updated
Nov 17, 2024
Dataset provided by
figshare
Authors
Shangeetha Sivasothy
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This contains the dataset of the manual execution results of evaluation scenarios, the list of RAG repositories, the automatic generation of question answer pairs and execution of evaluation scenarios across 5 open-source RAG pipelines using our approach and RAGAS approach, and automation scripts for generating question-answer pairs and execution of generated questions across the selected RAG pipelines.
f
This file provides the evaluation metrics used to assess the performance of...
figshare.com
xlsx
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel (2025). This file provides the evaluation metrics used to assess the performance of RAG pipelines in the various papers. [Dataset]. http://doi.org/10.1371/journal.pdig.0000877.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000877.s003
Dataset updated
Jun 11, 2025
Dataset provided by
PLOS Digital Health
Authors
Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file provides the evaluation metrics used to assess the performance of RAG pipelines in the various papers.
h
leann-rag-evaluation-data
huggingface.co
Updated Jul 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LEANN-RAG (2025). leann-rag-evaluation-data [Dataset]. https://huggingface.co/datasets/LEANN-RAG/leann-rag-evaluation-data
Explore at:
Dataset updated
Jul 13, 2025
Dataset authored and provided by
LEANN-RAG
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
LEANN-RAG Evaluation Data

This repository contains the necessary data to run the recall evaluation scripts for the LEANN-RAG project.

Dataset Components

This dataset is structured into three main parts:

Pre-built LEANN Indices:

dpr/: A pre-built index for the DPR dataset. rpj_wiki/: A pre-built index for the RPJ-Wiki dataset. These indices were created using the leann-core library and are required by the LeannSearcher.

Ground Truth Data:

ground_truth/: Contains the… See the full description on the dataset page: https://huggingface.co/datasets/LEANN-RAG/leann-rag-evaluation-data.
GVHD RAG Data with Production and Interaction Type
figshare.com
csv
Updated May 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepchecks Data (2025). GVHD RAG Data with Production and Interaction Type [Dataset]. http://doi.org/10.6084/m9.figshare.28045040.v4
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28045040.v4
Dataset updated
May 28, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Deepchecks Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A classic retrieval augmented generation (RAG) Q&A bot that answers questions about the GVHD medical condition.
u
Data from: Summary of an Evaluation Dataset: RAG System with LLM for Migrant...
research-data.ull.es
portalciencia.ull.es
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis Garcia-Forte (2024). Summary of an Evaluation Dataset: RAG System with LLM for Migrant Integration in an HCAI [Dataset]. http://doi.org/10.17632/rjdt5nmm88.1
Explore at:
Unique identifier
https://doi.org/10.17632/rjdt5nmm88.1
Dataset updated
Oct 18, 2024
Authors
Luis Garcia-Forte
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset provides a summary of the experimental results obtained from an HCAI system implemented using a RAG framework and the Llama 3.0 model. A total of 125 hyperparameter configurations were defined by aggregating metrics based on the median of the results from 91 questions and their corresponding answers. These configurations represent the alternatives evaluated through Multi-Criteria Decision-Making (MCDM) methods.
f
This file contains the raw data of all papers collected
plos.figshare.com
figshare.com
xlsx
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel (2025). This file contains the raw data of all papers collected [Dataset]. http://doi.org/10.1371/journal.pdig.0000877.s001
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000877.s001
Dataset updated
Jun 11, 2025
Dataset provided by
PLOS Digital Health
Authors
Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This file contains the raw data of all papers collected
D
Replication Data for: Advanced System Integration: Analyzing OpenAPI...
darus.uni-stuttgart.de
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin D. Pesl; Jerin George Mathew; Massimo Mecella; Marco Aiello (2024). Replication Data for: Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation [Dataset]. http://doi.org/10.18419/DARUS-4605
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-4605
Dataset updated
Dec 9, 2024
Dataset provided by
DaRUS
Authors
Robin D. Pesl; Jerin George Mathew; Massimo Mecella; Marco Aiello
License
https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4605https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4605
Dataset funded by
BMWK
MWK
Description
Integrating multiple (sub-)systems is essential to create advanced Information Systems (ISs). Difficulties mainly arise when integrating dynamic environments across the IS lifecycle, e.g., services not yet existent at design time. A traditional approach is a registry that provides the API documentation of the systems’ endpoints. Large Language Models (LLMs) have shown to be capable of automatically creating system integrations (e.g., as service composition) based on this documentation but require concise input due to input token limitations, especially regarding comprehensive API descriptions. Currently, it is unknown how best to preprocess these API descriptions. Within this work, we (i) analyze the usage of Retrieval Augmented Generation (RAG) for endpoint discovery and the chunking, i.e., preprocessing, of state-of-practice OpenAPIs to reduce the input token length while preserving the most relevant information. To further reduce the input token length for the composition prompt and improve endpoint retrieval, we propose (ii) a Discovery Agent that only receives a summary of the most relevant endpoints and retrieves specification details on demand. We evaluate RAG for endpoint discovery using the RestBench benchmark, first, for the different chunking possibilities and parameters measuring the endpoint retrieval recall, precision, and F1 score. Then, we assess the Discovery Agent using the same test set. With our prototype, we demonstrate how to successfully employ RAG for endpoint discovery to reduce token count. While revealing high values for recall, precision, and F1, further research is necessary to retrieve all requisite endpoints. Our experiments show that for preprocessing, LLM-based and format-specific approaches outperform naïve chunking methods. Relying on an agent further enhances these results as the agent splits the tasks into multiple fine granular subtasks, improving the overall RAG performance in the token count, precision, and F1 score. Content: code.zip:Python source code to perform the experiments. evaluate.py: Script to execute the experiments (Uncomment lines to select the embedding model). socrag/*: Source code for the RAG. benchmark/*: RestBench specification. results.zip:Results of the RAG experiments (in the folder /results/data/ inside the zip file). Experiment results for the RAG: results_{embedding_model}_{top-k}.json. Experiment results for the Discovery Agent: results_{embedding_model}_{agent}_{refinement}_{llm}.json. FAISS store (intermediate data required for exact reproduction of results; one folder for each embedding model): bge_small, nvidia and oai. Intermediate data of the LLM-based refinement methods required for the exact reproduction of results: *_parser.json.
h
open_ragbench
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vectara, open_ragbench [Dataset]. https://huggingface.co/datasets/vectara/open_ragbench
Explore at:
Dataset authored and provided by
Vectara
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Open RAG Benchmark

The Open RAG Benchmark is a unique, high-quality Retrieval-Augmented Generation (RAG) dataset constructed directly from arXiv PDF documents, specifically designed for evaluating RAG systems with a focus on multimodal PDF understanding. Unlike other datasets, Open RAG Benchmark emphasizes pure PDF content, meticulously extracting and generating queries on diverse modalities including text, tables, and images, even when they are intricately interwoven within a… See the full description on the dataset page: https://huggingface.co/datasets/vectara/open_ragbench.
f
A detailed list of different RAG methods used in the surveyed studies.
plos.figshare.com
xls
Updated Jun 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel (2025). A detailed list of different RAG methods used in the surveyed studies. [Dataset]. http://doi.org/10.1371/journal.pdig.0000877.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pdig.0000877.t003
Dataset updated
Jun 11, 2025
Dataset provided by
PLOS Digital Health
Authors
Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A detailed list of different RAG methods used in the surveyed studies.
h
rag-bench
huggingface.co
Updated May 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ICT-Golaxy (2024). rag-bench [Dataset]. https://huggingface.co/datasets/golaxy/rag-bench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2024
Dataset authored and provided by
ICT-Golaxy
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset card for RAG-BENCH

Data Summary

RAG-bench aims to provide results of many commonly used RAG datasets. All the results in this dataset are evaluated by the RAG evaluation tool Rageval, which could be easily reproduced with the tool. Currently, we have provided the results of ASQA dataset,ELI5 dataset and HotPotQA dataset.

Data Instance ASQA

{ "ambiguous_question":"Who is the original artist of sound of silence?", "qa_pairs":[{… See the full description on the dataset page: https://huggingface.co/datasets/golaxy/rag-bench.
RAG-Evaluation-Dataset-JA
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
allganize, RAG-Evaluation-Dataset-JA [Dataset]. https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-JA
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Allganize, Inc.
Authors
allganize
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Allganize RAG Leaderboard とは

Allganize RAG Leaderboard は、5つの業種ドメイン（金融、情報通信、製造、公共、流通・小売）において、日本語のRAGの性能評価を実施したものです。一般的なRAGは簡単な質問に対する回答は可能ですが、図表の中に記載されている情報などに対して回答できないケースが多く存在します。RAGの導入を希望する多くの企業は、自社と同じ業種ドメイン、文書タイプ、質問形態を反映した日本語のRAGの性能評価を求めています。RAGの性能評価には、検証ドキュメントや質問と回答といったデータセット、検証環境の構築が必要となりますが、AllganizeではRAGの導入検討の参考にしていただきたく、日本語のRAG性能評価に必要なデータを公開いたしました。RAGソリューションは、Parser、Retrieval、Generation の3つのパートで構成されています。現在、この3つのパートを総合的に評価した日本語のRAG Leaderboardは存在していません。（公開時点）Allganize RAG… See the full description on the dataset page: https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-JA.
h
OmniEval-Human-Questions
huggingface.co
Updated Jan 2, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NLPIR Lab @ RUC (2025). OmniEval-Human-Questions [Dataset]. https://huggingface.co/datasets/RUC-NLPIR/OmniEval-Human-Questions
Explore at:
Dataset updated
Jan 2, 2025
Dataset authored and provided by
NLPIR Lab @ RUC
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Information

We introduce an omnidirectional and automatic RAG benchmark, OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain, in the financial domain. Our benchmark is characterized by its multi-dimensional evaluation framework, including:

a matrix-based RAG scenario evaluation system that categorizes queries into five task classes and 16 financial topics, leading to a structured assessment of diverse query scenarios; a… See the full description on the dataset page: https://huggingface.co/datasets/RUC-NLPIR/OmniEval-Human-Questions.
Z
Replication Package for the paper "Conversing with business process-aware...
data.niaid.nih.gov
zenodo.org
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Casciani, Angelo (2024). Replication Package for the paper "Conversing with business process-aware Large Language Models: the BPLLM framework" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13342039
Explore at:
Dataset updated
Aug 19, 2024
Dataset authored and provided by
Casciani, Angelo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Replication Package for the research paper "Conversing with business process-aware Large Language Models: the BPLLM framework".

The package includes the process models, the questions (and expected answers), the results of the qualitative evaluation, and the Hugging Face links to the fine-tuned versions of Llama 3.1 8B employed in the quantitative evaluation of the framework.

In particular, the process models are:

The natural language Directly-follows graph (DFG) of the Food Delivery process: food_delivery_activities.txt for the definition of the activities and food_delivery_flow.txt for the sequence flow.

The BPMN model of the Food Delivery, E-commerce, and Reimbursement processes: ecommerce.bpmn, food_delivery.bpmn, and reimbursement.bpmn.

The datasets with the questions and the expected answers are:

1_questions_answers_not_refined_for_DFG.csv ;

1.1_questions_answers_refined_for_DFG.csv ;

2_questions_answers_not_refined.csv ;

3_questions_answers_refined.csv ;

4_questions_answers_different_processes.csv ;

5_questions_answers_similar_processes.csv ;

6_questions_answers_refined_ft.csv .

The complete results of the qualitative evaluation are contained in the file qualitative_experiments_results.pdf.

The Hugging Face links to the fine-tuned versions of Llama 3.1 8B are reported in hf_links_finetuned_models.pdf.
u
Data from: Evaluation Dataset: RAG System with LLM for Migrant Integration...
research-data.ull.es
Updated Oct 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dagoberto Castellanos Nieves (2024). Evaluation Dataset: RAG System with LLM for Migrant Integration in an HCAI [Dataset]. http://doi.org/10.17632/x4x86r6tzd.1
Explore at:
Unique identifier
https://doi.org/10.17632/x4x86r6tzd.1
Dataset updated
Oct 14, 2024
Authors
Dagoberto Castellanos Nieves
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data reflect the results of the experimentation with an HCAI system implemented using a RAG framework and the Llama 3.0 model. During the experimentation, 91 questions were utilized in the domain of legal advice and migrant rights. Metrics assessed included contextual enrichment, textual quality, discourse analysis, and sentiment evaluation. This allows for the analysis of sentiments and emotions, bias detection, content and toxicity classification, as well as an analysis of inclusion and diversity.
h
German-RAG-LLM-EASY-BENCHMARK
huggingface.co
Updated Feb 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Avemio AG (2025). German-RAG-LLM-EASY-BENCHMARK [Dataset]. https://huggingface.co/datasets/avemio/German-RAG-LLM-EASY-BENCHMARK
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 6, 2025
Dataset authored and provided by
Avemio AG
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
German-RAG-LLM-EASY-BENCHMARK

German-RAG - German Retrieval Augmented Generation Dataset Summary

This German-RAG-LLM-BENCHMARK represents a specialized collection for evaluating language models with a focus on source citation, time difference stating in RAG-specific tasks. To evaluate models compatible with OpenAI-Endpoints you can refer to our Github Repo: https://github.com/avemio-digital/German-RAG-LLM-EASY-BENCHMARK/ Most of the Subsets are synthetically… See the full description on the dataset page: https://huggingface.co/datasets/avemio/German-RAG-LLM-EASY-BENCHMARK.

Facebook

Twitter

Click to copy link

Link copied

Cite

Shangeetha Sivasothy (2024). RAGProbe: An Automated Approach for Evaluating RAG Pipelines [Dataset]. http://doi.org/10.6084/m9.figshare.25940956.v11

RAGProbe: An Automated Approach for Evaluating RAG Pipelines

Explore at:

pdfAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.25940956.v11

Dataset updated

Nov 14, 2024

Dataset provided by

figshare

Authors

Shangeetha Sivasothy

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This contains the dataset of the manual execution results of evaluation scenarios, the list of RAG repositories, the automatic generation of question answer pairs and execution of evaluation scenarios across 5 open-source RAG pipelines using our approach and RAGAS approach, and automation scripts for generating question-answer pairs and execution of generated questions across the selected RAG pipelines.

Clear search

Close search

Google apps

Main menu

RAGProbe: An Automated Approach for Evaluating RAG Pipelines

RAG-Evaluation-Dataset-KO

Acquired Podcast RAG Evaluation Dataset

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

RAG-EVALUATION-QA

Ger-RAG-eval

Data from: RAGProbe: Breaking RAG Pipelines with Evaluation Scenarios

This file provides the evaluation metrics used to assess the performance of...

leann-rag-evaluation-data

GVHD RAG Data with Production and Interaction Type

Data from: Summary of an Evaluation Dataset: RAG System with LLM for Migrant...

This file contains the raw data of all papers collected

Replication Data for: Advanced System Integration: Analyzing OpenAPI...

open_ragbench

A detailed list of different RAG methods used in the surveyed studies.

rag-bench

RAG-Evaluation-Dataset-JA

OmniEval-Human-Questions

Replication Package for the paper "Conversing with business process-aware...

Data from: Evaluation Dataset: RAG System with LLM for Migrant Integration...

German-RAG-LLM-EASY-BENCHMARK

RAGProbe: An Automated Approach for Evaluating RAG Pipelines