98 datasets found
  1. f

    RAGProbe: An Automated Approach for Evaluating RAG Pipelines

    • figshare.com
    pdf
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shangeetha Sivasothy (2024). RAGProbe: An Automated Approach for Evaluating RAG Pipelines [Dataset]. http://doi.org/10.6084/m9.figshare.25940956.v11
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 14, 2024
    Dataset provided by
    figshare
    Authors
    Shangeetha Sivasothy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This contains the dataset of the manual execution results of evaluation scenarios, the list of RAG repositories, the automatic generation of question answer pairs and execution of evaluation scenarios across 5 open-source RAG pipelines using our approach and RAGAS approach, and automation scripts for generating question-answer pairs and execution of generated questions across the selected RAG pipelines.

  2. RAG-Evaluation-Dataset-KO

    • huggingface.co
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    allganize (2024). RAG-Evaluation-Dataset-KO [Dataset]. https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-KO
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 9, 2024
    Dataset provided by
    Allganize, Inc.
    Authors
    allganize
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Allganize RAG Leaderboard

    Allganize RAG 리더보드는 5개 도메인(금융, 공공, 의료, 법률, 커머스)에 대해서 한국어 RAG의 성능을 평가합니다.일반적인 RAG는 간단한 질문에 대해서는 답변을 잘 하지만, 문서의 테이블과 이미지에 대한 질문은 답변을 잘 못합니다.
    RAG 도입을 원하는 수많은 기업들은 자사에 맞는 도메인, 문서 타입, 질문 형태를 반영한 한국어 RAG 성능표를 원하고 있습니다.평가를 위해서는 공개된 문서와 질문, 답변 같은 데이터 셋이 필요하지만, 자체 구축은 시간과 비용이 많이 드는 일입니다.이제 올거나이즈는 RAG 평가 데이터를 모두 공개합니다. RAG는 Parser, Retrieval, Generation 크게 3가지 파트로 구성되어 있습니다.현재, 공개되어 있는 RAG 리더보드 중, 3가지 파트를 전체적으로 평가하는 한국어로 구성된 리더보드는 없습니다. Allganize RAG 리더보드에서는 문서를… See the full description on the dataset page: https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-KO.

  3. o

    Acquired Podcast RAG Evaluation Dataset

    • opendatabay.com
    .undefined
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Acquired Podcast RAG Evaluation Dataset [Dataset]. https://www.opendatabay.com/data/ai-ml/711f5e5e-b873-46d9-a9ec-70f78ed57d50
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jul 5, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Entertainment & Media Consumption
    Description

    This dataset provides a collection of Acquired Podcast Transcripts, specifically curated for evaluating Retrieval-Augmented Generation (RAG) systems. It includes human-verified answers and AI model responses both with and without access to the transcripts, along with correctness ratings and quality assessments. The dataset's core purpose is to facilitate the development and testing of AI models, particularly in the domain of natural language processing and question-answering.

    Columns

    The dataset contains several key columns designed for RAG evaluation: * question: The query posed for evaluation. * human_answer: The reference answer provided by a human. * ai_answer_without_the_transcript: The answer generated by an AI model when it does not have access to the transcript. * ai_answer_without_the_transcript_correctness: A human-verified assessment of the factual accuracy of the AI answer without the transcript (e.g., CORRECT, INCORRECT, Other). * ai_answer_with_the_transcript: The answer generated by an AI model when it does have access to the transcript. * ai_answer_with_the_transcript_correctness: A human-verified assessment of the factual accuracy of the AI answer with the transcript (e.g., CORRECT, INCORRECT, Other). * quality_rating_for_answer_with_transcript: A human rating of the quality of the AI answer when the model had access to the transcript. * post_url: The URL of the specific Acquired Podcast episode related to the question. * file_name: The name of the transcript file corresponding to the episode.

    Distribution

    The dataset comprises 200 Acquired Podcast Transcripts, totalling approximately 3.5 million words. This is roughly equivalent to 5,500 pages when formatted into a Word document. It also includes a dedicated QA dataset for RAG evaluation, structured as a CSV file.

    Usage

    This dataset is ideal for: * Evaluating the factual accuracy and quality of AI models, particularly those employing RAG techniques. * Developing and refining natural language processing (NLP) models. * Training and testing question-answering systems. * Benchmarking the performance of different AI models in information retrieval tasks. * Conducting research in artificial intelligence and machine learning, focusing on generative AI.

    Coverage

    The dataset's content is derived from 200 episodes of the Acquired Podcast, collected from its official website. It covers a range of topics typically discussed on the podcast, including business, technology, and finance. The data collection focused on transcripts available at the time of sourcing.

    License

    CC0

    Who Can Use It

    • AI/ML Researchers: For developing and testing new RAG models and NLP techniques.
    • Data Scientists: For analysing and extracting insights from large text datasets and evaluating model performance.
    • NLP Developers: For building and improving question-answering systems and conversational AI.
    • Students and Academics: For educational projects and academic research in generative AI and data analytics.
    • Data Providers: To understand best practices for creating and structuring evaluation datasets.

    Dataset Name Suggestions

    • Acquired Podcast RAG Evaluation Dataset
    • Podcast Transcripts for AI QA
    • Acquired QA Dataset for Generative AI
    • Podcast RAG Performance Benchmark
    • Acquired Transcripts & QA Evaluation

    Attributes

    Original Data Source: Acquired Podcast Transcripts and RAG Evaluation

  4. h

    RAG-EVALUATION-QA

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mohamed Emir Bouhamar, RAG-EVALUATION-QA [Dataset]. https://huggingface.co/datasets/emirMb/RAG-EVALUATION-QA
    Explore at:
    Authors
    Mohamed Emir Bouhamar
    Description

    emirMb/RAG-EVALUATION-QA dataset hosted on Hugging Face and contributed by the HF Datasets community

  5. Ger-RAG-eval

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deutsche Telekom AG, Ger-RAG-eval [Dataset]. https://huggingface.co/datasets/deutsche-telekom/Ger-RAG-eval
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Deutsche Telekomhttp://www.telekom.de/
    Authors
    Deutsche Telekom AG
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    German RAG LLM Evaluation Dataset

    This dataset is intended for the evaluation of German RAG (retrieval augmented generation) capabilities of LLM models. It is based on the test set of the deutsche-telekom/wikipedia-22-12-de-dpr data set (also see wikipedia-22-12-de-dpr on GitHub) and consists of 4 subsets or tasks.

      Task Description
    

    The dataset consists of 4 subsets for the following 4 tasks (each task with 1000 prompts):

      choose_context_by_question (subset… See the full description on the dataset page: https://huggingface.co/datasets/deutsche-telekom/Ger-RAG-eval.
    
  6. f

    Data from: RAGProbe: Breaking RAG Pipelines with Evaluation Scenarios

    • figshare.com
    pdf
    Updated Nov 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shangeetha Sivasothy (2024). RAGProbe: Breaking RAG Pipelines with Evaluation Scenarios [Dataset]. http://doi.org/10.6084/m9.figshare.27740646.v3
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Nov 17, 2024
    Dataset provided by
    figshare
    Authors
    Shangeetha Sivasothy
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This contains the dataset of the manual execution results of evaluation scenarios, the list of RAG repositories, the automatic generation of question answer pairs and execution of evaluation scenarios across 5 open-source RAG pipelines using our approach and RAGAS approach, and automation scripts for generating question-answer pairs and execution of generated questions across the selected RAG pipelines.

  7. f

    This file provides the evaluation metrics used to assess the performance of...

    • figshare.com
    xlsx
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel (2025). This file provides the evaluation metrics used to assess the performance of RAG pipelines in the various papers. [Dataset]. http://doi.org/10.1371/journal.pdig.0000877.s003
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    PLOS Digital Health
    Authors
    Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file provides the evaluation metrics used to assess the performance of RAG pipelines in the various papers.

  8. h

    leann-rag-evaluation-data

    • huggingface.co
    Updated Jul 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LEANN-RAG (2025). leann-rag-evaluation-data [Dataset]. https://huggingface.co/datasets/LEANN-RAG/leann-rag-evaluation-data
    Explore at:
    Dataset updated
    Jul 13, 2025
    Dataset authored and provided by
    LEANN-RAG
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    LEANN-RAG Evaluation Data

    This repository contains the necessary data to run the recall evaluation scripts for the LEANN-RAG project.

      Dataset Components
    

    This dataset is structured into three main parts:

    Pre-built LEANN Indices:

    dpr/: A pre-built index for the DPR dataset. rpj_wiki/: A pre-built index for the RPJ-Wiki dataset. These indices were created using the leann-core library and are required by the LeannSearcher.

    Ground Truth Data:

    ground_truth/: Contains the… See the full description on the dataset page: https://huggingface.co/datasets/LEANN-RAG/leann-rag-evaluation-data.

  9. GVHD RAG Data with Production and Interaction Type

    • figshare.com
    csv
    Updated May 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepchecks Data (2025). GVHD RAG Data with Production and Interaction Type [Dataset]. http://doi.org/10.6084/m9.figshare.28045040.v4
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 28, 2025
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Deepchecks Data
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A classic retrieval augmented generation (RAG) Q&A bot that answers questions about the GVHD medical condition.

  10. u

    Data from: Summary of an Evaluation Dataset: RAG System with LLM for Migrant...

    • research-data.ull.es
    • portalciencia.ull.es
    Updated Oct 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luis Garcia-Forte (2024). Summary of an Evaluation Dataset: RAG System with LLM for Migrant Integration in an HCAI [Dataset]. http://doi.org/10.17632/rjdt5nmm88.1
    Explore at:
    Dataset updated
    Oct 18, 2024
    Authors
    Luis Garcia-Forte
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset provides a summary of the experimental results obtained from an HCAI system implemented using a RAG framework and the Llama 3.0 model. A total of 125 hyperparameter configurations were defined by aggregating metrics based on the median of the results from 91 questions and their corresponding answers. These configurations represent the alternatives evaluated through Multi-Criteria Decision-Making (MCDM) methods.

  11. f

    This file contains the raw data of all papers collected

    • plos.figshare.com
    • figshare.com
    xlsx
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel (2025). This file contains the raw data of all papers collected [Dataset]. http://doi.org/10.1371/journal.pdig.0000877.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    PLOS Digital Health
    Authors
    Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the raw data of all papers collected

  12. D

    Replication Data for: Advanced System Integration: Analyzing OpenAPI...

    • darus.uni-stuttgart.de
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robin D. Pesl; Jerin George Mathew; Massimo Mecella; Marco Aiello (2024). Replication Data for: Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation [Dataset]. http://doi.org/10.18419/DARUS-4605
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    DaRUS
    Authors
    Robin D. Pesl; Jerin George Mathew; Massimo Mecella; Marco Aiello
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4605https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4605

    Dataset funded by
    BMWK
    MWK
    Description

    Integrating multiple (sub-)systems is essential to create advanced Information Systems (ISs). Difficulties mainly arise when integrating dynamic environments across the IS lifecycle, e.g., services not yet existent at design time. A traditional approach is a registry that provides the API documentation of the systems’ endpoints. Large Language Models (LLMs) have shown to be capable of automatically creating system integrations (e.g., as service composition) based on this documentation but require concise input due to input token limitations, especially regarding comprehensive API descriptions. Currently, it is unknown how best to preprocess these API descriptions. Within this work, we (i) analyze the usage of Retrieval Augmented Generation (RAG) for endpoint discovery and the chunking, i.e., preprocessing, of state-of-practice OpenAPIs to reduce the input token length while preserving the most relevant information. To further reduce the input token length for the composition prompt and improve endpoint retrieval, we propose (ii) a Discovery Agent that only receives a summary of the most relevant endpoints and retrieves specification details on demand. We evaluate RAG for endpoint discovery using the RestBench benchmark, first, for the different chunking possibilities and parameters measuring the endpoint retrieval recall, precision, and F1 score. Then, we assess the Discovery Agent using the same test set. With our prototype, we demonstrate how to successfully employ RAG for endpoint discovery to reduce token count. While revealing high values for recall, precision, and F1, further research is necessary to retrieve all requisite endpoints. Our experiments show that for preprocessing, LLM-based and format-specific approaches outperform naïve chunking methods. Relying on an agent further enhances these results as the agent splits the tasks into multiple fine granular subtasks, improving the overall RAG performance in the token count, precision, and F1 score. Content: code.zip:Python source code to perform the experiments. evaluate.py: Script to execute the experiments (Uncomment lines to select the embedding model). socrag/*: Source code for the RAG. benchmark/*: RestBench specification. results.zip:Results of the RAG experiments (in the folder /results/data/ inside the zip file). Experiment results for the RAG: results_{embedding_model}_{top-k}.json. Experiment results for the Discovery Agent: results_{embedding_model}_{agent}_{refinement}_{llm}.json. FAISS store (intermediate data required for exact reproduction of results; one folder for each embedding model): bge_small, nvidia and oai. Intermediate data of the LLM-based refinement methods required for the exact reproduction of results: *_parser.json.

  13. h

    open_ragbench

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vectara, open_ragbench [Dataset]. https://huggingface.co/datasets/vectara/open_ragbench
    Explore at:
    Dataset authored and provided by
    Vectara
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Open RAG Benchmark

    The Open RAG Benchmark is a unique, high-quality Retrieval-Augmented Generation (RAG) dataset constructed directly from arXiv PDF documents, specifically designed for evaluating RAG systems with a focus on multimodal PDF understanding. Unlike other datasets, Open RAG Benchmark emphasizes pure PDF content, meticulously extracting and generating queries on diverse modalities including text, tables, and images, even when they are intricately interwoven within a… See the full description on the dataset page: https://huggingface.co/datasets/vectara/open_ragbench.

  14. f

    A detailed list of different RAG methods used in the surveyed studies.

    • plos.figshare.com
    xls
    Updated Jun 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel (2025). A detailed list of different RAG methods used in the surveyed studies. [Dataset]. http://doi.org/10.1371/journal.pdig.0000877.t003
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 11, 2025
    Dataset provided by
    PLOS Digital Health
    Authors
    Lameck Mbangula Amugongo; Pietro Mascheroni; Steven Brooks; Stefan Doering; Jan Seidel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A detailed list of different RAG methods used in the surveyed studies.

  15. h

    rag-bench

    • huggingface.co
    Updated May 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ICT-Golaxy (2024). rag-bench [Dataset]. https://huggingface.co/datasets/golaxy/rag-bench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 6, 2024
    Dataset authored and provided by
    ICT-Golaxy
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset card for RAG-BENCH

      Data Summary
    

    RAG-bench aims to provide results of many commonly used RAG datasets. All the results in this dataset are evaluated by the RAG evaluation tool Rageval, which could be easily reproduced with the tool. Currently, we have provided the results of ASQA dataset,ELI5 dataset and HotPotQA dataset.

      Data Instance
    
    
    
    
    
    
    
      ASQA
    

    { "ambiguous_question":"Who is the original artist of sound of silence?", "qa_pairs":[{… See the full description on the dataset page: https://huggingface.co/datasets/golaxy/rag-bench.

  16. RAG-Evaluation-Dataset-JA

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    allganize, RAG-Evaluation-Dataset-JA [Dataset]. https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-JA
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset provided by
    Allganize, Inc.
    Authors
    allganize
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Allganize RAG Leaderboard とは

    Allganize RAG Leaderboard は、5つの業種ドメイン(金融、情報通信、製造、公共、流通・小売)において、日本語のRAGの性能評価を実施したものです。一般的なRAGは簡単な質問に対する回答は可能ですが、図表の中に記載されている情報などに対して回答できないケースが多く存在します。RAGの導入を希望する多くの企業は、自社と同じ業種ドメイン、文書タイプ、質問形態を反映した日本語のRAGの性能評価を求めています。RAGの性能評価には、検証ドキュメントや質問と回答といったデータセット、検証環境の構築が必要となりますが、AllganizeではRAGの導入検討の参考にしていただきたく、日本語のRAG性能評価に必要なデータを公開いたしました。RAGソリューションは、Parser、Retrieval、Generation の3つのパートで構成されています。現在、この3つのパートを総合的に評価した日本語のRAG Leaderboardは存在していません。(公開時点)Allganize RAG… See the full description on the dataset page: https://huggingface.co/datasets/allganize/RAG-Evaluation-Dataset-JA.

  17. h

    OmniEval-Human-Questions

    • huggingface.co
    Updated Jan 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NLPIR Lab @ RUC (2025). OmniEval-Human-Questions [Dataset]. https://huggingface.co/datasets/RUC-NLPIR/OmniEval-Human-Questions
    Explore at:
    Dataset updated
    Jan 2, 2025
    Dataset authored and provided by
    NLPIR Lab @ RUC
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Information

    We introduce an omnidirectional and automatic RAG benchmark, OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain, in the financial domain. Our benchmark is characterized by its multi-dimensional evaluation framework, including:

    a matrix-based RAG scenario evaluation system that categorizes queries into five task classes and 16 financial topics, leading to a structured assessment of diverse query scenarios; a… See the full description on the dataset page: https://huggingface.co/datasets/RUC-NLPIR/OmniEval-Human-Questions.

  18. Z

    Replication Package for the paper "Conversing with business process-aware...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Casciani, Angelo (2024). Replication Package for the paper "Conversing with business process-aware Large Language Models: the BPLLM framework" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13342039
    Explore at:
    Dataset updated
    Aug 19, 2024
    Dataset authored and provided by
    Casciani, Angelo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Replication Package for the research paper "Conversing with business process-aware Large Language Models: the BPLLM framework".

    The package includes the process models, the questions (and expected answers), the results of the qualitative evaluation, and the Hugging Face links to the fine-tuned versions of Llama 3.1 8B employed in the quantitative evaluation of the framework.

    In particular, the process models are:

    The natural language Directly-follows graph (DFG) of the Food Delivery process: food_delivery_activities.txt for the definition of the activities and food_delivery_flow.txt for the sequence flow.

    The BPMN model of the Food Delivery, E-commerce, and Reimbursement processes: ecommerce.bpmn, food_delivery.bpmn, and reimbursement.bpmn.

    The datasets with the questions and the expected answers are:

    1_questions_answers_not_refined_for_DFG.csv ;

    1.1_questions_answers_refined_for_DFG.csv ;

    2_questions_answers_not_refined.csv ;

    3_questions_answers_refined.csv ;

    4_questions_answers_different_processes.csv ;

    5_questions_answers_similar_processes.csv ;

    6_questions_answers_refined_ft.csv .

    The complete results of the qualitative evaluation are contained in the file qualitative_experiments_results.pdf.

    The Hugging Face links to the fine-tuned versions of Llama 3.1 8B are reported in hf_links_finetuned_models.pdf.

  19. u

    Data from: Evaluation Dataset: RAG System with LLM for Migrant Integration...

    • research-data.ull.es
    Updated Oct 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dagoberto Castellanos Nieves (2024). Evaluation Dataset: RAG System with LLM for Migrant Integration in an HCAI [Dataset]. http://doi.org/10.17632/x4x86r6tzd.1
    Explore at:
    Dataset updated
    Oct 14, 2024
    Authors
    Dagoberto Castellanos Nieves
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The data reflect the results of the experimentation with an HCAI system implemented using a RAG framework and the Llama 3.0 model. During the experimentation, 91 questions were utilized in the domain of legal advice and migrant rights. Metrics assessed included contextual enrichment, textual quality, discourse analysis, and sentiment evaluation. This allows for the analysis of sentiments and emotions, bias detection, content and toxicity classification, as well as an analysis of inclusion and diversity.

  20. h

    German-RAG-LLM-EASY-BENCHMARK

    • huggingface.co
    Updated Feb 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avemio AG (2025). German-RAG-LLM-EASY-BENCHMARK [Dataset]. https://huggingface.co/datasets/avemio/German-RAG-LLM-EASY-BENCHMARK
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 6, 2025
    Dataset authored and provided by
    Avemio AG
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    German-RAG-LLM-EASY-BENCHMARK

      German-RAG - German Retrieval Augmented Generation
    
    
    
    
    
      Dataset Summary
    

    This German-RAG-LLM-BENCHMARK represents a specialized collection for evaluating language models with a focus on source citation, time difference stating in RAG-specific tasks. To evaluate models compatible with OpenAI-Endpoints you can refer to our Github Repo: https://github.com/avemio-digital/German-RAG-LLM-EASY-BENCHMARK/ Most of the Subsets are synthetically… See the full description on the dataset page: https://huggingface.co/datasets/avemio/German-RAG-LLM-EASY-BENCHMARK.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shangeetha Sivasothy (2024). RAGProbe: An Automated Approach for Evaluating RAG Pipelines [Dataset]. http://doi.org/10.6084/m9.figshare.25940956.v11

RAGProbe: An Automated Approach for Evaluating RAG Pipelines

Explore at:
pdfAvailable download formats
Dataset updated
Nov 14, 2024
Dataset provided by
figshare
Authors
Shangeetha Sivasothy
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This contains the dataset of the manual execution results of evaluation scenarios, the list of RAG repositories, the automatic generation of question answer pairs and execution of evaluation scenarios across 5 open-source RAG pipelines using our approach and RAGAS approach, and automation scripts for generating question-answer pairs and execution of generated questions across the selected RAG pipelines.

Search
Clear search
Close search
Google apps
Main menu