7 datasets found
  1. h

    multihopqa

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoRAG, multihopqa [Dataset]. https://huggingface.co/datasets/corag/multihopqa
    Explore at:
    Dataset authored and provided by
    CoRAG
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    MultiHopQA

    This dataset contains the MultiHopQA data along with intermediate retrieval and generation steps, as well as final predictions generated in the paper Chain-of-Retrieval Augmented Generation.

      Fields
    

    The dataset includes the following fields for each data point:

    query: The multi-hop question. query_id: A unique identifier for the query. answers: A list of correct answer(s) to the multi-hop question. context_doc_ids: A list of document IDs retrieved by the… See the full description on the dataset page: https://huggingface.co/datasets/corag/multihopqa.

  2. h

    kilt-corpus

    • huggingface.co
    Updated Jan 1, 1977
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CoRAG (1977). kilt-corpus [Dataset]. https://huggingface.co/datasets/corag/kilt-corpus
    Explore at:
    Dataset updated
    Jan 1, 1977
    Dataset authored and provided by
    CoRAG
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    KILT Corpus

    This dataset contains approximately 36 million Wikipedia passages from the "Multi-task retrieval for knowledge-intensive tasks" paper. It is also the retrieval corpus used in the paper Chain-of-Retrieval Augmented Generation.

      Fields
    

    id: A unique identifier for each passage. title: The title of the Wikipedia page from which the passage originates. contents: The textual content of the passage. wikipedia_id: The unique identifier for the Wikipedia page, used for… See the full description on the dataset page: https://huggingface.co/datasets/corag/kilt-corpus.

  3. AeroEngQA

    • zenodo.org
    bin, json, txt
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stuart E. Middleton; Stuart E. Middleton (2025). AeroEngQA [Dataset]. http://doi.org/10.5281/zenodo.14215677
    Explore at:
    json, txt, binAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Stuart E. Middleton; Stuart E. Middleton
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset name:

    AeroEngQA

    Description:

    AeroEngQA is a low volume, high quality benchmark aircraft design Question Answer (QA) dataset to support qualitative evaluatation of Large Language Models (LLMs).

    Dataset DOI:

    10.5281/zenodo.14215677

    Paper citation:

    Silva, E.A. Marsh, R. Yong, H.K. Middleton, S.E. Sóbester, A. Retrieval-Augmented Generation and In-Context Prompted Large Language Models in Aircraft Engineering, AIAA-2025, AIAA, doi:10.2514/6.2025-0700

    Abstract:

    With the aerospace industry taking its first steps towards exploiting the rapidly evolving technology of Large Language Models (LLMs), this study explores the potential of the latest generation of LLMs to become an effective link in the aircraft design tool chain of the future. Our focus is on the task of Question Answering (QA) in engineering, which has the potential to augment future aircraft design team meetings with an intelligent LLM-based agent able to engage with the team via a chatbot interface. We compare three of the most effective and popular classes of LLM QA prompting today – LLM zero-shot prompting, LLM in-context prompting and LLM-based Retrieval-Augmented Generation (RAG). We describe a new, low volume, high quality benchmark aircraft design QA dataset (AeroEngQA) and use it to qualitatively evaluate each class of LLM and exploring properties including answer accuracy and answer simplicity of the answer. We provide domain-specific insights into the usefulness of today’s LLMs for engineering design tasks such as aircraft design, and a view on how this might evolve in the future as the next generation of LLMs emerges.

    Acknowledgements:

    The DAWS 2 (Development of Advanced Wing Solutions 2) project is supported by the ATI Programme, a joint Government and industry investment to maintain and grow the UK’s competitive position in civil aerospace design and manufacture. The programme, delivered through a partnership between the Aerospace Technology Institute (ATI), Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK, addresses technology, capability and supply chain challenges.

  4. f

    Data analysis V5 for python.xlsx

    • figshare.com
    xlsx
    Updated May 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pingfei Jiang (2025). Data analysis V5 for python.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28956233.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 8, 2025
    Dataset provided by
    figshare
    Authors
    Pingfei Jiang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the original data for processing for manuscript "A Comparative Study on Retrieval-Augmented Generation and Chain-of-Thought Applications for LLM-Assisted Engineering Design Ideation"

  5. h

    rag-dataset-12000

    • huggingface.co
    Updated Feb 19, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Q (2025). rag-dataset-12000 [Dataset]. https://huggingface.co/datasets/chloedh0228/rag-dataset-12000
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 19, 2025
    Authors
    Q
    Description

    Retrieval-Augmented Generation (RAG) Dataset 12000

    Retrieval-Augmented Generation (RAG) Dataset 12000 is an English dataset designed for RAG-optimized models, built by Neural Bridge AI, and released under Apache license 2.0.

      Dataset Description**
    
    
    
    
    
      Dataset Summary
    

    Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by allowing them to consult an external authoritative knowledge base before generating responses. This approach significantly… See the full description on the dataset page: https://huggingface.co/datasets/chloedh0228/rag-dataset-12000.

  6. h

    GraphKV

    • huggingface.co
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GraphKV [Dataset]. https://huggingface.co/datasets/Graph-COM/GraphKV
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Graph Computation and Machine Learning (GCOM) Group
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Datahub for Graph-KV

    This directory contains processed datasets for retrieval-augmented generation (RAG) and Arxiv-QA tasks, used in the paper Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models. It is organized into two main folders: rag and arxiv, and results.

      📁 rag/
    

    This folder includes preprocessed data for several commonly used RAG datasets. Each subdirectory corresponds to a different dataset split or benchmark:

    2wiki_dev:… See the full description on the dataset page: https://huggingface.co/datasets/Graph-COM/GraphKV.

  7. h

    LongInterSample

    • huggingface.co
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LongInterDataset, LongInterSample [Dataset]. https://huggingface.co/datasets/LongInterDataset/LongInterSample
    Explore at:
    Authors
    LongInterDataset
    Description

    LongInter Dataset

      Introduction
    

    LongInter: the first large-scale dataset focused on long-term human-human interactions. We collect high-quality 3D motion sequences by retrieving and transitioning existing short motions using retrieval-augmented generation and transition inference strategies. We apply rigorous filtering criteria to ensure motion realism and consistency. Additionally, we provide rich, extended textual annotations by summarizing short-sequence captions using… See the full description on the dataset page: https://huggingface.co/datasets/LongInterDataset/LongInterSample.

  8. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CoRAG, multihopqa [Dataset]. https://huggingface.co/datasets/corag/multihopqa

multihopqa

corag/multihopqa

Explore at:
388 scholarly articles cite this dataset (View in Google Scholar)
Dataset authored and provided by
CoRAG
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

MultiHopQA

This dataset contains the MultiHopQA data along with intermediate retrieval and generation steps, as well as final predictions generated in the paper Chain-of-Retrieval Augmented Generation.

  Fields

The dataset includes the following fields for each data point:

query: The multi-hop question. query_id: A unique identifier for the query. answers: A list of correct answer(s) to the multi-hop question. context_doc_ids: A list of document IDs retrieved by the… See the full description on the dataset page: https://huggingface.co/datasets/corag/multihopqa.

Search
Clear search
Close search
Google apps
Main menu