7 datasets found

h
multihopqa
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoRAG, multihopqa [Dataset]. https://huggingface.co/datasets/corag/multihopqa
Explore at:
Dataset authored and provided by
CoRAG
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
MultiHopQA

This dataset contains the MultiHopQA data along with intermediate retrieval and generation steps, as well as final predictions generated in the paper Chain-of-Retrieval Augmented Generation.

Fields

The dataset includes the following fields for each data point:

query: The multi-hop question. query_id: A unique identifier for the query. answers: A list of correct answer(s) to the multi-hop question. context_doc_ids: A list of document IDs retrieved by the… See the full description on the dataset page: https://huggingface.co/datasets/corag/multihopqa.
h
kilt-corpus
huggingface.co
Updated Jan 1, 1977
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CoRAG (1977). kilt-corpus [Dataset]. https://huggingface.co/datasets/corag/kilt-corpus
Explore at:
Dataset updated
Jan 1, 1977
Dataset authored and provided by
CoRAG
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
KILT Corpus

This dataset contains approximately 36 million Wikipedia passages from the "Multi-task retrieval for knowledge-intensive tasks" paper. It is also the retrieval corpus used in the paper Chain-of-Retrieval Augmented Generation.

Fields

id: A unique identifier for each passage. title: The title of the Wikipedia page from which the passage originates. contents: The textual content of the passage. wikipedia_id: The unique identifier for the Wikipedia page, used for… See the full description on the dataset page: https://huggingface.co/datasets/corag/kilt-corpus.
AeroEngQA
zenodo.org
bin, json, txt
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stuart E. Middleton; Stuart E. Middleton (2025). AeroEngQA [Dataset]. http://doi.org/10.5281/zenodo.14215677
Explore at:
json, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14215677
Dataset updated
Jun 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Stuart E. Middleton; Stuart E. Middleton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset name:

AeroEngQA

Description:

AeroEngQA is a low volume, high quality benchmark aircraft design Question Answer (QA) dataset to support qualitative evaluatation of Large Language Models (LLMs).

Dataset DOI:

10.5281/zenodo.14215677

Paper citation:

Silva, E.A. Marsh, R. Yong, H.K. Middleton, S.E. Sóbester, A. Retrieval-Augmented Generation and In-Context Prompted Large Language Models in Aircraft Engineering, AIAA-2025, AIAA, doi:10.2514/6.2025-0700

Abstract:

With the aerospace industry taking its first steps towards exploiting the rapidly evolving technology of Large Language Models (LLMs), this study explores the potential of the latest generation of LLMs to become an effective link in the aircraft design tool chain of the future. Our focus is on the task of Question Answering (QA) in engineering, which has the potential to augment future aircraft design team meetings with an intelligent LLM-based agent able to engage with the team via a chatbot interface. We compare three of the most effective and popular classes of LLM QA prompting today – LLM zero-shot prompting, LLM in-context prompting and LLM-based Retrieval-Augmented Generation (RAG). We describe a new, low volume, high quality benchmark aircraft design QA dataset (AeroEngQA) and use it to qualitatively evaluate each class of LLM and exploring properties including answer accuracy and answer simplicity of the answer. We provide domain-specific insights into the usefulness of today’s LLMs for engineering design tasks such as aircraft design, and a view on how this might evolve in the future as the next generation of LLMs emerges.

Acknowledgements:

The DAWS 2 (Development of Advanced Wing Solutions 2) project is supported by the ATI Programme, a joint Government and industry investment to maintain and grow the UK’s competitive position in civil aerospace design and manufacture. The programme, delivered through a partnership between the Aerospace Technology Institute (ATI), Department for Business, Energy & Industrial Strategy (BEIS) and Innovate UK, addresses technology, capability and supply chain challenges.
f
Data analysis V5 for python.xlsx
figshare.com
xlsx
Updated May 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pingfei Jiang (2025). Data analysis V5 for python.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.28956233.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28956233.v1
Dataset updated
May 8, 2025
Dataset provided by
figshare
Authors
Pingfei Jiang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is the original data for processing for manuscript "A Comparative Study on Retrieval-Augmented Generation and Chain-of-Thought Applications for LLM-Assisted Engineering Design Ideation"
h
rag-dataset-12000
huggingface.co
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Q (2025). rag-dataset-12000 [Dataset]. https://huggingface.co/datasets/chloedh0228/rag-dataset-12000
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 19, 2025
Authors
Q
Description
Retrieval-Augmented Generation (RAG) Dataset 12000

Retrieval-Augmented Generation (RAG) Dataset 12000 is an English dataset designed for RAG-optimized models, built by Neural Bridge AI, and released under Apache license 2.0.

Dataset Description** Dataset Summary

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by allowing them to consult an external authoritative knowledge base before generating responses. This approach significantly… See the full description on the dataset page: https://huggingface.co/datasets/chloedh0228/rag-dataset-12000.
h
GraphKV
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GraphKV [Dataset]. https://huggingface.co/datasets/Graph-COM/GraphKV
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
Graph Computation and Machine Learning (GCOM) Group
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Datahub for Graph-KV

This directory contains processed datasets for retrieval-augmented generation (RAG) and Arxiv-QA tasks, used in the paper Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models. It is organized into two main folders: rag and arxiv, and results.

📁 rag/

This folder includes preprocessed data for several commonly used RAG datasets. Each subdirectory corresponds to a different dataset split or benchmark:

2wiki_dev:… See the full description on the dataset page: https://huggingface.co/datasets/Graph-COM/GraphKV.
h
LongInterSample
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LongInterDataset, LongInterSample [Dataset]. https://huggingface.co/datasets/LongInterDataset/LongInterSample
Explore at:
Authors
LongInterDataset
Description
LongInter Dataset

Introduction

LongInter: the first large-scale dataset focused on long-term human-human interactions. We collect high-quality 3D motion sequences by retrieving and transitioning existing short motions using retrieval-augmented generation and transition inference strategies. We apply rigorous filtering criteria to ensure motion realism and consistency. Additionally, we provide rich, extended textual annotations by summarizing short-sequence captions using… See the full description on the dataset page: https://huggingface.co/datasets/LongInterDataset/LongInterSample.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

CoRAG, multihopqa [Dataset]. https://huggingface.co/datasets/corag/multihopqa

multihopqa

corag/multihopqa

Explore at:

388 scholarly articles cite this dataset (View in Google Scholar)

Dataset authored and provided by

CoRAG

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

MultiHopQA

This dataset contains the MultiHopQA data along with intermediate retrieval and generation steps, as well as final predictions generated in the paper Chain-of-Retrieval Augmented Generation.

  Fields

The dataset includes the following fields for each data point:

query: The multi-hop question. query_id: A unique identifier for the query. answers: A list of correct answer(s) to the multi-hop question. context_doc_ids: A list of document IDs retrieved by the… See the full description on the dataset page: https://huggingface.co/datasets/corag/multihopqa.

Clear search

Close search

Google apps

Main menu

multihopqa

kilt-corpus

AeroEngQA

Data analysis V5 for python.xlsx

rag-dataset-12000

GraphKV

LongInterSample

multihopqa

corag/multihopqa