3 datasets found

h
Data from: GAIA-modified
huggingface.co
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Tan (2025). GAIA-modified [Dataset]. https://huggingface.co/datasets/evatan/GAIA-modified
Explore at:
Dataset updated
Jun 25, 2025
Authors
Yue Tan
Description
GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/evatan/GAIA-modified.
h
GAIA
huggingface.co
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yue Tan (2025). GAIA [Dataset]. https://huggingface.co/datasets/evatan/GAIA
Explore at:
Dataset updated
Jun 25, 2025
Authors
Yue Tan
Description
GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/evatan/GAIA.
h
GAIA
huggingface.co
Updated Nov 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GAIA (2023). GAIA [Dataset]. https://huggingface.co/datasets/gaia-benchmark/GAIA
Explore at:
Dataset updated
Nov 23, 2023
Dataset authored and provided by
GAIA
Description
GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Yue Tan (2025). GAIA-modified [Dataset]. https://huggingface.co/datasets/evatan/GAIA-modified

Data from: GAIA-modified

evatan/GAIA-modified

General AI Assistants Benchmark

Explore at:

Dataset updated

Jun 25, 2025

Authors

Yue Tan

Description

GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

  Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/evatan/GAIA-modified.

Clear search

Close search

Google apps

Main menu