9 datasets found

h
GAIA
huggingface.co
Updated Nov 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GAIA (2023). GAIA [Dataset]. https://huggingface.co/datasets/gaia-benchmark/GAIA
Explore at:
Dataset updated
Nov 23, 2023
Dataset authored and provided by
GAIA
Description
GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.
P
GAIA Dataset
paperswithcode.com
Updated Jun 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grégoire Mialon; Clémentine Fourrier; Craig Swift; Thomas Wolf; Yann Lecun; Thomas Scialom (2025). GAIA Dataset [Dataset]. https://paperswithcode.com/dataset/gaia
Explore at:
Dataset updated
Jun 13, 2025
Authors
Grégoire Mialon; Clémentine Fourrier; Craig Swift; Thomas Wolf; Yann Lecun; Thomas Scialom
Description
We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA’s philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system’s capability to exhibit similar robustness as the average human does on such questions. Using GAIA’s methodology, we devise 466 questions and their answer. We release our questions while retaining answers to 300 of them to power a leader-board accessible at https://huggingface.co/gaia-benchmark.
h
results_public
huggingface.co
Updated Nov 23, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GAIA (2023). results_public [Dataset]. https://huggingface.co/datasets/gaia-benchmark/results_public
Explore at:
Dataset updated
Nov 23, 2023
Dataset authored and provided by
GAIA
Description
Dataset Card for "resultspublic"

More Information needed
h
submissions_public
huggingface.co
Updated Feb 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GAIA (2025). submissions_public [Dataset]. https://huggingface.co/datasets/gaia-benchmark/submissions_public
Explore at:
Dataset updated
Feb 13, 2025
Dataset authored and provided by
GAIA
Description
Contains the scored submissions on the GAIA benchmark, validation set only. To avoid contamination of GAIA and leakage of the answers, please do not reshare the current dataset without any gating system or outside of the HF hub.
h
GAIA-Subset-Benchmark
huggingface.co
Updated Mar 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
II (2025). GAIA-Subset-Benchmark [Dataset]. https://huggingface.co/datasets/Intelligent-Internet/GAIA-Subset-Benchmark
Explore at:
Dataset updated
Mar 28, 2025
Dataset authored and provided by
II
Description
GAIA Benchmark Subset Model Card

This dataset is a subset of the GAIA benchmark, containing 44 web-search-based questions from the validation set. It evaluates multiple AI models on their ability to retrieve and process real-time information using web search and browser tools. Performance metrics include success indicators and detailed reports for each model. A comparative chart summarizing the results will be provided separately.

Benchmark Results
u
Gaia RVS benchmark stars. II.
cdsarc.cds.unistra.fr
Updated Mar 8, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDS (2024). Gaia RVS benchmark stars. II. [Dataset]. http://doi.org/10.26093/cds/vizier.36830072
Explore at:
Unique identifier
https://doi.org/10.26093/cds/vizier.36830072
Dataset updated
Mar 8, 2024
Dataset provided by
CDS
Description
VizieR Online Data Catalog: Gaia RVS benchmark stars. II.(Caffau E.+, 2024)
Gaia Inc. SWOT and Financial Analysis
quaintel.com
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Quaintel Research Solutions (2025). Gaia Inc. SWOT and Financial Analysis [Dataset]. https://quaintel.com/public/store/report/gaia-inc-company-profile-swot-pestle-value-chain-analysis
Explore at:
Dataset updated
Jun 29, 2025
Dataset provided by
Quaintel Research
Authors
Quaintel Research Solutions
License
https://quaintel.com/privacy-policyhttps://quaintel.com/privacy-policy
Area covered
Global
Description
Gaia Inc. Company Profile, Opportunities, Challenges and Risk (SWOT, PESTLE and Value Chain); Corporate and ESG Strategies; Competitive Intelligence; Financial KPI’s; Operational KPI’s; Recent Trends: “ Read More
u
Gaia FGK benchmark stars v3
cdsarc.cds.unistra.fr
Updated Feb 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDS (2024). Gaia FGK benchmark stars v3 [Dataset]. http://doi.org/10.26093/cds/vizier.36820145
Explore at:
Unique identifier
https://doi.org/10.26093/cds/vizier.36820145
Dataset updated
Feb 14, 2024
Dataset provided by
CDS
Description
VizieR Online Data Catalog: Gaia FGK benchmark stars v3(Soubiran C.+, 2024)
o
Replication data for: Rationalizing the Penn World Table: True Multilateral...
openicpsr.org
datasearch.gesis.org
Updated Dec 6, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
J. Peter Neary (2019). Replication data for: Rationalizing the Penn World Table: True Multilateral Indices for International Comparisons of Real Income [Dataset]. http://doi.org/10.3886/E116031V1
Explore at:
Unique identifier
https://doi.org/10.3886/E116031V1
Dataset updated
Dec 6, 2019
Dataset provided by
American Economic Association
Authors
J. Peter Neary
Description
Real incomes are routinely compared internationally using methods that "correct" for deviations from purchasing power parity. The most widely used of these is the Geary method which, though theoretically suspect, underlies the Penn World Table. This paper provides a theoretical foundation for the Geary method which I call the GAIA ( "Geary-Allen International Accounts" ) system. I show that the Geary method is exact when preferences are non-homothetic Leontief and, more generally, gives a ( possibly poor ) approximation to the GAIA benchmark. An empirical application suggests that both it and other widely used methods underestimate the degree of international inequality.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

GAIA (2023). GAIA [Dataset]. https://huggingface.co/datasets/gaia-benchmark/GAIA

GAIA

gaia-benchmark/GAIA

General AI Assistants Benchmark

Explore at:

Dataset updated

Nov 23, 2023

Dataset authored and provided by

GAIA

Description

GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

  Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.

Clear search

Close search

Google apps

Main menu

GAIA

GAIA Dataset

results_public

submissions_public

GAIA-Subset-Benchmark

Gaia RVS benchmark stars. II.

Gaia Inc. SWOT and Financial Analysis

Gaia FGK benchmark stars v3

Replication data for: Rationalizing the Penn World Table: True Multilateral...

GAIASee More Versions

gaia-benchmark/GAIA

General AI Assistants Benchmark

GAIA