9 datasets found
  1. h

    GAIA

    • huggingface.co
    Updated Nov 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GAIA (2023). GAIA [Dataset]. https://huggingface.co/datasets/gaia-benchmark/GAIA
    Explore at:
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    GAIA
    Description

    GAIA dataset

    GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

      Data and leaderboard
    

    GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.

  2. P

    GAIA Dataset

    • paperswithcode.com
    Updated Jun 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grégoire Mialon; Clémentine Fourrier; Craig Swift; Thomas Wolf; Yann Lecun; Thomas Scialom (2025). GAIA Dataset [Dataset]. https://paperswithcode.com/dataset/gaia
    Explore at:
    Dataset updated
    Jun 13, 2025
    Authors
    Grégoire Mialon; Clémentine Fourrier; Craig Swift; Thomas Wolf; Yann Lecun; Thomas Scialom
    Description

    We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA’s philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system’s capability to exhibit similar robustness as the average human does on such questions. Using GAIA’s methodology, we devise 466 questions and their answer. We release our questions while retaining answers to 300 of them to power a leader-board accessible at https://huggingface.co/gaia-benchmark.

  3. h

    results_public

    • huggingface.co
    Updated Nov 23, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GAIA (2023). results_public [Dataset]. https://huggingface.co/datasets/gaia-benchmark/results_public
    Explore at:
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    GAIA
    Description

    Dataset Card for "resultspublic"

    More Information needed

  4. h

    submissions_public

    • huggingface.co
    Updated Feb 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GAIA (2025). submissions_public [Dataset]. https://huggingface.co/datasets/gaia-benchmark/submissions_public
    Explore at:
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    GAIA
    Description

    Contains the scored submissions on the GAIA benchmark, validation set only. To avoid contamination of GAIA and leakage of the answers, please do not reshare the current dataset without any gating system or outside of the HF hub.

  5. h

    GAIA-Subset-Benchmark

    • huggingface.co
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    II (2025). GAIA-Subset-Benchmark [Dataset]. https://huggingface.co/datasets/Intelligent-Internet/GAIA-Subset-Benchmark
    Explore at:
    Dataset updated
    Mar 28, 2025
    Dataset authored and provided by
    II
    Description

    GAIA Benchmark Subset Model Card

    This dataset is a subset of the GAIA benchmark, containing 44 web-search-based questions from the validation set. It evaluates multiple AI models on their ability to retrieve and process real-time information using web search and browser tools. Performance metrics include success indicators and detailed reports for each model. A comparative chart summarizing the results will be provided separately.

      Benchmark Results
    
  6. Gaia Inc. SWOT and Financial Analysis

    • quaintel.com
    Updated Jun 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Quaintel Research Solutions (2025). Gaia Inc. SWOT and Financial Analysis [Dataset]. https://quaintel.com/public/store/report/gaia-inc-company-profile-swot-pestle-value-chain-analysis
    Explore at:
    Dataset updated
    Jun 29, 2025
    Dataset provided by
    Quaintel Research
    Authors
    Quaintel Research Solutions
    License

    https://quaintel.com/privacy-policyhttps://quaintel.com/privacy-policy

    Area covered
    Global
    Description

    Gaia Inc. Company Profile, Opportunities, Challenges and Risk (SWOT, PESTLE and Value Chain); Corporate and ESG Strategies; Competitive Intelligence; Financial KPI’s; Operational KPI’s; Recent Trends: “ Read More

  7. u

    Gaia RVS benchmark stars. II.

    • cdsarc.cds.unistra.fr
    Updated Mar 8, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDS (2024). Gaia RVS benchmark stars. II. [Dataset]. http://doi.org/10.26093/cds/vizier.36830072
    Explore at:
    Dataset updated
    Mar 8, 2024
    Dataset provided by
    CDS
    Description

    VizieR Online Data Catalog: Gaia RVS benchmark stars. II.(Caffau E.+, 2024)

  8. u

    Gaia FGK benchmark stars v3

    • cdsarc.cds.unistra.fr
    Updated Feb 14, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CDS (2024). Gaia FGK benchmark stars v3 [Dataset]. http://doi.org/10.26093/cds/vizier.36820145
    Explore at:
    Dataset updated
    Feb 14, 2024
    Dataset provided by
    CDS
    Description

    VizieR Online Data Catalog: Gaia FGK benchmark stars v3(Soubiran C.+, 2024)

  9. g

    Replication data for: Rationalizing the Penn World Table: True Multilateral...

    • datasearch.gesis.org
    • openicpsr.org
    Updated Dec 6, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Neary, J. Peter (2019). Replication data for: Rationalizing the Penn World Table: True Multilateral Indices for International Comparisons of Real Income [Dataset]. http://doi.org/10.3886/E116031
    Explore at:
    Dataset updated
    Dec 6, 2019
    Dataset provided by
    da|ra (Registration agency for social science and economic data)
    Authors
    Neary, J. Peter
    Description

    Real incomes are routinely compared internationally using methods that "correct" for deviations from purchasing power parity. The most widely used of these is the Geary method which, though theoretically suspect, underlies the Penn World Table. This paper provides a theoretical foundation for the Geary method which I call the GAIA ( "Geary-Allen International Accounts" ) system. I show that the Geary method is exact when preferences are non-homothetic Leontief and, more generally, gives a ( possibly poor ) approximation to the GAIA benchmark. An empirical application suggests that both it and other widely used methods underestimate the degree of international inequality.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GAIA (2023). GAIA [Dataset]. https://huggingface.co/datasets/gaia-benchmark/GAIA

GAIA

gaia-benchmark/GAIA

General AI Assistants Benchmark

Explore at:
Dataset updated
Nov 23, 2023
Dataset authored and provided by
GAIA
Description

GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

  Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.

Search
Clear search
Close search
Google apps
Main menu