3 datasets found
  1. h

    Data from: GAIA-modified

    • huggingface.co
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue Tan (2025). GAIA-modified [Dataset]. https://huggingface.co/datasets/evatan/GAIA-modified
    Explore at:
    Dataset updated
    Jun 25, 2025
    Authors
    Yue Tan
    Description

    GAIA dataset

    GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

      Data and leaderboard
    

    GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/evatan/GAIA-modified.

  2. h

    GAIA

    • huggingface.co
    Updated Jun 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yue Tan (2025). GAIA [Dataset]. https://huggingface.co/datasets/evatan/GAIA
    Explore at:
    Dataset updated
    Jun 25, 2025
    Authors
    Yue Tan
    Description

    GAIA dataset

    GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

      Data and leaderboard
    

    GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/evatan/GAIA.

  3. h

    GAIA

    • huggingface.co
    Updated Nov 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GAIA (2023). GAIA [Dataset]. https://huggingface.co/datasets/gaia-benchmark/GAIA
    Explore at:
    Dataset updated
    Nov 23, 2023
    Dataset authored and provided by
    GAIA
    Description

    GAIA dataset

    GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

      Data and leaderboard
    

    GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/gaia-benchmark/GAIA.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yue Tan (2025). GAIA-modified [Dataset]. https://huggingface.co/datasets/evatan/GAIA-modified

Data from: GAIA-modified

evatan/GAIA-modified

General AI Assistants Benchmark

Related Article
Explore at:
Dataset updated
Jun 25, 2025
Authors
Yue Tan
Description

GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

  Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It… See the full description on the dataset page: https://huggingface.co/datasets/evatan/GAIA-modified.

Search
Clear search
Close search
Google apps
Main menu