3 datasets found
  1. P

    HellaSwag Dataset

    • paperswithcode.com
    • tensorflow.org
    • +2more
    Updated Jan 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi (2025). HellaSwag Dataset [Dataset]. https://paperswithcode.com/dataset/hellaswag
    Explore at:
    Dataset updated
    Jan 5, 2025
    Authors
    Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi
    Description

    HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

  2. h

    bouncerbench-lite

    • huggingface.co
    Updated Jun 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Waterloo SWAG (2025). bouncerbench-lite [Dataset]. https://huggingface.co/datasets/uw-swag/bouncerbench-lite
    Explore at:
    Dataset updated
    Jun 26, 2025
    Dataset authored and provided by
    University of Waterloo SWAG
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Summary

    Existing LLM-based tools and coding agents respond to every issue and generate a patch for every case, even when the input is vague or their own output is incorrect. There are no mechanisms in place to abstain when confidence is low. BouncerBench checks if AI agents know when not to act.
    This is one of 3 datasets released as part of the paper Is Your Automated Software Engineer Trustworthy?.

    input_bouncerTasks on bug‐report text. The model decides if a report is… See the full description on the dataset page: https://huggingface.co/datasets/uw-swag/bouncerbench-lite.

  3. swag

    • huggingface.co
    • opendatalab.com
    • +1more
    Updated May 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ai2 (2024). swag [Dataset]. https://huggingface.co/datasets/allenai/swag
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 24, 2024
    Dataset provided by
    Allen Institute for AIhttp://allenai.org/
    Authors
    Ai2
    License

    https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/

    Description

    Dataset Card for Situations With Adversarial Generations

      Dataset Summary
    

    Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate what might come next ("then, she examined the engine"). SWAG (Situations With Adversarial Generations) is a large-scale dataset for this task of grounded commonsense inference, unifying natural language inference and physically grounded reasoning. The dataset consists of 113k… See the full description on the dataset page: https://huggingface.co/datasets/allenai/swag.

  4. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi (2025). HellaSwag Dataset [Dataset]. https://paperswithcode.com/dataset/hellaswag

HellaSwag Dataset

Explore at:
Dataset updated
Jan 5, 2025
Authors
Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi
Description

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

Search
Clear search
Close search
Google apps
Main menu