3 datasets found

P
HellaSwag Dataset
paperswithcode.com
tensorflow.org
+2more
Updated Jan 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi (2025). HellaSwag Dataset [Dataset]. https://paperswithcode.com/dataset/hellaswag
Explore at:
Dataset updated
Jan 5, 2025
Authors
Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi
Description
HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).
h
bouncerbench-lite
huggingface.co
Updated Jun 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Waterloo SWAG (2025). bouncerbench-lite [Dataset]. https://huggingface.co/datasets/uw-swag/bouncerbench-lite
Explore at:
Dataset updated
Jun 26, 2025
Dataset authored and provided by
University of Waterloo SWAG
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Summary

Existing LLM-based tools and coding agents respond to every issue and generate a patch for every case, even when the input is vague or their own output is incorrect. There are no mechanisms in place to abstain when confidence is low. BouncerBench checks if AI agents know when not to act.
This is one of 3 datasets released as part of the paper Is Your Automated Software Engineer Trustworthy?.

input_bouncerTasks on bug‐report text. The model decides if a report is… See the full description on the dataset page: https://huggingface.co/datasets/uw-swag/bouncerbench-lite.
swag
huggingface.co
opendatalab.com
+1more
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ai2 (2024). swag [Dataset]. https://huggingface.co/datasets/allenai/swag
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 24, 2024
Dataset provided by
Allen Institute for AIhttp://allenai.org/
Authors
Ai2
License
https://choosealicense.com/licenses/unknown/https://choosealicense.com/licenses/unknown/
Description
Dataset Card for Situations With Adversarial Generations

Dataset Summary

Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate what might come next ("then, she examined the engine"). SWAG (Situations With Adversarial Generations) is a large-scale dataset for this task of grounded commonsense inference, unifying natural language inference and physically grounded reasoning. The dataset consists of 113k… See the full description on the dataset page: https://huggingface.co/datasets/allenai/swag.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi (2025). HellaSwag Dataset [Dataset]. https://paperswithcode.com/dataset/hellaswag

HellaSwag Dataset

Explore at:

Dataset updated

Jan 5, 2025

Authors

Rowan Zellers; Ari Holtzman; Yonatan Bisk; Ali Farhadi; Yejin Choi

Description

HellaSwag is a challenge dataset for evaluating commonsense NLI that is specially hard for state-of-the-art models, though its questions are trivial for humans (>95% accuracy).

Clear search

Close search

Google apps

Main menu

HellaSwag Dataset

bouncerbench-lite

swag

HellaSwag Dataset