8 datasets found

h
SWE-bench
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Luay, SWE-bench [Dataset]. https://huggingface.co/datasets/Luwayy/SWE-bench
Explore at:
Authors
Muhammad Luay
Description
Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/Luwayy/SWE-bench.
h
SWE-bench
huggingface.co
Updated Apr 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench (2025). SWE-bench [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench
Explore at:
Dataset updated
Apr 29, 2025
Dataset authored and provided by
SWE-bench
Description
Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench.
h
SWE-bench_bm25_40K
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_bm25_40K [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Card for "SWE-bench_bm25_40K"

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_40K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K.
h
SWE-bench
huggingface.co
opendatalab.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench.
h
SWE-bench_Lite_oracle
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_Lite_oracle [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 300 test Issue-Pull Request pairs from 11 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_Lite_oracle includes a formatting of each instance using the "Oracle" retrieval… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle.
h
SWE-bench_bm25_27K
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_bm25_27K [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Card for "SWE-bench_bm25_27K"

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_27K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K.
h
SWE-bench_bm25_13K
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_bm25_13K [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Card for "SWE-bench_bm25_13K"

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_13K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K.
h
SWE-bench_oracle
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_oracle [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Card for "SWE-bench_oracle"

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_oracle includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Muhammad Luay, SWE-bench [Dataset]. https://huggingface.co/datasets/Luwayy/SWE-bench

SWE-bench

Luwayy/SWE-bench

Explore at:

Authors

Muhammad Luay

Description

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

  Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/Luwayy/SWE-bench.

Clear search

Close search

Google apps

Main menu

SWE-bench

SWE-bench

SWE-bench_bm25_40K

SWE-bench

SWE-bench_Lite_oracle

SWE-bench_bm25_27K

SWE-bench_bm25_13K

SWE-bench_oracle

SWE-bench

Luwayy/SWE-bench