Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Want to run inference now?
This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/Luwayy/SWE-bench.
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Want to run inference now?
This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench.
Dataset Card for "SWE-bench_bm25_40K"
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_40K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K.
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Want to run inference now?
This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench.
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 300 test Issue-Pull Request pairs from 11 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_Lite_oracle includes a formatting of each instance using the "Oracle" retrieval… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle.
Dataset Card for "SWE-bench_bm25_27K"
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_27K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K.
Dataset Card for "SWE-bench_bm25_13K"
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_13K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K.
Dataset Card for "SWE-bench_oracle"
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_oracle includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Want to run inference now?
This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/Luwayy/SWE-bench.