59 datasets found

h
SWE-bench
huggingface.co
opendatalab.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench.
h
SWE-bench_Verified
huggingface.co
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench (2025). SWE-bench_Verified [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified
Explore at:
Dataset updated
Apr 29, 2025
Dataset authored and provided by
SWE-bench
Description
Dataset Summary SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process. The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The original… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified.
SWE Bench Verified
kaggle.com
huggingface.co
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harry Wang (2024). SWE Bench Verified [Dataset]. https://www.kaggle.com/datasets/harrywang/swe-bench-verified
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 20, 2024
Dataset provided by
Kaggle
Authors
Harry Wang
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
See details from OpenAI: https://openai.com/index/introducing-swe-bench-verified/

Converted from Parquet to CSV from https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified

Data Summary from Huggingface:

SWE-bench Verified is a subset of 500 samples from the SWE-bench test set, which have been human-validated for quality. SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. See this post for more details on the human-validation process.

The dataset collects 500 test Issue-Pull Request pairs from popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.

The original SWE-bench dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now? This dataset only contains the problem_statement (i.e. issue text) and the base_commit which represents the state of the codebase before the issue has been resolved. If you want to run inference using the "Oracle" or BM25 retrieval settings mentioned in the paper, consider the following datasets.

princeton-nlp/SWE-bench_Lite_oracle

princeton-nlp/SWE-bench_Lite_bm25_13K

princeton-nlp/SWE-bench_Lite_bm25_27K

Supported Tasks and Leaderboards SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com

Languages The text of the dataset is primarily English, but we make no effort to filter or otherwise clean based on language type.

Dataset Structure

An example of a SWE-bench datum is as follows:

instance_id: (str) - A formatted instance identifier, usually as repo_owner_repo_name-PR-number.

patch: (str) - The gold patch, the patch generated by the PR (minus test-related code), that resolved the issue.

repo: (str) - The repository owner/name identifier from GitHub.

base_commit: (str) - The commit hash of the repository representing the HEAD of the repository before the solution PR is applied.

hints_text: (str) - Comments made on the issue prior to the creation of the solution PR’s first commit creation date.

created_at: (str) - The creation date of the pull request.

test_patch: (str) - A test-file patch that was contributed by the solution PR. problem_statement: (str) - The issue title and body.

version: (str) - Installation version to use for running evaluation.

environment_setup_commit: (str) - commit hash to use for environment setup and installation.

FAIL_TO_PASS: (str) - A json list of strings that represent the set of tests resolved by the PR and tied to the issue resolution.

PASS_TO_PASS: (str) - A json list of strings that represent tests that should pass before and after the PR application.
h
SWE-bench_Multilingual
huggingface.co
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench (2025). SWE-bench_Multilingual [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench_Multilingual
Explore at:
Dataset updated
Apr 29, 2025
Dataset authored and provided by
SWE-bench
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
SWE-bench/SWE-bench_Multilingual dataset hosted on Hugging Face and contributed by the HF Datasets community
h
SWE-bench_Lite
huggingface.co
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench (2025). SWE-bench_Lite [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench_Lite
Explore at:
Dataset updated
Apr 29, 2025
Dataset authored and provided by
SWE-bench
Description
Dataset Summary

SWE-bench Lite is subset of SWE-bench, a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 300 test Issue-Pull Request pairs from 11 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Want to run inference now?

This dataset only contains the… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench_Lite.
Multi-SWE-bench
huggingface.co
Updated Jul 16, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ByteDance Seed (2025). Multi-SWE-bench [Dataset]. https://huggingface.co/datasets/ByteDance-Seed/Multi-SWE-bench
Explore at:
Dataset updated
Jul 16, 2025
Dataset provided by
ByteDancehttps://www.bytedance.com/
Authors
ByteDance Seed
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
👋 Overview

This repository contains the Multi-SWE-bench dataset, introduced in Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving, to address the lack of multilingual benchmarks for evaluating LLMs in real-world code issue resolution. Unlike existing Python-centric benchmarks (e.g., SWE-bench), this framework spans 7 languages (Java, TypeScript, JavaScript, Go, Rust, C, and C++) with 1,632 high-quality instances, curated from 2,456 candidates by 68 expert annotators… See the full description on the dataset page: https://huggingface.co/datasets/ByteDance-Seed/Multi-SWE-bench.
h
SWE-bench_Multimodal
huggingface.co
Updated Apr 29, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench (2025). SWE-bench_Multimodal [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench_Multimodal
Explore at:
Dataset updated
Apr 29, 2025
Dataset authored and provided by
SWE-bench
Description
SWE-bench Multimodal

SWE-bench Multimodal is a dataset of 617 task instances that evalutes Language Models and AI Systems on their ability to resolve real world GitHub issues. To learn more about the dataset, please visit our website. You can find the leaderboard at SWE-bench's home page.
SWE-bench_Pro
huggingface.co
Updated Sep 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Scale AI (2025). SWE-bench_Pro [Dataset]. https://huggingface.co/datasets/ScaleAI/SWE-bench_Pro
Explore at:
Dataset updated
Sep 21, 2025
Dataset authored and provided by
Scale AIhttps://scale.com/
Description
Dataset Summary

SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks. Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf See the related evaluation Github: https://github.com/scaleapi/SWE-bench_Pro-os

Dataset Structure

We follow SWE-Bench Verified (https://huggingface.co/datasets/SWE-bench/SWE-bench_Verified) in terms of dataset structure, with several… See the full description on the dataset page: https://huggingface.co/datasets/ScaleAI/SWE-bench_Pro.
h
SWE-Bench-Verified
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R2E-Gym, SWE-Bench-Verified [Dataset]. https://huggingface.co/datasets/R2E-Gym/SWE-Bench-Verified
Explore at:
Dataset authored and provided by
R2E-Gym
Description
R2E-Gym/SWE-Bench-Verified dataset hosted on Hugging Face and contributed by the HF Datasets community
SWE-bench-extra
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nebius, SWE-bench-extra [Dataset]. https://huggingface.co/datasets/nebius/SWE-bench-extra
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset provided by
Nebius
Nebius Grouphttps://nebius.com/
Authors
Nebius
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note: This dataset has an improved and significantly larger successor: SWE-rebench.

Dataset Summary

SWE-bench Extra is a dataset that can be used to train or evaluate agentic systems specializing in resolving GitHub issues. It is based on the methodology used to build SWE-bench benchmark and includes 6,415 Issue-Pull Request pairs sourced from 1,988 Python repositories.

Dataset Description

The SWE-bench Extra dataset supports the development of software engineering agents… See the full description on the dataset page: https://huggingface.co/datasets/nebius/SWE-bench-extra.
h
SWE-bench_Not_Verified
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench, SWE-bench_Not_Verified [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench_Not_Verified
Explore at:
Dataset authored and provided by
SWE-bench
Description
SWE-bench/SWE-bench_Not_Verified dataset hosted on Hugging Face and contributed by the HF Datasets community
h
SWE-bench-verified-scikit-learn
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yi-Ling Chen, SWE-bench-verified-scikit-learn [Dataset]. https://huggingface.co/datasets/yilche/SWE-bench-verified-scikit-learn
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Yi-Ling Chen
Description
yilche/SWE-bench-verified-scikit-learn dataset hosted on Hugging Face and contributed by the HF Datasets community
h
SWE-bench_oracle_cl100k
huggingface.co
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Princeton NLP group, SWE-bench_oracle_cl100k [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle_cl100k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Princeton NLP group
Description
Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution.

Supported Tasks and Leaderboards

SWE-bench proposes a new task: issue resolution provided a full repository and GitHub issue. The leaderboard can be found at www.swebench.com

Languages… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle_cl100k.
h
Devin-SWE-bench-output
huggingface.co
Updated May 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
OpenHands (2024). Devin-SWE-bench-output [Dataset]. https://huggingface.co/datasets/OpenHands/Devin-SWE-bench-output
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 6, 2024
Dataset authored and provided by
OpenHands
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
OpenHands/Devin-SWE-bench-output dataset hosted on Hugging Face and contributed by the HF Datasets community
h
SWE-bench-Live
huggingface.co
Updated Aug 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench-Live (2025). SWE-bench-Live [Dataset]. https://huggingface.co/datasets/SWE-bench-Live/SWE-bench-Live
Explore at:
Dataset updated
Aug 30, 2025
Dataset authored and provided by
SWE-bench-Live
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
A brand-new, continuously updated SWE-bench-like dataset powered by an automated curation pipeline.

For the official data release page, please see microsoft/SWE-bench-Live.

Dataset Summary

SWE-bench-Live is a live benchmark for issue resolving, designed to evaluate an AI system’s ability to complete real-world software engineering tasks. Thanks to our automated dataset curation pipeline, we plan to update SWE-bench-Live on a monthly basis to provide the… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench-Live/SWE-bench-Live.
h
swe-bench-verified-ja
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
YuyaYamamoto, swe-bench-verified-ja [Dataset]. https://huggingface.co/datasets/nejumi/swe-bench-verified-ja
Explore at:
Authors
YuyaYamamoto
Description
nejumi/swe-bench-verified-ja dataset hosted on Hugging Face and contributed by the HF Datasets community
h
swe-bench-coding-tasks
huggingface.co
Updated Oct 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Unidata NLP (2025). swe-bench-coding-tasks [Dataset]. https://huggingface.co/datasets/ud-nlp/swe-bench-coding-tasks
Explore at:
Dataset updated
Oct 4, 2025
Authors
Unidata NLP
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
SWE-Bench Dataset - 8,712 files

The dataset comprises 8,712 files across 6 programming languages, featuring verified tasks and benchmarks for evaluating coding agents and language models. It supports coding agents, language models, and developer tools with verified benchmark scores and multi-language test sets. - Get the data

Dataset characteristics:

Characteristic Data

Description An extended benchmark of real-world software engineering tasks with enhanced… See the full description on the dataset page: https://huggingface.co/datasets/ud-nlp/swe-bench-coding-tasks.
h
SWE-smith-trajectories
huggingface.co
Updated May 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SWE-bench (2025). SWE-smith-trajectories [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories
Explore at:
Dataset updated
May 19, 2025
Dataset authored and provided by
SWE-bench
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
SWE-smith Trajectories

Code • Paper • Site

This dataset contains the 5017 trajectories we fine-tuned Qwen 2.5 Coder Instruct on, leading to SWE-agent-LM-32B, a coding LM agent that achieve 40.2% on SWE-bench Verified (no verifiers or multiple rollouts, just 1 attempt per instance). Trajectories were generated by running SWE-agent + Claude 3.7 Sonnet on task instances from the SWE-smith dataset.
h
swe-bench-verified-50
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ibragim, swe-bench-verified-50 [Dataset]. https://huggingface.co/datasets/ibragim-bad/swe-bench-verified-50
Explore at:
Authors
Ibragim
Description
ibragim-bad/swe-bench-verified-50 dataset hosted on Hugging Face and contributed by the HF Datasets community
h
swe-bench-dummy-test-dataset
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kilian Lieret (2025). swe-bench-dummy-test-dataset [Dataset]. https://huggingface.co/datasets/klieret/swe-bench-dummy-test-dataset
Explore at:
Dataset updated
Jul 1, 2025
Authors
Kilian Lieret
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
klieret/swe-bench-dummy-test-dataset dataset hosted on Hugging Face and contributed by the HF Datasets community

Facebook

Twitter

Click to copy link

Link copied

Cite

Princeton NLP group, SWE-bench [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench

SWE-bench

princeton-nlp/SWE-bench

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Authors

Princeton NLP group

Description

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

  Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench.

Clear search

Close search

Google apps

Main menu

SWE-bench

SWE-bench_Verified

SWE Bench Verified

SWE-bench_Multilingual

SWE-bench_Lite

Multi-SWE-bench

SWE-bench_Multimodal

SWE-bench_Pro

SWE-Bench-Verified

SWE-bench-extra

SWE-bench_Not_Verified

SWE-bench-verified-scikit-learn

SWE-bench_oracle_cl100k

Devin-SWE-bench-output

SWE-bench-Live

swe-bench-verified-ja

swe-bench-coding-tasks

SWE-smith-trajectories

swe-bench-verified-50

swe-bench-dummy-test-dataset

SWE-benchSee More Versions

princeton-nlp/SWE-bench

SWE-bench