MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SWE-Gym contains 2438 instances sourced from 11 Python repos, following SWE-Bench data collection procedure. Get started at project page github.com/SWE-Gym/SWE-Gym
swesynth/SWE-Gym-logs dataset hosted on Hugging Face and contributed by the HF Datasets community
SWE-Gym/OpenHands-Sampled-Trajectories dataset hosted on Hugging Face and contributed by the HF Datasets community
FundamentalResearchLabs/leader-training-swe-gym-rest dataset hosted on Hugging Face and contributed by the HF Datasets community
R2E-Gym/SWE-Bench-Verified dataset hosted on Hugging Face and contributed by the HF Datasets community
MultiturnRL/SWE-Gym-Small dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🔧 Selected SWE-Gym Subset
A curated subset of 100 program repair instances from the SWE-Gym dataset, selected for lightweight evaluation and rapid prototyping.
📦 Dataset Description
This dataset contains 100 program repair tasks selected from the full SWE-Gym benchmark. Each instance represents a realistic software bug scenario, including the following fields:
instance_id: Unique identifier repo: GitHub repository commit: Bug-inducing commit hash test_setup: Test setup… See the full description on the dataset page: https://huggingface.co/datasets/dcloud347/Selected_SWE-Gym.
ryanhoangt/threshold-calib-sonnet-4-swe-gym-lite-13k dataset hosted on Hugging Face and contributed by the HF Datasets community
ASSERT-KTH/Nano-SFT-SWE-Gym-gemini-2.5-flash dataset hosted on Hugging Face and contributed by the HF Datasets community
rasdani/SWE-Bench-Verified-R2E-Gym-100 dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview
RL dataset for training SWE-Swiss models on the repair task. The prompts are based on issues from SWE-Gym and SWE-smith. To create a challenging task, the code content in each prompt consists of two components: "oracle" files, which are the ground-truth files requiring a patch, and "distractor" files, which are plausible but incorrect files predicted by an LLM.
Citation
@misc{SWESwiss2025, title = {SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for… See the full description on the dataset page: https://huggingface.co/datasets/SWE-Swiss/SWESwiss-Repair-RL-SWEGym-SWESmith-12K.
AxT-dev/swe-agent-lm-32b-r2e-gym-trajectories dataset hosted on Hugging Face and contributed by the HF Datasets community
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview
SFT dataset for training SWE-Swiss models on the repair task. The prompts are based on issues from SWE-Gym and SWE-smith. To create a challenging task, the code content in each prompt consists of two components: 'oracle' files, which are the ground-truth files requiring a patch, and 'distractor' files, which are plausible but incorrect files predicted by an LLM. The responses are generated by DeepSeek-R1-0528, and we filter out any data where the generated patch cannot pass… See the full description on the dataset page: https://huggingface.co/datasets/SWE-Swiss/SWESwiss-SFT-Repair-4K.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview
SFT dataset for training SWE-Swiss models on the localization task. Prompts are constructed from a subset of issues in SWE-Gym-Raw and the SWE-bench training set. To prevent data leakage, we've filtered out any repositories that also appear in the SWE-bench test set. The responses are generated by DeepSeek-R1-0528. An instance is included in the final dataset only if the model's prediction meets two conditions: the number of predicted files is five or fewer, and the recall… See the full description on the dataset page: https://huggingface.co/datasets/SWE-Swiss/SWESwiss-SFT-Localization-5K.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Overview
SFT dataset for training SWE-Swiss models on the unit test generation task. The prompts contain issues sourced from SWE-Gym and SWE-smith, while the responses are generated by DeepSeek-R1-0528. To ensure quality, we filter out data where the generated unit tests do not perform as expected. A generated test is kept only if its execution results correctly distinguish between a set of correct and incorrect patches, mirroring the behavior of the repository's own test suite.… See the full description on the dataset page: https://huggingface.co/datasets/SWE-Swiss/SWESwiss-SFT-Unittest-1K.
SWE-rebench-R2E (Filtered Dataset)
Dataset Description
This is a filtered version of the nebius/SWE-rebench dataset. The filtering process removes instances that overlap in repo with other established SWE-bench datasets to ensure uniqueness and reduce data contamination. Thus, you could directly use it as training data along with SWE-smith/R2E-Gym-Subset and test it on SWE-bench_Verified/Lite.
Filtering Criteria
The dataset was filtered using the following… See the full description on the dataset page: https://huggingface.co/datasets/hubert233/SWE-rebench-filtered.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
SWE-Gym contains 2438 instances sourced from 11 Python repos, following SWE-Bench data collection procedure. Get started at project page github.com/SWE-Gym/SWE-Gym