8 datasets found
  1. h

    SWE-bench

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Luay, SWE-bench [Dataset]. https://huggingface.co/datasets/Luwayy/SWE-bench
    Explore at:
    Authors
    Muhammad Luay
    Description

    Dataset Summary

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

      Want to run inference now?
    

    This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/Luwayy/SWE-bench.

  2. h

    SWE-bench

    • huggingface.co
    Updated Apr 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SWE-bench (2025). SWE-bench [Dataset]. https://huggingface.co/datasets/SWE-bench/SWE-bench
    Explore at:
    Dataset updated
    Apr 29, 2025
    Dataset authored and provided by
    SWE-bench
    Description

    Dataset Summary

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

      Want to run inference now?
    

    This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/SWE-bench/SWE-bench.

  3. h

    SWE-bench_bm25_40K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench_bm25_40K [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    Dataset Card for "SWE-bench_bm25_40K"

      Dataset Summary
    

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_40K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_40K.

  4. h

    SWE-bench

    • huggingface.co
    • opendatalab.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    Dataset Summary

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

      Want to run inference now?
    

    This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench.

  5. h

    SWE-bench_Lite_oracle

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench_Lite_oracle [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    Dataset Summary

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 300 test Issue-Pull Request pairs from 11 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_Lite_oracle includes a formatting of each instance using the "Oracle" retrieval… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_Lite_oracle.

  6. h

    SWE-bench_bm25_27K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench_bm25_27K [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    Dataset Card for "SWE-bench_bm25_27K"

      Dataset Summary
    

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_27K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_27K.

  7. h

    SWE-bench_bm25_13K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench_bm25_13K [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    Dataset Card for "SWE-bench_bm25_13K"

      Dataset Summary
    

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_bm25_13K includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_bm25_13K.

  8. h

    SWE-bench_oracle

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Princeton NLP group, SWE-bench_oracle [Dataset]. https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Princeton NLP group
    Description

    Dataset Card for "SWE-bench_oracle"

      Dataset Summary
    

    SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues? This dataset SWE-bench_oracle includes a formatting of… See the full description on the dataset page: https://huggingface.co/datasets/princeton-nlp/SWE-bench_oracle.

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Muhammad Luay, SWE-bench [Dataset]. https://huggingface.co/datasets/Luwayy/SWE-bench

SWE-bench

Luwayy/SWE-bench

Explore at:
Authors
Muhammad Luay
Description

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification using post-PR behavior as the reference solution. The dataset was released as part of SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

  Want to run inference now?

This dataset only contains the problem_statement… See the full description on the dataset page: https://huggingface.co/datasets/Luwayy/SWE-bench.

Search
Clear search
Close search
Google apps
Main menu