2 datasets found
  1. h

    github-code

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CodeParrot, github-code [Dataset]. https://huggingface.co/datasets/codeparrot/github-code
    Explore at:
    Dataset provided by
    Good Engineering, Inc
    Authors
    CodeParrot
    License

    https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

    Description

    The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.

  2. h

    Patho-Bench

    • huggingface.co
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI for Pathology Image Analysis Lab @ HMS / BWH (2025). Patho-Bench [Dataset]. https://huggingface.co/datasets/MahmoodLab/Patho-Bench
    Explore at:
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    AI for Pathology Image Analysis Lab @ HMS / BWH
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    ♆ Patho-Bench

    πŸ“„ Preprint | Code

    Patho-Bench is designed to evaluate patch and slide encoder foundation models for whole-slide images (WSIs). This HuggingFace repository contains the data splits for the public Patho-Bench tasks. Please visit our codebase on GitHub for the full codebase and benchmark implementation. This project was developed by the Mahmood Lab at Harvard Medical School and Brigham and Women's Hospital. This work was funded by NIH NIGMS R35GM138216.

    [!NOTE]… See the full description on the dataset page: https://huggingface.co/datasets/MahmoodLab/Patho-Bench.

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
CodeParrot, github-code [Dataset]. https://huggingface.co/datasets/codeparrot/github-code

github-code

github-code

codeparrot/github-code

Explore at:
55 scholarly articles cite this dataset (View in Google Scholar)
Dataset provided by
Good Engineering, Inc
Authors
CodeParrot
License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.

Search
Clear search
Close search
Google apps
Main menu