2 datasets found

h
github-code
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CodeParrot, github-code [Dataset]. https://huggingface.co/datasets/codeparrot/github-code
Explore at:
Dataset provided by
Good Engineering, Inc
Authors
CodeParrot
License
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Description
The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.
h
Patho-Bench
huggingface.co
Updated Feb 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI for Pathology Image Analysis Lab @ HMS / BWH (2025). Patho-Bench [Dataset]. https://huggingface.co/datasets/MahmoodLab/Patho-Bench
Explore at:
Dataset updated
Feb 12, 2025
Dataset authored and provided by
AI for Pathology Image Analysis Lab @ HMS / BWH
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
♆ Patho-Bench

📄 Preprint | Code

Patho-Bench is designed to evaluate patch and slide encoder foundation models for whole-slide images (WSIs). This HuggingFace repository contains the data splits for the public Patho-Bench tasks. Please visit our codebase on GitHub for the full codebase and benchmark implementation. This project was developed by the Mahmood Lab at Harvard Medical School and Brigham and Women's Hospital. This work was funded by NIH NIGMS R35GM138216.

[!NOTE]… See the full description on the dataset page: https://huggingface.co/datasets/MahmoodLab/Patho-Bench.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

CodeParrot, github-code [Dataset]. https://huggingface.co/datasets/codeparrot/github-code

github-code

codeparrot/github-code

Explore at:

55 scholarly articles cite this dataset (View in Google Scholar)

Dataset provided by

Good Engineering, Inc

Authors

CodeParrot

License

https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/

Description

The GitHub Code dataest consists of 115M code files from GitHub in 32 programming languages with 60 extensions totalling in 1TB of text data. The dataset was created from the GitHub dataset on BiqQuery.

Clear search

Close search

Google apps

Main menu

github-code

Patho-Bench

github-code

github-code

codeparrot/github-code