4 datasets found

P
NCT-CRC-HE-100K Dataset
paperswithcode.com
Updated Sep 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2021). NCT-CRC-HE-100K Dataset [Dataset]. https://paperswithcode.com/dataset/nct-crc-he-100k
Explore at:
Dataset updated
Sep 30, 2021
Description
The NCT-CRC-HE-100K dataset is a set of 100,000 non-overlapping image patches extracted from 86 H$&$E stained human cancer tissue slides and normal tissue from the NCT biobank (National Center for Tumor Diseases) and the UMM pathology archive (University Medical Center Mannheim). While the dataset Colorectal Cacner-Validation-Histology-7K (CRC-VAL-HE-7K) consist of 7180 images extracted from 50 patients with colorectal adenocarcinoma and were used to create a dataset that does not overlap with patients in the NCT-CRC-HE-100K dataset. It was created by pathologists by manually delineating tissue regions in whole slide images into the following nine tissue classes: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).
100,000 histological images of human colorectal cancer and healthy tissue
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx (2020). 100,000 histological images of human colorectal cancer and healthy tissue [Dataset]. http://doi.org/10.5281/zenodo.1214456
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1214456
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Description "NCT-CRC-HE-100K"

This is a set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.

All images are 224x224 pixels (px) at 0.5 microns per pixel (MPP). All images are color-normalized using Macenko's method (http://ieeexplore.ieee.org/abstract/document/5193250/, DOI 10.1109/ISBI.2009.5193250).

Tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).

These images were manually extracted from N=86 H&E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples from the NCT Biobank (National Center for Tumor Diseases, Heidelberg, Germany) and the UMM pathology archive (University Medical Center Mannheim, Mannheim, Germany). Tissue samples contained CRC primary tumor slides and tumor tissue from CRC liver metastases; normal tissue classes were augmented with non-tumorous regions from gastrectomy specimen to increase variability.

Ethics statement "NCT-CRC-HE-100K"

All experiments were conducted in accordance with the Declaration of Helsinki, the International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS), the Belmont Report and the U.S. Common Rule. Anonymized archival tissue samples were retrieved from the tissue bank of the National Center for Tumor diseases (NCT, Heidelberg, Germany) in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (tissue bank decision numbers 2152 and 2154, granted to Niels Halama and Jakob Nikolas Kather; informed consent was obtained from all patients as part of the NCT tissue bank protocol, ethics board approval S-207/2005, renewed on 20 Dec 2017). Another set of tissue samples was provided by the pathology archive at UMM (University Medical Center Mannheim, Heidelberg University, Mannheim, Germany) after approval by the institutional ethics board (Ethics Board II at University Medical Center Mannheim, decision number 2017-806R-MA, granted to Alexander Marx and waiving the need for informed consent for this retrospective and fully anonymized analysis of archival samples).

Data set "CRC-VAL-HE-7K"

This is a set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K). It can be used as a validation set for models trained on the larger data set. Like in the larger data set, images are 224x224 px at 0.5 MPP. All tissue samples were provided by the NCT tissue bank, see above for further details and ethics statement.

Data set "NCT-CRC-HE-100K-NONORM"

This is a slightly different version of the "NCT-CRC-HE-100K" image set: This set contains 100,000 images in 9 tissue classes at 0.5 MPP and was created from the same raw data as "NCT-CRC-HE-100K". However, no color normalization was applied to these images. Consequently, staining intensity and color slightly varies between the images. Please note that although this image set was created from the same data as "NCT-CRC-HE-100K", the image regions are not completely identical because the selection of non-overlapping tiles from raw images was a stochastic process.

General comments

Please note that the classes are only roughly balanced. Classifiers should never be evaluated based on accuracy in the full set alone. Also, if a high risk of training bias is excepted, balancing the number of cases per class is recommended.
NCT-CRC-HE-100K-NONORM
kaggle.com
Updated Dec 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imran Khan (2021). NCT-CRC-HE-100K-NONORM [Dataset]. https://www.kaggle.com/imrankhan77/nct-crc-he-100k-nonorm/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 7, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Imran Khan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Imran Khan

Released under CC0: Public Domain

Contents
H
Native Medical CNN Model Training Records before and after Optimization
dataverse.harvard.edu
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jie Li (2025). Native Medical CNN Model Training Records before and after Optimization [Dataset]. http://doi.org/10.7910/DVN/5T5TXE
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/5T5TXE
Dataset updated
Apr 11, 2025
Dataset provided by
Harvard Dataverse
Authors
Jie Li
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The model is for colon cancer tissue classification. In order to increase the complexity of the classification task, this research collectively applied NCT-CRC-HE-100K (n = 100,000) and CRC-VAL-HE-7K (n = 7,180) to increase the volume and variety of the dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2021). NCT-CRC-HE-100K Dataset [Dataset]. https://paperswithcode.com/dataset/nct-crc-he-100k

NCT-CRC-HE-100K Dataset

Explore at:

280 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 30, 2021

Description

The NCT-CRC-HE-100K dataset is a set of 100,000 non-overlapping image patches extracted from 86 H$&$E stained human cancer tissue slides and normal tissue from the NCT biobank (National Center for Tumor Diseases) and the UMM pathology archive (University Medical Center Mannheim). While the dataset Colorectal Cacner-Validation-Histology-7K (CRC-VAL-HE-7K) consist of 7180 images extracted from 50 patients with colorectal adenocarcinoma and were used to create a dataset that does not overlap with patients in the NCT-CRC-HE-100K dataset. It was created by pathologists by manually delineating tissue regions in whole slide images into the following nine tissue classes: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).

Clear search

Close search

Google apps

Main menu

NCT-CRC-HE-100K Dataset

100,000 histological images of human colorectal cancer and healthy tissue

NCT-CRC-HE-100K-NONORM

Dataset

Contents

Native Medical CNN Model Training Records before and after Optimization

NCT-CRC-HE-100K Dataset