100+ datasets found

100,000 histological images of human colorectal cancer and healthy tissue
zenodo.org
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx (2020). 100,000 histological images of human colorectal cancer and healthy tissue [Dataset]. http://doi.org/10.5281/zenodo.1214456
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1214456
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Description "NCT-CRC-HE-100K"

This is a set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.

All images are 224x224 pixels (px) at 0.5 microns per pixel (MPP). All images are color-normalized using Macenko's method (http://ieeexplore.ieee.org/abstract/document/5193250/, DOI 10.1109/ISBI.2009.5193250).

Tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).

These images were manually extracted from N=86 H&E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples from the NCT Biobank (National Center for Tumor Diseases, Heidelberg, Germany) and the UMM pathology archive (University Medical Center Mannheim, Mannheim, Germany). Tissue samples contained CRC primary tumor slides and tumor tissue from CRC liver metastases; normal tissue classes were augmented with non-tumorous regions from gastrectomy specimen to increase variability.

Ethics statement "NCT-CRC-HE-100K"

All experiments were conducted in accordance with the Declaration of Helsinki, the International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS), the Belmont Report and the U.S. Common Rule. Anonymized archival tissue samples were retrieved from the tissue bank of the National Center for Tumor diseases (NCT, Heidelberg, Germany) in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (tissue bank decision numbers 2152 and 2154, granted to Niels Halama and Jakob Nikolas Kather; informed consent was obtained from all patients as part of the NCT tissue bank protocol, ethics board approval S-207/2005, renewed on 20 Dec 2017). Another set of tissue samples was provided by the pathology archive at UMM (University Medical Center Mannheim, Heidelberg University, Mannheim, Germany) after approval by the institutional ethics board (Ethics Board II at University Medical Center Mannheim, decision number 2017-806R-MA, granted to Alexander Marx and waiving the need for informed consent for this retrospective and fully anonymized analysis of archival samples).

Data set "CRC-VAL-HE-7K"

This is a set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K). It can be used as a validation set for models trained on the larger data set. Like in the larger data set, images are 224x224 px at 0.5 MPP. All tissue samples were provided by the NCT tissue bank, see above for further details and ethics statement.

Data set "NCT-CRC-HE-100K-NONORM"

This is a slightly different version of the "NCT-CRC-HE-100K" image set: This set contains 100,000 images in 9 tissue classes at 0.5 MPP and was created from the same raw data as "NCT-CRC-HE-100K". However, no color normalization was applied to these images. Consequently, staining intensity and color slightly varies between the images. Please note that although this image set was created from the same data as "NCT-CRC-HE-100K", the image regions are not completely identical because the selection of non-overlapping tiles from raw images was a stochastic process.

General comments

Please note that the classes are only roughly balanced. Classifiers should never be evaluated based on accuracy in the full set alone. Also, if a high risk of training bias is excepted, balancing the number of cases per class is recommended.
2 million histological images of breast cancer tumors with her2 labels
zenodo.org
data.niaid.nih.gov
zip
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renan Valieris; Renan Valieris; Luan Martins; Luan Martins; Alexandre Defelicibus; Alexandre Defelicibus; Cynthia Aparecida Bueno de Toledo Osorio; Cynthia Aparecida Bueno de Toledo Osorio; Adriana Passos Bueno; Adriana Passos Bueno (2024). 2 million histological images of breast cancer tumors with her2 labels [Dataset]. http://doi.org/10.5281/zenodo.8383580
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8383580
Dataset updated
Aug 20, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Renan Valieris; Renan Valieris; Luan Martins; Luan Martins; Alexandre Defelicibus; Alexandre Defelicibus; Cynthia Aparecida Bueno de Toledo Osorio; Cynthia Aparecida Bueno de Toledo Osorio; Adriana Passos Bueno; Adriana Passos Bueno
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Description
This is a 2 million set of non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human breast cancer tumor tissue.

The anonymized dataset comes from a cohort of BC patients from the A. C. Camargo Cancer Center (ACCCC, N = 504). All patients were treated for breast cancer at the ACCCC between 2019 and 2021. As part of their diagnosis, in HER2 IHC score 2+ cases, patients' HER2 status was determined following the ASCO guidelines updated in 2018, with visual evaluation of IHC assay and either a FISH or DDISH test. All cases with metastasis or neoadjuvant treatment were excluded.

A total of 426 H&E stained high resolution images (40x magnification) were scanned from biopsy and resection tissue samples with a Leica Aperio AT2 scanner. Ethical approval of the ACCCC study was given by the ethics committee of the Fundação Antônio Prudente. We divided the cases into the following 3 groups according to the results of the IHC and ISH tests: HER2-negative, HER2-low and HER2-high.

The slides were divided into 256 px x 256 px tiles at 0.5 um/pixel magnification. Then, we used a custom trained ConvNext-tiny neural network to only include tiles from the tumor region and its environment, generating a total of 2051877 image patches.

A sample is considered her2-negative with an IHC score of 0; her2-low with an IHC score of 1+ or an IHC score of 2+ with a negative ISH-based test result, and her2-high with an IHC score of 2+ with a positive ISH-based test or an IHC score of 3+.

The accompanying code used for training the models is available at https://github.com/tojallab/wsi-mil
R
Colon Histology Dataset
universe.roboflow.com
zip
Updated Jul 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Colon Histology (2024). Colon Histology Dataset [Dataset]. https://universe.roboflow.com/colon-histology/colon-histology
Explore at:
zipAvailable download formats
Dataset updated
Jul 13, 2024
Dataset authored and provided by
Colon Histology
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Cells
Description
Colon Histology

## Overview Colon Histology is a dataset for classification tasks - it contains Cells annotations for 560 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
T
colorectal_histology
tensorflow.org
Updated Jun 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). colorectal_histology [Dataset]. https://www.tensorflow.org/datasets/catalog/colorectal_histology
Explore at:
Dataset updated
Jun 1, 2024
Description
Classification of textures in colorectal cancer histology. Each example is a 150 x 150 x 3 RGB image of one of 8 classes.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('colorectal_histology', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/colorectal_histology-2.0.0.png" alt="Visualization" width="500px">
BACH Dataset : Grand Challenge on Breast Cancer Histology images
data.niaid.nih.gov
Updated Jan 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Polónia, António; Eloy, Catarina; Aguiar, Paulo (2020). BACH Dataset : Grand Challenge on Breast Cancer Histology images [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3632034
Explore at:
Dataset updated
Jan 31, 2020
Dataset provided by
Institute of Molecular Pathology and Immunology of the University of Porto
INEB/i3S
Authors
Polónia, António; Eloy, Catarina; Aguiar, Paulo
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
i3S Annotated Datasets on Digital Pathology

WELCOME

In an effort to contribute and push forward the field of Digital Pathology, Ipatimup and INEB, two major research institutions in Portugal, have joined forces in the construction of histology datasets to support grand Challenges on automatic classification of tissue malignancy. The researchers/pathologists responsible for the datasets are:

António Polónia (MD), Ipatimup/i3S

Catarina Eloy (MD, PhD), Ipatimup/i3S

Paulo Aguiar (PhD), INEB/i3S

This specific page refers to the Grand Challenge on Breast Cancer Histology images, or BACH Challenge

THE BACH CHALLENGE DATASET

ICIAR 2018 - Grand Challenge on Breast Cancer Histology images [Challenge organized by Teresa Araújo, Guilherme Aresta, António Polónia, Catarina Eloy and Paulo Aguiar]

For detailed information visit: https://iciar2018-challenge.grand-challenge.org/home/

THIS DATASET IS PUBLICALLY AVAILABLE UNDER A CREATIVE COMMONS CC BY-NC-ND LICENSE (ATTRIBUTION-NONCOMMERCIAL-NODERIVS) ESSENCIALLY, YOU ARE GRANTED ACCESS TO THE DATASET FOR USE IN YOUR RESEARCH AS LONG AS YOU CREDIT OUR WORK/PUBLICATIONS(*), BUT YOU CANNOT CHANGE THEM IN ANY WAY OR USE THEM COMMERCIALLY

(*) Aresta, Guilherme, et al. "BACH: Grand challenge on breast cancer histology images." Medical image analysis (2019).

(*) Araújo, Teresa, et al. "Classification of breast cancer histology images using convolutional neural networks." PloS one 12.6 (2017): e0177544.

(*) Fondón, Irene, et al. "Automatic classification of tissue malignancy for breast carcinoma diagnosis." Computers in biology and medicine 96 (2018): 41-51.
c
Hyperspectral Histological Images for Diagnosis of Human Glioblastoma
cancerimagingarchive.net
stage.cancerimagingarchive.net
n/a, png and envi
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2024). Hyperspectral Histological Images for Diagnosis of Human Glioblastoma [Dataset]. http://doi.org/10.7937/z1k6-vd17
Explore at:
png and envi, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/z1k6-vd17
Dataset updated
May 24, 2024
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 24, 2024
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Hyperspectral imaging technology combines the main features of two existing technologies: conventional imaging and spectroscopy. Thus, hyperspectral cameras make it possible to analyze, at the same time and in a non-contact way, the morphological features and chemical composition of the objects captured. The information provided by hyperspectral imaging can be used to detect patterns, cells, or biomarkers to identify diseases. There are different alternatives for processing them and there is a lack of publicly available datasets of medical hyperspectral images. To the best of our knowledge, this is the first open access dataset containing histological hyperspectral images of glioblastoma brain tumors, which can be set as a benchmark for researchers to compare their approaches.
This dataset includes 13 subjects. Each subject has a single histological slide with multiple hyperspectral images captured from each slide where deemed relevant by the pathologists (this number varies for each slide). The database is composed of 469 annotated hyperspectral images from 13 histological slides (482 total images), having a spatial dimension of 800 × 1004 pixels and a spectral dimension of 826 spectral channels. The format of the hyperspectral images is ENVI, the standard format for the storage of hyperspectral images. The ENVI format consists of a flat-binary raster file which may or may not have a file extension, accompanied by an ASCII header file (denoted as *.hdr). The data are stored in band-interleaved-by-line format. In addition, dark and white references were captured to perform a calibration of the raw image, which is a standard procedure in hyperspectral image processing.
The slides were stained with hematoxylin and eosin and captured using a custom hyperspectral microscopic system at 20× magnification. The ground-truth annotation for this dataset is the diagnosis of the slides (tumor _T_ or not tumor _NT_ ) performed by skilled histopathologists after the visual examination of the stained slides, according to the World Health Organization classification of tumors of the nervous system. As far as we are concerned, there are no commercial hyperspectral whole slide scanners. Also, the availability of hyperspectral microscopes is still limited in the market.
The microscope is an Olympus BX-53 (Olympus, Tokyo, Japan). The hyperspectral camera is a Hyperspec® VNIR A-Series from HeadWall Photonics (Fitchburg, MA, USA), which is based on an imaging spectrometer coupled to a charge-coupled device sensor, the Adimec-1000m (Adimec, Eindhoven, Netherlands). This hyperspectral system works in the visual and near-infrared spectral range from 400 to 1000 nm with a spectral resolution of 2.8 nm, sampling 826 spectral channels, and 1004 spatial pixels. The push-broom camera performs a spatial scanning to acquire a hyperspectral cube with a mechanical stage (SCAN, Märzhäuser, Wetzlar, Germany) attached to the microscope, which provides an accurate movement of the slides. The objective lenses are from the LMPLFLN family (Olympus, Tokyo, Japan), optimized for infrared observations.
More information about the dataset can be found in this manuscript.
a
Invasive Ductal Carcinoma (IDC) Histology Image Dataset
academictorrents.com
bittorrent
Updated Feb 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2019). Invasive Ductal Carcinoma (IDC) Histology Image Dataset [Dataset]. https://academictorrents.com/details/e40bd59ab08861329ce3c418be191651f35e2ffa
Explore at:
bittorrent(1644892042)Available download formats
Dataset updated
Feb 22, 2019
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide. Dataset Description The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative and 78,786 IDC positive). Each patch’s file name is of the format: u_xX_yY_classC.png — > example 10253_idx5_x1351_y1101_class0.png Where u is the patient ID (10253_idx5), X is the x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.
c
CAnine CuTaneous Cancer Histology Dataset
stage.cancerimagingarchive.net
cancerimagingarchive.net
json, n/a, svs +1
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2022). CAnine CuTaneous Cancer Histology Dataset [Dataset]. http://doi.org/10.7937/TCIA.2M93-FX66
Explore at:
n/a, zip and sqlite, json, svsAvailable download formats
Unique identifier
https://doi.org/10.7937/TCIA.2M93-FX66
Dataset updated
Jan 12, 2022
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Jan 12, 2022
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
We present a large-scale dataset of 350 histologic samples of seven different canine cutaneous tumors. All samples were obtained through surgical resection due to neoplastic indicators and were selected retrospectively from the biopsy archive of the Institute for Veterinary Pathology of the Freie Universität Berlin according to sufficient tissue preservation and presence of characteristic histologic features for the corresponding tumor subtypes. Samples were stained with a routine Hematoxylin & Eosin dye and digitized with two Leica linear scanning systems at a resolution of 0.25 um/pixel. Together with the 350 whole slide images, we provide a database consisting of 12,424 polygon annotations for six non-neoplastic tissue classes (epidermis, dermis, subcutis, bone, cartilage, and a joint class of inflammation and necrosis) and seven tumor classes (melanoma, mast cell tumor, squamous cell carcinoma, peripheral nerve sheath tumor, plasmacytoma, trichoblastoma, and histiocytoma).
The polygon annotations were generated using the open source software SlideRunner (https://github.com/DeepPathology/SlideRunner). Within SlideRunner, users can view whole slide images (WSIs) and zoom through their magnification levels. Using multiple clicks or click-and-drag, the pathologist annotated polygons for 13 classes (epidermis, dermis, subcutis, bone, cartilage, a joint class of inflammation and necrosis, melanoma, mast cell tumor, squamous cell carcinoma, peripheral nerve sheath tumor, plasmacytoma, trichoblastoma, and histiocytoma) on 287 WSIs. The remaining WSIs were annotated by three medical students in their 8th semester supervised by the leading pathologist who later reviewed these annotations for correctness and completeness.
Due to the large size of the dataset and the extensive annotations, it provides a good basis for segmentation and classification algorithms based on supervised learning. Previous work [1-4] has shown, that due to various homologies between the species, canine cutaneous tissue can serve as a model for human samples. Prouteau et al. have published an extensive comparison of the two species especially for cutaneous tumors and include homologies between canine and human oncology regarding "clinical and histological appearance, biological behavior, tumor genetics, molecular pathways and targets, and response to therapies" [1]. Ranieri et al. highlight that pet dogs and humans share many environmental risk factors and show the highest risk for cancer development at similar points of time respective to their life spans [2]. Both, Ranieri et al. and Pinho et al. highlight the potential of using insights from experiments on canine samples for developing human cancer treatments [2,3]. From a technical perspective, Aubreville et al. have shown that canine samples can be used to aid human cancer research through the use of transfer learning methods [4].
Potential users of the dataset can load the SQLite database into their custom installation of SlideRunner and adapt or extend the database with custom annotations. Furthermore, we converted the annotations to the COCO JSON format, which is commonly used by computer scientists for training neural networks. Its pixel-level annotations can be used for supervised segmentation algorithms as opposed to datasets that only provide clinical data on slide level.
References
Prouteau, Anaïs, and Catherine André. "Canine melanomas as models for human melanomas: Clinical, histological, and genetic comparison." Genes 10.7 (2019): 501. https://doi.org/10.3390/genes10070501
Ranieri, G., et al. "A model of study for human cancer: Spontaneous occurring tumors in dogs. Biological features and translation for new anticancer therapies." Critical reviews in oncology/hematology 88.1 (2013): 187-197. https://doi.org/10.1016/j.critrevonc.2013.03.005
Pinho, Salomé S., et al. "Canine tumors: a spontaneous animal model of human carcinogenesis." Translational Research 159.3 (2012): 165-172. https://doi.org/10.1016/j.trsl.2011.11.005
Aubreville, Marc, et al. "A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research." Scientific data 7.1 (2020): 1-10. https://doi.org/10.1038/s41597-020-00756-z
Collection of textures in colorectal cancer histology
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Nikolas Kather; Frank Gerrit Zöllner; Francesco Bianconi; Susanne M Melchers; Lothar R Schad; Timo Gaiser; Alexander Marx; Cleo-Aron Weis; Jakob Nikolas Kather; Frank Gerrit Zöllner; Francesco Bianconi; Susanne M Melchers; Lothar R Schad; Timo Gaiser; Alexander Marx; Cleo-Aron Weis (2020). Collection of textures in colorectal cancer histology [Dataset]. http://doi.org/10.5281/zenodo.53169
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.53169
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Nikolas Kather; Frank Gerrit Zöllner; Francesco Bianconi; Susanne M Melchers; Lothar R Schad; Timo Gaiser; Alexander Marx; Cleo-Aron Weis; Jakob Nikolas Kather; Frank Gerrit Zöllner; Francesco Bianconi; Susanne M Melchers; Lothar R Schad; Timo Gaiser; Alexander Marx; Cleo-Aron Weis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content

This data set represents a collection of textures in histological images of human colorectal cancer. It contains two files:

"Kather_texture_2016_image_tiles_5000.zip": a zipped folder containing 5000 histological images of 150 * 150 px each (74 * 74 µm). Each image belongs to exactly one of eight tissue categories (specified by the folder name).

"Kather_texture_2016_larger_images_10.zip": a zipped folder containing 10 larger histological images of 5000 x 5000 px each. These images contain more than one tissue type.

Image format

All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. Histological samples are fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).

Ethics statement

All experiments were approved by the institutional ethics board (medical ethics board II, University Medical Center Mannheim, Heidelberg University, Germany; approval 2015-868R-MA). The institutional ethics board waived the need for informed consent for this retrospective analysis of anonymized samples. All experiments were carried out in accordance with the approved guidelines and with the Declaration of Helsinki.

More information / data usage

For more information, please refer to the following article. Please cite this article when using the data set.

Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)

Contact

For questions, please contact:
Dr. Jakob Nikolas Kather
http://orcid.org/0000-0002-3730-5348
ResearcherID: D-4279-2015
Histopathology images for end-to-end AI, based on TCGA-BRCA
zenodo.org
bin, zip
Updated Sep 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Nikolas Kather; Jakob Nikolas Kather (2021). Histopathology images for end-to-end AI, based on TCGA-BRCA [Dataset]. http://doi.org/10.5281/zenodo.5337009
Explore at:
zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5337009
Dataset updated
Sep 1, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Nikolas Kather; Jakob Nikolas Kather
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
These are histopathological images which are derived from the TCGA-BRCA breast cancer histology dataset at https://portal.gdc.cancer.gov/ (please check this website for the original data license). They can be used for end-to-end artificial intelligence (AI) workflows such as DeepMed (https://github.com/KatherLab/deepmed) which aim to predict high-level features directly from digital images with weakly supervised transfer learning. Here, we use two subsets of these digitized images:

1) TCGA-BRCA-A2, these are all images from Walter Reed National Military Medical Center (tissue source site code A2, N=100 images) in the TCGA-BRCA database (tcga-brca-a2-deepmed-tiles.zip)

2) TCGA-BRCA-E2, these are all images from Roswell Park Comprehensive Cancer Center (tissue source site code E2, N=90 images) in the TCGA-BRCA database (tcga-brca-e2-deepmed-tiles.zip)

see also https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/tissue-source-site-codes

The images were preprocessed according to the Aachen Protocol for Deep Learning Histopathology which is available at https://zenodo.org/record/3694994. Specifically, digital whole slide images (SVS format) of hematoxylin & eosin (H&E) stained slides were tessellated (without manual annotations) into tiles of 256x256 px edge length at 1 µm/px. Then, images were color-normalized using the Macenko method as described before (https://www.nature.com/articles/s43018-020-0087-6) and saved as JPEG files. For the A2 cohort, an additional ZIP archive is provided in which only 100 random image tiles are saved for each patient (tcga-brca-a2-deepmed-tiles_100.zip). In addition, we provide a CLINI and a SLIDE table as defined in the "Aachen Protocol". The CLINI table contains clinico-pathological data for all included patients and it is derived from clinical information on www.cbioportal.org as well as from Thorsson et al. (https://pubmed.ncbi.nlm.nih.gov/29628290/). We recommend to use the A2 dataset for training and the E2 dataset for testing. Please cite the relevant papers if you re-use this dataset, more information is available on www.kather.ai
Histology and Cytology Market - Size & Growth
mordorintelligence.com
pdf,excel,csv,ppt
Updated Aug 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mordor Intelligence (2025). Histology and Cytology Market - Size & Growth [Dataset]. https://www.mordorintelligence.com/industry-reports/histology-and-cytology-market
Explore at:
pdf,excel,csv,pptAvailable download formats
Dataset updated
Aug 29, 2025
Dataset provided by
Authors
Mordor Intelligence
License
https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
Time period covered
2019 - 2030
Area covered
Global
Description
The Histology and Cytology Market Report Segments the Industry Into by Type of Examination (Histology, Cytology), by Test Type (Microscopy Tests, Molecular Genetics Tests, Flow Cytomtery), by End User (Hospitals and Clinics, Academic and Research Institutes, Other End Users), and Geography (North America, Europe, Asia-Pacific, Middle East and Africa, South America). Get Five Years of Historic Data and Five-Year Forecasts.
Colorectal Histology MNIST
kaggle.com
Updated Sep 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
K Scott Mader (2018). Colorectal Histology MNIST [Dataset]. https://www.kaggle.com/datasets/kmader/colorectal-histology-mnist/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 19, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
K Scott Mader
Description
Overview

The dataset serves as a much more interesting MNIST or CIFAR10 problem for biologists by focusing on histology tiles from patients with colorectal cancer. In particular, the data has 8 different classes of tissue (but Cancer/Not Cancer can also be an interesting problem).

Challenge

Classify tiles correctly into one of the eight classes

Which classes are most frequently confused?

What features can be used (like texture, see scikit-image) to improve classification?

How can these models be applied to the much larger 5000x5000 models? How can this be done efficiently?

Acknowledgements

The dataset has been copied from Zenodo: https://zenodo.org/record/53169#.W6HwwP4zbOQ with

made by: Kather, Jakob Nikolas; Zöllner, Frank Gerrit; Bianconi, Francesco; Melchers, Susanne M; Schad, Lothar R; Gaiser, Timo; Marx, Alexander; Weis, Cleo-Aron

The copy here is to make it more accessible to Kaggle users and allow kernels providing basic analysis of the data

Content

This data set represents a collection of textures in histological images of human colorectal cancer. It contains two files:

"Kather_texture_2016_image_tiles_5000.zip": a zipped folder containing 5000 histological images of 150 * 150 px each (74 * 74 µm). Each image belongs to exactly one of eight tissue categories (specified by the folder name). "Kather_texture_2016_larger_images_10.zip": a zipped folder containing 10 larger histological images of 5000 x 5000 px each. These images contain more than one tissue type. Image format

All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. Histological samples are fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).

Ethics statement

All experiments were approved by the institutional ethics board (medical ethics board II, University Medical Center Mannheim, Heidelberg University, Germany; approval 2015-868R-MA). The institutional ethics board waived the need for informed consent for this retrospective analysis of anonymized samples. All experiments were carried out in accordance with the approved guidelines and with the Declaration of Helsinki.

More information / data usage

For more information, please refer to the following article. Please cite this article when using the data set.

Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)

Contact

For questions, please contact: Dr. Jakob Nikolas Kather http://orcid.org/0000-0002-3730-5348 ResearcherID: D-4279-2015
T
Histology Images
dataverse.tdl.org
bin, tiff
Updated Jul 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Manuel Rausch; Manuel Rausch (2025). Histology Images [Dataset]. http://doi.org/10.18738/T8/2TRJGV
Explore at:
tiff(305043661), tiff(60527912), tiff(304816484), tiff(39798664), tiff(123446044), tiff(84851570), tiff(305123678), tiff(95427082), tiff(43023690), tiff(274739036), tiff(274908783), tiff(111409192), tiff(395484647), tiff(3709404), tiff(59850964), tiff(138685062), tiff(274968431), tiff(89903438), tiff(244586650), tiff(274848780), tiff(538347464), tiff(395341086), tiff(334967360), bin(145514615), tiff(274679406), tiff(304882663), tiff(144271684), tiff(305043649), tiff(132441176), tiff(110401786), tiff(3393602), tiff(274679402), tiff(274674370), tiff(43633228), tiff(274789138), tiff(54217104), tiff(109127200), tiff(47767166), tiff(335040089), tiff(365424687), tiff(218766060), tiff(79509788), tiff(149377538), tiff(305006202), tiff(304811428), tiff(132885914), tiff(107570714), tiff(95992828), tiff(90157068), tiff(97075536), tiff(184240592), tiff(348749818), tiff(48248646), bin(127257579), tiff(77108392), tiff(77192396), tiff(98917652), tiff(98698762), tiff(82466160), tiff(43420754), bin(146959519), tiff(118006130), tiff(131932094), tiff(123594212), tiff(91472800), tiff(103293846), tiff(244808845), bin(104125585), tiff(81960610), tiff(81212676), tiff(274739038), tiff(139882774), tiff(75472836), bin(158575881), tiff(83709130), tiff(248175578), tiff(151803200), tiff(227368818), tiff(109102634), tiff(274739062), tiff(109785254), tiff(76784358), tiff(103263438), tiff(395255272), tiff(3943652), bin(112244761), tiff(88132926), tiff(335149804), tiff(47219828), tiff(80156432), tiff(101763730), tiff(255316052), tiff(86241596), tiff(46460106), tiff(214424216), tiff(274906588), tiff(214438015), tiff(81091712), tiff(74374540), tiff(5131020), tiff(75241784), tiff(109890922), tiff(365099362), tiff(214424132), tiff(145807472), tiff(274744068), tiff(146055872), tiff(244576582), tiff(85873494), tiff(144499352), tiff(46880398), tiff(76744740), tiff(79701646), bin(142536583), tiff(48805260), tiff(3373912), tiff(244811007), tiff(118869484), bin(127533629), tiff(154128197), tiff(3872618), tiff(141701696), tiff(305109840), bin(62463023), tiff(80146372), tiff(116358374), bin(112669697), tiff(69020052), tiff(274679422), tiff(274679424), tiff(405065079), tiff(69280042), tiff(244528524), bin(88307667), tiff(244528526), tiff(237099558), tiff(274725217), tiff(136394806), tiff(79713356), tiff(274725215), tiff(304896484), tiff(94523486), bin(140012223), tiff(244811010), tiff(61052654), tiff(76846506), tiff(154123169), tiff(80756046), tiff(91226084), tiff(274734004), tiff(107059724), tiff(365293404), tiff(214424190), tiff(274739034), tiff(42203268), tiff(274679404), tiff(126950400), tiff(97304874), tiff(274968432), tiff(3259124), tiff(38450432), tiff(153127754), tiff(184285615), tiff(214667410), tiff(184280593), tiff(141645856), tiff(395269090), tiff(46781748), tiff(214607028), tiff(95224526), tiff(50681088)Available download formats
Unique identifier
https://doi.org/10.18738/T8/2TRJGV
Dataset updated
Jul 29, 2025
Dataset provided by
Texas Data Repository
Authors
Manuel Rausch; Manuel Rausch
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This data set contains the histological images as reported in our manuscript "Tricuspid valve maladaptation in sheep with biventricular heart failure: The posterior and septal leaflets"
u
Registered histology, MRI, and manual annotations of over 300 brain regions...
rdr.ucl.ac.uk
txt
Updated Oct 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eugenio Iglesias Gonzalez; Adria Casamitjana; Alessia Atzeni; Benjamin Billot; David Thomas; Emily Blackburn; James Hughes; Juri Althonayan; Loic Peter; Matteo Mancini; Nellie Robinson; Peter Schmidt; Shauna Crampsie (2023). Registered histology, MRI, and manual annotations of over 300 brain regions in 5 human hemispheres (data from ERC Starting Grant 677697 "BUNGEE-TOOLS") [Dataset]. http://doi.org/10.5522/04/24243835.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5522/04/24243835.v1
Dataset updated
Oct 6, 2023
Dataset provided by
University College London
Authors
Eugenio Iglesias Gonzalez; Adria Casamitjana; Alessia Atzeni; Benjamin Billot; David Thomas; Emily Blackburn; James Hughes; Juri Althonayan; Loic Peter; Matteo Mancini; Nellie Robinson; Peter Schmidt; Shauna Crampsie
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Summary:

This repository includes data related to the ERC Starting Grant project 677697: "Building Next-Generation Computational Tools for High Resolution Neuroimaging Studies" (BUNGEE-TOOLS). It includes: (a) Dense histological sections from five human hemispheres with manual delineations of >300 brain regions; (b) Corresponding ex vivo MRI scans; (c) Dissection photographs; (d) A spatially aligned version of the dataset; (e) A probabilistic atlas built from the hemispheres; and (f) Code to apply the atlas to automated segmentation of in vivo MRI scans.

More detailed description on what this dataset includes:

Data files and Python code for Bayesian segmentation of human brain MRI based on a next-generation, high-resolution histological atlas: "Next-Generation histological atlas for high-resolution segmentation of human brain MRI" A Casamitjana et al., in preparation. This repository contains a set of zip files, each corresponding to one directory. Once decompressed, each directory has a readme.txt file explaining its contents. The list of zip files / compressed directories is:

3dAtlas.zip: nifti files with summary imaging volumes of the probabilistic atlas.

BlockFacePhotoBlocks.zip: nifti files with the blackface photographs acquired during tissue sectioning, reconstructed into 3D volumes (in RGB).

Histology.zip: jpg files with the LFB and H&E stained sections.

HistologySegmentations.zip: 2D nifti files with the segmentations of the histological sections.

MRI.zip: ex vivo T2-weighted MRI scans and corresponding FreeSurfer processing files

SegmentationCode.zip: contains the the Python code and data files that we used to segment brain MRI scans and obtain the results presented in the article (for reproducibility purposes). Note that it requires an installation of FreeSurfer. Also, note that the code is also maintained in FreeSurfer (but may not produce exactly the same results): https://surfer.nmr.mgh.harvard.edu/fswiki/HistoAtlasSegmentation

WholeHemispherePhotos.zip: photographs of the specimens prior to dissection

WholeSlicePhotos.zip: photographs of the tissue slabs prior to blocking.

We also note that the registered images for the five cases can be found in GitHub: https://github.com/UCL/BrainAtlas-P41-16 https://github.com/UCL/BrainAtlas-P57-16 https://github.com/UCL/BrainAtlas-P58-16 https://github.com/UCL/BrainAtlas-P85-18 https://github.com/UCL/BrainAtlas-EX9-19
These registered images can be interactively explored with the following web interface: https://github-pages.ucl.ac.uk/BrainAtlas/#/atlas
f
Tumor histology and multiplicity.
figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Claire B. Pollock; Yuzhi Yin; Hongyan Yuan; Xiao Zeng; Sruthi King; Xin Li; Levy Kopelovich; Chris Albanese; Robert I. Glazer (2023). Tumor histology and multiplicity. [Dataset]. http://doi.org/10.1371/journal.pone.0016215.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0016215.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Claire B. Pollock; Yuzhi Yin; Hongyan Yuan; Xiao Zeng; Sruthi King; Xin Li; Levy Kopelovich; Chris Albanese; Robert I. Glazer
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
P = 0.0125 vs. untreated WT mice for histological differences.*P = 0.0205 vs. untreated PDK1 mice for histological differences.Wild-type (WT) and MMTV-PDK1 transgenic mice (PDK1) were fed either standard rodent chow or chow supplemented with 0.005% (w/w) GW501516 (GW). GW501516 treatment produced a significant change in the percentage of adenosquamous/squamous carcinomas. There were no significant differences in tumor multiplicity between groups.
Identifying Cell Nuclei from Histology Images
kaggle.com
zip
Updated Jul 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandhaya (2019). Identifying Cell Nuclei from Histology Images [Dataset]. https://www.kaggle.com/sandhaya4u/histology-image-dataset
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jul 16, 2019
Authors
Sandhaya
Description
# # # Machine Learning Model for identifying Cell Nuclei from Histology Images

Machine learning model for identifying cell nuclei from histology images. The model having the ability to generalize across a variety of lighting conditions, cell types, magnifications, and imaging modalities.Imagine speeding up research for almost every disease, from lung cancer and heart disease to rare disorders. The Data Science Bowl offers to data scientist / practitioner a most ambitious mission i.e. create an algorithm to automate nucleus detection & create an algorithm to detect all non overlapped nuclei from the given test data i.e. It should have the capability for instance segmentation. We’ve all seen people suffer from diseases like cancer, heart disease, chronic obstructive pulmonary disease, Alzheimer’s, and diabetes. Many have seen their loved ones pass away. Think how many lives would be transformed if cures came faster. By automating nucleus detection, you could help unlock cures faster—from rare disorders to the common cold

# ## Why nuclei?

Identifying the cells’ nuclei is the starting point for most analyses because most of the human body’s 30 trillion cells contain a nucleus full of DNA, the genetic code that programs each cell. Identifying nuclei allows researchers to identify each individual cell in a sample, and by measuring how cells react to various treatments, the researcher can understand the underlying biological processes at work.By participating, teams will work to automate the process of identifying nuclei, which will allow for more efficient drug testing, shortening the 10 years it takes for each new drug to come to market

Acknowledgements

The success and final outcome of this project required a lot of guidance and assistance from many people and I am extremely privileged to have got this all along the completion of my project. All that I have done is only due to such supervision and assistance and I would not forget to thank them.I owe my deep gratitude to our project guide C - DAC Noida, who took keen interest on my project work and guided me all along, till the completion of our project work by providing all the necessary information for developing a good system.

Inspiration

The Data Science Bowl, presented by Booz Allen and Kaggle, is the world’s premier data science for social good competition. The Data Science Bowl brings together data scientists, technologists, domain experts, and organizations to take on the world’s challenges with data and technology. It’s a platform through which people can harness their passion, unleash their curiosity, and amplify their impact to effect change on a global scale
G
AI-Enhanced Histology Scanner Market Research Report 2033
growthmarketreports.com
csv, pdf, pptx
Updated Aug 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Growth Market Reports (2025). AI-Enhanced Histology Scanner Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/ai-enhanced-histology-scanner-market
Explore at:
pptx, csv, pdfAvailable download formats
Dataset updated
Aug 4, 2025
Dataset authored and provided by
Growth Market Reports
Time period covered
2024 - 2032
Area covered
Global
Description
AI-Enhanced Histology Scanner Market Outlook

According to our latest research, the global AI-Enhanced Histology Scanner market size reached USD 1.32 billion in 2024, demonstrating robust momentum driven by technological advancements and increasing adoption in clinical and research settings. The market is forecasted to reach USD 4.68 billion by 2033, expanding at a compelling CAGR of 15.2% from 2025 to 2033. This growth is primarily fueled by the rising demand for precision diagnostics, the integration of artificial intelligence in pathology workflows, and the global surge in cancer prevalence, which necessitates rapid and accurate histological analysis.

One of the primary growth drivers for the AI-Enhanced Histology Scanner market is the escalating burden of chronic diseases, particularly cancer, worldwide. The increasing incidence of cancer cases has prompted healthcare providers and research organizations to adopt advanced diagnostic tools that offer high throughput, accuracy, and reproducibility. AI-powered histology scanners are revolutionizing traditional pathology by automating image analysis, reducing human error, and enabling pathologists to deliver faster and more accurate diagnoses. These innovations not only improve patient outcomes but also streamline laboratory workflows, making them indispensable in modern healthcare infrastructure. Furthermore, the growing awareness about early disease detection and preventive healthcare is spurring the adoption of these scanners in both developed and emerging economies.

Another significant factor contributing to market expansion is the rapid evolution of artificial intelligence and machine learning technologies. With improvements in deep learning algorithms, computer vision, and data analytics, AI-enhanced histology scanners are now capable of processing vast quantities of high-resolution images, detecting subtle morphological changes, and providing actionable insights for clinicians and researchers. The integration of cloud-based platforms and interoperability with laboratory information systems further enhances the scalability and accessibility of these solutions. Additionally, the increasing focus on personalized medicine and targeted therapies is driving demand for advanced histopathological tools capable of precise tissue characterization and molecular profiling, further cementing the role of AI-enhanced scanners in the diagnostic and research continuum.

The market is also benefiting from favorable regulatory policies and increased investments in healthcare infrastructure, particularly in emerging economies. Governments and private stakeholders are allocating substantial resources towards upgrading laboratory capabilities, fostering public-private partnerships, and supporting the development of innovative diagnostic technologies. The proliferation of digital pathology, combined with the adoption of AI-driven solutions, is enabling healthcare systems to overcome challenges related to pathologist shortages and rising caseloads. Moreover, collaborations between technology providers, academic institutions, and pharmaceutical companies are accelerating the commercialization of next-generation histology scanners, expanding their reach across diverse end-user segments.

From a regional perspective, North America continues to dominate the AI-Enhanced Histology Scanner market, accounting for the largest revenue share in 2024, followed closely by Europe and the rapidly growing Asia Pacific region. North America's leadership position is underpinned by advanced healthcare infrastructure, a high concentration of leading technology companies, and robust investment in R&D. Europe benefits from strong government support for digital health initiatives and a well-established network of research institutions. Meanwhile, Asia Pacific is witnessing the fastest growth, driven by increasing healthcare expenditure, rising awareness of digital pathology, and expanding collaborations between local and global market players. The regional landscape is further shaped by evolving regulatory frameworks, reimbursement policies, and the pace of technological adoption, making it a dynamic and competitive market for stakeholders.

"https://growthmarketreports.com/request-sample/28643">
<button class="bt
SE Marine Mammal Histology/Tissue data
fisheries.noaa.gov
s.cnmilf.com
+2more
Updated Jan 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Southeast Fisheries Science Center (2025). SE Marine Mammal Histology/Tissue data [Dataset]. https://www.fisheries.noaa.gov/inport/item/26507
Explore at:
Dataset updated
Jan 9, 2025
Dataset provided by
Southeast Fisheries Science Center
Time period covered
1992 - Oct 19, 2125
Area covered

Description
Tissue samples are collected from stranded marine mammals in the Southeastern United States. These tissue samples are examined histologically and evaluated to identify diseases, parasites, and other factors that may result in morbidity and mortality of marine mammals. These data document the different types of diseases or other health effects seen in stranded marine mammals.
H
Histology Service Report
datainsightsmarket.com
doc, pdf, ppt
Updated Aug 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Histology Service Report [Dataset]. https://www.datainsightsmarket.com/reports/histology-service-525252
Explore at:
ppt, pdf, docAvailable download formats
Dataset updated
Aug 13, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global histology services market is experiencing robust growth, driven by the increasing prevalence of chronic diseases necessitating extensive diagnostic testing, the burgeoning pharmaceutical and biotechnology industries demanding preclinical and clinical research support, and the rising adoption of advanced imaging techniques. The market's expansion is further fueled by technological advancements in automated tissue processing, digital pathology, and AI-powered image analysis, leading to improved efficiency, accuracy, and throughput in histological examinations. While challenges exist, such as stringent regulatory approvals and high operational costs associated with specialized equipment and skilled personnel, these are being mitigated by strategic partnerships and outsourcing strategies among market players. The forecast period (2025-2033) anticipates continued strong growth, with a projected CAGR exceeding 5% (a conservative estimate based on industry averages for related medical services). This growth will be largely influenced by the ongoing development and adoption of novel histological techniques, expansion into emerging markets, and the increasing demand for personalized medicine, which relies heavily on detailed histological assessments. The market is segmented by service type (routine histology, immunohistochemistry, special stains, etc.), application (oncology, pathology, drug discovery, etc.), and end-user (pharmaceutical and biotechnology companies, research institutions, hospitals, etc.). Key players, including Zyagen, Inc., Scantox, HistologiX, Charles River Laboratories, and others, are actively engaged in expanding their service portfolios, investing in R&D, and pursuing strategic acquisitions to maintain a competitive edge. Regional analysis reveals strong growth in North America and Europe, driven by established healthcare infrastructures and high research activity. However, Asia-Pacific is expected to show significant growth potential in the coming years due to rapid economic development, rising healthcare expenditure, and increasing awareness about advanced diagnostic techniques. The competitive landscape is characterized by a mix of large multinational corporations and specialized niche players, each catering to specific segments within the market. The market's future hinges on continuous technological innovation, regulatory compliance, and the ability to meet the evolving needs of its diverse clientele.
f
Histology findings.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jun 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
van Dijk, M. R.; Kappelle, L. J.; Frijns, C. J. M.; Zoetemeyer, S.; Starmans, N. L. P. (2021). Histology findings. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000882360
Explore at:
Dataset updated
Jun 28, 2021
Authors
van Dijk, M. R.; Kappelle, L. J.; Frijns, C. J. M.; Zoetemeyer, S.; Starmans, N. L. P.
Description
Histology findings.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx (2020). 100,000 histological images of human colorectal cancer and healthy tissue [Dataset]. http://doi.org/10.5281/zenodo.1214456

100,000 histological images of human colorectal cancer and healthy tissue

Explore at:

175 scholarly articles cite this dataset (View in Google Scholar)

zipAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.1214456

Dataset updated

Jan 24, 2020

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data Description "NCT-CRC-HE-100K"

This is a set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.
All images are 224x224 pixels (px) at 0.5 microns per pixel (MPP). All images are color-normalized using Macenko's method (http://ieeexplore.ieee.org/abstract/document/5193250/, DOI 10.1109/ISBI.2009.5193250).
Tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).
These images were manually extracted from N=86 H&E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples from the NCT Biobank (National Center for Tumor Diseases, Heidelberg, Germany) and the UMM pathology archive (University Medical Center Mannheim, Mannheim, Germany). Tissue samples contained CRC primary tumor slides and tumor tissue from CRC liver metastases; normal tissue classes were augmented with non-tumorous regions from gastrectomy specimen to increase variability.

Ethics statement "NCT-CRC-HE-100K"

All experiments were conducted in accordance with the Declaration of Helsinki, the International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS), the Belmont Report and the U.S. Common Rule. Anonymized archival tissue samples were retrieved from the tissue bank of the National Center for Tumor diseases (NCT, Heidelberg, Germany) in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (tissue bank decision numbers 2152 and 2154, granted to Niels Halama and Jakob Nikolas Kather; informed consent was obtained from all patients as part of the NCT tissue bank protocol, ethics board approval S-207/2005, renewed on 20 Dec 2017). Another set of tissue samples was provided by the pathology archive at UMM (University Medical Center Mannheim, Heidelberg University, Mannheim, Germany) after approval by the institutional ethics board (Ethics Board II at University Medical Center Mannheim, decision number 2017-806R-MA, granted to Alexander Marx and waiving the need for informed consent for this retrospective and fully anonymized analysis of archival samples).

Data set "CRC-VAL-HE-7K"

This is a set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K). It can be used as a validation set for models trained on the larger data set. Like in the larger data set, images are 224x224 px at 0.5 MPP. All tissue samples were provided by the NCT tissue bank, see above for further details and ethics statement.

Data set "NCT-CRC-HE-100K-NONORM"

This is a slightly different version of the "NCT-CRC-HE-100K" image set: This set contains 100,000 images in 9 tissue classes at 0.5 MPP and was created from the same raw data as "NCT-CRC-HE-100K". However, no color normalization was applied to these images. Consequently, staining intensity and color slightly varies between the images. Please note that although this image set was created from the same data as "NCT-CRC-HE-100K", the image regions are not completely identical because the selection of non-overlapping tiles from raw images was a stochastic process.

General comments

Please note that the classes are only roughly balanced. Classifiers should never be evaluated based on accuracy in the full set alone. Also, if a high risk of training bias is excepted, balancing the number of cases per class is recommended.

Clear search

Close search

Google apps

Main menu

100,000 histological images of human colorectal cancer and healthy tissue

2 million histological images of breast cancer tumors with her2 labels

Colon Histology Dataset

Colon Histology

colorectal_histology

BACH Dataset : Grand Challenge on Breast Cancer Histology images

Hyperspectral Histological Images for Diagnosis of Human Glioblastoma

Invasive Ductal Carcinoma (IDC) Histology Image Dataset

CAnine CuTaneous Cancer Histology Dataset

References

Collection of textures in colorectal cancer histology

Histopathology images for end-to-end AI, based on TCGA-BRCA

Histology and Cytology Market - Size & Growth

Colorectal Histology MNIST

Overview

Challenge

Acknowledgements

Content

Ethics statement

More information / data usage

Contact

Histology Images

Registered histology, MRI, and manual annotations of over 300 brain regions...

Tumor histology and multiplicity.

Identifying Cell Nuclei from Histology Images

Acknowledgements

Inspiration

AI-Enhanced Histology Scanner Market Research Report 2033

AI-Enhanced Histology Scanner Market Outlook

SE Marine Mammal Histology/Tissue data

Histology Service Report

Histology findings.

100,000 histological images of human colorectal cancer and healthy tissueSee More Versions

100,000 histological images of human colorectal cancer and healthy tissue