100+ datasets found

c
Hyperspectral Histological Images for Diagnosis of Human Glioblastoma
cancerimagingarchive.net
n/a, png and envi
Updated May 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2024). Hyperspectral Histological Images for Diagnosis of Human Glioblastoma [Dataset]. http://doi.org/10.7937/z1k6-vd17
Explore at:
png and envi, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/z1k6-vd17
Dataset updated
May 24, 2024
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
May 24, 2024
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
Hyperspectral imaging technology combines the main features of two existing technologies: conventional imaging and spectroscopy. Thus, hyperspectral cameras make it possible to analyze, at the same time and in a non-contact way, the morphological features and chemical composition of the objects captured. The information provided by hyperspectral imaging can be used to detect patterns, cells, or biomarkers to identify diseases. There are different alternatives for processing them and there is a lack of publicly available datasets of medical hyperspectral images. To the best of our knowledge, this is the first open access dataset containing histological hyperspectral images of glioblastoma brain tumors, which can be set as a benchmark for researchers to compare their approaches.
This dataset includes 13 subjects. Each subject has a single histological slide with multiple hyperspectral images captured from each slide where deemed relevant by the pathologists (this number varies for each slide). The database is composed of 469 annotated hyperspectral images from 13 histological slides (482 total images), having a spatial dimension of 800 × 1004 pixels and a spectral dimension of 826 spectral channels. The format of the hyperspectral images is ENVI, the standard format for the storage of hyperspectral images. The ENVI format consists of a flat-binary raster file which may or may not have a file extension, accompanied by an ASCII header file (denoted as *.hdr). The data are stored in band-interleaved-by-line format. In addition, dark and white references were captured to perform a calibration of the raw image, which is a standard procedure in hyperspectral image processing.
The slides were stained with hematoxylin and eosin and captured using a custom hyperspectral microscopic system at 20× magnification. The ground-truth annotation for this dataset is the diagnosis of the slides (tumor _T_ or not tumor _NT_ ) performed by skilled histopathologists after the visual examination of the stained slides, according to the World Health Organization classification of tumors of the nervous system. As far as we are concerned, there are no commercial hyperspectral whole slide scanners. Also, the availability of hyperspectral microscopes is still limited in the market.
The microscope is an Olympus BX-53 (Olympus, Tokyo, Japan). The hyperspectral camera is a Hyperspec® VNIR A-Series from HeadWall Photonics (Fitchburg, MA, USA), which is based on an imaging spectrometer coupled to a charge-coupled device sensor, the Adimec-1000m (Adimec, Eindhoven, Netherlands). This hyperspectral system works in the visual and near-infrared spectral range from 400 to 1000 nm with a spectral resolution of 2.8 nm, sampling 826 spectral channels, and 1004 spatial pixels. The push-broom camera performs a spatial scanning to acquire a hyperspectral cube with a mechanical stage (SCAN, Märzhäuser, Wetzlar, Germany) attached to the microscope, which provides an accurate movement of the slides. The objective lenses are from the LMPLFLN family (Olympus, Tokyo, Japan), optimized for infrared observations.
More information about the dataset can be found in this manuscript.
100,000 histological images of human colorectal cancer and healthy tissue
zenodo.org
data.niaid.nih.gov
zip
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx (2020). 100,000 histological images of human colorectal cancer and healthy tissue [Dataset]. http://doi.org/10.5281/zenodo.1214456
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1214456
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakob Nikolas Kather; Jakob Nikolas Kather; Niels Halama; Alexander Marx; Niels Halama; Alexander Marx
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Description "NCT-CRC-HE-100K"

This is a set of 100,000 non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human colorectal cancer (CRC) and normal tissue.

All images are 224x224 pixels (px) at 0.5 microns per pixel (MPP). All images are color-normalized using Macenko's method (http://ieeexplore.ieee.org/abstract/document/5193250/, DOI 10.1109/ISBI.2009.5193250).

Tissue classes are: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).

These images were manually extracted from N=86 H&E stained human cancer tissue slides from formalin-fixed paraffin-embedded (FFPE) samples from the NCT Biobank (National Center for Tumor Diseases, Heidelberg, Germany) and the UMM pathology archive (University Medical Center Mannheim, Mannheim, Germany). Tissue samples contained CRC primary tumor slides and tumor tissue from CRC liver metastases; normal tissue classes were augmented with non-tumorous regions from gastrectomy specimen to increase variability.

Ethics statement "NCT-CRC-HE-100K"

All experiments were conducted in accordance with the Declaration of Helsinki, the International Ethical Guidelines for Biomedical Research Involving Human Subjects (CIOMS), the Belmont Report and the U.S. Common Rule. Anonymized archival tissue samples were retrieved from the tissue bank of the National Center for Tumor diseases (NCT, Heidelberg, Germany) in accordance with the regulations of the tissue bank and the approval of the ethics committee of Heidelberg University (tissue bank decision numbers 2152 and 2154, granted to Niels Halama and Jakob Nikolas Kather; informed consent was obtained from all patients as part of the NCT tissue bank protocol, ethics board approval S-207/2005, renewed on 20 Dec 2017). Another set of tissue samples was provided by the pathology archive at UMM (University Medical Center Mannheim, Heidelberg University, Mannheim, Germany) after approval by the institutional ethics board (Ethics Board II at University Medical Center Mannheim, decision number 2017-806R-MA, granted to Alexander Marx and waiving the need for informed consent for this retrospective and fully anonymized analysis of archival samples).

Data set "CRC-VAL-HE-7K"

This is a set of 7180 image patches from N=50 patients with colorectal adenocarcinoma (no overlap with patients in NCT-CRC-HE-100K). It can be used as a validation set for models trained on the larger data set. Like in the larger data set, images are 224x224 px at 0.5 MPP. All tissue samples were provided by the NCT tissue bank, see above for further details and ethics statement.

Data set "NCT-CRC-HE-100K-NONORM"

This is a slightly different version of the "NCT-CRC-HE-100K" image set: This set contains 100,000 images in 9 tissue classes at 0.5 MPP and was created from the same raw data as "NCT-CRC-HE-100K". However, no color normalization was applied to these images. Consequently, staining intensity and color slightly varies between the images. Please note that although this image set was created from the same data as "NCT-CRC-HE-100K", the image regions are not completely identical because the selection of non-overlapping tiles from raw images was a stochastic process.

General comments

Please note that the classes are only roughly balanced. Classifiers should never be evaluated based on accuracy in the full set alone. Also, if a high risk of training bias is excepted, balancing the number of cases per class is recommended.
Z
2 million histological images of breast cancer tumors with her2 labels
data.niaid.nih.gov
zenodo.org
Updated Aug 20, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Renan Valieris (2024). 2 million histological images of breast cancer tumors with her2 labels [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8383579
Explore at:
Dataset updated
Aug 20, 2024
Dataset provided by
Adriana Passos Bueno
Renan Valieris
Cynthia Aparecida Bueno de Toledo Osorio
Alexandre Defelicibus
Luan Martins
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Description This is a 2 million set of non-overlapping image patches from hematoxylin & eosin (H&E) stained histological images of human breast cancer tumor tissue.

The anonymized dataset comes from a cohort of BC patients from the A. C. Camargo Cancer Center (ACCCC, N = 504). All patients were treated for breast cancer at the ACCCC between 2019 and 2021. As part of their diagnosis, in HER2 IHC score 2+ cases, patients' HER2 status was determined following the ASCO guidelines updated in 2018, with visual evaluation of IHC assay and either a FISH or DDISH test. All cases with metastasis or neoadjuvant treatment were excluded.

A total of 426 H&E stained high resolution images (40x magnification) were scanned from biopsy and resection tissue samples with a Leica Aperio AT2 scanner. Ethical approval of the ACCCC study was given by the ethics committee of the Fundação Antônio Prudente. We divided the cases into the following 3 groups according to the results of the IHC and ISH tests: HER2-negative, HER2-low and HER2-high.

The slides were divided into 256 px x 256 px tiles at 0.5 um/pixel magnification. Then, we used a custom trained ConvNext-tiny neural network to only include tiles from the tumor region and its environment, generating a total of 2051877 image patches.

A sample is considered her2-negative with an IHC score of 0; her2-low with an IHC score of 1+ or an IHC score of 2+ with a negative ISH-based test result, and her2-high with an IHC score of 2+ with a positive ISH-based test or an IHC score of 3+.

The accompanying code used for training the models is available at https://github.com/tojallab/wsi-mil
Z
BACH Dataset : Grand Challenge on Breast Cancer Histology images
data.niaid.nih.gov
zenodo.org
Updated Jan 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aguiar, Paulo (2020). BACH Dataset : Grand Challenge on Breast Cancer Histology images [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3632034
Explore at:
Dataset updated
Jan 31, 2020
Dataset provided by
Polónia, António
Aguiar, Paulo
Eloy, Catarina
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
i3S Annotated Datasets on Digital Pathology

WELCOME

In an effort to contribute and push forward the field of Digital Pathology, Ipatimup and INEB, two major research institutions in Portugal, have joined forces in the construction of histology datasets to support grand Challenges on automatic classification of tissue malignancy. The researchers/pathologists responsible for the datasets are:

António Polónia (MD), Ipatimup/i3S

Catarina Eloy (MD, PhD), Ipatimup/i3S

Paulo Aguiar (PhD), INEB/i3S

This specific page refers to the Grand Challenge on Breast Cancer Histology images, or BACH Challenge

THE BACH CHALLENGE DATASET

ICIAR 2018 - Grand Challenge on Breast Cancer Histology images [Challenge organized by Teresa Araújo, Guilherme Aresta, António Polónia, Catarina Eloy and Paulo Aguiar]

For detailed information visit: https://iciar2018-challenge.grand-challenge.org/home/

THIS DATASET IS PUBLICALLY AVAILABLE UNDER A CREATIVE COMMONS CC BY-NC-ND LICENSE (ATTRIBUTION-NONCOMMERCIAL-NODERIVS) ESSENCIALLY, YOU ARE GRANTED ACCESS TO THE DATASET FOR USE IN YOUR RESEARCH AS LONG AS YOU CREDIT OUR WORK/PUBLICATIONS(*), BUT YOU CANNOT CHANGE THEM IN ANY WAY OR USE THEM COMMERCIALLY

(*) Aresta, Guilherme, et al. "BACH: Grand challenge on breast cancer histology images." Medical image analysis (2019).

(*) Araújo, Teresa, et al. "Classification of breast cancer histology images using convolutional neural networks." PloS one 12.6 (2017): e0177544.

(*) Fondón, Irene, et al. "Automatic classification of tissue malignancy for breast carcinoma diagnosis." Computers in biology and medicine 96 (2018): 41-51.
f
Data from: BreCaHAD: A Dataset for Breast Cancer Histopathological...
figshare.com
png
Updated Jan 28, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alper Aksac; Douglas J. Demetrick; Tansel Özyer; Reda Alhajj (2019). BreCaHAD: A Dataset for Breast Cancer Histopathological Annotation and Diagnosis [Dataset]. http://doi.org/10.6084/m9.figshare.7379186.v3
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7379186.v3
Dataset updated
Jan 28, 2019
Dataset provided by
figshare
Authors
Alper Aksac; Douglas J. Demetrick; Tansel Özyer; Reda Alhajj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consists of 1 .xlsx file, 2 .png files, 1 .json file and 1 .zip file:annotation_details.xlsx: The distribution of annotations in the previously mentioned six classes (mitosis, apoptosis, tumor nuclei, non-tumor nuclei, tubule, and non-tubule) is presented in a Excel spreadsheet.original.png: The input image.annotated.png: An example from the dataset. In the annotated image, blue circles indicate the tumor nuclei, pink circles show non-tumor nuclei such as blood cells, stroma nuclei, and lymphocytes; orange and green circles are mitosis and apoptosis, respectively; light blue circles are true lumen for tubules, and yellow circles represent white regions (non-lumen) such as fat, blood vessel, and broken tissues.data.json: The annotations for the BreCaHAD dataset are provided in JSON (JavaScript Object Notation) format. In the given example, the JSON file (ground truth) contains two mitosis and only one tumor nuclei annotations. Here, x and y are the coordinates of the centroid of the annotated object, and the values are between 0, 1.BreCaHAD.zip: An archive file containing dataset. Three folders are included: images (original images), groundTruth (json files), and groundTruth_display (groundTruth applied on original images)
a
Invasive Ductal Carcinoma (IDC) Histology Image Dataset
academictorrents.com
bittorrent
Updated Feb 22, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
None (2019). Invasive Ductal Carcinoma (IDC) Histology Image Dataset [Dataset]. https://academictorrents.com/details/e40bd59ab08861329ce3c418be191651f35e2ffa
Explore at:
bittorrent(1644892042)Available download formats
Dataset updated
Feb 22, 2019
Authors
None
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Invasive Ductal Carcinoma (IDC) is the most common subtype of all breast cancers. To assign an aggressiveness grade to a whole mount sample, pathologists typically focus on the regions which contain the IDC. As a result, one of the common pre-processing steps for automatic aggressiveness grading is to delineate the exact regions of IDC inside of a whole mount slide. Dataset Description The original dataset consisted of 162 whole mount slide images of Breast Cancer (BCa) specimens scanned at 40x. From that, 277,524 patches of size 50 x 50 were extracted (198,738 IDC negative and 78,786 IDC positive). Each patch’s file name is of the format: u_xX_yY_classC.png — > example 10253_idx5_x1351_y1101_class0.png Where u is the patient ID (10253_idx5), X is the x-coordinate of where this patch was cropped from, Y is the y-coordinate of where this patch was cropped from, and C indicates the class where 0 is non-IDC and 1 is IDC.
Z
Collection of textures in colorectal cancer histology
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zöllner, Frank Gerrit (2020). Collection of textures in colorectal cancer histology [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_53169
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Melchers, Susanne M
Weis, Cleo-Aron
Gaiser, Timo
Marx, Alexander
Zöllner, Frank Gerrit
Schad, Lothar R
Bianconi, Francesco
Kather, Jakob Nikolas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Content

This data set represents a collection of textures in histological images of human colorectal cancer. It contains two files:

"Kather_texture_2016_image_tiles_5000.zip": a zipped folder containing 5000 histological images of 150 * 150 px each (74 * 74 µm). Each image belongs to exactly one of eight tissue categories (specified by the folder name).

"Kather_texture_2016_larger_images_10.zip": a zipped folder containing 10 larger histological images of 5000 x 5000 px each. These images contain more than one tissue type.

Image format

All images are RGB, 0.495 µm per pixel, digitized with an Aperio ScanScope (Aperio/Leica biosystems), magnification 20x. Histological samples are fully anonymized images of formalin-fixed paraffin-embedded human colorectal adenocarcinomas (primary tumors) from our pathology archive (Institute of Pathology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany).

Ethics statement

All experiments were approved by the institutional ethics board (medical ethics board II, University Medical Center Mannheim, Heidelberg University, Germany; approval 2015-868R-MA). The institutional ethics board waived the need for informed consent for this retrospective analysis of anonymized samples. All experiments were carried out in accordance with the approved guidelines and with the Declaration of Helsinki.

More information / data usage

For more information, please refer to the following article. Please cite this article when using the data set.

Kather JN, Weis CA, Bianconi F, Melchers SM, Schad LR, Gaiser T, Marx A, Zollner F: Multi-class texture analysis in colorectal cancer histology (2016), Scientific Reports (in press)

Contact

For questions, please contact: Dr. Jakob Nikolas Kather http://orcid.org/0000-0002-3730-5348 ResearcherID: D-4279-2015
a
Gland Segmentation in Histology Images Challenge (GlaS) Dataset
academictorrents.com
bittorrent
Updated Sep 21, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Korsuk Sirinukunwattana (2016). Gland Segmentation in Histology Images Challenge (GlaS) Dataset [Dataset]. https://academictorrents.com/details/208814dd113c2b0a242e74e832ccac28fcff74e5
Explore at:
bittorrent(180902609)Available download formats
Dataset updated
Sep 21, 2016
Dataset authored and provided by
Korsuk Sirinukunwattana
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
![]() "We aim to bring together researchers who are interested in the gland segmentation problem, to validate the performance of their existing or newly invented algorithms on the same standard dataset. In this challenge, we will provide the participants with images of Haematoxylin and Eosin (H&E) stained slides, consisting of a wide range of histologic grades." ![]() ## Introduction Glands are important histological structures which are present in most organ systems as the main mechanism for secreting proteins and carbohydrates. It has been shown that malignant tumours arising from glandular epithelium, also known as adenocarcinomas, are the most prevalent form of cancer. The morphology of glands has been used routinely by pathologists to assess the degree of malignancy of several adenocarcinomas, including prostate, breast, lung, and colon. Accura
Identifying Cell Nuclei from Histology Images
kaggle.com
zip
Updated Jul 16, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandhaya (2019). Identifying Cell Nuclei from Histology Images [Dataset]. https://www.kaggle.com/sandhaya4u/histology-image-dataset
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jul 16, 2019
Authors
Sandhaya
Description
# # # Machine Learning Model for identifying Cell Nuclei from Histology Images

Machine learning model for identifying cell nuclei from histology images. The model having the ability to generalize across a variety of lighting conditions, cell types, magnifications, and imaging modalities.Imagine speeding up research for almost every disease, from lung cancer and heart disease to rare disorders. The Data Science Bowl offers to data scientist / practitioner a most ambitious mission i.e. create an algorithm to automate nucleus detection & create an algorithm to detect all non overlapped nuclei from the given test data i.e. It should have the capability for instance segmentation. We’ve all seen people suffer from diseases like cancer, heart disease, chronic obstructive pulmonary disease, Alzheimer’s, and diabetes. Many have seen their loved ones pass away. Think how many lives would be transformed if cures came faster. By automating nucleus detection, you could help unlock cures faster—from rare disorders to the common cold

# ## Why nuclei?

Identifying the cells’ nuclei is the starting point for most analyses because most of the human body’s 30 trillion cells contain a nucleus full of DNA, the genetic code that programs each cell. Identifying nuclei allows researchers to identify each individual cell in a sample, and by measuring how cells react to various treatments, the researcher can understand the underlying biological processes at work.By participating, teams will work to automate the process of identifying nuclei, which will allow for more efficient drug testing, shortening the 10 years it takes for each new drug to come to market

Acknowledgements

The success and final outcome of this project required a lot of guidance and assistance from many people and I am extremely privileged to have got this all along the completion of my project. All that I have done is only due to such supervision and assistance and I would not forget to thank them.I owe my deep gratitude to our project guide C - DAC Noida, who took keen interest on my project work and guided me all along, till the completion of our project work by providing all the necessary information for developing a good system.

Inspiration

The Data Science Bowl, presented by Booz Allen and Kaggle, is the world’s premier data science for social good competition. The Data Science Bowl brings together data scientists, technologists, domain experts, and organizations to take on the world’s challenges with data and technology. It’s a platform through which people can harness their passion, unleash their curiosity, and amplify their impact to effect change on a global scale
h
BACH
huggingface.co
Updated May 31, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laureηt Fainsin (2019). BACH [Dataset]. https://huggingface.co/datasets/1aurent/BACH
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 31, 2019
Authors
Laureηt Fainsin
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
BreAst Cancer Histology (BACH) Dataset: Grand Challenge on Breast Cancer Histology images

Description

The dataset is composed of Hematoxylin and eosin (H&E) stained breast histology microscopy images. Microscopy images are labelled as normal, benign, in situ carcinoma or invasive carcinoma according to the predominant cancer type in each image. The annotation was performed by two medical experts and images where there was disagreement were discarded. Images have the… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/BACH.
f
Structure of the BreaKHis dataset.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yun Jiang; Li Chen; Hai Zhang; Xiao Xiao (2023). Structure of the BreaKHis dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0214587.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0214587.t004
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Yun Jiang; Li Chen; Hai Zhang; Xiao Xiao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Structure of the BreaKHis dataset.
HistoArtifacts
zenodo.org
data.niaid.nih.gov
zip
Updated Apr 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neel Kanwal; Neel Kanwal (2025). HistoArtifacts [Dataset]. http://doi.org/10.5281/zenodo.10809442
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10809442
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Neel Kanwal; Neel Kanwal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Mar 12, 2024
Description
This dataset contains five notable histological artifacts: blur, blood (hemorrhage), air bubbles, folded tissue, and damaged tissue. This dataset is used in the following works, and a description of the dataset can be found at https://arxiv.org/abs/2403.07743.

The full dataset is explained and used in the article, "Equipping Computational Pathology Systems with Artifact Processing Pipelines: A Showcase for Computation and Performance Trade-offs.". https://arxiv.org/abs/2403.07743

See the detailed video explanation behind the motivation of artifact detection in computational pathology. in the video paper: "Extract, detect, eliminate: Enhancing reliability and performance of computational pathology through artifact processing pipelines" https://www.sciencetalks-journal.com/article/S2772-5693(24)00013-6/fulltext

Please cite the following papers while using the dataset, in full or partially:.

A sub-dataset contains folded tissues extracted at 20x and blur class used in the paper "Are you sure it’s an artifact? Artifact detection and uncertainty quantification in histological images". https://www.sciencedirect.com/science/article/pii/S0895611123001398

A sub-dataset using air bubbles is used in the paper: "Vision transformers for small histological datasets learned through knowledge distillation" https://link.springer.com/chapter/10.1007/978-3-031-33380-4_13
https://arxiv.org/abs/2305.17370

A sub-dataset using blood and damaged tissue is used in the paper: "Quantifying the effect of color processing on blood and damaged tissue detection in whole slide images" https://ieeexplore.ieee.org/abstract/document/9816283

"Equipping Computational Pathology Systems with Artifact Processing Pipelines: A Showcase for Computation and Performance Trade-offs.". https://arxiv.org/abs/2403.07743

"The Devil is in the Details: Whole Slide Image Acquisition and Processing for Artifacts Detection, Color Variation, and Data Augmentation: A Review" https://ieeexplore.ieee.org/document/9777677
c
CAnine CuTaneous Cancer Histology Dataset
dev.cancerimagingarchive.net
cancerimagingarchive.net
json, n/a, svs +1
Updated Jan 12, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Cancer Imaging Archive (2022). CAnine CuTaneous Cancer Histology Dataset [Dataset]. http://doi.org/10.7937/TCIA.2M93-FX66
Explore at:
zip and sqlite, svs, json, n/aAvailable download formats
Unique identifier
https://doi.org/10.7937/TCIA.2M93-FX66
Dataset updated
Jan 12, 2022
Dataset authored and provided by
The Cancer Imaging Archive
License
https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/
Time period covered
Jan 12, 2022
Dataset funded by
National Cancer Institutehttp://www.cancer.gov/
Description
We present a large-scale dataset of 350 histologic samples of seven different canine cutaneous tumors. All samples were obtained through surgical resection due to neoplastic indicators and were selected retrospectively from the biopsy archive of the Institute for Veterinary Pathology of the Freie Universität Berlin according to sufficient tissue preservation and presence of characteristic histologic features for the corresponding tumor subtypes. Samples were stained with a routine Hematoxylin & Eosin dye and digitized with two Leica linear scanning systems at a resolution of 0.25 um/pixel. Together with the 350 whole slide images, we provide a database consisting of 12,424 polygon annotations for six non-neoplastic tissue classes (epidermis, dermis, subcutis, bone, cartilage, and a joint class of inflammation and necrosis) and seven tumor classes (melanoma, mast cell tumor, squamous cell carcinoma, peripheral nerve sheath tumor, plasmacytoma, trichoblastoma, and histiocytoma).
The polygon annotations were generated using the open source software SlideRunner (https://github.com/DeepPathology/SlideRunner). Within SlideRunner, users can view whole slide images (WSIs) and zoom through their magnification levels. Using multiple clicks or click-and-drag, the pathologist annotated polygons for 13 classes (epidermis, dermis, subcutis, bone, cartilage, a joint class of inflammation and necrosis, melanoma, mast cell tumor, squamous cell carcinoma, peripheral nerve sheath tumor, plasmacytoma, trichoblastoma, and histiocytoma) on 287 WSIs. The remaining WSIs were annotated by three medical students in their 8th semester supervised by the leading pathologist who later reviewed these annotations for correctness and completeness.
Due to the large size of the dataset and the extensive annotations, it provides a good basis for segmentation and classification algorithms based on supervised learning. Previous work [1-4] has shown, that due to various homologies between the species, canine cutaneous tissue can serve as a model for human samples. Prouteau et al. have published an extensive comparison of the two species especially for cutaneous tumors and include homologies between canine and human oncology regarding "clinical and histological appearance, biological behavior, tumor genetics, molecular pathways and targets, and response to therapies" [1]. Ranieri et al. highlight that pet dogs and humans share many environmental risk factors and show the highest risk for cancer development at similar points of time respective to their life spans [2]. Both, Ranieri et al. and Pinho et al. highlight the potential of using insights from experiments on canine samples for developing human cancer treatments [2,3]. From a technical perspective, Aubreville et al. have shown that canine samples can be used to aid human cancer research through the use of transfer learning methods [4].
Potential users of the dataset can load the SQLite database into their custom installation of SlideRunner and adapt or extend the database with custom annotations. Furthermore, we converted the annotations to the COCO JSON format, which is commonly used by computer scientists for training neural networks. Its pixel-level annotations can be used for supervised segmentation algorithms as opposed to datasets that only provide clinical data on slide level.
References
Prouteau, Anaïs, and Catherine André. "Canine melanomas as models for human melanomas: Clinical, histological, and genetic comparison." Genes 10.7 (2019): 501. https://doi.org/10.3390/genes10070501
Ranieri, G., et al. "A model of study for human cancer: Spontaneous occurring tumors in dogs. Biological features and translation for new anticancer therapies." Critical reviews in oncology/hematology 88.1 (2013): 187-197. https://doi.org/10.1016/j.critrevonc.2013.03.005
Pinho, Salomé S., et al. "Canine tumors: a spontaneous animal model of human carcinogenesis." Translational Research 159.3 (2012): 165-172. https://doi.org/10.1016/j.trsl.2011.11.005
Aubreville, Marc, et al. "A completely annotated whole slide image dataset of canine breast cancer to aid human breast cancer research." Scientific data 7.1 (2020): 1-10. https://doi.org/10.1038/s41597-020-00756-z
m
Data from: A histopathological image dataset for grading breast invasive...
data.mendeley.com
Updated Feb 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamidreza Bolhasani (2020). A histopathological image dataset for grading breast invasive ductal carcinomas [Dataset]. http://doi.org/10.17632/w7jjcx7gj6.1
Explore at:
Unique identifier
https://doi.org/10.17632/w7jjcx7gj6.1
Dataset updated
Feb 3, 2020
Authors
Hamidreza Bolhasani
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this dataset, a histopathological microscopy image dataset of 922 images related to 124 patients with IDC is introduced. This dataset is published and accessible through web via: http://databiox.com. What makes this dataset distinct from the similar ones is that it contains equal number of specimens from each three grades of IDC, which led to around 50 specimens for each grade.
Test Dataset for Whole Slide Image Registration
zenodo.org
data.niaid.nih.gov
bin, tiff
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Romain Guiet; Romain Guiet; Nicolas Chiaruttini; Nicolas Chiaruttini (2024). Test Dataset for Whole Slide Image Registration [Dataset]. http://doi.org/10.5281/zenodo.4680455
Explore at:
tiff, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4680455
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Romain Guiet; Romain Guiet; Nicolas Chiaruttini; Nicolas Chiaruttini
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset associated to these documents:

https://c4science.ch/w/bioimaging_and_optics_platform_biop/teaching/dab-intensity/

https://c4science.ch/w/bioimaging_and_optics_platform_biop/image-processing/wsi_registration_fjii_qupath/

Sampled processed by the EPFL histology core facility

More documentation to come...
o
Breast Cancer Histology Images
omicsdi.org
ega-archive.org
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Breast Cancer Histology Images [Dataset]. https://www.omicsdi.org/dataset/ega/EGAD00010001911
Explore at:
Variables measured
Genomics
Description
Fresh frozen breast cancer H&E tissue images collected and annotated by the International Cancer Genome Consortium (ICGC), that included the BASIS collaboration. Associated with whole genome sequence data as originally described by Nik-Zainal et al, Nature, 2016 (DOI: 10.1038/nature17676) and deposited with ID EGAS00001001178
Histopathology data of bone marrow biopsies (HistBMP or HistMNIST)
zenodo.org
application/gzip
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jakub Tomczak; Jakub Tomczak (2020). Histopathology data of bone marrow biopsies (HistBMP or HistMNIST) [Dataset]. http://doi.org/10.5281/zenodo.1205024
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1205024
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Jakub Tomczak; Jakub Tomczak
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Data information

We prepared a dataset basing on histopathological images freely available on-line (http://www.enjoypath.com/). We selected 16 patients (patient IDs: 272, 274, 283, 289, 290, 291, 292, 295, 297, 298, 299). Each histopathological image represents a bone marrow biopsy. Diagnoses of the chosen cases were associated with different kinds of cancer (e.g., lymphoma, leukemia) or anemia. All original images were taken using HE, 40×, and each image was of size 336 × 448.

Data preparation

The original RGB representation was transformed to gray scale. Further, we divided each image into small patches of size 28 × 28. Eventually, we picked 10 patients for training, 3 patients for validation and 3 patients for testing, which resulted in 6,800 training images, 2,000 validation images and 2,000 test images. The selection of patients was performed in such a fashion that each dataset contained representative images with different diagnoses and amount of fat.

Since the small patches resemble a widely-used benchmark in machine learning/AI community called MNIST, the dataset is referred to as HistMNIST.

First usage

The dataset was used to train deep generative models (VAEs):

Tomczak, J. M., & Welling, M. (2016). Improving variational auto-encoders using householder flow. arXiv preprint arXiv:1611.09630.
r
Data from: ACROBAT - a multi-stain breast cancer histological...
researchdata.se
demo.researchdata.se
Updated Oct 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mattias Rantalainen; Johan Hartman (2023). ACROBAT - a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology [Dataset]. http://doi.org/10.48723/w728-p041
Explore at:
(418), (76735897912), (36540), (2982), (74182679049), (76914241171), (81512804565), (31248), (75799632383), (23401027210), (36876), (37413), (37036), (10301), (73134087512), (36333), (1168275)Available download formats
Unique identifier
https://doi.org/10.48723/w728-p041
Dataset updated
Oct 20, 2023
Dataset provided by
Karolinska Institutet
Authors
Mattias Rantalainen; Johan Hartman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2012 - 2018
Area covered
Stockholm County
Description
The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data table with one row for each WSI, specifying an anonymised patient ID, the stain or IHC antibody type of each WSI, as well as the magnification and microns per pixel at each available resolution level. Automated registration algorithm performance evaluation is possible through the ACROBAT challenge website based on over 37,000 landmark pair annotations from 13 annotators. While the primary purpose of this data set was the development and evaluation of WSI registration methods, this data set has the potential to facilitate further research in the context of computational pathology, for example in the areas of stain-guided learning, virtual staining, unsupervised learning and stain-independent models.

The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.

WSIs were anonymised by deleting the associated macro images, by generating filenames with random case IDs and by overwriting meta data fields with potentially personal information. Hamamatsu NDPI files were then converted using libvips (libvips.org/). WSIs are available as generic tiled TIFF WSIs (openslide.org/formats/generic-tiff/) at 10X magnification and lower image levels.

The data set is available for download in seven separate ZIP archives, five for the training data (train_part1.zip (71.47 GB), train_part2.zip (70.59 GB), train_part3.zip (75.91 GB), train_part4.zip (71.63 GB) and train_part5.zip (69.09 GB)), one for the validation data (valid.zip 21.79 GB) and one for the test data (test.zip 68.11 GB).

File listings and checksums in SHA1 format are available for checking archive/data integrity when downloading.

While it would be helpful to notify SND of any publications using this data set by sending an email to request@snd.gu.se, please note that this is not required to use the data.
1-segment radii in high anti-pimonidazole images.
plos.figshare.com
xls
Updated Jun 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew Sundstrom; Elda Grabocka; Dafna Bar-Sagi; Bud Mishra (2023). 1-segment radii in high anti-pimonidazole images. [Dataset]. http://doi.org/10.1371/journal.pone.0153623.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0153623.t002
Dataset updated
Jun 6, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Andrew Sundstrom; Elda Grabocka; Dafna Bar-Sagi; Bud Mishra
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The values of ng and ni report that the statistics are from a sample of 2 gradients found in 1 image.
h
breast-histopathology-images
huggingface.co
Updated Aug 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
adnen (2024). breast-histopathology-images [Dataset]. https://huggingface.co/datasets/dbzadnen/breast-histopathology-images
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 19, 2024
Authors
adnen
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
Dataset Card for Breast Histopathology Images

Dataset Overview

Breast Histopathology Images is a dataset containing high-resolution images of breast cancer specimens, specifically focusing on Invasive Ductal Carcinoma (IDC). The dataset is used for developing models to automatically detect and grade the aggressiveness of breast cancer based on histopathological images.

Context

Invasive Ductal Carcinoma (IDC) is the most common subtype of breast cancer.… See the full description on the dataset page: https://huggingface.co/datasets/dbzadnen/breast-histopathology-images.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Cancer Imaging Archive (2024). Hyperspectral Histological Images for Diagnosis of Human Glioblastoma [Dataset]. http://doi.org/10.7937/z1k6-vd17

Hyperspectral Histological Images for Diagnosis of Human Glioblastoma

HistologyHSI-GB

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

png and envi, n/aAvailable download formats

Unique identifier

https://doi.org/10.7937/z1k6-vd17

Dataset updated

May 24, 2024

Dataset authored and provided by

The Cancer Imaging Archive

License

https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/https://www.cancerimagingarchive.net/data-usage-policies-and-restrictions/

Time period covered

May 24, 2024

Dataset funded by

National Cancer Institutehttp://www.cancer.gov/

Description

Hyperspectral imaging technology combines the main features of two existing technologies: conventional imaging and spectroscopy. Thus, hyperspectral cameras make it possible to analyze, at the same time and in a non-contact way, the morphological features and chemical composition of the objects captured. The information provided by hyperspectral imaging can be used to detect patterns, cells, or biomarkers to identify diseases. There are different alternatives for processing them and there is a lack of publicly available datasets of medical hyperspectral images. To the best of our knowledge, this is the first open access dataset containing histological hyperspectral images of glioblastoma brain tumors, which can be set as a benchmark for researchers to compare their approaches.

This dataset includes 13 subjects. Each subject has a single histological slide with multiple hyperspectral images captured from each slide where deemed relevant by the pathologists (this number varies for each slide). The database is composed of 469 annotated hyperspectral images from 13 histological slides (482 total images), having a spatial dimension of 800 × 1004 pixels and a spectral dimension of 826 spectral channels. The format of the hyperspectral images is ENVI, the standard format for the storage of hyperspectral images. The ENVI format consists of a flat-binary raster file which may or may not have a file extension, accompanied by an ASCII header file (denoted as *.hdr). The data are stored in band-interleaved-by-line format. In addition, dark and white references were captured to perform a calibration of the raw image, which is a standard procedure in hyperspectral image processing.

The slides were stained with hematoxylin and eosin and captured using a custom hyperspectral microscopic system at 20× magnification. The ground-truth annotation for this dataset is the diagnosis of the slides (tumor _T_ or not tumor _NT_ ) performed by skilled histopathologists after the visual examination of the stained slides, according to the World Health Organization classification of tumors of the nervous system. As far as we are concerned, there are no commercial hyperspectral whole slide scanners. Also, the availability of hyperspectral microscopes is still limited in the market.

The microscope is an Olympus BX-53 (Olympus, Tokyo, Japan). The hyperspectral camera is a Hyperspec® VNIR A-Series from HeadWall Photonics (Fitchburg, MA, USA), which is based on an imaging spectrometer coupled to a charge-coupled device sensor, the Adimec-1000m (Adimec, Eindhoven, Netherlands). This hyperspectral system works in the visual and near-infrared spectral range from 400 to 1000 nm with a spectral resolution of 2.8 nm, sampling 826 spectral channels, and 1004 spatial pixels. The push-broom camera performs a spatial scanning to acquire a hyperspectral cube with a mechanical stage (SCAN, Märzhäuser, Wetzlar, Germany) attached to the microscope, which provides an accurate movement of the slides. The objective lenses are from the LMPLFLN family (Olympus, Tokyo, Japan), optimized for infrared observations.

More information about the dataset can be found in this manuscript.

Clear search

Close search

Google apps

Main menu

Hyperspectral Histological Images for Diagnosis of Human Glioblastoma

100,000 histological images of human colorectal cancer and healthy tissue

2 million histological images of breast cancer tumors with her2 labels

BACH Dataset : Grand Challenge on Breast Cancer Histology images

Data from: BreCaHAD: A Dataset for Breast Cancer Histopathological...

Invasive Ductal Carcinoma (IDC) Histology Image Dataset

Collection of textures in colorectal cancer histology

Gland Segmentation in Histology Images Challenge (GlaS) Dataset

Identifying Cell Nuclei from Histology Images

Acknowledgements

Inspiration

BACH

Structure of the BreaKHis dataset.

HistoArtifacts

CAnine CuTaneous Cancer Histology Dataset

References

Data from: A histopathological image dataset for grading breast invasive...

Test Dataset for Whole Slide Image Registration

Breast Cancer Histology Images

Histopathology data of bone marrow biopsies (HistBMP or HistMNIST)

Data from: ACROBAT - a multi-stain breast cancer histological...

1-segment radii in high anti-pimonidazole images.

breast-histopathology-images

Hyperspectral Histological Images for Diagnosis of Human Glioblastoma

HistologyHSI-GB