Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data table with one row for each WSI, specifying an anonymised patient ID, the stain or IHC antibody type of each WSI, as well as the magnification and microns per pixel at each available resolution level. Automated registration algorithm performance evaluation is possible through the ACROBAT challenge website based on over 37,000 landmark pair annotations from 13 annotators. While the primary purpose of this data set was the development and evaluation of WSI registration methods, this data set has the potential to facilitate further research in the context of computational pathology, for example in the areas of stain-guided learning, virtual staining, unsupervised learning and stain-independent models.
The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.
WSIs were anonymised by deleting the associated macro images, by generating filenames with random case IDs and by overwriting meta data fields with potentially personal information. Hamamatsu NDPI files were then converted using libvips (libvips.org/). WSIs are available as generic tiled TIFF WSIs (openslide.org/formats/generic-tiff/) at 10X magnification and lower image levels.
The data set is available for download in seven separate ZIP archives, five for the training data (train_part1.zip (71.47 GB), train_part2.zip (70.59 GB), train_part3.zip (75.91 GB), train_part4.zip (71.63 GB) and train_part5.zip (69.09 GB)), one for the validation data (valid.zip 21.79 GB) and one for the test data (test.zip 68.11 GB).
File listings and checksums in SHA1 format are available for checking archive/data integrity when downloading.
While it would be helpful to notify SND of any publications using this data set by sending an email to request@snd.gu.se, please note that this is not required to use the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is comprised of 38 chemically stained Whole slide image samples along with their corresponding ground truth annotated by histopathologists for 12 classes indicating skin layers (Epidermis, Reticular dermis, Papillary dermis, Dermis, Keratin), Skin tissues (Inflammation, Hair follicles, Glands), skin cancer (Basal cell carcinoma, Squamous cell carcinoma, Intraepidermal carcinoma) and background (BKG).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Pathology Images of Scanners and Mobilephones (PLISM) dataset was created for the evaluation of AI models’ robustness to domain shifts. PLISM is the first group-wised pathological image dataset that encompasses diverse tissue types stained under 13 H&E conditions, with multiple imaging media, including smartphones (7 scanners and 6 smartphones).The PLISM-wsi subset consists of image groups for all staining conditions between WSIs for each tile image. The PLISM-wsi subset contains a total of 310,947 imagesColor and texture in digital pathology images are affected by H&E stain conditions (e.g. Harris or Carrazi) and digitalization devices (e.g. slide scanners or smartphones), which cause inter-institutional domain shifts.Please see the files 'stain_condition.png' and 'counterpart.png' for H&E staining conditions and devices used.Each tar.gz file in this dataset contains a collection of files labeled via the following file naming convention: (stain_name)_(device_name)/(stain_name)_(device_name)_(top_left_x)_(top_left_y).pngThe csv file included with this dataset contains the following information:Tissue Type: The specific type of human tissue represented in the image, chosen from among 46 possible tissue types.Stain Type: The specific staining condition applied to the image, chosen from among 13 possible conditions.Device Type: The specific type of imaging device used to capture the image, chosen from among 13 possible device types.Coordinate: The xy coordinates of the top left and bottom right corners of each image (e.g., 1000_500_0_0).Image Path: The relative path to each image.See the smartphones subset of the PLISM dataset in the Collection at https://doi.org/10.25452/figshare.plus.c.6773925
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
A large annotated dataset, composed of both microscopy (classification task) and whole-slide images (segmentation task), was specifically compiled and made publicly available for the BACH challenge. Following a positive response from the scientific community, a total of 64 submissions, out of 677 registrations, effectively entered the competition. From the submitted algorithms it was possible to push forward the state-of-the-art in terms of accuracy (87%) in automatic classification of breast cancer with histopathological images.
There are two main folders for classification task: train and test. In Photos folder, there are totally four classes: benign, in situ, invasive, and normal. There is also a ground truth csv file for labels. Images are tif format.
Paper: https://arxiv.org/abs/1808.04277
Citation: Aresta, G., Araújo, T., Kwok, S., Chennamsetty, S. S., Safwan, M., Alex, V., ... & Aguiar, P. (2019). Bach: Grand challenge on breast cancer histology images. Medical image analysis, 56, 122-139.
Dataset: https://zenodo.org/record/3632035
Facebook
TwitterComputational histopathology has made significant strides in the past few years, slowly getting closer to clinical adoption. One area of benefit would be the automatic generation of diagnostic reports from H&E-stained whole slide images, which would further increase the efficiency of the pathologists' routine diagnostic workflows.
In this study, we compiled a dataset (PatchGastricADC22) of histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens, which we extracted from diagnostic reports and paired with patches extracted from the associated whole slide images. The dataset contains a variety of gastric adenocarcinoma subtypes.
We trained a baseline attention-based model to predict the captions from features extracted from the patches and obtained promising results. We make the captioned dataset of 262K patches publicly available.
Purpose
The dataset was created to support research in medical image captioning — specifically, to automatically generate diagnostic text descriptions from histopathological image patches. It helps train and evaluate models that can interpret tissue morphology and produce human-like pathology reports.
Domain & Source
Dataset Structure (PatchGastricADC22)
📁 Folder: patches_captions/patches_captions/ Contains all patch-level histopathology image files (in .jpg format). Each patch represents a cropped region (300×300 pixels) from a Whole Slide Image (WSI).
🧾 File: captions.csv Provides the mapping between image IDs and their corresponding diagnostic captions. Each row represents one unique image patch and its textual description.
🧩 CSV Columns:
id – Base ID identifying the parent WSI or case from which the patch was extracted. subtype – Indicates the histological subtype (e.g., tubular adenocarcinoma, poorly differentiated). text – Expert-written caption describing the morphological and diagnostic features visible in the patch.
Dataset Statistics 🧩 Total images (patches) ~262,777 🧪 Total WSIs (slides) 1305 🖼️ Patch size 300 × 300 pixels 🔬 Magnification 20× ✍️ Captions One per patch 🔠 Vocabulary size 344 unique words 📏 Max caption length 47 words ⚖️ Split 70% train / 10% validation / 20% test
Creation Process 1. Whole Slide Images (WSIs) were collected from gastric cancer pathology archives. 2. Each slide was divided into 300×300 patches (non-overlapping). 3. Expert pathologists annotated each patch with a short caption describing diagnostic features (cellular and structural morphology). 4. Data were consolidated into image files + a master captions.csv.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset with examples of Artefacts in Digital Pathology.
The dataset contains 22 Whole-Slide Images, with H&E or IHC staining, showing various types and levels of defect to the slides. Annotations were made by a biomedical engineer based on examples given by an expert.
The dataset is split in different folders:
train
18 whole-slide images (extracted at 1.25x & 2.5x magnification)
All from the same Block (colorectal cancer tissue)
1/2 with H&E & 1/2 with anti-pan-cytokeratin IHC staining.
validation
3 whole-slide images (1.25x + 2.5x mag)
2 from the same Block as the training set (1 IHC, 1 H&E)
1 from another Block (IHC anti-pan-cytokerating, gastroesophageal junction lesion)
validation_tiles
patches of varying sizes taken from the 3 validation whole-slide images @1.25x magnification.
7 patches from each slide.
test
1 whole-slide image (1.25x + 2.5x mag)
From another block: IHC staining (anti-NR2F2), mouth cancer
For the train, validation and test whole-slide images, each slide has: - The RGB images @1.25x & 2.5x mag - The corresponding background/tissue masks - The corresponding annotation masks containing examples of artefacts (note that a majority of artefacts are not annotated. In total, 918 artefacts are in the train set)
For the validation tiles, the following table gives the "patch-level" supervision:
tile# Artefact(s) 00 None/Few 01 Tear&Fold 02 Ink 03 None/Few 04 None/Few 05 Tear&Fold 06 Tear&Fold + Blur 07 Knife damage 08 Knife damage 09 Ink 10 None/Few 11 Tear&Fold 12 Tear&Fold 13 None/Few 14 None/Few 15 Knife damage 16 Tear&Fold 17 None/Few 18 None/Few 19 Blur 20 Knife damage
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
000 pixels width and 200
Facebook
Twitter
According to our latest research, the global Whole Slide Image Analysis market size reached USD 1.42 billion in 2024, and it is expected to grow at a remarkable CAGR of 14.7% from 2025 to 2033. By the end of the forecast period, the market is projected to achieve a value of USD 4.35 billion by 2033. The surging adoption of digital pathology solutions, particularly driven by advancements in artificial intelligence and machine learning, is a key growth factor fueling the expansion of the Whole Slide Image Analysis market globally.
The principal growth driver for the Whole Slide Image Analysis market is the increasing prevalence of chronic diseases, notably cancer, which necessitates rapid, accurate, and scalable diagnostic solutions. As healthcare systems worldwide face mounting pressure to deliver timely diagnoses, the digitization of pathology workflows through whole slide imaging has become essential. This technology enables pathologists to analyze high-resolution images remotely, reducing diagnostic turnaround times and minimizing human error. Furthermore, the integration of AI-powered algorithms significantly enhances the accuracy of image interpretation, supporting personalized medicine and improving patient outcomes. The convergence of these factors is leading to widespread adoption of Whole Slide Image Analysis across hospitals, research institutes, and diagnostic laboratories.
Another significant growth factor is the rising investment in healthcare IT infrastructure, particularly in developed economies such as the United States, Germany, and Japan. Governments and private players are increasingly funding initiatives aimed at modernizing pathology laboratories, which includes transitioning from traditional glass slides to digital platforms. The adoption of cloud-based solutions has further accelerated this trend, enabling seamless data storage, sharing, and collaboration among healthcare professionals. Additionally, the COVID-19 pandemic has acted as a catalyst, underscoring the need for remote diagnostics and virtual consultations. This shift has prompted healthcare organizations to embrace digital pathology and whole slide image analysis to ensure continuity in patient care and research activities.
The market is also benefiting from the expansion of applications beyond conventional cancer diagnostics. Whole Slide Image Analysis is now being leveraged in drug discovery, pathology research, and medical education, broadening its scope and utility. Pharmaceutical and biotechnology companies are utilizing these platforms to streamline drug development processes, while academic institutions are incorporating digital slides into their curricula for enhanced learning experiences. The growing awareness about the benefits of digital pathology, coupled with the increasing availability of user-friendly and interoperable software solutions, is expected to further drive market growth over the coming years.
Whole Slide Imaging is a transformative technology that has revolutionized the field of pathology by enabling the digitization of entire tissue slides. This advancement allows pathologists to view and analyze high-resolution images of tissue samples on a computer screen, rather than relying on traditional glass slides and microscopes. The ability to capture and store digital images of entire slides facilitates remote consultations and second opinions, enhancing collaboration among pathologists across different locations. Moreover, Whole Slide Imaging supports the integration of artificial intelligence and machine learning algorithms, which can assist in the automated detection and classification of pathological features, further improving diagnostic accuracy and efficiency.
From a regional perspective, North America currently dominates the Whole Slide Image Analysis market, accounting for the largest revenue share in 2024. This leadership is attributed to the presence of advanced healthcare infrastructure, high adoption rates of digital technologies, and strong government support for healthcare innovation. However, the Asia Pacific region is anticipated to exhibit the fastest growth over the forecast period, fueled by rising healthcare expenditure, increasing awareness about digital pathology, and expanding research and development activities. Europe also holds
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore the TCGA Whole Slide Image (WSI) SVS files available on Kaggle, offering detailed visual representations of tissue samples from various cancer types. These high-resolution images provide valuable insights into tumor morphology and tissue architecture, facilitating cancer diagnosis, prognosis, and treatment research. Delve into the rich landscape of cancer biology, leveraging the wealth of information contained within these SVS files to drive innovative advancements in oncology. This is a dataset of WSI images downloaded from the TCGA portal.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
We describe Orbit Image Analysis, an open-source whole slide image analysis tool. The tool consists of a generic tile-processing engine which allows the execution of various image analysis algorithms provided by either Orbit itself or from other open-source platforms using a tile-based map-reduce execution framework. Orbit Image Analysis is capable of sophisticated whole slide imaging analyses due to several key features. First, Orbit has machine-learning capabilities. This deep learning segmentation can be integrated with complex object detection for analysis of intricate tissues. In addition, Orbit can run locally as standalone or connect to the open-source image server OMERO. Another important characteristic is its scale-out functionality, using the Apache Spark framework for distributed computing. In this paper, we describe the use of Orbit in three different real-world applications: quantification of idiopathic lung fibrosis, nerve fibre density quantification, and glomeruli detection in the kidney.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mouse duodenum fixed in 4% PFA overnight at 4°C, processed for paraffin infiltration using a standard histology procedure and cut at 4 microns were dewaxed, rehydrated, permeabilized with 0.5% Triton X-100 in PBS 1x and stained with Azide - Alexa Fluor 555 (Thermo Fisher) to detect EdU and DAPI for nuclei. The images were taken using a Leica DM5500 microscope with a 40X N.A.1 objective (black&white camera: DFC350FXR2, pixel dimension: 0.161 microns). Next, the slide was unmounted and stained using the fully automated Ventana Discovery xT autostainer (Roche Diagnostics, Rotkreuz, Switzerland). All steps were performed on automate with Ventana solutions. Sections were pretreated with heat using the CC1 solution under mild conditions. The primary rat anti BrDU (clone: BU1/75 (ICR1), Serotec, diluted 1:300) was incubated 1 hour at 37°C. After incubation with a donkey anti rat biotin diluted 1:200 (Jackson ImmunoResearch Laboratories), chromogenic revelation was performed with DabMap kit. The section was counterstained with Harris hematoxylin (J.T. Baker) before a second round of imaging on DM5500 PL Fluotar 40X N.A.1.0 oil (color camera: DFC 320 R2, pixel dimension: 0.1725 microns). Before acquisition, a white-balance as well as a shading correction is performed according to Leica LAS software wizard. The fluorescence and DAB images were converted in ome.tiff multiresolution file with the kheops Fiji Plugin.
Sampled prepared in the EPFL histology core facility by Nathalie Müller and Gian-Filippo Mancini.
Associated documents:
https://c4science.ch/w/bioimaging_and_optics_platform_biop/teaching/dab-intensity/
https://imagej.net/plugins/bdv/warpy/warpy
This document contains a full QuPath project with an example of registered image.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 3.0 (CC BY-NC-ND 3.0)https://creativecommons.org/licenses/by-nc-nd/3.0/
License information was derived automatically
The MCO study whole slide image collection consists of 1500 digitised tissue slides of colorectal cancers. From 1994 to 2010, the Molecular and Cellular Oncology (MCO) Study group conducted a study of individuals undergoing treatment for colorectal cancer. For the study, they systematically collected tissue samples and clinical and pathological information from more than 1500 people who had tumours surgically removed from their large bowel. This collection represents one typical section from each tumour case, stained with Hematoxylin and eosin, and scanned using a x40 objective. The resolution of the digitised images approaches that visible under an optical microscope - more than 100,000 dpi. At this resolution, each image is around 2 Gigabytes, bringing the size of the 1500 images in the MCO Whole Slide Image Collection to 3 Terabytes. The MCO whole slide image collection is now available on the Intersect Australia Research Data Storage Infrastructure (RDSI) Node. Originating source(s): MCO research group, UNSW (1993-2011)
Facebook
Twitter
According to our latest research, the global Whole Slide Imaging System market size reached USD 1.12 billion in 2024, demonstrating robust expansion driven by advancements in digital pathology and increasing adoption of telemedicine solutions. The market is expected to grow at a CAGR of 14.7% from 2025 to 2033, with the forecasted market size projected to reach USD 3.45 billion by 2033. This substantial growth is fueled by technological innovations, rising prevalence of chronic diseases, and the increasing demand for efficient diagnostic solutions across healthcare settings. As per our latest research, the Whole Slide Imaging System market continues to transform the landscape of pathology and diagnostics globally.
One of the primary growth factors propelling the Whole Slide Imaging System market is the rapid digital transformation occurring within pathology departments worldwide. The migration from traditional glass slides to digital slide scanning and analysis has proven to significantly enhance workflow efficiency, accuracy, and collaboration among pathologists. The integration of artificial intelligence (AI) and machine learning algorithms with whole slide imaging systems has further improved diagnostic precision, enabling faster and more reliable detection of complex diseases such as cancer. Additionally, the growing need for remote consultations and second opinions has accelerated the adoption of telepathology, allowing experts to review and interpret slides from any location. This paradigm shift is not only streamlining diagnostic processes but also addressing the shortage of skilled pathologists, particularly in remote and underserved regions.
Another key driver of market growth is the rising incidence of chronic and infectious diseases, which has led to an increased volume of biopsies and histopathological examinations. Whole Slide Imaging Systems provide a scalable solution to manage this growing workload, offering high-throughput scanning capabilities and advanced image management tools. Hospitals, diagnostic laboratories, and academic research institutes are investing heavily in these systems to improve turnaround times and enhance patient outcomes. Furthermore, regulatory approvals and standardizations from organizations such as the FDA and CE have bolstered confidence in the reliability and clinical utility of digital pathology, encouraging further adoption across both developed and emerging markets. The ongoing push towards value-based healthcare and the emphasis on precision medicine are also contributing to the sustained demand for Whole Slide Imaging Systems.
The expanding application of Whole Slide Imaging Systems in education and research represents another significant growth avenue for the market. Medical schools and research institutions are leveraging digital slides for teaching, training, and collaborative studies, enabling students and researchers to access high-resolution images anytime and anywhere. This digital approach not only enhances the learning experience but also facilitates global collaboration on research projects and clinical trials. The ability to store, share, and annotate digital slides has accelerated the pace of scientific discovery and innovation in the field of pathology. As funding for biomedical research continues to increase, especially in oncology and rare diseases, the demand for advanced imaging solutions is expected to rise correspondingly.
From a regional perspective, North America currently dominates the Whole Slide Imaging System market, accounting for the largest share in 2024 due to its advanced healthcare infrastructure, high adoption of digital technologies, and strong presence of leading market players. Europe follows closely, supported by favorable government initiatives and a growing focus on precision diagnostics. Meanwhile, the Asia Pacific region is poised for the fastest growth during the forecast period, driven by increasing healthcare investments, rising awareness about digital pathology, and a burgeoning patient population. Latin America and the Middle East & Africa are also witnessing gradual adoption, albeit at a slower pace, as healthcare systems in these regions continue to modernize and embrace digital solutions.
The emergence of Whole Slide Scanner technology has been a
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset represents a collection of tissue types in histological images of human gastric cancer, containing 31,096 non-overlapping images of 224x224 pixels each, extracted from H&E-stained pathological slides at Harbin Medical University Cancer Hospital. The dataset was generated by predicting tissue components in gastric cancer using annotations from a publicly available colorectal cancer dataset to create tissue heatmaps. Professional pathologists then selected 300 whole slide images with high prediction accuracy. Finally, a substantial number of images, each belonging to one of eight tissue categories (Adipose (ADI), Background (BACK), Debris (DEB), Lymphocytes (LYM), Mucus (MUC), Smooth Muscle (MUS), Normal Colon Mucosa (NORM), Cancer-associated Stroma (STR), Tumor (TUM)), were extracted from these slides.
Facebook
TwitterImage analysis workflows for Histology increasingly require the correlation and combination of measurements across several whole slide images. Indeed, for multiplexing, as well as multimodal imaging, it is indispensable that the same sample is imaged multiple times, either through various systems for multimodal imaging, or using the same system but throughout rounds of sample manipulation (e.g. multiple staining sessions). In both cases slight deformations from one image to another are unavoidable, leading to an imperfect superimposition Redundant and thus a loss of accuracy making it difficult to link measurements, in particular at the cellular level. Using pre-existing software components and developing missing ones, we propose a user-friendly workflow which facilitates the nonlinear registration of whole slide images in order to reach sub-cellular resolution level. The set of whole slide images to register and analyze is at first defined as a QuPath project. Fiji is then used to open the QuPath project and perform the registrations. Each registration is automated by using an elastix backend, or semi-automated by using BigWarp in order to interactively correct the results of the automated registration. These transformations can then be retrieved in QuPath to transfer any regions of interest from an image to the corresponding registered images. In addition, the transformations can be applied in QuPath to produce on-the-fly transformed images that can be displayed on top of the reference image. Thus, relevant data can be combined and analyzed throughout all registered slides, facilitating the analysis of correlative results for multiplexed and multimodal imaging.
Facebook
TwitterThe dataset consists of 99 H&E-stained whole slide skin images (WSI) - 49 abnormal and 50 normal cases. All significant abnormal findings identified are outlined and categorized into 13 types such as actinic keratosis, basal cell carcinoma and dermatofibroma. Other tissue components, such as epidermis, adnexal structures, as well as the surgical margin are delineated to create a complete histological map. In total, 16741 separate annotations have been made to segment the different tissue structures and link them to ontological information.
Facebook
Twitter
The TCGA-UT dataset is a large-scale collection of histopathological image patches from human cancer tissues. It contains 1,608,060 image patches extracted from hematoxylin & eosin (H&E) stained histological samples across 32 different types of solid cancers.
Files are organized using the following format:
[cancer_type]/[resolution]/[TCGA Barcode]/[region]-[number]-[pixel resolution].jpgIf you use this dataset in your research, please cite:
Komura, D., et al. (2022). Universal encoding of pan-cancer histology by deep texture representations.
Cell Reports 38, 110424. https://doi.org/10.1016/j.celrep.2022.110424If you're interested in using this dataset for benchmarking foundation models or feature extractors, we recommend accessing the dataset through the Hugging Face Hub at dakomura/tcga-ut. The Hugging Face version provides:
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Large set of whole-slide-images (WSI) of prostatectomy specimens with various grades of prostate cancer (PCa). More information can be found in the corresponding paper: https://doi.org/10.1038/s41598-018-37257-4
The WSIs in this dataset can be viewed using the open-source software ASAP or Open Slide.
Due to the large size of the complete dataset, the data has been split up in to multiple archives.
The data from the training set:
The data from the test set:
This study was financed by a grant from the Dutch Cancer Society (KWF), grant number KUN 2015-7970.
If you make use of this dataset please cite both the dataset itself and the corresponding paper: https://doi.org/10.1038/s41598-018-37257-4
Facebook
Twitterhttps://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global whole slide imaging (WSI) scanner market is experiencing robust growth, projected to reach $202.1 million in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 4.2% from 2025 to 2033. This expansion is fueled by several key factors. The increasing adoption of digital pathology in healthcare settings is a major driver, offering benefits such as improved efficiency, enhanced diagnostic accuracy, remote consultations, and streamlined workflow management. Advancements in WSI technology, including higher resolution scanners and sophisticated image analysis software, further contribute to market growth. The rising prevalence of chronic diseases necessitating extensive histological analysis also boosts demand. Leading players like Leica Biosystems, Hamamatsu Photonics, Zeiss, and Roche are driving innovation and expanding their market presence through strategic partnerships and technological advancements. The market's segmentation likely includes variations based on scanner type (e.g., brightfield, fluorescence), application (e.g., oncology, pathology), and end-user (e.g., hospitals, research institutions). Despite the strong growth trajectory, market expansion faces some challenges. High initial investment costs for WSI scanners can be a barrier to entry for smaller healthcare facilities and research labs. The need for extensive training and skilled personnel to operate and interpret WSI data also presents a hurdle. However, ongoing technological improvements are likely to drive down costs and improve user-friendliness, mitigating these challenges over time. The increasing integration of artificial intelligence (AI) and machine learning (ML) in WSI analysis promises to further enhance diagnostic capabilities and streamline workflow, creating new opportunities for growth. The development of standardized protocols and regulations in digital pathology will also play a significant role in market expansion.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Whole slide images for testing automatic analysis.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data table with one row for each WSI, specifying an anonymised patient ID, the stain or IHC antibody type of each WSI, as well as the magnification and microns per pixel at each available resolution level. Automated registration algorithm performance evaluation is possible through the ACROBAT challenge website based on over 37,000 landmark pair annotations from 13 annotators. While the primary purpose of this data set was the development and evaluation of WSI registration methods, this data set has the potential to facilitate further research in the context of computational pathology, for example in the areas of stain-guided learning, virtual staining, unsupervised learning and stain-independent models.
The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.
WSIs were anonymised by deleting the associated macro images, by generating filenames with random case IDs and by overwriting meta data fields with potentially personal information. Hamamatsu NDPI files were then converted using libvips (libvips.org/). WSIs are available as generic tiled TIFF WSIs (openslide.org/formats/generic-tiff/) at 10X magnification and lower image levels.
The data set is available for download in seven separate ZIP archives, five for the training data (train_part1.zip (71.47 GB), train_part2.zip (70.59 GB), train_part3.zip (75.91 GB), train_part4.zip (71.63 GB) and train_part5.zip (69.09 GB)), one for the validation data (valid.zip 21.79 GB) and one for the test data (test.zip 68.11 GB).
File listings and checksums in SHA1 format are available for checking archive/data integrity when downloading.
While it would be helpful to notify SND of any publications using this data set by sending an email to request@snd.gu.se, please note that this is not required to use the data.