Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data table with one row for each WSI, specifying an anonymised patient ID, the stain or IHC antibody type of each WSI, as well as the magnification and microns per pixel at each available resolution level. Automated registration algorithm performance evaluation is possible through the ACROBAT challenge website based on over 37,000 landmark pair annotations from 13 annotators. While the primary purpose of this data set was the development and evaluation of WSI registration methods, this data set has the potential to facilitate further research in the context of computational pathology, for example in the areas of stain-guided learning, virtual staining, unsupervised learning and stain-independent models.
The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.
WSIs were anonymised by deleting the associated macro images, by generating filenames with random case IDs and by overwriting meta data fields with potentially personal information. Hamamatsu NDPI files were then converted using libvips (libvips.org/). WSIs are available as generic tiled TIFF WSIs (openslide.org/formats/generic-tiff/) at 10X magnification and lower image levels.
The data set is available for download in seven separate ZIP archives, five for the training data (train_part1.zip (71.47 GB), train_part2.zip (70.59 GB), train_part3.zip (75.91 GB), train_part4.zip (71.63 GB) and train_part5.zip (69.09 GB)), one for the validation data (valid.zip 21.79 GB) and one for the test data (test.zip 68.11 GB).
File listings and checksums in SHA1 format are available for checking archive/data integrity when downloading.
While it would be helpful to notify SND of any publications using this data set by sending an email to request@snd.gu.se, please note that this is not required to use the data.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Explore the TCGA Whole Slide Image (WSI) SVS files available on Kaggle, offering detailed visual representations of tissue samples from various cancer types. These high-resolution images provide valuable insights into tumor morphology and tissue architecture, facilitating cancer diagnosis, prognosis, and treatment research. Delve into the rich landscape of cancer biology, leveraging the wealth of information contained within these SVS files to drive innovative advancements in oncology. This is a dataset of WSI images downloaded from the TCGA portal.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Pathology Images of Scanners and Mobilephones (PLISM) dataset was created for the evaluation of AI models’ robustness to domain shifts. PLISM is the first group-wised pathological image dataset that encompasses diverse tissue types stained under 13 H&E conditions, with multiple imaging media, including smartphones (7 scanners and 6 smartphones).The PLISM-orginal subset consists of 91 original WSIs before image registration. Color and texture in digital pathology images are affected by H&E stain conditions (e.g. Harris or Carrazi) and digitalization devices (e.g. slide scanners or smartphones), which cause inter-institutional domain shifts.The extension of each WSI file is .svs, .ndpi, or .tiff.See the other subsets of the PLISM dataset in the Collection at https://doi.org/10.25452/figshare.plus.c.6773925
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset is comprised of 38 chemically stained Whole slide image samples along with their corresponding ground truth annotated by histopathologists for 12 classes indicating skin layers (Epidermis, Reticular dermis, Papillary dermis, Dermis, Keratin), Skin tissues (Inflammation, Hair follicles, Glands), skin cancer (Basal cell carcinoma, Squamous cell carcinoma, Intraepidermal carcinoma) and background (BKG).
Facebook
TwitterComputational histopathology has made significant strides in the past few years, slowly getting closer to clinical adoption. One area of benefit would be the automatic generation of diagnostic reports from H&E-stained whole slide images, which would further increase the efficiency of the pathologists' routine diagnostic workflows.
In this study, we compiled a dataset (PatchGastricADC22) of histopathological captions of stomach adenocarcinoma endoscopic biopsy specimens, which we extracted from diagnostic reports and paired with patches extracted from the associated whole slide images. The dataset contains a variety of gastric adenocarcinoma subtypes.
We trained a baseline attention-based model to predict the captions from features extracted from the patches and obtained promising results. We make the captioned dataset of 262K patches publicly available.
Purpose
The dataset was created to support research in medical image captioning — specifically, to automatically generate diagnostic text descriptions from histopathological image patches. It helps train and evaluate models that can interpret tissue morphology and produce human-like pathology reports.
Domain & Source
Dataset Structure (PatchGastricADC22)
📁 Folder: patches_captions/patches_captions/ Contains all patch-level histopathology image files (in .jpg format). Each patch represents a cropped region (300×300 pixels) from a Whole Slide Image (WSI).
🧾 File: captions.csv Provides the mapping between image IDs and their corresponding diagnostic captions. Each row represents one unique image patch and its textual description.
🧩 CSV Columns:
id – Base ID identifying the parent WSI or case from which the patch was extracted. subtype – Indicates the histological subtype (e.g., tubular adenocarcinoma, poorly differentiated). text – Expert-written caption describing the morphological and diagnostic features visible in the patch.
Dataset Statistics 🧩 Total images (patches) ~262,777 🧪 Total WSIs (slides) 1305 🖼️ Patch size 300 × 300 pixels 🔬 Magnification 20× ✍️ Captions One per patch 🔠 Vocabulary size 344 unique words 📏 Max caption length 47 words ⚖️ Split 70% train / 10% validation / 20% test
Creation Process 1. Whole Slide Images (WSIs) were collected from gastric cancer pathology archives. 2. Each slide was divided into 300×300 patches (non-overlapping). 3. Expert pathologists annotated each patch with a short caption describing diagnostic features (cellular and structural morphology). 4. Data were consolidated into image files + a master captions.csv.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Links to code and bioRxiv pre-print:
Multi-lens Neural Machine (MLNM) Code
An AI-assisted Tool For Efficient Prostate Cancer Diagnosis (bioRxiv Pre-print)
Digitized hematoxylin and eosin (H&E)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. H&E-stained slides were scanned at 40× magnification (specimen-level pixel size 0·25μm × 0·25μm) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.
Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.
Patches of size 512 × 512 pixels were cropped from whole slide images at resolutions 5×, 10×, 20×, and 40× with an annotated gland centered at each patch. This dataset contains these cropped images.
This dataset is used to train two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:
gland_segmentation_dataset.zip
gland_classification_dataset.zip
Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen.
Train
Valid
Test
Total
Prostatectomy
17
8
15
40
Biopsy
26
13
20
59
Total
43
21
35
99
Train
Valid
Test
Total
Prostatectomy
7795
3753
7224
18772
Biopsy
5559
4028
5981
15568
Total
13354
7781
13205
34340
Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one H&E stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.
Train
Valid
Test
Total
Biopsy
10:9:1
3:7:0
6:10:0
19:26:1
Train
Valid
Test
Total
Biopsy
1557:2277
1216:1341
1543:2718
4316:6336
NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from H&E slides. They were not used in the machine learning study.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
000 pixels width and 200
Facebook
TwitterThe dataset consists of 99 H&E-stained whole slide skin images (WSI) - 49 abnormal and 50 normal cases. All significant abnormal findings identified are outlined and categorized into 13 types such as actinic keratosis, basal cell carcinoma and dermatofibroma. Other tissue components, such as epidermis, adnexal structures, as well as the surgical margin are delineated to create a complete histological map. In total, 16741 separate annotations have been made to segment the different tissue structures and link them to ontological information.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Lakeshprabhu Thangadurai
Released under Apache 2.0
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Pathology Images of Scanners and Mobilephones (PLISM) dataset was created for the evaluation of AI models’ robustness to domain shifts. PLISM is the first group-wised pathological image dataset that encompasses diverse tissue types stained under 13 H&E conditions, with multiple imaging media, including smartphones (7 scanners and 6 smartphones).In PLISM-sm, smartphone images were used as queries to create image groups for each staining condition corresponding to each tile image. The PLISM-sm subset contains a total of 57,902 images.Color and texture in digital pathology images are affected by H&E stain conditions (e.g. Harris or Carrazi) and digitalization devices (e.g. slide scanners or smartphones), which cause inter-institutional domain shifts.Please see the files 'stain_condition.png' and 'counterpart.png' for H&E staining conditions and devices used.This tar.gz file contains a collection of files labeled via the following file naming convention:(stain_name)/(device_name)/(top_left_x)_(top_left_y)_(right_lower_x)_(right_lower_y).pngThe csv file included with this dataset contains the following information:Tissue Type: The specific type of human tissue represented in the image, chosen from among 46 possible tissue types.Stain Type: The specific staining condition applied to the image, chosen from among 13 possible conditions.Device Type: The specific type of imaging device used to capture the image, chosen from among 13 possible device typesCoordinate: The xy coordinates of the top left and bottom right corners of each image (e.g., 1000_500_0_0)Image Path: The relative path to each image.See the whole slide images (WSIs) subset of the PLISM dataset in the Collection at https://doi.org/10.25452/figshare.plus.c.6773925
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
An anonymous whole slide image in Philips iSyntax format for running software tests on OpenPhi - Open Pathology Interface (https://zenodo.org/record/4680748#.YNnBxDqxXJU). See the repository (https://gitlab.com/BioimageInformaticsGroup/openphi/) for up to date information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mouse duodenum fixed in 4% PFA overnight at 4°C, processed for paraffin infiltration using a standard histology procedure and cut at 4 microns were dewaxed, rehydrated, permeabilized with 0.5% Triton X-100 in PBS 1x and stained with Azide - Alexa Fluor 555 (Thermo Fisher) to detect EdU and DAPI for nuclei. The images were taken using a Leica DM5500 microscope with a 40X N.A.1 objective (black&white camera: DFC350FXR2, pixel dimension: 0.161 microns). Next, the slide was unmounted and stained using the fully automated Ventana Discovery xT autostainer (Roche Diagnostics, Rotkreuz, Switzerland). All steps were performed on automate with Ventana solutions. Sections were pretreated with heat using the CC1 solution under mild conditions. The primary rat anti BrDU (clone: BU1/75 (ICR1), Serotec, diluted 1:300) was incubated 1 hour at 37°C. After incubation with a donkey anti rat biotin diluted 1:200 (Jackson ImmunoResearch Laboratories), chromogenic revelation was performed with DabMap kit. The section was counterstained with Harris hematoxylin (J.T. Baker) before a second round of imaging on DM5500 PL Fluotar 40X N.A.1.0 oil (color camera: DFC 320 R2, pixel dimension: 0.1725 microns). Before acquisition, a white-balance as well as a shading correction is performed according to Leica LAS software wizard. The fluorescence and DAB images were converted in ome.tiff multiresolution file with the kheops Fiji Plugin.
Sampled prepared in the EPFL histology core facility by Nathalie Müller and Gian-Filippo Mancini.
Associated documents:
https://c4science.ch/w/bioimaging_and_optics_platform_biop/teaching/dab-intensity/
https://imagej.net/plugins/bdv/warpy/warpy
This document contains a full QuPath project with an example of registered image.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a representative sample from the dataset that was used to develop resolution-agnostic convolutional neural networks for tissue segmentation1 in whole-slide histopathology images.
The dataset is composed of two parts: development set and dissimilar set.
Sample images from the development set:
breast_hne_00.tif
breast_lymph_node_hne_00.tif
tongue_ae1ae3_00.tif
tongue_hne_00.tif
tongue_ki67_00.tif
Sample images from the dissimilar set:
brain_alcianblue_00.tif
cornea_grocott_00.tif
kidney_cab_00.tif
skin_perls_00.tif
uterus_vonkossa_00.tif
Facebook
TwitterLinks to code and Patterns paper:
Multi-lens Neural Machine (MLNM) Code
An AI-assisted Tool For Efficient Prostate Cancer Diagnosis in Low-grade and Low-volume Cases
Digitized hematoxylin and eosin (HE)-stained whole-slide-images (WSIs) of 40 prostatectomy and 59 core needle biopsy specimens were collected from 99 prostate cancer patients at Tan Tock Seng Hospital, Singapore. There were 99 WSIs in total such that each specimen had one WSI. HE-stained slides were scanned at 40 magnification (specimen-level pixel size 025m 025m) using Aperio AT2 Slide Scanner (Leica Biosystems). Institutional board review from the hospital were obtained for this study, and all the data were de-identified.
Prostate glandular structures in core needle biopsy slides were manually annotated and classified using the ASAP annotation tool (ASAP). A senior pathologist reviewed 10% of the annotations in each slide, ensuring that some reference annotations were provided to the researcher at different regions of the core. It is to be noted that partial glands appearing at the edges of the biopsy cores were not annotated.
Whole Slide Image Dataset
Whole Slide Image dataset containing 99 images in SVS format with corresponding annotations in XML format are provided in WSI.zip. Available patient grading for the WSIs are provided in gleason_score_mapped.txt. These XML annotations can be parsed using the code in official repository.
Cropped Image Dataset
Patches of size 512 512 pixels were cropped from the WSI (Whole Slide Image Dataset) at resolutions 5, 10, 20, and 40 with an annotated gland centered at each patch. This dataset contains these cropped images.
This dataset is used to train the two AI models for Gland Segmentation (99 patients) and Gland Classification (46 patients). Tables 1 and 2 illustrate both gland segmentation and gland classification datasets. We have put the two corresponding sub-datasets as two zip files as follows:
gland_segmentation_dataset.zip
gland_classification_dataset.zip
Table 1: The number of slides and patches in training, validation, and test sets for gland segmentation task. There is one HE stained WSI for each prostatectomy or core needle biopsy specimen.
#Slides
Train
Valid
Test
Total
Prostatectomy
17
8
15
40
Biopsy
26
13
20
59
Total
43
21
35
99
#Patches
Train
Valid
Test
Total
Prostatectomy
7795
3753
7224
18772
Biopsy
5559
4028
5981
15568
Total
13354
7781
13205
34340
Table 2: The number of slides and patches in training, validation, and test sets for gland classification task. There is one HE stained WSI for each prostatectomy or core needle biopsy specimen. The gland classification datasets are the subsets of the gland segmentation datasets. GS: Gleason Score. B: Benign. M: Malignant.
#Slides (GS 3+3:3+4:4+3)
Train
Valid
Test
Total
Biopsy
10:9:1
3:7:0
6:10:0
19:26:1
#Patches (B:M)
Train
Valid
Test
Total
Biopsy
1557:2277
1216:1341
1543:2718
4316:6336
NB: Gland classification folder (gland_classification_dataset.zip) may contain extra patches, labels of which could not be identified from HE slides. They were not used in the machine learning study.
Facebook
Twitterhttps://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy
The Whole Slide Imaging Market Report is Segmented by Component (Hardware, Software), Scanner Type (Brightfield Scanners, Fluorescence Scanners, and More), Application (Telepathology, Cytopathology, and More), End User (Hospitals & Clinical Laboratories, Academic & Research Institutes, and More), and Geography (North America, Europe, and More). The Market Forecasts are Provided in Terms of Value (USD), Based On Availability.
Facebook
TwitterThe data set consists of 81 registered whole slide image pairs, a pair represents unstained and H&E stained images of the same tissue sample. In addition to that, it also contains a tissue mask for each whole slide image pair. The samples are used for studying the histological feasibility of AI-driven virtual histopathology staining.
Imaging was performed using Thunder Imager 3D Tissue slide scanner (Leica Microsystems, Wetzlar, Germany) equipped with DMC2900 camera and HC PL APO 40x/0.95 DRY objective with an isotropic pixel resolution of 0.353 µm.
Facebook
Twitter
The TCGA-UT dataset is a large-scale collection of histopathological image patches from human cancer tissues. It contains 1,608,060 image patches extracted from hematoxylin & eosin (H&E) stained histological samples across 32 different types of solid cancers.
Files are organized using the following format:
[cancer_type]/[resolution]/[TCGA Barcode]/[region]-[number]-[pixel resolution].jpgIf you use this dataset in your research, please cite:
Komura, D., et al. (2022). Universal encoding of pan-cancer histology by deep texture representations.
Cell Reports 38, 110424. https://doi.org/10.1016/j.celrep.2022.110424If you're interested in using this dataset for benchmarking foundation models or feature extractors, we recommend accessing the dataset through the Hugging Face Hub at dakomura/tcga-ut. The Hugging Face version provides:
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global whole slide image analysis market size reached USD 1.18 billion in 2024, reflecting the sector’s robust expansion fueled by technological advancements and rising adoption in healthcare. With a compound annual growth rate (CAGR) of 13.7% from 2025 to 2033, the market is projected to attain a value of USD 3.66 billion by 2033. The primary growth factor driving this market is the increasing integration of artificial intelligence and machine learning in digital pathology, which is significantly enhancing diagnostic accuracy and operational efficiency.
One of the most significant growth drivers for the whole slide image analysis market is the accelerating adoption of digital pathology across hospitals, diagnostic laboratories, and research institutions. The shift from traditional microscopy to digital platforms is enabling pathologists to analyze high-resolution images with greater precision and speed. This transformation is particularly critical in cancer diagnostics, where early and accurate detection can directly impact patient outcomes. Furthermore, the growing prevalence of chronic diseases and the rising demand for personalized medicine are compelling healthcare providers to invest in advanced image analysis solutions. The ability to store, retrieve, and share digital slides seamlessly is also facilitating collaborative research and remote consultations, which is especially valuable in regions with limited access to subspecialty expertise.
Another key factor propelling market growth is the ongoing advancements in artificial intelligence and machine learning algorithms. These technologies are revolutionizing the way whole slide images are processed and interpreted, allowing for the automation of complex tasks such as tumor detection, grading, and quantification. AI-powered image analysis not only reduces human error but also enhances throughput, making it possible to handle large volumes of slides efficiently. Pharmaceutical and biotechnology companies are leveraging these capabilities for drug discovery and development, as automated image analysis accelerates the identification of biomarkers and the assessment of therapeutic efficacy. The continuous improvement in computational power and the growing availability of annotated datasets are expected to further drive innovation in this space.
Moreover, the increasing focus on workflow optimization and cost reduction is encouraging healthcare facilities to adopt whole slide image analysis solutions. Digital platforms offer significant advantages over conventional methods, including reduced storage space, lower risk of specimen loss, and streamlined case management. The integration of these solutions with laboratory information systems (LIS) and electronic health records (EHR) is facilitating end-to-end digital workflows, from slide scanning to diagnosis and reporting. Additionally, regulatory approvals and standardization efforts are paving the way for broader implementation, particularly in regions where digital pathology is still in its nascent stage. The market is also witnessing growing investments from public and private sectors to modernize healthcare infrastructure and expand access to advanced diagnostic tools.
From a regional perspective, North America currently dominates the whole slide image analysis market, accounting for the largest share in 2024. This leadership is attributed to the presence of a well-established healthcare infrastructure, high adoption of digital pathology solutions, and strong investments in research and development. Europe follows closely, driven by supportive regulatory frameworks and increasing collaborations between academic institutions and industry players. The Asia Pacific region is emerging as a lucrative market, with countries such as China, Japan, and India witnessing rapid digital transformation in healthcare. Latin America and the Middle East & Africa are also experiencing steady growth, supported by government initiatives to improve diagnostic capabilities and address the rising burden of chronic diseases.
The whole slide image analysis market, when segmented by product type, primarily includes software and services. Software solutions are at the forefront of this segment, accounting for the majority of the market share in 2024. These platforms are designed to facilitate the acquisition, storage, management, and analysis of high-resolution whole sli
Facebook
Twitter
According to our latest research, the global Whole Slide Imaging System market size reached USD 1.12 billion in 2024, demonstrating robust expansion driven by advancements in digital pathology and increasing adoption of telemedicine solutions. The market is expected to grow at a CAGR of 14.7% from 2025 to 2033, with the forecasted market size projected to reach USD 3.45 billion by 2033. This substantial growth is fueled by technological innovations, rising prevalence of chronic diseases, and the increasing demand for efficient diagnostic solutions across healthcare settings. As per our latest research, the Whole Slide Imaging System market continues to transform the landscape of pathology and diagnostics globally.
One of the primary growth factors propelling the Whole Slide Imaging System market is the rapid digital transformation occurring within pathology departments worldwide. The migration from traditional glass slides to digital slide scanning and analysis has proven to significantly enhance workflow efficiency, accuracy, and collaboration among pathologists. The integration of artificial intelligence (AI) and machine learning algorithms with whole slide imaging systems has further improved diagnostic precision, enabling faster and more reliable detection of complex diseases such as cancer. Additionally, the growing need for remote consultations and second opinions has accelerated the adoption of telepathology, allowing experts to review and interpret slides from any location. This paradigm shift is not only streamlining diagnostic processes but also addressing the shortage of skilled pathologists, particularly in remote and underserved regions.
Another key driver of market growth is the rising incidence of chronic and infectious diseases, which has led to an increased volume of biopsies and histopathological examinations. Whole Slide Imaging Systems provide a scalable solution to manage this growing workload, offering high-throughput scanning capabilities and advanced image management tools. Hospitals, diagnostic laboratories, and academic research institutes are investing heavily in these systems to improve turnaround times and enhance patient outcomes. Furthermore, regulatory approvals and standardizations from organizations such as the FDA and CE have bolstered confidence in the reliability and clinical utility of digital pathology, encouraging further adoption across both developed and emerging markets. The ongoing push towards value-based healthcare and the emphasis on precision medicine are also contributing to the sustained demand for Whole Slide Imaging Systems.
The expanding application of Whole Slide Imaging Systems in education and research represents another significant growth avenue for the market. Medical schools and research institutions are leveraging digital slides for teaching, training, and collaborative studies, enabling students and researchers to access high-resolution images anytime and anywhere. This digital approach not only enhances the learning experience but also facilitates global collaboration on research projects and clinical trials. The ability to store, share, and annotate digital slides has accelerated the pace of scientific discovery and innovation in the field of pathology. As funding for biomedical research continues to increase, especially in oncology and rare diseases, the demand for advanced imaging solutions is expected to rise correspondingly.
From a regional perspective, North America currently dominates the Whole Slide Imaging System market, accounting for the largest share in 2024 due to its advanced healthcare infrastructure, high adoption of digital technologies, and strong presence of leading market players. Europe follows closely, supported by favorable government initiatives and a growing focus on precision diagnostics. Meanwhile, the Asia Pacific region is poised for the fastest growth during the forecast period, driven by increasing healthcare investments, rising awareness about digital pathology, and a burgeoning patient population. Latin America and the Middle East & Africa are also witnessing gradual adoption, albeit at a slower pace, as healthcare systems in these regions continue to modernize and embrace digital solutions.
The emergence of Whole Slide Scanner technology has been a
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains UNI patch embeddings derived from the SurGen cohort's whole slide images (WSIs), focused on colorectal cancer cases. Each WSI was processed into 224x224 pixel tissue patches, extracted at a scale of 1.0 microns per pixel (MPP). A 1024-dimensional embedding was computed for each patch using the UNI foundation model[1]. This dataset allows for rapid downstream analysis of tasks such as biomarker prediction, survival analysis, tumour grading, and prognostic modelling. The SurGen dataset, comprising both primary colorectal and metastatic cases, offers a valuable resource for computational pathology research.
Each Zarr file within the dataset contains an array of patch-level features and a corresponding array of coordinates, enabling the retrieval of specific feature locations as needed.
Embeddings are provided in a zip archive and intended for reuse in research focused on digital pathology, tumour genomics, and oncology. For cohort ground truth labels please see the link below.
Access the original dataset: https://doi.org/10.6019/S-BIAD1285
GitHub for more info: https://github.com/CraigMyles/SurGen-Dataset
[1] Chen, R.J., Ding, T., Lu, M.Y., Williamson, D.F.K., et al. Towards a general-purpose foundation model for computational pathology. Nat Med (2024). https://doi.org/10.1038/s41591-024-02857-3
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ACROBAT data set consists of 4,212 whole slide images (WSIs) from 1,153 female primary breast cancer patients. The WSIs in the data set are available at 10X magnification and show tissue sections from breast cancer resection specimens stained with hematoxylin and eosin (H&E) or immunohistochemistry (IHC). For each patient, one WSI of H&E stained tissue and at least one one, and up to four, WSIs of corresponding tissue stained with the routine diagnostic stains ER, PGR, HER2 and KI67 are available. The data set was acquired as part of the CHIME study (chimestudy.se) and its primary purpose was to facilitate the ACROBAT WSI registration challenge (acrobat.grand-challenge.org). The histopathology slides originate from routine diagnostic pathology workflows and were digitised for research purposes at Karolinska Institutet (Stockholm, Sweden). The image acquisition process resembles the routine digital pathology image digitisation workflow, using three different Hamamatsu WSI scanners, specifically one NanoZoomer S360 and two NanoZoomer XR. The WSIs in this data set are accompanied by a data table with one row for each WSI, specifying an anonymised patient ID, the stain or IHC antibody type of each WSI, as well as the magnification and microns per pixel at each available resolution level. Automated registration algorithm performance evaluation is possible through the ACROBAT challenge website based on over 37,000 landmark pair annotations from 13 annotators. While the primary purpose of this data set was the development and evaluation of WSI registration methods, this data set has the potential to facilitate further research in the context of computational pathology, for example in the areas of stain-guided learning, virtual staining, unsupervised learning and stain-independent models.
The data set consists of three subsets, the training, validation and test set, based on the ACROBAT WSI registration challenge. There are 750 cases in the training set, for each of which one H&E WSI and one to four IHC WSIs are available, with 3406 WSIs in total. The validation set consists of 100 cases with 200 WSIs in total and the test set of 303 cases with 606 WSIs in total. Both for the validation and test set, one H&E WSI as well as one randomly selected IHC WSI is available.
WSIs were anonymised by deleting the associated macro images, by generating filenames with random case IDs and by overwriting meta data fields with potentially personal information. Hamamatsu NDPI files were then converted using libvips (libvips.org/). WSIs are available as generic tiled TIFF WSIs (openslide.org/formats/generic-tiff/) at 10X magnification and lower image levels.
The data set is available for download in seven separate ZIP archives, five for the training data (train_part1.zip (71.47 GB), train_part2.zip (70.59 GB), train_part3.zip (75.91 GB), train_part4.zip (71.63 GB) and train_part5.zip (69.09 GB)), one for the validation data (valid.zip 21.79 GB) and one for the test data (test.zip 68.11 GB).
File listings and checksums in SHA1 format are available for checking archive/data integrity when downloading.
While it would be helpful to notify SND of any publications using this data set by sending an email to request@snd.gu.se, please note that this is not required to use the data.