CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
CAMELYON16 challenge dataset. The goal of CAMELYON16 challenge is to evaluate new and existing algorithms for automated detection of metastases in hematoxylin and eosin (H&E) stained whole-slide images (WSIs) of lymph node sections. The dataset contains 270 WSIs (159 normal slides, and 111 slides with tumor) for training, and 129 WSIs for testing. The dataset is a slightly updated version of the one available on GigaScience at . The changes are: 1. The test_114.tiff WSI was exhaustively annotated. 2. Generated mask files were added for each WSI with value 1 for normal tissue, and 2 for tumor areas in the corresponding WSI.
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for Camelyon16-features
Dataset Summary
The Camelyon16 dataset is a very popular benchmark dataset used in the field of cancer classification.
The dataset we've uploaded here is the result of features extracted from the Camelyon16 dataset using the Phikon model, which is also openly available on Hugging Face.
Dataset Creation
Initial Data Collection and Normalization
The initial collection of the Camelyon16 Whole Slide Images… See the full description on the dataset page: https://huggingface.co/datasets/owkin/camelyon16-features.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detection performance comparison with Camelyon16.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CAMELYON16 - Multiple Instance Learning (MIL)
Important. This dataset is part of the torchmil library. This repository provides an adapted version of the CAMELYON16 dataset tailored for Multiple Instance Learning (MIL). It is designed for use with the CAMELYON16Dataset class from the torchmil library. CAMELYON16 is a widely used benchmark in MIL research, making this adaptation particularly valuable for developing and evaluating MIL models.
About the Original CAMELYON16… See the full description on the dataset page: https://huggingface.co/datasets/torchmil/Camelyon16_MIL.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
CAMELYON16 contains 270 WSIs for training and 129 WSIs for test. This dataset is only a tiny part of the whole CAMELYON16. Please check the following links for other parts.
@buttermint has uploaded the test set of CAMELYON 16. 1-20 21-40 41-60 61-80 81-100 101-130
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
The authors of CAMELYON16 have manually annotated the region of cancer in high quality. And the order of the slides in normal part is a bit massive. All the information is in this dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Camelyon+ dataset is accessible through ScienceDB. The original WSI data is available from the official Camelyon16 and Camelyon-17 websites, so it has not been uploaded to the database. Slide-level labels are included in XLSX files. We provide corrected versions of the Camelyon-16 and Camelyon-17 datasets, as well as a combined version of Camelyon+ with four classification labels (negative, micro, macro, ITC) and two classification labels (negative, tumor) to support different downstream tasks.To ensure unbiased data correction by pathologists, the original training dataset from Camelyon-16, originally named "tumor," "normal," and ID, has been renamed. The mapping to the original naming will be recorded and shared in an XLSX file. For positive WSIs, pixel-level annotations are provided in XML format.To enable future comparative experiments using various feature extractors on the Camelyon+ dataset, feature files extracted at 20X magnification using ResNet-50, VIT-S, PLIP, CONCH, UNI, and Gigapath are also available. These feature files are provided in PT format for easy use.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
CAMELYON16 contains 270 WSIs for training and 129 WSIs for test. This dataset is only a tiny part of the whole CAMELYON16. Please check the following links for other parts.
@buttermint has uploaded the test set of CAMELYON 16. 1-20 21-40 41-60 61-80 81-100 101-130
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
The authors of CAMELYON16 have manually annotated the region of cancer in high quality. And the order of the slides in normal part is a bit massive. All the information is in this dataset.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for Histopathology Dataset
Dataset Summary
This dataset contains 1024x1024 patches of a group of histopathology images taken from the CAMELYON16 dataset and embedding vectors extracted from these patches using the Google Path Foundation model.
Thumbnail of Main Slide
Usage
CAMELYON16: List of images taken from CAMELYON16 dataset: test_001.tiftest_002.tif test_003.tif test_004.tif test_005.tif test_006.tif test_007.tif… See the full description on the dataset page: https://huggingface.co/datasets/Cilem/histopathology-1024.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ForcewithMe
Released under Apache 2.0
Dataset Card for Mixed Histopathology Dataset
Dataset Summary
This dataset contains 512x512 patches of a group of histopathology images taken from the CAMELYON16 , CANCER IMAGING ARCHIVE-KIDNEY, CANCER IMAGING ARCHIVE-COLON, CANCER IMAGING ARCHIVE-LUNG datasets and embedding vectors extracted from these patches using the Google Path Foundation model.
Thumbnail of Main Slide
Usage
CAMELYON16: List of images taken from CAMELYON16 dataset:… See the full description on the dataset page: https://huggingface.co/datasets/Cilem/mixed-histopathology-512.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ForcewithMe
Released under Apache 2.0
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ForcewithMe
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Recent advancements in deep learning have shown promise in enhancing the performance of medical image analysis. In pathology, automated whole slide imaging has transformed clinical workflows by streamlining routine tasks and diagnostic and prognostic support. However, the lack of transparency of deep learning models, often described as black boxes, poses a significant barrier to their clinical adoption. This study evaluates various explainability methods for Vision Transformers, assessing their effectiveness in explaining the rationale behind their classification predictions on histopathological images. Using a Vision Transformer trained on the publicly available CAMELYON16 dataset comprising of 399 whole slide images of lymph node metastases of patients with breast cancer, we conducted a comparative analysis of a diverse range of state-of-the-art techniques for generating explanations through heatmaps, including Attention Rollout, Integrated Gradients, RISE, and ViT-Shapley. Our findings reveal that Attention Rollout and Integrated Gradients are prone to artifacts, while RISE and particularly ViT-Shapley generate more reliable and interpretable heatmaps. ViT-Shapley also demonstrated faster runtime and superior performance in insertion and deletion metrics. These results suggest that integrating ViT-Shapley-based heatmaps into pathology reports could enhance trust and scalability in clinical workflows, facilitating the adoption of explainable artificial intelligence in pathology.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ForcewithMe
Released under Apache 2.0
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The classifier detection performance.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by ForcewithMe
Released under Apache 2.0
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The presence of lymph node metastases is one of the most important factors in breast cancer prognosis. The most common strategy to assess the regional lymph node status is the sentinel lymph node procedure. The sentinel lymph node is the most likely lymph node to contain metastasized cancer cells and is excised, histopathologically processed and examined by the pathologist. This tedious examination process is time-consuming and can lead to small metastases being missed. However, recent advances in whole-slide imaging and machine learning have opened an avenue for analysis of digitized lymph node sections with computer algorithms. For example, convolutional neural networks, a type of machine learning algorithm, are able to automatically detect cancer metastases in lymph nodes with high accuracy. To train machine learning models, large, well-curated datasets are needed. We released a dataset of 1399 annotated whole-slide images of lymph nodes, both with and without metastases, in total three terabytes of data in the context of the CAMELYON16 and CAMELYON17 Grand Challenges. Slides were collected from five different medical centers to cover a broad range of image appearance and staining variations. Each whole-slide image has a slide-level label indicating whether it contains no metastases, macro-metastases, micro-metastases or isolated tumor cells. Furthermore, for 209 whole-slide images, detailed hand-drawn contours for all metastases are provided. Last, open-source software tools to visualize and interact with the data have been made available. A unique dataset of annotated, whole-slide digital histopathology images has been provided with high potential for re-use.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a prototype implementation of a mechanism for linking provenance information and its metadata, also called provenance of provenance or meta-provenance. This dataset is an RO-Crate that bundles artifacts of an AI-based computational pipeline. The resulting RO-Crate contains (directly or by a reference) artifacts of the pipeline execution, such as input dataset, intermediate and final results, configuration files, pipeline implementation, log files, or provenance files. The RO-Crate is based on the CPM RO-Crate profile, which integrates the Common Provenance Model (CPM) and Process Run Crate profile. The description of the AI pipeline and an explanation of how the CPM RO-Crate profile is applied to bundle the pipeline execution artifacts is provided in our previous work.
As this dataset aims to demonstrate the mechanism for linking provenance and meta-provenance, the input dataset used for the AI model training and testing is reduced only to a few images, as the size of the input dataset does not affect the mechanism. The images used in the input are from the Camelyon16 dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The pixel-level detection performance on different sampling algorithms with DMC classifier.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BackgroundMetastatic detection in sentinel lymph nodes remains a crucial prognostic factor in breast cancer management, with accurate and timely diagnosis directly impacting treatment decisions. While traditional histopathological assessment relies on microscopic examination of stained tissues, the digitization of slides as whole-slide images (WSI) has enabled the development of computer-aided diagnostic systems. These automated approaches offer potential improvements in detection consistency and efficiency compared to conventional methods.ResultsThis study leverages transfer learning on hematoxylin and eosin (HE) WSIs to achieve computationally efficient metastasis detection without compromising accuracy. We propose an approach for generating segmentation masks by transferring spatial annotations from immunohistochemistry (IHC) WSIs to corresponding H&E slides. Using these masks, four distinct datasets were constructed to fine-tune a pretrained ResNet50 model across eight different configurations, incorporating varied dataset combinations and data augmentation techniques. To enhance interpretability, we developed a visualization tool that employs color-coded probability maps to highlight tumor regions alongside their prediction confidence. Our experiments demonstrated that integrating an external dataset (Camelyon16) during training significantly improved model performance, surpassing the benefits of data augmentation alone. The optimal model, trained on both external and local data, achieved an accuracy and F1-score of 0.98, outperforming existing state-of-the-art methods.ConclusionsThis study demonstrates that transfer learning architectures, when enhanced with multi-source data integration and interpretability frameworks, can significantly improve metastatic detection in whole slide imaging. Our methodology achieves diagnostic performance comparable to gold-standard techniques while dramatically accelerating analytical workflows. The synergistic combination of external dataset incorporation and probabilistic visualization outputs provides a clinically actionable solution that maintains both computational efficiency and pathological interpretability.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
CAMELYON16 challenge dataset. The goal of CAMELYON16 challenge is to evaluate new and existing algorithms for automated detection of metastases in hematoxylin and eosin (H&E) stained whole-slide images (WSIs) of lymph node sections. The dataset contains 270 WSIs (159 normal slides, and 111 slides with tumor) for training, and 129 WSIs for testing. The dataset is a slightly updated version of the one available on GigaScience at . The changes are: 1. The test_114.tiff WSI was exhaustively annotated. 2. Generated mask files were added for each WSI with value 1 for normal tissue, and 2 for tumor areas in the corresponding WSI.