The PatchCamelyon benchmark is a new and challenging image classification dataset. It consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annoted with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than Imagenet, trainable on a single GPU.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('patch_camelyon', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/patch_camelyon-2.0.0.png" alt="Visualization" width="500px">
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset is a subset of the Camelyon-17 Breast Cancer Challenge. It contains 224x224 H&E histological image patches where blood has been detected. It was originally sampled to validate the blood detection capabilities of the method presented in [1]. Blood was manually identified by a trained technician.
If you use this dataset, please cite:
Pérez-Bueno, F., Engan, K., Molina, R. (2024). Robust blind color deconvolution and blood detection on histological images using Bayesian K-SVD. In: Journal of Artificial Intelligence in Medicine. https://doi.org/10.1016/j.artmed.2024.102969 [bibtex]
Pérez-Bueno, F., Engan, K., Molina, R. (2023). A Robust BKSVD Method for Blind Color Deconvolution and Blood Detection on H&E Histological Images. In: Artificial Intelligence in Medicine. AIME 2023, vol 13897. https://doi.org/10.1007/978-3-031-34344-5_25 [bibtex]
and the original publication for the Camelyon-17 Challenge (see details on the challenge website)
The folder structure is as follows:
center/image_id/pathology_label/patch_label/
pathology_label can take the following values:
patch_label can take the following values:
Patches are sampled at the maximum resolution available 40x, and the filename includes the starting pixel in the x and y dimension. For the original .tiff images at high quality, please refer to the Camelyon-17 Challenge.
The license for this dataset is CC0 following the Camelyon-17 license.
The dataset consists of 400 whole-slide images (WSIs) of lymph node sections stained with hematoxylin and eosin (H&E), collected from two medical centers in the Netherlands. The WSIs are stored in a multi-resolution pyramid format, allowing for efficient retrieval of image subregions at different magnification levels. The training set includes two subsets:
170 WSIs (100 normal, 70 with metastases) from Radboud University Medical Center 100 WSIs (60 normal, 40 with metastases) from University Medical Center Utrecht
The test set consists of 130 WSIs from both institutions. Ground truth data for metastases is provided as XML files with annotated contours and WSI binary masks.
The Camelyon16 dataset aims to reduce the workload and subjectivity in cancer diagnosis by pathologists. It serves as a benchmark for evaluating algorithms that can automatically detect metastases in histopathological images, focusing on breast cancer in sentinel lymph nodes.
Researchers can develop and refine machine learning models for automated detection of metastases. The dataset allows for performance comparisons of different detection algorithms. Automated systems can be integrated into clinical workflows to enhance diagnostic accuracy and efficiency. The dataset is valuable for training medical professionals in digital pathology and AI applications in diagnostics.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Camelyon+ dataset is accessible through ScienceDB. The original WSI data is available from the official Camelyon16 and Camelyon-17 websites, so it has not been uploaded to the database. Slide-level labels are included in XLSX files. We provide corrected versions of the Camelyon-16 and Camelyon-17 datasets, as well as a combined version of Camelyon+ with four classification labels (negative, micro, macro, ITC) and two classification labels (negative, tumor) to support different downstream tasks.To ensure unbiased data correction by pathologists, the original training dataset from Camelyon-16, originally named "tumor," "normal," and ID, has been renamed. The mapping to the original naming will be recorded and shared in an XLSX file. For positive WSIs, pixel-level annotations are provided in XML format.To enable future comparative experiments using various feature extractors on the Camelyon+ dataset, feature files extracted at 20X magnification using ResNet-50, VIT-S, PLIP, CONCH, UNI, and Gigapath are also available. These feature files are provided in PT format for easy use.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The presence of lymph node metastases is one of the most important factors in breast cancer prognosis. The most common strategy to assess the regional lymph node status is the sentinel lymph node procedure. The sentinel lymph node is the most likely lymph node to contain metastasized cancer cells and is excised, histopathologically processed and examined by the pathologist. This tedious examination process is time-consuming and can lead to small metastases being missed. However, recent advances in whole-slide imaging and machine learning have opened an avenue for analysis of digitized lymph node sections with computer algorithms. For example, convolutional neural networks, a type of machine learning algorithm, are able to automatically detect cancer metastases in lymph nodes with high accuracy. To train machine learning models, large, well-curated datasets are needed. We released a dataset of 1399 annotated whole-slide images of lymph nodes, both with and without metastases, in total three terabytes of data in the context of the CAMELYON16 and CAMELYON17 Grand Challenges. Slides were collected from five different medical centers to cover a broad range of image appearance and staining variations. Each whole-slide image has a slide-level label indicating whether it contains no metastases, macro-metastases, micro-metastases or isolated tumor cells. Furthermore, for 209 whole-slide images, detailed hand-drawn contours for all metastases are provided. Last, open-source software tools to visualize and interact with the data have been made available. A unique dataset of annotated, whole-slide digital histopathology images has been provided with high potential for re-use.
CAMELYON-17 consists of 145 positive slides and 353 negative slides, where positive patches occupying less than 10% of the tissue area in positive slides.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detection performance comparison with Camelyon16.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
CAMELYON16 contains 270 WSIs for training and 129 WSIs for test. This dataset is only a tiny part of the whole CAMELYON16. Please check the following links for other parts.
@buttermint has uploaded the test set of CAMELYON 16. 1-20 21-40 41-60 61-80 81-100 101-130
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
The authors of CAMELYON16 have manually annotated the region of cancer in high quality. And the order of the slides in normal part is a bit massive. All the information is in this dataset.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
CAMELYON17 challenge dataset. The goal of this challenge is to evaluate new and existing algorithms for automated detection and classification of breast cancer metastases in whole-slide images of histological lymph node sections. The dataset contains 1000 WSIs of 200 artificial patients from 5 different medical center and exhaustive annotations for 10 WSIs from each center. The dataset is a slightly updated version of the one available on GigaScience at . The changes are: 1. Generated mask files were added for each annotated WSI and 50 additional WSI without tumor with value 1 for normal tissue, and 2 for tumor areas in the corresponding WSI. 2. The images are shared without zipping them together per patient.
Feature extraction on the camelyon 17 dataset (https://camelyon17.grand-challenge.org/Data/) using the tile-level encoder of the Prov-GigaPath model (10.1038/s41586-024-07441-w) and the trident project (https://github.com/mahmoodlab/trident/).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Assessment of the sufficiency of information provided for reproducibility.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Features for 4 slides of the Camelyon dataset, produced as described by the FLamby project: https://github.com/owkin/FLamby/tree/1e8023c05814852c23c0b2acb10abba0f7c2c4ee/flamby/datasets/fed_camelyon16
kzorluoglu/chameleon-dataset-v2 dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
PatchCamelyon (PCam)
Description
The PatchCamelyon benchmark is a new and challenging image classification dataset. It consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annoted with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than imagenet, trainable on a single GPU
Why PCam
Fundamental… See the full description on the dataset page: https://huggingface.co/datasets/1aurent/PatchCamelyon.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Chameleon Project is a dataset for object detection tasks - it contains Chameleon annotations for 237 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The classifier detection performance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of original and reimplementation results of Lee paper.
https://webtechsurvey.com/termshttps://webtechsurvey.com/terms
A complete list of live websites using the Chameleon technology, compiled through global website indexing conducted by WebTechSurvey.
The PatchCamelyon benchmark is a new and challenging image classification dataset. It consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annoted with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than Imagenet, trainable on a single GPU.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('patch_camelyon', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/patch_camelyon-2.0.0.png" alt="Visualization" width="500px">