CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A public-access archive of skin lesion images, supporting teaching, research, and the development and evaluation of diagnostic algorithms.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 40 subclasses of skin lesions, collected in Turkiye, which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution images and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with its 5 super classes, 15 main classes, 40 subclasses and 12,345 high-resolution dermatoscopic images.
Yilmaz, A., Yasar, S.P., Gencoglan, G. et al. DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 40 Subclasses. Sci Data 11, 1302 (2024). https://doi.org/10.1038/s41597-024-04104-3
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
MILK10k consists of 10480 images, each representing a paired clinical close-up and dermatoscopic image for 5240 lesions. The dataset’s metadata include age (in 5-year intervals), sex, anatomic site, skin tone, diagnosis, method of ground truth establishment (histopathology or other means), and, if a dermatoscopic image of the same lesion was previously included in ISIC, its corresponding ISIC identifier. Skin tone is categorized into six levels, ranging from very dark (0) to very light (5), intentionally distinct from the Fitzpatrick skin types to avoid confusion. Most patients had skin tones in the middle ranges. Of the 5240 lesions, 95.7% were biopsied or excised, with histopathology serving as the gold standard for diagnosis. Diagnoses were mapped to both the ISIC-Dx diagnostic scheme and a simplified classification based on the ISIC2018/2019 challenge and HAM10000 diagnostic categories. The dataset includes 11 broad diagnostic categories:
Additionally, we provide the most specific ISIC-Dx diagnosis and its parent branch in the ISIC-Dx diagnostic tree. In cases where a dermatoscopic image of the same lesion was already included in the ISIC archive, its ISIC identifier is reported in the metadata. Furthermore, all images have been annotated using the MONET framework, with probabilities for the following concept term groups included in the metadata:
In addition to MILK10k, we have curated a smaller benchmark dataset, called MILK10k Benchmark derived from the same sources and covering the same diagnostic categories. This dataset is available as part of a live challenge within the ISIC framework and can be accessed on ISIC.
Images were provided by the following institutions:
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
To address the challenges of training neural networks for automated diagnosis of pigmented skin lesions, the authors introduced the HAM10000 ("Human Against Machine with 10000 training images") dataset. This dataset aimed to overcome the limitations of small-sized and homogeneous dermatoscopic image datasets by providing a diverse and extensive collection. To achieve this, they collected dermatoscopic images from various populations using different modalities, which necessitated employing distinct acquisition and cleaning methods. The authors also designed semi-automatic workflows that incorporated specialized neural networks to enhance the dataset's quality. The resulting HAM10000 dataset comprised 10,015 dermatoscopic images, which were made available for academic machine learning applications through the ISIC archive. This dataset served as a benchmark for machine learning experiments and comparisons with human experts.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset is a cleaned and preprocessed combination of the HAM10000 and ISIC Archive dermoscopic image datasets, intended for training and evaluating deep learning models for skin lesion classification.
It is structured to support multi-class image classification, and has been carefully processed to maintain high quality, class balance.
Classes Included :
"melanocytic nevi": "Melanocytic Nevus", "nv": "Melanocytic Nevus", "melanoma": "Melanoma", "mel": "Melanoma", "benign keratosis": "Benign Keratosis", "bkl": "Benign Keratosis", "basal cell carcinoma": "Basal Cell Carcinoma", "bcc": "Basal Cell Carcinoma", "actinic keratosis": "Actinic Keratosis", "akiec": "Actinic Keratosis", "dermatofibroma": "Dermatofibroma", "df": "Dermatofibroma", "vascular lesions": "Vascular Lesion", "vasc": "Vascular Lesion", "warts/molluscum": "Warts/Molluscum"
Preprocessing Notes
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Melanoma Melanocytic nevus Basal cell carcinoma Actinic keratosis Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis) Dermatofibroma Vascular lesion Squamous cell carcinoma
Original Challenge: https://challenge.isic-archive.com/data/#2019
[1] Tschandl P., Rosendahl C. & Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi.10.1038/sdata.2018.161 (2018)
[2] Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, Allan Halpern: "Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)", 2017; arXiv:1710.05006.
[3] Marc Combalia, Noel C. F. Codella, Veronica Rotemberg, Brian Helba, Veronica Vilaplana, Ofer Reiter, Allan C. Halpern, Susana Puig, Josep Malvehy: "BCN20000: Dermoscopic Lesions in the Wild", 2019; arXiv:1908.02288.
If you use this dataset in your research, please credit the authors
https://www.everydayhealth.com/skin-cancer/what-are-the-different-types-of-skin-cancer/
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The iToBoS dataset: skin region images extracted from 3D total body photographs for lesion detectionThe early detection of skin cancer is critical for improving patient outcomes. Traditionally, dermatologists rely on dermoscopy to examine pigmented skin lesions. While this non-invasive technique enhances diagnostic accuracy, its effectiveness is highly dependent on the clinician’s expertise. Additionally, capturing dermoscopic images for every suspicious lesion is a labor-intensive process. Given these challenges, there is an increasing need for computer-aided diagnosis (CAD) systems that utilize conventional cameras. Such systems can support general physicians and other non-specialist practitioners in identifying potential malignant lesion, improving early detection and intervention. Moreover, they facilitate longitudinal tracking of lesions, aiding researchers in studying disease progression and treatment efficacy.This dataset provides high-resolution skin patch images extracted from 3D total body photographs to support the development of advanced machine learning models for lesion detection. It serves as a valuable resource for researchers working on automated skin lesion analysis, particularly in the context of total body photography (TBP).Dataset Description:The iToBoS dataset consists of 16,954 high-resolution images of skin regions obtained from anonymized 3D avatars of patients. These avatars were generated using the Canfield VECTRA WB360 system, a cutting-edge imaging technology that captures comprehensive, full-body skin images using 92 fixed cameras arranged in 46 stereo pairs with xenon flash lighting. The images were collected from patients at two clinical sites: the Clinical Hospital of Barcelona (Spain) and the University of Queensland (Australia).The dataset provides diverse anatomical locations, including the torso, arms, and legs, with each image having an average resolution of 1012x827 pixels and a 45-pixel overlap between adjacent images. The images are extracted from 3D avatars while ensuring compliance with GDPR regulations by automatically removing patient facial features. Each image is accompanied by metadata, including patient age range, body location, and sun damage score, allowing for in-depth analysis and stratification.Significance of the Dataset:Facilitates Automated Skin Lesion Detection: The dataset supports the development of AI-based lesion detection models that can improve early diagnosis of skin cancer, particularly in regions with limited access to dermatological expertise.Supports Total Body Photography Research: Leveraging 3D TBP for lesion detection is an emerging field, and this dataset provides a benchmark for further exploration.Enhances Machine Learning Applications: The dataset serves as a benchmark for developing state-of-the-art computer vision and deep learning models for detection of skin lesions.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
These TFRecords contain both the image data and tabular data (meta data) for 2019 ISIC Melanoma Classification Competition described here. The images are 256x256x3 jpegs. The original jpegs have been center square cropped and then resized using cv2.resize
with interpolation = cv2.INTER_AREA
.
The odd numbered TFRecords (1,3,5,7,...)
contain images that had an original image size of 1024x1024 before crop resize. And the even numbered TFRecords (0,2,4,6,...)
did not have size 1024x1024 before crop resize. They are split like this in case you do not want to include the odd numbered TFRecords which some say have images that look different than 2020 competition data.
The train TFRecords have the following fields
feature = {
'image': _bytes_feature,
'image_name': _bytes_feature,
'patient_id': _int64_feature,
'sex': _int64_feature,
'age_approx': _int64_feature,
'anatom_site_general_challenge': _int64_feature,
'diagnosis': _int64_feature,
'target': _int64_feature,
'width': _int64_feature,
'height': _int64_feature
}
The feature width
and height
are the original image width before center square crop resize
The feature target=1
if diagnosis=MEL
which is melanoma and target=0
otherwise. The image_name
is a string. The patient_id
is set to -1
because we don't know it. The sex
has been labeled encoded to int with
0:'male`
1:'female`
The age_approx
originally had 437 NaNs but these have been imputed to mean. The anatom_site_general_challenge
has been label encoded to
-1: NaN
0: 'head/neck'
1: 'upper extremity'
2: 'lower extremity'
3: 'torso',
4: 'palms/soles'
5: 'oral/genital'
The diagnosis
has been label encoded to
9: 'MEL'
10: 'NV'
11: 'BCC'
12: 'AK'
13: 'BKL'
14: 'DF'
15: 'VASC'
16: 'SCC'
17: 'UNK'
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc). More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The dataset includes lesions with multiple images, which can be tracked by the lesion_id-column within the HAM10000_metadata file. Due to upload size limitations, images are stored in two files: HAM10000_images_part1.zip (5000 JPEG files) HAM10000_images_part2.zip (5015 JPEG files) Additional data for evaluation purposes The HAM10000 dataset served as the training set for the ISIC 2018 challenge (Task 3). The test-set images are available herein as ISIC2018_Task3_Test_Images.zip (1511 images), the official validation-set is available through the challenge website https://challenge2018.isic-archive.com/. The ISIC-Archive also provides a "Live challenge" submission site for continuous evaluation of automated classifiers on the official validation- and test-set. Comparison to physicians Test-set evaluations of the ISIC 2018 challenge were compared to physicians on an international scale, where the majority of challenge participants outperformed expert readers: Tschandl P. et al., Lancet Oncol 2019 Human-computer collaboration The test-set images were also used in a study comparing different methods and scenarios of human-computer collaboration: Tschandl P. et al., Nature Medicine 2020 Following corresponding metadata is available herein: ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.csv: Human ratings for Test images with and without interaction with a ResNet34 CNN (Malignancy Probability, Multi-Class probability, CBIR) or Human-Crowd Multi-Class probabilities. This is data was collected for and analyzed in Tschandl P. et al., Nature Medicine 2020, therefore please refer to this publication when using the data. HAM10000_segmentations_lesion_tschandl.zip: To evaluate regions of CNN activations in Tschandl P. et al., Nature Medicine 2020 (please refer to this publication when using the data), a single dermatologist (Tschandl P) created binary segmentation masks for all 10015 images from the HAM10000 dataset. Masks were initialized with the segmentation network as described by Tschandl et al., Computers in Biology and Medicine 2019, and following verified, corrected or replaced via the free-hand selection tool in FIJI.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dermatoscopic images usually depict a single skin lesion, but large scale datasets with available segmentations of affected areas are not available until now. Challenge segmentation data often suffered from being either too coarse or too noisy. This dataset provides 10015 binary segmentation masks based on FCN-created segmentations and hand-drawn lines, which together with the HAM10000 diagnosis metadata can be used for object detection or semantic segmentation.
This dataset contains binary segmentation masks as PNG-files of all HAM10000 dataset images. The area segments lesion area as evaluated by a single dermatologist (me). They were initiated with a FCN lesion segmentation model, where afterwards I went through all of them and either approved them, or corrected / redrew them with the free-hand selection tool in FIJI.
You can find the HAM10000 dataset images at the following places: - Harvard Dataverse: https://doi.org/10.7910/DVN/DBW86T - ISIC Archive Gallery: https://www.isic-archive.com - Kaggle Dataset Kernel (downsampled): https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000
If you use this data, please cite/refer to the publication I made these segmentation masks for...
...and the original source of the images:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We conducted a prospective, observational clinical validation study to assess the diagnostic accuracy of the AI algorithm (ADAE) in predicting melanoma from dermoscopy skin lesion images. Patients who had consented for a skin biopsy to exclude melanoma were eligible. All lesions underwent biopsy.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The ISIC_WSM dataset provides pixel–level supervisions for a subset of images (43885) from the ISIC archive, while the original images can be downloaded separately at the ISIC website. The supervision is obtained from the available bounding–boxes of the COCO–Text dataset exploiting a weakly supervised algorithm. See the paper for more details.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
📄 Context Skin diseases are among the most common health concerns worldwide, ranging from benign lesions like keratosis to serious conditions such as melanoma. Early and accurate diagnosis plays a vital role in preventing disease progression and improving patient outcomes. This dataset aims to assist in developing AI-driven dermatology tools by providing structured information on various skin diseases, their definitions, patient-described symptoms, and associated clinical images. 🔍 Sources The dataset is compiled from a combination of: Publicly available dermatological image repositories, such as the ISIC (International Skin Imaging Collaboration) archive, which contains labeled dermoscopic images of skin lesions. Clinical literature and dermatology textbooks, used to write concise disease definitions. Simulated patient statements, reflecting typical ways in which patients describe their skin conditions during clinical consultations. These were generated based on clinical case studies and patient interviews found in dermatology research papers. Synthetic aggregation: File names refer to images associated with each disease class, meant for easy integration with machine learning pipelines. 🌟 Inspiration This dataset was inspired by the growing need for: Explainable AI (XAI) in dermatology: Making machine learning models more understandable to clinicians and patients. Bridging the gap between clinical terminology and patient language: Helping AI models learn how real patients describe their symptoms, enhancing the usability of teledermatology tools. Supporting education and research: Assisting medical students, researchers, and AI developers in understanding skin diseases in both clinical and layman contexts. Enabling multi-modal learning: Combining text descriptions, disease definitions, and images to train more robust models that can reason across data types. 📄 Column Descriptions Disease Class - The name of the skin disease type (e.g., Actinic Keratosis, Melanoma, Benign Keratosis, etc.). There are 9 unique classes. Disease Definition - A clinical description explaining the nature and characteristics of the disease. Major Statement - Simulated patient descriptions or questions that reflect how individuals typically describe their symptoms. File Name - The corresponding image file name related to the disease case
100 high-resolution
Not seeing a result you expected?
Learn how you can add new datasets to our index.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
A public-access archive of skin lesion images, supporting teaching, research, and the development and evaluation of diagnostic algorithms.