14 datasets found

o
International Skin Imaging Collaboration (ISIC) Archive
registry.opendata.aws
Updated Aug 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
International Skin Imaging Collaboration (ISIC) (2025). International Skin Imaging Collaboration (ISIC) Archive [Dataset]. https://registry.opendata.aws/isic-archive/
Explore at:
Dataset updated
Aug 12, 2025
Dataset provided by
International Skin Imaging Collaboration (ISIC)
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
A public-access archive of skin lesion images, supporting teaching, research, and the development and evaluation of diagnostic algorithms.
i
DERM12345
api.isic-archive.com
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Imperial College London (2025). DERM12345 [Dataset]. http://doi.org/10.34970/705541
Explore at:
Unique identifier
https://doi.org/10.34970/705541
Dataset updated
2025
Dataset provided by
ISIC Archive
datacite
Authors
Imperial College London
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Skin lesion datasets provide essential information for understanding various skin conditions and developing effective diagnostic tools. They aid the artificial intelligence-based early detection of skin cancer, facilitate treatment planning, and contribute to medical education and research. Published large datasets have partially coverage the subclassifications of the skin lesions. This limitation highlights the need for more expansive and varied datasets to reduce false predictions and help improve the failure analysis for skin lesions. This study presents a diverse dataset comprising 12,345 dermatoscopic images with 40 subclasses of skin lesions, collected in Turkiye, which comprises different skin types in the transition zone between Europe and Asia. Each subgroup contains high-resolution images and expert annotations, providing a strong and reliable basis for future research. The detailed analysis of each subgroup provided in this study facilitates targeted research endeavors and enhances the depth of understanding regarding the skin lesions. This dataset distinguishes itself through a diverse structure with its 5 super classes, 15 main classes, 40 subclasses and 12,345 high-resolution dermatoscopic images.

Yilmaz, A., Yasar, S.P., Gencoglan, G. et al. DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 40 Subclasses. Sci Data 11, 1302 (2024). https://doi.org/10.1038/s41597-024-04104-3
i
MILK10k
api.isic-archive.com
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MILK study team (2025). MILK10k [Dataset]. http://doi.org/10.34970/648456
Explore at:
Unique identifier
https://doi.org/10.34970/648456
Dataset updated
2025
Dataset provided by
ISIC Archive
datacite
Authors
MILK study team
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
MILK10k consists of 10480 images, each representing a paired clinical close-up and dermatoscopic image for 5240 lesions. The dataset’s metadata include age (in 5-year intervals), sex, anatomic site, skin tone, diagnosis, method of ground truth establishment (histopathology or other means), and, if a dermatoscopic image of the same lesion was previously included in ISIC, its corresponding ISIC identifier. Skin tone is categorized into six levels, ranging from very dark (0) to very light (5), intentionally distinct from the Fitzpatrick skin types to avoid confusion. Most patients had skin tones in the middle ranges. Of the 5240 lesions, 95.7% were biopsied or excised, with histopathology serving as the gold standard for diagnosis. Diagnoses were mapped to both the ISIC-Dx diagnostic scheme and a simplified classification based on the ISIC2018/2019 challenge and HAM10000 diagnostic categories. The dataset includes 11 broad diagnostic categories:

Basal cell carcinoma (bcc)

Melanocytic nevus (nv)

Benign keratinocytic lesion (bkl)

Squamous cell carcinoma/keratoacanthoma (sccka)

Melanoma (mel)

Actinic keratosis/intraepidermal carcinoma (akiec)

Dermatofibroma (df)

Inflammatory and infectious conditions (inf)

Vascular lesions and hemorrhage (vasc)

Other benign proliferations including collision tumors (ben_oth)

Other malignant proliferations including collision tumors (mal_oth)

Additionally, we provide the most specific ISIC-Dx diagnosis and its parent branch in the ISIC-Dx diagnostic tree. In cases where a dermatoscopic image of the same lesion was already included in the ISIC archive, its ISIC identifier is reported in the metadata. Furthermore, all images have been annotated using the MONET framework, with probabilities for the following concept term groups included in the metadata:

Ulceration, crust

Hair

Vasculature, vessels

Erythema

Pigmentation

Gel, water drop, fluid, dermoscopy liquid

Skin markings, pen ink, purple pen

In addition to MILK10k, we have curated a smaller benchmark dataset, called MILK10k Benchmark derived from the same sources and covering the same diagnostic categories. This dataset is available as part of a live challenge within the ISIC framework and can be accessed on ISIC.

Images were provided by the following institutions:

Department of Dermatology, Medical University of Vienna, Vienna, Austria

Medicine Faculty Department of Dermatology, Ankara University, Ankara, Turkey

Mayne Academy of General Practice, Medical School, The University of Queensland, Australia

Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, USA

Independent Researcher, 1000 Skopje, North Macedonia
D
Skin Cancer: HAM10000 Dataset
datasetninja.com
Updated Jan 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tschandl, Philipp; Cliff Rosendahl; Harald Kittler (2024). Skin Cancer: HAM10000 Dataset [Dataset]. https://datasetninja.com/skin-cancer-ham10000
Explore at:
Dataset updated
Jan 21, 2024
Dataset provided by
Dataset Ninja
Authors
Tschandl, Philipp; Cliff Rosendahl; Harald Kittler
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
To address the challenges of training neural networks for automated diagnosis of pigmented skin lesions, the authors introduced the HAM10000 ("Human Against Machine with 10000 training images") dataset. This dataset aimed to overcome the limitations of small-sized and homogeneous dermatoscopic image datasets by providing a diverse and extensive collection. To achieve this, they collected dermatoscopic images from various populations using different modalities, which necessitated employing distinct acquisition and cleaning methods. The authors also designed semi-automatic workflows that incorporated specialized neural networks to enhance the dataset's quality. The resulting HAM10000 dataset comprised 10,015 dermatoscopic images, which were made available for academic machine learning applications through the ISIC archive. This dataset served as a benchmark for machine learning experiments and comparisons with human experts.
Skin Disease Detection Dataset (HAM10000 + ISIC)
kaggle.com
Updated May 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nour12347653 (2025). Skin Disease Detection Dataset (HAM10000 + ISIC) [Dataset]. https://www.kaggle.com/datasets/nour12347653/skin-disease-detection-dataset-ham10000-isic
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 20, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
nour12347653
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is a cleaned and preprocessed combination of the HAM10000 and ISIC Archive dermoscopic image datasets, intended for training and evaluating deep learning models for skin lesion classification.

It is structured to support multi-class image classification, and has been carefully processed to maintain high quality, class balance.

Classes Included :

"melanocytic nevi": "Melanocytic Nevus", "nv": "Melanocytic Nevus", "melanoma": "Melanoma", "mel": "Melanoma", "benign keratosis": "Benign Keratosis", "bkl": "Benign Keratosis", "basal cell carcinoma": "Basal Cell Carcinoma", "bcc": "Basal Cell Carcinoma", "actinic keratosis": "Actinic Keratosis", "akiec": "Actinic Keratosis", "dermatofibroma": "Dermatofibroma", "df": "Dermatofibroma", "vascular lesions": "Vascular Lesion", "vasc": "Vascular Lesion", "warts/molluscum": "Warts/Molluscum"

Preprocessing Notes

All images resized to 224x224 for CNN compatibility

Labels unified and cleaned across both datasets

Invalid or corrupted entries removed
Skin Lesions
kaggle.com
Updated Nov 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anwar Hawash (2023). Skin Lesions [Dataset]. https://www.kaggle.com/datasets/anwarhawash/skin-lesions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 11, 2023
Dataset provided by
Kaggle
Authors
Anwar Hawash
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Diagnostic Categories:

Melanoma Melanocytic nevus Basal cell carcinoma Actinic keratosis Benign keratosis (solar lentigo / seborrheic keratosis / lichen planus-like keratosis) Dermatofibroma Vascular lesion Squamous cell carcinoma

Original Data Source

Original Challenge: https://challenge.isic-archive.com/data/#2019

[1] Tschandl P., Rosendahl C. & Kittler H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi.10.1038/sdata.2018.161 (2018)

[2] Noel C. F. Codella, David Gutman, M. Emre Celebi, Brian Helba, Michael A. Marchetti, Stephen W. Dusza, Aadi Kalloo, Konstantinos Liopyris, Nabin Mishra, Harald Kittler, Allan Halpern: "Skin Lesion Analysis Toward Melanoma Detection: A Challenge at the 2017 International Symposium on Biomedical Imaging (ISBI), Hosted by the International Skin Imaging Collaboration (ISIC)", 2017; arXiv:1710.05006.

[3] Marc Combalia, Noel C. F. Codella, Veronica Rotemberg, Brian Helba, Veronica Vilaplana, Ofer Reiter, Allan C. Halpern, Susana Puig, Josep Malvehy: "BCN20000: Dermoscopic Lesions in the Wild", 2019; arXiv:1908.02288.

Copyright and Attribution

If you use this dataset in your research, please credit the authors

what-are-the-different-types-of-skin-cancer?

https://www.everydayhealth.com/skin-cancer/what-are-the-different-types-of-skin-cancer/
f
iToBoS 2024 - Skin Lesion Detection with 3D-TBP
figshare.com
api.isic-archive.com
png
Updated May 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anup Saha (2025). iToBoS 2024 - Skin Lesion Detection with 3D-TBP [Dataset]. http://doi.org/10.6084/m9.figshare.28452545.v6
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28452545.v6
Dataset updated
May 12, 2025
Dataset provided by
figshare
Authors
Anup Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The iToBoS dataset: skin region images extracted from 3D total body photographs for lesion detectionThe early detection of skin cancer is critical for improving patient outcomes. Traditionally, dermatologists rely on dermoscopy to examine pigmented skin lesions. While this non-invasive technique enhances diagnostic accuracy, its effectiveness is highly dependent on the clinician’s expertise. Additionally, capturing dermoscopic images for every suspicious lesion is a labor-intensive process. Given these challenges, there is an increasing need for computer-aided diagnosis (CAD) systems that utilize conventional cameras. Such systems can support general physicians and other non-specialist practitioners in identifying potential malignant lesion, improving early detection and intervention. Moreover, they facilitate longitudinal tracking of lesions, aiding researchers in studying disease progression and treatment efficacy.This dataset provides high-resolution skin patch images extracted from 3D total body photographs to support the development of advanced machine learning models for lesion detection. It serves as a valuable resource for researchers working on automated skin lesion analysis, particularly in the context of total body photography (TBP).Dataset Description:The iToBoS dataset consists of 16,954 high-resolution images of skin regions obtained from anonymized 3D avatars of patients. These avatars were generated using the Canfield VECTRA WB360 system, a cutting-edge imaging technology that captures comprehensive, full-body skin images using 92 fixed cameras arranged in 46 stereo pairs with xenon flash lighting. The images were collected from patients at two clinical sites: the Clinical Hospital of Barcelona (Spain) and the University of Queensland (Australia).The dataset provides diverse anatomical locations, including the torso, arms, and legs, with each image having an average resolution of 1012x827 pixels and a 45-pixel overlap between adjacent images. The images are extracted from 3D avatars while ensuring compliance with GDPR regulations by automatically removing patient facial features. Each image is accompanied by metadata, including patient age range, body location, and sun damage score, allowing for in-depth analysis and stratification.Significance of the Dataset:Facilitates Automated Skin Lesion Detection: The dataset supports the development of AI-based lesion detection models that can improve early diagnosis of skin cancer, particularly in regions with limited access to dermatological expertise.Supports Total Body Photography Research: Leveraging 3D TBP for lesion detection is an emerging field, and this dataset provides a benchmark for further exploration.Enhances Machine Learning Applications: The dataset serves as a benchmark for developing state-of-the-art computer vision and deep learning models for detection of skin lesions.
ISIC 2019 TFRecords 256x256
kaggle.com
zip
Updated Jul 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Deotte (2020). ISIC 2019 TFRecords 256x256 [Dataset]. https://www.kaggle.com/cdeotte/ISIC2019-256x256
Explore at:
zip(440785630 bytes)Available download formats
Dataset updated
Jul 10, 2020
Authors
Chris Deotte
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
These TFRecords contain both the image data and tabular data (meta data) for 2019 ISIC Melanoma Classification Competition described here. The images are 256x256x3 jpegs. The original jpegs have been center square cropped and then resized using cv2.resize with interpolation = cv2.INTER_AREA.

The odd numbered TFRecords (1,3,5,7,...) contain images that had an original image size of 1024x1024 before crop resize. And the even numbered TFRecords (0,2,4,6,...) did not have size 1024x1024 before crop resize. They are split like this in case you do not want to include the odd numbered TFRecords which some say have images that look different than 2020 competition data.

TFRecords with Image and Tabular Data

The train TFRecords have the following fields

feature = { 'image': _bytes_feature, 'image_name': _bytes_feature, 'patient_id': _int64_feature, 'sex': _int64_feature, 'age_approx': _int64_feature, 'anatom_site_general_challenge': _int64_feature, 'diagnosis': _int64_feature, 'target': _int64_feature, 'width': _int64_feature, 'height': _int64_feature }

The feature width and height are the original image width before center square crop resize

The feature target=1 if diagnosis=MEL which is melanoma and target=0 otherwise. The image_name is a string. The patient_id is set to -1 because we don't know it. The sex has been labeled encoded to int with

0:'male` 1:'female`

The age_approx originally had 437 NaNs but these have been imputed to mean. The anatom_site_general_challenge has been label encoded to

-1: NaN 0: 'head/neck' 1: 'upper extremity' 2: 'lower extremity' 3: 'torso', 4: 'palms/soles' 5: 'oral/genital'

The diagnosis has been label encoded to

9: 'MEL' 10: 'NV' 11: 'BCC' 12: 'AK' 13: 'BKL' 14: 'DF' 15: 'VASC' 16: 'SCC' 17: 'UNK'
H
Data from: The HAM10000 dataset, a large collection of multi-source...
dataverse.harvard.edu
tsv, zip
Updated Jan 29, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2021). The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions [Dataset]. http://doi.org/10.7910/DVN/DBW86T
Explore at:
tsv(830369), zip(10808743)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/DBW86T
Dataset updated
Jan 29, 2021
Dataset provided by
Harvard Dataverse
Description
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc). More than 50% of lesions are confirmed through histopathology (histo), the ground truth for the rest of the cases is either follow-up examination (follow_up), expert consensus (consensus), or confirmation by in-vivo confocal microscopy (confocal). The dataset includes lesions with multiple images, which can be tracked by the lesion_id-column within the HAM10000_metadata file. Due to upload size limitations, images are stored in two files: HAM10000_images_part1.zip (5000 JPEG files) HAM10000_images_part2.zip (5015 JPEG files) Additional data for evaluation purposes The HAM10000 dataset served as the training set for the ISIC 2018 challenge (Task 3). The test-set images are available herein as ISIC2018_Task3_Test_Images.zip (1511 images), the official validation-set is available through the challenge website https://challenge2018.isic-archive.com/. The ISIC-Archive also provides a "Live challenge" submission site for continuous evaluation of automated classifiers on the official validation- and test-set. Comparison to physicians Test-set evaluations of the ISIC 2018 challenge were compared to physicians on an international scale, where the majority of challenge participants outperformed expert readers: Tschandl P. et al., Lancet Oncol 2019 Human-computer collaboration The test-set images were also used in a study comparing different methods and scenarios of human-computer collaboration: Tschandl P. et al., Nature Medicine 2020 Following corresponding metadata is available herein: ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.csv: Human ratings for Test images with and without interaction with a ResNet34 CNN (Malignancy Probability, Multi-Class probability, CBIR) or Human-Crowd Multi-Class probabilities. This is data was collected for and analyzed in Tschandl P. et al., Nature Medicine 2020, therefore please refer to this publication when using the data. HAM10000_segmentations_lesion_tschandl.zip: To evaluate regions of CNN activations in Tschandl P. et al., Nature Medicine 2020 (please refer to this publication when using the data), a single dermatologist (Tschandl P) created binary segmentation masks for all 10015 images from the HAM10000 dataset. Masks were initialized with the segmentation network as described by Tschandl et al., Computers in Biology and Medicine 2019, and following verified, corrected or replaced via the free-hand selection tool in FIJI.
HAM10000 Lesion Segmentations
kaggle.com
Updated Jul 2, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
chdlr (2020). HAM10000 Lesion Segmentations [Dataset]. https://www.kaggle.com/datasets/tschandl/ham10000-lesion-segmentations/
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 2, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
chdlr
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Context

Dermatoscopic images usually depict a single skin lesion, but large scale datasets with available segmentations of affected areas are not available until now. Challenge segmentation data often suffered from being either too coarse or too noisy. This dataset provides 10015 binary segmentation masks based on FCN-created segmentations and hand-drawn lines, which together with the HAM10000 diagnosis metadata can be used for object detection or semantic segmentation.

Content

This dataset contains binary segmentation masks as PNG-files of all HAM10000 dataset images. The area segments lesion area as evaluated by a single dermatologist (me). They were initiated with a FCN lesion segmentation model, where afterwards I went through all of them and either approved them, or corrected / redrew them with the free-hand selection tool in FIJI.

You can find the HAM10000 dataset images at the following places: - Harvard Dataverse: https://doi.org/10.7910/DVN/DBW86T - ISIC Archive Gallery: https://www.isic-archive.com - Kaggle Dataset Kernel (downsampled): https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000

Acknowledgements

If you use this data, please cite/refer to the publication I made these segmentation masks for...

Tschandl, P., Rinner, C., Apalla, Z. et al. Human–computer collaboration for skin cancer recognition. Nat Med (2020). https://doi.org/10.1038/s41591-020-0942-0

...and the original source of the images:

Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci Data 5, 180161 (2018). https://doi.org/10.1038/sdata.2018.161
PROVe-AI
api.isic-archive.com
Updated 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Memorial Sloan Kettering Cancer Center (2022). PROVe-AI [Dataset]. http://doi.org/10.34970/576276
Explore at:
Unique identifier
https://doi.org/10.34970/576276
Dataset updated
2022
Dataset provided by
DataCitehttps://www.datacite.org/
ISIC Archive
Authors
Memorial Sloan Kettering Cancer Center
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We conducted a prospective, observational clinical validation study to assess the diagnostic accuracy of the AI algorithm (ADAE) in predicting melanoma from dermoscopy skin lesion images. Patients who had consented for a skin biopsy to exclude melanoma were eligible. All lesions underwent biopsy.
O
ISIC_WSM
opendatalab.com
zip
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Siena (2024). ISIC_WSM [Dataset]. https://opendatalab.com/OpenDataLab/ISIC_WSM
Explore at:
zipAvailable download formats
Dataset updated
Jun 13, 2024
Dataset provided by
University of Siena
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The ISIC_WSM dataset provides pixel–level supervisions for a subset of images (43885) from the ISIC archive, while the original images can be downloaded separately at the ISIC website. The supervision is obtained from the available bounding–boxes of the COCO–Text dataset exploiting a weakly supervised algorithm. See the paper for more details.
NLP_SKIN_DATA_PS_DD
kaggle.com
Updated Jul 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HARINI SHREE R (2025). NLP_SKIN_DATA_PS_DD [Dataset]. http://doi.org/10.34740/kaggle/dsv/12368953
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/12368953
Dataset updated
Jul 4, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
HARINI SHREE R
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
📄 Context Skin diseases are among the most common health concerns worldwide, ranging from benign lesions like keratosis to serious conditions such as melanoma. Early and accurate diagnosis plays a vital role in preventing disease progression and improving patient outcomes. This dataset aims to assist in developing AI-driven dermatology tools by providing structured information on various skin diseases, their definitions, patient-described symptoms, and associated clinical images. 🔍 Sources The dataset is compiled from a combination of: Publicly available dermatological image repositories, such as the ISIC (International Skin Imaging Collaboration) archive, which contains labeled dermoscopic images of skin lesions. Clinical literature and dermatology textbooks, used to write concise disease definitions. Simulated patient statements, reflecting typical ways in which patients describe their skin conditions during clinical consultations. These were generated based on clinical case studies and patient interviews found in dermatology research papers. Synthetic aggregation: File names refer to images associated with each disease class, meant for easy integration with machine learning pipelines. 🌟 Inspiration This dataset was inspired by the growing need for: Explainable AI (XAI) in dermatology: Making machine learning models more understandable to clinicians and patients. Bridging the gap between clinical terminology and patient language: Helping AI models learn how real patients describe their symptoms, enhancing the usability of teledermatology tools. Supporting education and research: Assisting medical students, researchers, and AI developers in understanding skin diseases in both clinical and layman contexts. Enabling multi-modal learning: Combining text descriptions, disease definitions, and images to train more robust models that can reason across data types. 📄 Column Descriptions Disease Class - The name of the skin disease type (e.g., Actinic Keratosis, Melanoma, Benign Keratosis, etc.). There are 9 unique classes. Disease Definition - A clinical description explaining the nature and characteristics of the disease. Major Statement - Simulated patient descriptions or questions that reflect how individuals typically describe their symptoms. File Name - The corresponding image file name related to the disease case
i
Dermatology Image and Text Dataset for AI-Powered Diagnosis and RAG-Based...
ieee-dataport.org
Updated May 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emre Olca (2025). Dermatology Image and Text Dataset for AI-Powered Diagnosis and RAG-Based Medical Support [Dataset]. https://ieee-dataport.org/documents/dermatology-image-and-text-dataset-ai-powered-diagnosis-and-rag-based-medical-support
Explore at:
Dataset updated
May 1, 2025
Authors
Emre Olca
Description
100 high-resolution
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

International Skin Imaging Collaboration (ISIC) (2025). International Skin Imaging Collaboration (ISIC) Archive [Dataset]. https://registry.opendata.aws/isic-archive/

International Skin Imaging Collaboration (ISIC) Archive

Explore at:

267 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Aug 12, 2025

Dataset provided by

International Skin Imaging Collaboration (ISIC)

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

A public-access archive of skin lesion images, supporting teaching, research, and the development and evaluation of diagnostic algorithms.

Clear search

Close search

Google apps

Main menu

International Skin Imaging Collaboration (ISIC) Archive

DERM12345

MILK10k

Skin Cancer: HAM10000 Dataset

Skin Disease Detection Dataset (HAM10000 + ISIC)

Skin Lesions

Diagnostic Categories:

Original Data Source

Copyright and Attribution

what-are-the-different-types-of-skin-cancer?

iToBoS 2024 - Skin Lesion Detection with 3D-TBP

ISIC 2019 TFRecords 256x256

TFRecords with Image and Tabular Data

Data from: The HAM10000 dataset, a large collection of multi-source...

HAM10000 Lesion Segmentations

Context

Content

Acknowledgements

PROVe-AI

ISIC_WSM

NLP_SKIN_DATA_PS_DD

Dermatology Image and Text Dataset for AI-Powered Diagnosis and RAG-Based...

International Skin Imaging Collaboration (ISIC) Archive