Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The source dataset and its full description may be accessed through the Harvard Dataverse, and should be cited as
Tschandl, Philipp, 2018, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions", https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V4, UNF:6:KCZFcBLiFE5ObWcTc2ZBOA== [fileUNF]
Note that the herein uploaded dataset does not contain all of the source material, namely the file ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.tab
- which contains data on a study involving human-computer collaboration - and the folder HAM10000_segmentations_lesion_tschandl
- containing binary segmentation masks of the training images. Still, in contrast to most of the HAM10000 datasets published in Kaggle, the current one includes the test dataset that was curated for the ISIC 2018 challenge (Task 3).
The uploaded dataset is comprised by 3 folders and 2 files, described in the table below.
Content | Type | Description |
---|---|---|
HAM10000_images_part_1 | folder | Part 1 of a set of training pictures |
HAM10000_images_part_2 | folder | Part 2 of a set of training pictures |
ISIC2018_Task3_Test_Images | folder | Set of test pictures |
HAM10000_metadata.csv | file | Metadata associated with the training data |
ISIC2018_Task3_Test_GroundTruth.csv | file | Metadata associated with the test data |
The training dataset (HAM10000_images_part_1
and HAM10000_images_part_2
) is called "HAM10000" meaning "Human Against Machine with 10000 training images"" (actually 10015 images) and it corresponds to a large collection of multi-source dermatoscopic RGB images (JPG) of common pigmented skin lesions. The test dataset (ISIC2018_Task3_Test_Images
) corresponds to 511 images. The files HAM10000_metadata.csv
and ISIC2018_Task3_Test_GroundTruth.csv
contain the respective metadata (data about the data) which further include other features and the labels.
Their structure of the metadata files follows the template presented by the table below.
Column | Type | Description |
---|---|---|
lesion_id | String | ID of the lesion case |
image_id | String | ID of an image (also the name of the respective JPG file) associated with that case |
dx | String | Label of that case |
dx_type | String | Method used for diagnosing that case |
age | Float | Age of the person associated with that case |
sex | String | Sex of the person associated with that case |
localization | String | Location of the lesion in the person body |
dataset | String | Reference from which the data was taken |
dx
column (the classes)The values that the column dx
may take are tabulated below.
Value | Description |
---|---|
akiec | Actinic keratoses and intraepithelial carcinoma (also called "Bowen's disease") - an early form of skin cancer |
bcc | Basal cell carcinoma - the most common type of skin cancer |
bkl | Benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses) - common and benign |
df | Dermatofibroma - common and benign |
mel | Melanoma - a type of skin cancer involving the melanin cells |
nv | Melanocytic nevus - the medical term for a mole (benign) |
vasc | Vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage) (benign) |
dx_type
column (the diagnosis methods)And the table below present the values of the column dx_type
.
Value | Description |
---|---|
histo | Histopathology |
follow_up | Follow-up examination |
consensus | Expert consensus |
confocal | In-vivo confocal microscopy |
This is the Skin Cancer MNIST: HAM10000 dataset, but resized and pickled. The images are resized from 600x450 pixels down to 40x30 pixels and then pickled using Python's pickle
library.
pickle5
is required to unpickle the dataset, so make sure to add !pip3 install pickle5
and import pickle5 as pickle
to your notebook.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Este dataset é uma versão adaptada do “Skin Cancer MNIST” - HAM10000 original - convertida de uma tarefa de classificação para detecção de lesões cutâneas. Ele contém:
- 10.000 imagens de lesões de pele humana anotadas manualmente com bounding boxes.
- Divisão em 7 classes principais de lesões cutâneas, incluindo:
1. Actinic keratoses and intraepithelial carcinoma/Bowen disease (akiec): Lesões pré-malignas.
2. Basal cell carcinoma (bcc): Tipo de câncer de pele com bom prognóstico.
3. Benign lesions of the keratosis type (bkl): Incluem lentigo solar, ceratose seborreica e ceratose liquenoide.
4. Dermatofibroma (df): Lesões benignas comuns.
5. Melanoma (mel): Lesão maligna com alta prioridade clínica.
6. Melanocytic nevi (nv): Lesões benignas melanocíticas muito comuns.
7. Vascular lesions (vasc): Incluem angiomas, angiokeratomas, granulomas piogênicos e hemorragias.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Dermatoscopic images usually depict a single skin lesion, but large scale datasets with available segmentations of affected areas are not available until now. Challenge segmentation data often suffered from being either too coarse or too noisy. This dataset provides 10015 binary segmentation masks based on FCN-created segmentations and hand-drawn lines, which together with the HAM10000 diagnosis metadata can be used for object detection or semantic segmentation.
This dataset contains binary segmentation masks as PNG-files of all HAM10000 dataset images. The area segments lesion area as evaluated by a single dermatologist (me). They were initiated with a FCN lesion segmentation model, where afterwards I went through all of them and either approved them, or corrected / redrew them with the free-hand selection tool in FIJI.
You can find the HAM10000 dataset images at the following places: - Harvard Dataverse: https://doi.org/10.7910/DVN/DBW86T - ISIC Archive Gallery: https://www.isic-archive.com - Kaggle Dataset Kernel (downsampled): https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000
If you use this data, please cite/refer to the publication I made these segmentation masks for...
...and the original source of the images:
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
Original Paper and Dataset here Kaggle dataset here
Introduction to datasets
Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We tackle this problem by releasing the HAM10000 ("Human Against Machine with 10000 training images") dataset.… See the full description on the dataset page: https://huggingface.co/datasets/karoladelk/ham1ok.
This dataset was created by Vu Ngoc Binh
This dataset was created by Hijibiji_Hijibiji
This dataset was created by gowtham_mandla
This dataset was created by Iulia-Georgiana Talpalariu
This dataset was created by OUAHABI Benhenni
This dataset was created by Dai Nguyen 2
This dataset was created by AntonRL124c
This dataset was created by VIVEK NARAYAN 21114108
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Aranya Saha
Released under MIT
This dataset was created by Jihaad Pangestu
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Skin Disease GAN-Generated and Original Images Lightweight Dataset
This dataset is a collection of skin disease images generated using a Generative Adversarial Network (GAN) approach. Specifically, a GAN was utilized with Stable Diffusion as the generator and a transformer-based discriminator to create realistic images of various skin diseases. The GAN approach enhances the accuracy and realism of the generated images, making this dataset a valuable resource for machine learning and computer vision applications in dermatology.
To create this dataset, a series of Low-Rank Adaptations (LoRAs) were generated for each disease category. These LoRAs were trained on the base dataset with 60 epochs and 30,000 steps using OneTrainer. Images were then generated for the following disease categories:
Due to the availability of ample public images, Melanoma was excluded from the generation process. The Fooocus API served as the generator within the GAN framework, creating images based on the LoRAs.
To ensure quality and accuracy, a transformer-based discriminator was employed to verify the generated images, classifying them into the correct disease categories.
The original base dataset used to create this GAN-based dataset includes reputable sources such as:
2019 HAM10000 Challenge - Kaggle - Google Images - Dermnet NZ - Bing Images - Yandex - Hellenic Atlas - Dermatological Atlas The LoRAs and their recommended weights for generating images are available for download on our CivitAi profile. You can refer to this profile for detailed instructions and access to the LoRAs used in this dataset.
Generated Images: High-quality images of skin diseases generated via GAN with Stable Diffusion, using transformer-based discrimination for accurate classification.
This dataset is suitable for:
Garcia-Espinosa, E. ., Ruiz-Castilla, J. S., & Garcia-Lamont, F. (2025). Generative AI and Transformers in Advanced Skin Lesion Classification applied on a mobile device. International Journal of Combinatorial Optimization Problems and Informatics, 16(2), 158–175. https://doi.org/10.61467/2007.1558.2025.v16i2.1078
Espinosa, E.G., Castilla, J.S.R., Lamont, F.G. (2025). Skin Disease Pre-diagnosis with Novel Visual Transformers. In: Figueroa-García, J.C., Hernández, G., Suero Pérez, D.F., Gaona García, E.E. (eds) Applied Computer Sciences in Engineering. WEA 2024. Communications in Computer and Information Science, vol 2222. Springer, Cham. https://doi.org/10.1007/978-3-031-74595-9_10
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Manodeep Ray
Released under CC0: Public Domain
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The source dataset and its full description may be accessed through the Harvard Dataverse, and should be cited as
Tschandl, Philipp, 2018, "The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions", https://doi.org/10.7910/DVN/DBW86T, Harvard Dataverse, V4, UNF:6:KCZFcBLiFE5ObWcTc2ZBOA== [fileUNF]
Note that the herein uploaded dataset does not contain all of the source material, namely the file ISIC2018_Task3_Test_NatureMedicine_AI_Interaction_Benefit.tab
- which contains data on a study involving human-computer collaboration - and the folder HAM10000_segmentations_lesion_tschandl
- containing binary segmentation masks of the training images. Still, in contrast to most of the HAM10000 datasets published in Kaggle, the current one includes the test dataset that was curated for the ISIC 2018 challenge (Task 3).
The uploaded dataset is comprised by 3 folders and 2 files, described in the table below.
Content | Type | Description |
---|---|---|
HAM10000_images_part_1 | folder | Part 1 of a set of training pictures |
HAM10000_images_part_2 | folder | Part 2 of a set of training pictures |
ISIC2018_Task3_Test_Images | folder | Set of test pictures |
HAM10000_metadata.csv | file | Metadata associated with the training data |
ISIC2018_Task3_Test_GroundTruth.csv | file | Metadata associated with the test data |
The training dataset (HAM10000_images_part_1
and HAM10000_images_part_2
) is called "HAM10000" meaning "Human Against Machine with 10000 training images"" (actually 10015 images) and it corresponds to a large collection of multi-source dermatoscopic RGB images (JPG) of common pigmented skin lesions. The test dataset (ISIC2018_Task3_Test_Images
) corresponds to 511 images. The files HAM10000_metadata.csv
and ISIC2018_Task3_Test_GroundTruth.csv
contain the respective metadata (data about the data) which further include other features and the labels.
Their structure of the metadata files follows the template presented by the table below.
Column | Type | Description |
---|---|---|
lesion_id | String | ID of the lesion case |
image_id | String | ID of an image (also the name of the respective JPG file) associated with that case |
dx | String | Label of that case |
dx_type | String | Method used for diagnosing that case |
age | Float | Age of the person associated with that case |
sex | String | Sex of the person associated with that case |
localization | String | Location of the lesion in the person body |
dataset | String | Reference from which the data was taken |
dx
column (the classes)The values that the column dx
may take are tabulated below.
Value | Description |
---|---|
akiec | Actinic keratoses and intraepithelial carcinoma (also called "Bowen's disease") - an early form of skin cancer |
bcc | Basal cell carcinoma - the most common type of skin cancer |
bkl | Benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses) - common and benign |
df | Dermatofibroma - common and benign |
mel | Melanoma - a type of skin cancer involving the melanin cells |
nv | Melanocytic nevus - the medical term for a mole (benign) |
vasc | Vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage) (benign) |
dx_type
column (the diagnosis methods)And the table below present the values of the column dx_type
.
Value | Description |
---|---|
histo | Histopathology |
follow_up | Follow-up examination |
consensus | Expert consensus |
confocal | In-vivo confocal microscopy |