5 datasets found

Z
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological...
data.niaid.nih.gov
zenodo.org
Updated Jul 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abhishek, Kumar (2024). Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11101337
Explore at:
Dataset updated
Jul 14, 2024
Dataset provided by
Jain, Aditi
Hamarneh, Ghassan
Abhishek, Kumar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition. In this paper, we conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our analysis pipeline and the accompanying code publicly available, we aim to encourage similar explorations and to facilitate the identification and addressing of potential data quality issues in other large datasets.

Citation

If you find this project useful or if you use our newly proposed datasets and/or our analyses, please cite our paper.

Kumar Abhishek, Aditi Jain, Ghassan Hamarneh. "Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets". arXiv preprint arXiv:2401.14497, 2024. DOI: 10.48550/ARXIV.2401.14497.

The corresponding BibTeX entry is:

@article{abhishek2024investigating, title={Investigating the Quality of {DermaMNIST} and {Fitzpatrick17k} Dermatological Image Datasets}, author={Abhishek, Kumar and Jain, Aditi and Hamarneh, Ghassan}, journal={arXiv preprint arXiv:2401.14497}, doi = {10.48550/ARXIV.2401.14497}, url = {https://arxiv.org/abs/2401.14497}, year={2024}}

Project Website

The results of the analysis, including the visualizations, are available on the project website: https://derm.cs.sfu.ca/critique/.

Code

The accompanying code for this project is hosted on GitHub at https://github.com/kakumarabhishek/Corrected-Skin-Image-Datasets.

License

The metadata files (DermaMNIST-C.csv, DermaMNIST-E.csv, Fitzpatrick17k_DiagnosisMapping.xlsx,Fitzpatrick17k-C.csv) contained in this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

The NPZ files associated with DermaMNIST-C (dermamnist_corrected_28.npz, dermamnist_corrected_224.npz) and DermaMNIST-E (dermamnist_extended_28.npz, dermamnist_extended_224.npz) contained in this repository are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

The code hosted on GitHub is licensed under the Apache License 2.0.
h
DermaMNIST-C
huggingface.co
Updated Jun 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kumar Abhishek (2025). DermaMNIST-C [Dataset]. https://huggingface.co/datasets/kabhishe/DermaMNIST-C
Explore at:
Dataset updated
Jun 21, 2025
Authors
Kumar Abhishek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
kabhishe/DermaMNIST-C dataset hosted on Hugging Face and contributed by the HF Datasets community
Z
Data from: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark...
data.niaid.nih.gov
explore.openaire.eu
Updated Apr 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiancheng Yang (2023). MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4269851
Explore at:
Dataset updated
Apr 19, 2023
Dataset provided by
Rui Shi
Bingbing Ni
Jiancheng Yang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data repository for MedMNIST v1 is out of date! Please check the latest version of MedMNIST v2.

Abstract

We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images, which requires no background knowledge. Covering the primary data modalities in medical image analysis, it is diverse on data scale (from 100 to 100,000) and tasks (binary/multi-class, ordinal regression and multi-label). MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis. Moreover, MedMNIST Classification Decathlon is designed to benchmark AutoML algorithms on all 10 datasets; We have compared several baseline methods, including open-source or commercial AutoML tools. The datasets, evaluation code and baseline methods for MedMNIST are publicly available at https://medmnist.github.io/.

Please note that this dataset is NOT intended for clinical use.

We recommend our official code to download, parse and use the MedMNIST dataset:

pip install medmnist

Citation and Licenses

If you find this project useful, please cite our ISBI'21 paper as: Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis," arXiv preprint arXiv:2010.14925, 2020.

or using bibtex: @article{medmnist, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, journal={arXiv preprint arXiv:2010.14925}, year={2020} }

Besides, please cite the corresponding paper if you use any subset of MedMNIST. Each subset uses the same license as that of the source dataset.

PathMNIST

Jakob Nikolas Kather, Johannes Krisam, et al., "Predicting survival from colorectal cancer histology slides using deep learning: A retrospective multicenter study," PLOS Medicine, vol. 16, no. 1, pp. 1–22, 01 2019.

License: CC BY 4.0

ChestMNIST

Xiaosong Wang, Yifan Peng, et al., "Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases," in CVPR, 2017, pp. 3462–3471.

License: CC0 1.0

DermaMNIST

Philipp Tschandl, Cliff Rosendahl, and Harald Kittler, "The ham10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions," Scientific data, vol. 5, pp. 180161, 2018.

Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, and Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; arXiv:1902.03368.

License: CC BY-NC 4.0

OCTMNIST/PneumoniaMNIST

Daniel S. Kermany, Michael Goldbaum, et al., "Identifying medical diagnoses and treatable diseases by image-based deep learning," Cell, vol. 172, no. 5, pp. 1122 – 1131.e9, 2018.

License: CC BY 4.0

RetinaMNIST

DeepDR Diabetic Retinopathy Image Dataset (DeepDRiD), "The 2nd diabetic retinopathy – grading and image quality estimation challenge," https://isbi.deepdr.org/data.html, 2020.

License: CC BY 4.0

BreastMNIST

Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy, "Dataset of breast ultrasound images," Data in Brief, vol. 28, pp. 104863, 2020.

License: CC BY 4.0

OrganMNIST_{Axial,Coronal,Sagittal}

Patrick Bilic, Patrick Ferdinand Christ, et al., "The liver tumor segmentation benchmark (lits)," arXiv preprint arXiv:1901.04056, 2019.

Xuanang Xu, Fugen Zhou, et al., "Efficient multiple organ localization in ct image using 3d region proposal network," IEEE Transactions on Medical Imaging, vol. 38, no. 8, pp. 1885–1898, 2019.

License: CC BY 4.0
o
Data from: MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D...
explore.openaire.eu
zenodo.org
Updated Aug 16, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiancheng Yang; Rui Shi; Donglai Wei; Zequan Liu; Lin Zhao; Bilian Ke; Hanspeter Pfister; Bingbing Ni (2021). MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification [Dataset]. http://doi.org/10.5281/zenodo.6496656
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.6496656
Dataset updated
Aug 16, 2021
Authors
Jiancheng Yang; Rui Shi; Donglai Wei; Zequan Liu; Lin Zhao; Bilian Ke; Hanspeter Pfister; Bingbing Ni
Description
Abstract We introduce MedMNIST v2, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST v2 is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of 708,069 2D images and 10,214 3D images in total, could support numerous research / educational purposes in biomedical image analysis, computer vision and machine learning. We benchmark several baseline methods on MedMNIST v2, including 2D / 3D neural networks and open-source / commercial AutoML tools. The data and code are publicly available at https://medmnist.com/. Note: This dataset is NOT intended for clinical use. We recommend our official code to download, parse and use the MedMNIST dataset: pip install medmnist Citation If you find this project useful, please cite both v1 and v2 paper as: Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, Bingbing Ni. Yang, Jiancheng, et al. "MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification." Scientific Data, 2023. Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis". IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021. or using bibtex: @article{medmnistv2, title={MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification}, author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing}, journal={Scientific Data}, volume={10}, number={1}, pages={41}, year={2023}, publisher={Nature Publishing Group UK London} } @inproceedings{medmnistv1, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={191--195}, year={2021} } Please also cite the corresponding paper(s) of source data if you use any subset of MedMNIST as per the description on the project website. License The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), except DermaMNIST under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). The code is under Apache-2.0 License. Changelog v2.1 (this repository): We have fixed the mistake in the file of NoduleMNIST3D (i.e., nodulemnist3d.npz). More details in this issue. v2.0: Initial repository of MedMNIST v2, add 6 datasets for 3D and 2 for 2D. v1.0: Initial repository of MedMNIST v1, 10 datasets for 2D.
Z
[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...
data.niaid.nih.gov
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rui Shi (2024). [MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification with Multiple Size Options: 28 (MNIST-Like), 64, 128, and 224 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5208229
Explore at:
Dataset updated
Nov 28, 2024
Dataset provided by
Donglai Wei
Rui Shi
Hanspeter Pfister
Bingbing Ni
Bilian Ke
Jiancheng Yang
Lin Zhao
Zequan Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code [GitHub] | Publication [Nature Scientific Data'23 / ISBI'21] | Preprint [arXiv]

Abstract

We introduce MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning. We benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools. The data and code are publicly available at https://medmnist.com/.

Disclaimer: The only official distribution link for the MedMNIST dataset is Zenodo. We kindly request users to refer to this original dataset link for accurate and up-to-date data.

Update: We are thrilled to release MedMNIST+ with larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D. As a complement to the previous 28-size MedMNIST, the large-size version could serve as a standardized benchmark for medical foundation models. Install the latest API to try it out!

Python Usage

We recommend our official code to download, parse and use the MedMNIST dataset:

% pip install medmnist% python

To use the standard 28-size (MNIST-like) version utilizing the downloaded files:

from medmnist import PathMNIST

train_dataset = PathMNIST(split="train")

To enable automatic downloading by setting download=True:

from medmnist import NoduleMNIST3D

val_dataset = NoduleMNIST3D(split="val", download=True)

Alternatively, you can access MedMNIST+ with larger image sizes by specifying the size parameter:

from medmnist import ChestMNIST

test_dataset = ChestMNIST(split="test", download=True, size=224)

Citation

If you find this project useful, please cite both v1 and v2 paper as:

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, Bingbing Ni. Yang, Jiancheng, et al. "MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification." Scientific Data, 2023.

Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis". IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.

or using bibtex:

@article{medmnistv2, title={MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification}, author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing}, journal={Scientific Data}, volume={10}, number={1}, pages={41}, year={2023}, publisher={Nature Publishing Group UK London} }

@inproceedings{medmnistv1, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={191--195}, year={2021} }

Please also cite the corresponding paper(s) of source data if you use any subset of MedMNIST as per the description on the project website.

License

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), except DermaMNIST under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

The code is under Apache-2.0 License.

Changelog

v3.0 (this repository): Released MedMNIST+ featuring larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D.

v2.2: Removed a small number of mistakenly included blank samples in OrganAMNIST, OrganCMNIST, OrganSMNIST, OrganMNIST3D, and VesselMNIST3D.

v2.1: Addressed an issue in the NoduleMNIST3D file (i.e., nodulemnist3d.npz). Further details can be found in this issue.

v2.0: Launched the initial repository of MedMNIST v2, adding 6 datasets for 3D and 2 for 2D.

v1.0: Established the initial repository (in a separate repository) of MedMNIST v1, featuring 10 datasets for 2D.

Note: This dataset is NOT intended for clinical use.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Abhishek, Kumar (2024). Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11101337

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets

Explore at:

Dataset updated

Jul 14, 2024

Dataset provided by

Jain, Aditi
Hamarneh, Ghassan
Abhishek, Kumar

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract

The remarkable progress of deep learning in dermatological tasks has brought us closer to achieving diagnostic accuracies comparable to those of human experts. However, while large datasets play a crucial role in the development of reliable deep neural network models, the quality of data therein and their correct usage are of paramount importance. Several factors can impact data quality, such as the presence of duplicates, data leakage across train-test partitions, mislabeled images, and the absence of a well-defined test partition. In this paper, we conduct meticulous analyses of three popular dermatological image datasets: DermaMNIST, its source HAM10000, and Fitzpatrick17k, uncovering these data quality issues, measure the effects of these problems on the benchmark results, and propose corrections to the datasets. Besides ensuring the reproducibility of our analysis, by making our analysis pipeline and the accompanying code publicly available, we aim to encourage similar explorations and to facilitate the identification and addressing of potential data quality issues in other large datasets.

Citation

If you find this project useful or if you use our newly proposed datasets and/or our analyses, please cite our paper.

Kumar Abhishek, Aditi Jain, Ghassan Hamarneh. "Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets". arXiv preprint arXiv:2401.14497, 2024. DOI: 10.48550/ARXIV.2401.14497.

The corresponding BibTeX entry is:

@article{abhishek2024investigating, title={Investigating the Quality of {DermaMNIST} and {Fitzpatrick17k} Dermatological Image Datasets}, author={Abhishek, Kumar and Jain, Aditi and Hamarneh, Ghassan}, journal={arXiv preprint arXiv:2401.14497}, doi = {10.48550/ARXIV.2401.14497}, url = {https://arxiv.org/abs/2401.14497}, year={2024}}

Project Website

The results of the analysis, including the visualizations, are available on the project website: https://derm.cs.sfu.ca/critique/.

Code

The accompanying code for this project is hosted on GitHub at https://github.com/kakumarabhishek/Corrected-Skin-Image-Datasets.

License

The metadata files (DermaMNIST-C.csv, DermaMNIST-E.csv, Fitzpatrick17k_DiagnosisMapping.xlsx,Fitzpatrick17k-C.csv) contained in this repository are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

The NPZ files associated with DermaMNIST-C (dermamnist_corrected_28.npz, dermamnist_corrected_224.npz) and DermaMNIST-E (dermamnist_extended_28.npz, dermamnist_extended_224.npz) contained in this repository are licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) License.

The code hosted on GitHub is licensed under the Apache License 2.0.

Clear search

Close search

Google apps

Main menu

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological...

DermaMNIST-C

Data from: MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark...

Data from: MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D...

[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...

Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets