6 datasets found

MedMNIST: Standardized Biomedical Images
kaggle.com
Updated Feb 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
"'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

MedMNIST Landscape :

https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

Key Features

###

Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

Starter Code: download more data and training

Github Page: https://github.com/MedMNIST/MedMNIST

My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

Acknowledgements

Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

License and Citation

The code is under Apache-2.0 License.

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...
MNIST dataset for Outliers Detection - [ MNIST4OD ]
figshare.com
application/gzip
Updated May 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Stilo; Bardh Prenkaj (2024). MNIST dataset for Outliers Detection - [ MNIST4OD ] [Dataset]. http://doi.org/10.6084/m9.figshare.9954986.v2
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9954986.v2
Dataset updated
May 17, 2024
Dataset provided by
Figsharehttp://figshare.com/
Authors
Giovanni Stilo; Bardh Prenkaj
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors.Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7594 | 784 | 10MNIST_1 | 8665 | 784 | 10MNIST_2 | 7689 | 784 | 10MNIST_3 | 7856 | 784 | 10MNIST_4 | 7507 | 784 | 10MNIST_5 | 6945 | 784 | 10MNIST_6 | 7564 | 784 | 10MNIST_7 | 8023 | 784 | 10MNIST_8 | 7508 | 784 | 10MNIST_9 | 7654 | 784 | 10
h
svhn
huggingface.co
Updated Mar 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Genius Society (2025). svhn [Dataset]. https://huggingface.co/datasets/Genius-Society/svhn
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 28, 2025
Dataset authored and provided by
Genius Society
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset card for SVHN

The Street View House Numbers (SVHN) dataset is a real-world image dataset developed and designed for machine learning and object recognition algorithms, and is characterized by low data preprocessing and formatting requirements. Similar to MNIST, SVHN contains images of small cropped numbers, but in terms of labeled data, SVHN is an order of magnitude larger than MNIST, comprising over 600,000 digital images. Unlike MNIST, SVHN deals with a much more… See the full description on the dataset page: https://huggingface.co/datasets/Genius-Society/svhn.
a
The PatchCamelyon benchmark dataset (PCAM)
academictorrents.com
bittorrent
Updated Nov 13, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bas Veeling (2018). The PatchCamelyon benchmark dataset (PCAM) [Dataset]. https://academictorrents.com/details/1561a180b11d4b746273b5ce46772ad36f1229b6
Explore at:
bittorrent(8061211742)Available download formats
Dataset updated
Nov 13, 2018
Dataset authored and provided by
Bas Veeling
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
The PatchCamelyon benchmark is a new and challenging image classification dataset. It consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annoted with a binary label indicating presence of metastatic tissue. PCam provides a new benchmark for machine learning models: bigger than CIFAR10, smaller than imagenet, trainable on a single GPU. ## Why PCam Fundamental machine learning advancements are predominantly evaluated on straight-forward natural-image classification datasets. Think MNIST, CIFAR, SVHN. Medical imaging is becoming one of the major applications of ML and we believe it deserves a spot on the list of go-to ML datasets. Both to challenge future work, and to steer developments into directions that are beneficial for this domain. We think PCam can play a role in this. It packs the clinically-relevant task of metastasis detection into a straight-forward binary image classification task, akin to CIFAR-10 and MNIST
d
Data from: Label-noise reduction with support vector machines
search.dataone.org
dataone.org
+1more
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daly, Kendra (2025). Label-noise reduction with support vector machines [Dataset]. http://doi.org/10.7266/N71834DN
Explore at:
Unique identifier
https://doi.org/10.7266/N71834DN
Dataset updated
Feb 5, 2025
Dataset provided by
GRIIDC
Authors
Daly, Kendra
Description
This dataset reports data generated through the investigation of the problem of detection of label-noise in large pattern-recognition datasets. Specifically, results of label-noise reduction on two datasets are reported. The University of California, Irvine (UCI) Letter Recognition dataset and the Mixed National Institute of Standards and Technology (MNIST) Digit Recognition dataset was used to train an algorithm. The algorithm was then tested a Plankton dataset collected by the SIPPER (Shadow Imaging Particle Profiler and Evaluation Recorder) camera imaging system during trips to the site of the Deepwater Horizon Oil Spill. The experiment with the Plankton dataset represented a more practical application of data cleansing, because the label-noise naturally occurred in the data. Data presented in this database include the noise detected for four classes, total noise detected, and the percent accumulative noise detected for both the UCI and MNIST datasets. SIPPER data are not reported in this dataset. SIPPER data can be found in dataset R1.x130.000:0002 "Application of image processing and machine learning techniques to distinguish suspected oil droplets from plankton and other particles for the SIPPER imaging system". On a dataset that contained images of plankton with inadvertent noise, the new algorithm was able to detect all incorrect samples in the class of interest by reviewing only 5% of the data. Thus, the described approach helps to significantly reduce the effort needed to remove label-noise from data. Data published in: Felatyev, S., M. Shreve, K. Kramer, L. Hall, D. Goldgof, R. Kasturi, K. Daly, A. Remsen, H. Bunke. 2012. Label-Noise Reduction with Support Vector Machines. International Conference on Pattern Recognition (ICPR), November 2012, Tsukuba Science City, Japan.
Fruits-360 dataset
kaggle.com
paperswithcode.com
+1more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihai Oltean (2025). Fruits-360 dataset [Dataset]. https://www.kaggle.com/datasets/moltean/fruits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mihai Oltean
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).

Branches

The dataset has 5 major branches:

-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.

-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.

-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.

-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.

-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.

How to cite

Mihai Oltean, Fruits-360 dataset, 2017-

Dataset properties

For the 100x100 branch

Total number of images: 138704.

Training set size: 103993 images.

Test set size: 34711 images.

Number of classes: 206 (fruits, vegetables, nuts and seeds).

Image size: 100x100 pixels.

For the original-size branch

Total number of images: 58363.

Training set size: 29222 images.

Validation set size: 14614 images

Test set size: 14527 images.

Number of classes: 90 (fruits, vegetables, nuts and seeds).

Image size: various (original, captured, size) pixels.

For the 3-body-problem branch

Total number of images: 47033.

Training set size: 34800 images.

Test set size: 12233 images.

Number of classes: 3 (Apples, Cherries, Tomatoes).

Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.

Image size: 100x100 pixels.

For the meta branch

Number of classes: 26 (fruits, vegetables, nuts and seeds).

For the multi branch

Number of images: 150.

Filename format:

For the 100x100 branch

image_index_100.jpg (e.g. 31_100.jpg) or

r_image_index_100.jpg (e.g. r_31_100.jpg) or

r?_image_index_100.jpg (e.g. r2_31_100.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).

Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.

For the original-size branch

r?_image_index.jpg (e.g. r2_31.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.

The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.

For the multi branch

The file's name is the concatenation of the names of the fruits inside that picture.

Alternate download

The Fruits-360 dataset can be downloaded from:

Kaggle https://www.kaggle.com/moltean/fruits

GitHub https://github.com/fruits-360

How fruits were filmed

Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.

A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.

Behind the fruits, we placed a white sheet of paper as a background.

Here i...
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist

MedMNIST: Standardized Biomedical Images

A large-scale MNIST-like collection

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 2, 2024

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Möbius

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

"'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

MedMNIST Landscape :

https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

Key Features

###

Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

Starter Code: download more data and training

Github Page: https://github.com/MedMNIST/MedMNIST

My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

Acknowledgements

Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

License and Citation

The code is under Apache-2.0 License.

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...

Clear search

Close search

Google apps

Main menu

MedMNIST: Standardized Biomedical Images

Key Features

Starter Code: download more data and training

Acknowledgements

License and Citation

MNIST dataset for Outliers Detection - [ MNIST4OD ]

svhn

The PatchCamelyon benchmark dataset (PCAM)

Data from: Label-noise reduction with support vector machines

Fruits-360 dataset

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

Branches

How to cite

Dataset properties

For the 100x100 branch

For the original-size branch

For the 3-body-problem branch

For the meta branch

For the multi branch

Filename format:

For the 100x100 branch

For the original-size branch

For the multi branch

Alternate download

How fruits were filmed

MedMNIST: Standardized Biomedical Images

A large-scale MNIST-like collection

Key Features

Starter Code: download more data and training

Acknowledgements

License and Citation