13 datasets found

T
mnist
tensorflow.org
universe.roboflow.com
+4more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
h
MNIST
huggingface.co
Updated Mar 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Graph Datasets (2023). MNIST [Dataset]. https://huggingface.co/datasets/graphs-datasets/MNIST
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 2, 2023
Dataset authored and provided by
Graph Datasets
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for MNIST

Dataset Summary

The MNIST dataset consists of 55000 images in 10 classes, represented as graphs. It comes from a computer vision dataset.

Supported Tasks and Leaderboards

MNIST should be used for multiclass graph classification.

External Use PyGeometric

To load in PyGeometric, do the following: from datasets import load_dataset

from torch_geometric.data import Data from torch_geometric.loader import DataLoader… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/MNIST.

MNIST as PNG

kaggle.com

zip

Updated Jul 17, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Ben Gorman (2024). MNIST as PNG [Dataset]. https://www.kaggle.com/datasets/ben519/mnist-as-png

Explore at:

zip(32971223 bytes)Available download formats

Dataset updated

Jul 17, 2024

Authors

Ben Gorman

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

[MNIST](https://en.wikipedia.org/wiki/MNIST_database#:~:text=The%20MNIST%20database%20(Modified%20National,training%20various%20image%20processing%20systems.) data in PNG format, derived directly from MNIST in CSV.

The data contains 60,000 labelled train samples and 10,000 labelled test samples. Each sample is a 28x28 grayscale PNG image.

Unzipped directory structure 👇

test/
 0/
  test_image_3.png
  test_image_10.png
  test_image_13.png
  ...
 1/
  test_image_2.png
  test_image_5.png
  test_image_14.png
  ...
 ...
 9/

train/
 0/
  train_image_1.png
  train_image_21.png
  train_image_34.png
  ...
 1/
 ...
 9/

Data collection script

import pandas as pd
from PIL import Image

mnist_train = pd.read_csv("mnist-csv/mnist_train.csv")
mnist_test = pd.read_csv("mnist-csv/mnist_test.csv")

for i in range(10):

  # Convert the training data to png
  train_i = mnist_train.loc[mnist_train.label == i]
  for index, row in train_i.iterrows():
    X = row[1:].to_numpy().reshape(28, 28)
    filepath = (
      f"mnist-png/train/{i}/train_image_{index}.png"
    )
    img = Image.fromarray(X.astype("uint8"), mode="L")
    img.save(filepath)

  # Convert the test data to png
  test_i = mnist_test.loc[mnist_test.label == i]
  for index, row in test_i.iterrows():
    X = row[1:].to_numpy().reshape(28, 28)
    filepath = f"mnist-png/test/{i}/test_image_{index}.png"
    img = Image.fromarray(X.astype("uint8"), mode="L")
    img.save(filepath)

T
fashion_mnist
tensorflow.org
opendatalab.com
+3more
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). fashion_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/fashion_mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('fashion_mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
T
moving_mnist
tensorflow.org
opendatalab.com
Updated Nov 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). moving_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/moving_mnist
Explore at:
Dataset updated
Nov 23, 2022
Description
Moving variant of MNIST database of handwritten digits. This is the data used by the authors for reporting model performance. See tfds.video.moving_mnist.image_as_moving_sequence for generating training/validation data from the MNIST dataset.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('moving_mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.
The "mnist" dataset in csv format
kaggle.com
zip
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aditya Anil Kashyap (2023). The "mnist" dataset in csv format [Dataset]. https://www.kaggle.com/datasets/adityaanilkashyap/the-mnist-dataset-in-csv-format/code
Explore at:
zip(15966440 bytes)Available download formats
Dataset updated
Apr 6, 2023
Authors
Aditya Anil Kashyap
Description
Contents of the dataset

The dataset contains 70000 images (train + test), each with 784 pixels and 70000 labels The dimensions of the csv file are: 70000 x 785 with the first column being the target variable

How to read the dataset

import numpy as np import pandas as pd

df = pd.read_csv(mnist.csv, header=None) y = np.array(df.iloc[:, 0]) # The 0th column is the target variable, y.shape yields (70000, ) X = np.array(df.iloc[:, 1:]) # The rest of the columns are the input data (pixel values) X = X.reshape((X.shape[0], int(np.sqrt(X.shape[1])), int(np.sqrt(X.shape[1])))) # X.shape yields (70000, 28, 28)
QMNIST - The Extended MNIST Dataset (120k images)
kaggle.com
zip
Updated Jul 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fedesoriano (2021). QMNIST - The Extended MNIST Dataset (120k images) [Dataset]. https://www.kaggle.com/fedesoriano/qmnist-the-extended-mnist-dataset-120k-images
Explore at:
zip(19858130 bytes)Available download formats
Dataset updated
Jul 24, 2021
Authors
fedesoriano
Description
Context

The exact preprocessing steps used to construct the MNIST dataset have long been lost. This leaves us with no reliable way to associate its characters with the ID of the writer and little hope to recover the full MNIST testing set that had 60K images but was never released. The official MNIST testing set only contains 10K randomly sampled images and is often considered too small to provide meaningful confidence intervals.

The QMNIST dataset was generated from the original data found in the NIST Special Database 19 with the goal to match the MNIST preprocessing as closely as possible.

Content

The simplest way to use the QMNIST extended dataset is to download the unique file below (MNIST-120k). This pickle file has the same format as the standard MNIST data files but contains 120000 examples.

You can use the following lines of code to load the data: def unpickle(file): import pickle with open(file, 'rb') as fo: dict = pickle.load(fo, encoding='bytes') return dict qmnist = unpickle("MNIST-120k")

The data comes in a dictionary format, you can get the data and the labels separately by extracting the content from the dictionary: data = qmnist['data'] labels = qmnist['labels']

Source

The original QMNIST dataset was uploaded by Chhavi Yadav and Léon Bottou. Citation:

Yadav, C. and Bottou, L., “Cold Case: The Lost MNIST Digits”, arXiv e-prints, 2019.

Link to the original paper: https://arxiv.org/pdf/1905.10498.pdf Link to the GitHub repository: https://github.com/facebookresearch/qmnist

My contribution was to collect all the images and labels into the same file and convert it into a pickle file so it is easier to load. Please consider mentioning the author if you use this dataset instead of the original version.
T
kmnist
tensorflow.org
datasets.activeloop.ai
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). kmnist [Dataset]. https://www.tensorflow.org/datasets/catalog/kmnist
Explore at:
Dataset updated
Jun 1, 2024
Description
Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images), provided in the original MNIST format as well as a NumPy format. Since MNIST restricts us to 10 classes, we chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('kmnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/kmnist-3.0.1.png" alt="Visualization" width="500px">
Z
[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...
data.niaid.nih.gov
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiancheng Yang; Rui Shi; Donglai Wei; Zequan Liu; Lin Zhao; Bilian Ke; Hanspeter Pfister; Bingbing Ni (2024). [MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image Classification with Multiple Size Options: 28 (MNIST-Like), 64, 128, and 224 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5208229
Explore at:
Dataset updated
Nov 28, 2024
Dataset provided by
Shanghai Jiao Tong University
Zhongshan Hospital Affiliated to Fudan University
Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine
RWTH Aachen University
Harvard University
Authors
Jiancheng Yang; Rui Shi; Donglai Wei; Zequan Liu; Lin Zhao; Bilian Ke; Hanspeter Pfister; Bingbing Ni
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Code [GitHub] | Publication [Nature Scientific Data'23 / ISBI'21] | Preprint [arXiv]

Abstract

We introduce MedMNIST, a large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning. We benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools. The data and code are publicly available at https://medmnist.com/.

Disclaimer: The only official distribution link for the MedMNIST dataset is Zenodo. We kindly request users to refer to this original dataset link for accurate and up-to-date data.

Update: We are thrilled to release MedMNIST+ with larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D. As a complement to the previous 28-size MedMNIST, the large-size version could serve as a standardized benchmark for medical foundation models. Install the latest API to try it out!

Python Usage

We recommend our official code to download, parse and use the MedMNIST dataset:

% pip install medmnist% python

To use the standard 28-size (MNIST-like) version utilizing the downloaded files:

from medmnist import PathMNIST

train_dataset = PathMNIST(split="train")

To enable automatic downloading by setting download=True:

from medmnist import NoduleMNIST3D

val_dataset = NoduleMNIST3D(split="val", download=True)

Alternatively, you can access MedMNIST+ with larger image sizes by specifying the size parameter:

from medmnist import ChestMNIST

test_dataset = ChestMNIST(split="test", download=True, size=224)

Citation

If you find this project useful, please cite both v1 and v2 paper as:

Jiancheng Yang, Rui Shi, Donglai Wei, Zequan Liu, Lin Zhao, Bilian Ke, Hanspeter Pfister, Bingbing Ni. Yang, Jiancheng, et al. "MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification." Scientific Data, 2023.

Jiancheng Yang, Rui Shi, Bingbing Ni. "MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis". IEEE 18th International Symposium on Biomedical Imaging (ISBI), 2021.

or using bibtex:

@article{medmnistv2, title={MedMNIST v2-A large-scale lightweight benchmark for 2D and 3D biomedical image classification}, author={Yang, Jiancheng and Shi, Rui and Wei, Donglai and Liu, Zequan and Zhao, Lin and Ke, Bilian and Pfister, Hanspeter and Ni, Bingbing}, journal={Scientific Data}, volume={10}, number={1}, pages={41}, year={2023}, publisher={Nature Publishing Group UK London} }

@inproceedings{medmnistv1, title={MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis}, author={Yang, Jiancheng and Shi, Rui and Ni, Bingbing}, booktitle={IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={191--195}, year={2021} }

Please also cite the corresponding paper(s) of source data if you use any subset of MedMNIST as per the description on the project website.

License

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0), except DermaMNIST under Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

The code is under Apache-2.0 License.

Changelog

v3.0 (this repository): Released MedMNIST+ featuring larger sizes: 64x64, 128x128, and 224x224 for 2D, and 64x64x64 for 3D.

v2.2: Removed a small number of mistakenly included blank samples in OrganAMNIST, OrganCMNIST, OrganSMNIST, OrganMNIST3D, and VesselMNIST3D.

v2.1: Addressed an issue in the NoduleMNIST3D file (i.e., nodulemnist3d.npz). Further details can be found in this issue.

v2.0: Launched the initial repository of MedMNIST v2, adding 6 datasets for 3D and 2 for 2D.

v1.0: Established the initial repository (in a separate repository) of MedMNIST v1, featuring 10 datasets for 2D.

Note: This dataset is NOT intended for clinical use.
T
emnist
tensorflow.org
datasets.activeloop.ai
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). emnist [Dataset]. https://www.tensorflow.org/datasets/catalog/emnist
Explore at:
Dataset updated
Jun 1, 2024
Description
The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset.

Note: Like the original EMNIST data, images provided here are inverted horizontally and rotated 90 anti-clockwise. You can use tf.transpose within ds.map to convert the images to a human-friendlier format.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('emnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/emnist-byclass-3.1.0.png" alt="Visualization" width="500px">
Gisette Dataset (MNIST digits 4 and 9)
kaggle.com
zip
Updated Aug 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
fedesoriano (2022). Gisette Dataset (MNIST digits 4 and 9) [Dataset]. https://www.kaggle.com/fedesoriano/gisette-dataset-mnist-digits-4-and-9
Explore at:
zip(28070012 bytes)Available download formats
Dataset updated
Aug 7, 2022
Authors
fedesoriano
Description
Similar Datasets

Hepatitis C Prediction Dataset: LINK

Cirrhosis Prediction Dataset: LINK

Boston House Prices-Advanced Regression Techniques: LINK

Wind Speed Prediction Dataset: LINK

Spanish Wine Quality Dataset: LINK

Context

GISETTE is a handwritten digit recognition problem. The problem is to separate the highly confusible digits '4' and '9'. This dataset is one of five datasets of the NIPS 2003 feature selection challenge.

The digits have been size-normalized and centered in a fixed-size image of dimension 28x28. The original data were modified for the purpose of the feature selection challenge. In particular, pixels were samples at random in the middle top part of the feature containing the information necessary to disambiguate 4 from 9 and higher order features were created as products of these pixels to plunge the problem in a higher dimensional feature space.

Content

The dataset consists of three sets: training, validation, and testing with 6000, 1000, and 6500 observations respectively. The dataset includes a total of 5000 features, 2500 of them with no predictive power. The order of the features and patterns were randomized. The task is to build a machine learning algorithm that is also capable of selecting the appropriate features.

You can use the following lines of code to load the data: def unpickle(file): !pip3 install pickle5 import pickle5 as pickle with open(file, 'rb') as fo: dict = pickle.load(fo) return dict gisette_data = unpickle("gisette.pickle")

The data comes in a dictionary format, you can get the data and the labels for each set (training, validation, testing) separately by extracting the content from the dictionary (there are no testing labels): train_data = gisette_data['training']['data'] train_labels = gisette_data['training']['labels']

Note: when label =1 that indicates the digit '4', label =-1 indicates the digit '9'

Source

a. Original owners The data set was constructed from the MNIST data that is made available by Yann LeCun and Corinna Cortes at http://yann.lecun.com/exdb/mnist/.

b. Donor of database This version of the database was prepared for the NIPS 2003 variable and feature selection benchmark by Isabelle Guyon, 955 Creston Road, Berkeley, CA 94708, USA (isabelle@clopinet.com).

My contribution was to collect all the images and labels into the same file and convert it into a pickle file so it is easier to load. Please consider mentioning the author if you use this dataset instead of the original version.
MIEDT dataset
kaggle.com
Updated Jan 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
机关鸢鸟 (2025). MIEDT dataset [Dataset]. https://www.kaggle.com/datasets/lidang78/miedt-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
机关鸢鸟
Description
Dataset Overview This dataset is organized based on the edge detection task, aiming to provide rich image resources and corresponding edge detection annotation information for related research and applications, which can be used for the testing of edge detection algorithms. In order to evaluate the performance of the edge detection method comprehensively, we created the Medical Image Edge Detection Test (MIEDT) dataset. The MIEDT contains 100 medical images, which were randomly selected from three publicly available datasets, Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 .

Data Set Structure Original image: This folder stores the original image data. It contains 15 Head CT images in PNG format with varying image resolutions; 25 coronary heart disease images in JPG format and with an image resolution of [1024 * 1024]; 60 skin images in JPG format and with an image resolution of [600 * 450]. It covers a variety of medical image materials with different imaging and contrast, providing diverse input data for edge detection algorithms. Ground truth：The data in this folder are the edge detection annotation images corresponding to the images in the "Originals" folder. They are in PNG format. In these images, the white pixels represent the edge parts of the image, and the black pixels represent the non-edge areas. These annotation information accurately outlines the object contours and edge features in the original images.

Usage Instructions For users who conduct image processing using Python, they can utilize the cv2 (OpenCV) library to read image data. The sample code is as follows:

import cv2 original_image = cv2.imread('Original image/IMG-001.png') # Read original image ground_truth_image = cv2.imread('Ground truth/GT-001.png', cv2.IMREAD_GRAYSCALE) # Read the corresponding Ground Truth image When performing model training based on deep learning frameworks (such as TensorFlow, PyTorch), the dataset path can be configured into the corresponding dataset loading class according to the data loading mechanism of the framework to ensure that the model can correctly read and process the image and its annotation data.

4. Data Sources and References Data Sources: The original images are collected from public image datasets Head CT-hemorrhage, Coronary Artery Diseases DataSet, and Skin Cancer MNIST: HAM10000 to ensure the quality and diversity of the images. If you are using this dataset in academic research, please cite the following literature.

References: [1] Noel Codella, Veronica Rotemberg, Philipp Tschandl, M. Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, Harald Kittler, Allan Halpern: “Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)”, 2018; https://arxiv.org/abs/1902.03368

[2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018).

[3] Classification of Brain Hemorrhage Using Deep Learning from CT Scan Images - https://link.springer.com/chapter/10.1007/978-981-19-7528-8_15
T
cats_vs_dogs
tensorflow.org
universe.roboflow.com
+1more
Updated Dec 19, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). cats_vs_dogs [Dataset]. https://www.tensorflow.org/datasets/catalog/cats_vs_dogs
Explore at:
Dataset updated
Dec 19, 2023
Description
A large set of images of cats and dogs. There are 1738 corrupted images that are dropped.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('cats_vs_dogs', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/cats_vs_dogs-4.0.1.png" alt="Visualization" width="500px">
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist

mnist

Explore at:

89 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jun 1, 2024

Description

The MNIST database of handwritten digits.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">

Clear search

Close search

Google apps

Main menu

mnist

MNIST

MNIST as PNG

Unzipped directory structure 👇

Data collection script

fashion_mnist

moving_mnist

The "mnist" dataset in csv format

Contents of the dataset

How to read the dataset

QMNIST - The Extended MNIST Dataset (120k images)

Context

Content

Source

kmnist

[MedMNIST+] 18x Standardized Datasets for 2D and 3D Biomedical Image...

emnist

Gisette Dataset (MNIST digits 4 and 9)

Similar Datasets

Context

Content

Source

MIEDT dataset

cats_vs_dogs

mnist