The MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
The MNIST dataset is a dataset of handwritten digits. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 60,000 training images and 10,000 test images. Each image is a 28x28 pixel grayscale image of a handwritten digit. The digits are labeled from 0 to 9.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('fashion_mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/fashion_mnist-3.0.1.png" alt="Visualization" width="500px">
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Card for MNIST
Dataset Summary
The MNIST dataset consists of 55000 images in 10 classes, represented as graphs. It comes from a computer vision dataset.
Supported Tasks and Leaderboards
MNIST should be used for multiclass graph classification.
External Use
PyGeometric
To load in PyGeometric, do the following: from datasets import load_dataset
from torch_geometric.data import Data from torch_geometric.loader import DataLoader… See the full description on the dataset page: https://huggingface.co/datasets/graphs-datasets/MNIST.
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 (https://www.nist.gov/srd/nist-special-database-19) and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset (http://yann.lecun.com/exdb/mnist/). Further information on the dataset contents and conversion process can be found in the paper available at https://arxiv.org/abs/1702.05373v2
The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits.
The database is made available in original MNIST format and Matlab format.
The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64×64 pixels. The digits frequently intersect with each other and bounce off the edges of the frame
The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.
Accurate 3D point clouds can (easily and cheaply) be adquired nowdays from different sources:
However there is a lack of large 3D datasets (you can find a good one here based on triangular meshes); it's especially hard to find datasets based on point clouds (wich is the raw output from every 3D sensing device).
This dataset contains 3D point clouds generated from the original images of the MNIST dataset to bring a familiar introduction to 3D to people used to work with 2D datasets (images).
In the 3D_from_2D notebook you can find the code used to generate the dataset.
You can use the code in the notebook to generate a bigger 3D dataset from the original.
The entire dataset stored as 4096-D vectors obtained from the voxelization (x:16, y:16, z:16) of all the 3D point clouds.
In adition to the original point clouds, it contains randomly rotated copies with noise.
The full dataset is splitted into arrays:
Example python code reading the full dataset:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
X_train = hf["X_train"][:]
y_train = hf["y_train"][:]
X_test = hf["X_test"][:]
y_test = hf["y_test"][:]
5000 (train), and 1000 (test) 3D point clouds stored in HDF5 file format. The point clouds have zero mean and a maximum dimension range of 1.
Each file is divided into HDF5 groups
Each group is named as its corresponding array index in the original mnist dataset and it contains:
x, y, z
coordinates of each 3D point in the point cloud.nx, ny, nz
components of the unit normal associate to each point.Example python code reading 2 digits and storing some of the group content in tuples:
with h5py.File("../input/train_point_clouds.h5", "r") as hf:
a = hf["0"]
b = hf["1"]
digit_a = (a["img"][:], a["points"][:], a.attrs["label"])
digit_b = (b["img"][:], b["points"][:], b.attrs["label"])
Simple Python class that generates a grid of voxels from the 3D point cloud. Check kernel for use.
Module with functions to plot point clouds and voxelgrid inside jupyter notebook. You have to run this locally due to Kaggle's notebook lack of support to rendering Iframes. See github issue here
Functions included:
array_to_color
Converts 1D array to rgb values use as kwarg color
in plot_points()
plot_points(xyz, colors=None, size=0.1, axis=False)
plot_voxelgrid(v_grid, cmap="Oranges", axis=False)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset for magnetic particle imaging based on the MNIST dataset.
This dataset contains simulated MPI measurements along with ground truth phantoms selected from the https://yann.lecun.com/exdb/mnist/" target="_blank" rel="noopener">MNIST database of handwritten digits. A state-of-the-art model-based system matrix is used to simulate the MPI measurements of the MNIST phantoms. These measurements are equipped with noise perturbations captured by the preclinical MPI system (Bruker, Ettlingen, Germany). The dataset can be utilized in its provided form, while additional data is included to offer flexibility for creating customized versions.
MPI-MNIST features four different system matrices, each available in three spatial resolutions. The provided data is generated using a specified system matrix at highest spatial resolution. Reconstruction operations can be performed by using any of the provided system matrices at a lower resolution. This setup allows for simulating reconstructions from either an exact or an inexact forward operator. To cover further operator deviation setups, we provide additional noise data for the application of pixelwise noise to the reconstruction system matrix.
For supporting the development of learning-based methods, a large amount of further noise samples, captured by the Bruker scanner, is provided.
For a detailed description of the dataset, see arxiv.org/abs/2501.05583.
The Python-based GitHub repository available at https://github.com/meiraiske/MPI-MNIST" href="https://github.com/meiraiske/MPI-MNIST" target="_blank" rel="noopener">https://github.com/meiraiske/MPI-MNIST can be used for downloading the data from this website and preparing it for project use which includes an integration to PyTorch or PyTorch Lightning modules.
File Structure
All data, except for the phantoms, is provided in the MDF file format. This format is specifically tailored to store MPI data and contains metadata corresponding to the experimental setup. The ground truth phantoms are provided as HDF5 files since they do not require any metadata.
SM
: Contains twelve system matrices named SM_{physical model}_{resolution}.mdf
. It covers four physical models given in three resolutions ('coarse'
, 'int'
and 'fine'
). The highest resolution ('fine'
) is used for data generation.large_noise
: Contains large_NoiseMeas.mdf
with 390060 noise measurements. Each noise measurement has been averaged over ten empty scanner measurements. This can be used e.g. for learning-based methods. For dataset in ['train', 'test']
:
{dataset}_noise
: Contains four noise matrices, where each noise measurement has been averaged over ten empty scanner measurements: NoiseMeas_phantom_{dataset}.mdf
: Additive measurement noise for simulated measurements.NoiseMeas_phantom_bg_{dataset}.mdf
: Unused noise reserved for background correction of 1. NoiseMeas_SM_{dataset}.mdf
: System Matrix noise, that can be applied to each pixel of the reconstruction system matrix.NoiseMeas_SM_bg_{dataset}.mdf
: Unused noise reserved for background correction of 3. {dataset}_gt
: Contains {dataset}_gt.hdf5
with flattened and preprocessed ground truth MNIST phantoms given in coarse resolution (15x17=255 pixels) with pixel values in [0, 10]
.{dataset}_obs
: Contains {dataset}_obs.mdf
with noise free simulated measurements (observations) of {dataset}_gt.hdf5
using the system matrix stored in SM_fluid_opt_fine.mdf
.{dataset}_obsnoisy
: Contains {dataset}_obsnoisy.mdf
with noise contained simulated measurements, resulting from {dataset}_obs.mdf
and {dataset}_phantom_noise.mdf
.
In line with MNIST, each MDF/HDF5 file in {dataset}_gt
, {dataset}_obs
, {dataset}_obsnoisy
for dataset in ['train', 'test']
contains 60000 samples for 'train'
and 10000 samples for 'test'
. The data can be manually reproduced in the intermediate resolution (45x51=2295 pixels) from the files in this dataset using the system matrices in intermediate ('int'
) resolution for reconstruction and upsampling the ground truth phantoms by 3 pixels per dimension. This case is also implemented in the Github repository .
The PDF file MPI-MNIST_Metadata.pdf
contains a list of meta information for each of the MDF files of this dataset.
https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
The MNIST-100 dataset is a variation of the original MNIST dataset, consisting of 100 handwritten numbers extracted from the MNIST dataset. Unlike the traditional MNIST dataset, which contains 60,000 training images of digits from 0 to 9, the Modified MNIST-10 dataset focuses on 100 numbers. Dataset Overview:
Dataset Name: MNIST-100 Total Number of Images: train: 60000 test: 1000 Classes: 100 (Numbers from 00 to 99) Image Size: 28x56 pixels (grayscale)
Data Collection: The MNIST-100 dataset… See the full description on the dataset page: https://huggingface.co/datasets/marcin119a/mnist100.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
adna, ba, bha, cha, chha, chhya, da, daa, dha, dhaa, ga, gha, gya, ha, ja, jha, ka, kha, kna, la, ma, motosaw, na, pa, patalosaw, petchiryakha, pha, ra, taamatar, tabala, tha, thaa, tra, waw, yaw, yna
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
EMNIST MNIST Dataset
Authors
Gregory Cohen Saeed Afshar Jonathan Tapson Andre van Schaik
The MARCS Institute for Brain, Behaviour and DevelopmentWestern Sydney UniversityPenrith, Australia 2751 Email: g.cohen@westernsydney.edu.au
What is it?
The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 (NIST Special Database 19) and converted to a 28x28 pixel image format and dataset structure that directly… See the full description on the dataset page: https://huggingface.co/datasets/Royc30ne/emnist-mnist.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset containing images of handwritten english numerals from 0-9 obtained from National Institute of Standards and Technology. It consists of greyscale images of handwritten digits and consists of 60000 images of size 28*28 for training and 10000 images as test examples.
Dataset Card for "notMNIST"
Overview
The notMNIST dataset is a collection of images of letters from A to J in various fonts. It is designed as a more challenging alternative to the traditional MNIST dataset, which consists of handwritten digits. The notMNIST dataset is commonly used in machine learning and computer vision tasks for character recognition.
Dataset Information
Number of Classes: 10 (A to J) Number of Samples: 187,24 Image Size: 28 x 28 pixels… See the full description on the dataset page: https://huggingface.co/datasets/anubhavmaity/notMNIST.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The following dataset contains the MNIST dataset in stroke/point form. The data in this repository was based on the data obtained from the following project: https://github.com/edwin-de-jong/mnist-digits-stroke-sequence-data
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
🖼️ MNIST (Extracted from PyTorch Vision)
MNIST is a classic dataset of handwritten digits, widely used for image classification tasks in machine learning.
ℹ️ Dataset Details
📖 Dataset Description
The MNIST database of handwritten digits is a commonly used benchmark dataset in machine learning. It consists of 70,000 grayscale images of handwritten digits (0-9), each with a size of 28x28 pixels. The dataset is split into 60,000 training images and 10,000… See the full description on the dataset page: https://huggingface.co/datasets/p2pfl/MNIST.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains a simplified version of the famous MNIST handwritten digits dataset. This version involves distinguishing between digits 3 and 5 rather than the full range 0-9.
EMNIST (extended MNIST) has 4 times more data than MNIST. It is a set of handwritten digits with a 28 x 28 format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here we present a dataset, MNIST4OD, of large size (number of dimensions and number of instances) suitable for Outliers Detection task.The dataset is based on the famous MNIST dataset (http://yann.lecun.com/exdb/mnist/).We build MNIST4OD in the following way:To distinguish between outliers and inliers, we choose the images belonging to a digit as inliers (e.g. digit 1) and we sample with uniform probability on the remaining images as outliers such as their number is equal to 10% of that of inliers. We repeat this dataset generation process for all digits. For implementation simplicity we then flatten the images (28 X 28) into vectors.Each file MNIST_x.csv.gz contains the corresponding dataset where the inlier class is equal to x.The data contains one instance (vector) in each line where the last column represents the outlier label (yes/no) of the data point. The data contains also a column which indicates the original image class (0-9).See the following numbers for a complete list of the statistics of each datasets ( Name | Instances | Dimensions | Number of Outliers in % ):MNIST_0 | 7594 | 784 | 10MNIST_1 | 8665 | 784 | 10MNIST_2 | 7689 | 784 | 10MNIST_3 | 7856 | 784 | 10MNIST_4 | 7507 | 784 | 10MNIST_5 | 6945 | 784 | 10MNIST_6 | 7564 | 784 | 10MNIST_7 | 8023 | 784 | 10MNIST_8 | 7508 | 784 | 10MNIST_9 | 7654 | 784 | 10
The MNIST database of handwritten digits.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('mnist', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">