100+ datasets found

MNIST Dataset
kaggle.com
opendatalab.com
+4more
zip
Updated Jan 8, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hojjat Khodabakhsh (2019). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/hojjatk/mnist-dataset
Explore at:
zip(23112702 bytes)Available download formats
Dataset updated
Jan 8, 2019
Authors
Hojjat Khodabakhsh
Description
Context

MNIST is a subset of a larger set available from NIST (it's copied from http://yann.lecun.com/exdb/mnist/)

Content

The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. . Four files are available:

train-images-idx3-ubyte.gz: training set images (9912422 bytes)

train-labels-idx1-ubyte.gz: training set labels (28881 bytes)

t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)

t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

How to read

See sample MNIST reader

Acknowledgements

Yann LeCun, Courant Institute, NYU

Corinna Cortes, Google Labs, New York

Christopher J.C. Burges, Microsoft Research, Redmond

Inspiration

Many methods have been tested with this training set and test set (see http://yann.lecun.com/exdb/mnist/ for more details)

MNIST-100

kaggle.com

zip

Updated Jul 25, 2023

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Marcin Wierzbiński (2023). MNIST-100 [Dataset]. https://www.kaggle.com/datasets/martininf1n1ty/mnist100

Explore at:

zip(23452456 bytes)Available download formats

Dataset updated

Jul 25, 2023

Authors

Marcin Wierzbiński

License

http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

Description

The MNIST-100 dataset is a variation of the original MNIST dataset, consisting of 100 handwritten numbers extracted from the MNIST dataset. Unlike the traditional MNIST dataset, which contains 60,000 training images of digits from 0 to 9, the Modified MNIST-10 dataset focuses on 100 numbers.

Dataset Overview: - Dataset Name: MNIST-100 - Total Number of Images: train: 60000 test: 1000 - Classes: 100 (Numbers from 00 to 99) - Image Size: 28x56 pixels (grayscale)

Data Collection: The MNIST-100 dataset was created by randomly selecting 10 unique digits from the original MNIST dataset. For each selected digit, 10 representative images were extracted, resulting in a total of 100 images. These images were carefully chosen to represent a diverse range of handwriting styles for each digit.

Each image in the dataset is labeled with its corresponding numbers, ranging from 00 to 99, making it suitable for classification tasks. Researchers and practitioners can use this dataset to train and evaluate machine learning algorithms and neural networks for digit recognition and classification.

Please note that the Modified MNIST-100 dataset is not intended to replace the original MNIST dataset but serves as a complementary resource for specific applications requiring a smaller and more focused subset of the MNIST data.

Overall, the MNIST-100 dataset offers a compact and representative collection of 100 handwritten numbers, providing a convenient tool for experimentation and learning in computer vision and pattern recognition.

Label Distribution for training set:

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655

Test data:

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7193292%2Fac688f2526851734cb50be10f0a7bd7d%2Fpobrane%20(16).png?generation=1690276359580027&alt=media" alt="">

Label	Occurrences	Label	Occurrences	Label	Occurrences
00	96	34	100	68	90
01	108	35	91	69	92
02	91	36	107	70	102
03	96	37	112	71	116
04	75	38	97	72	101
05	85	39	96	73	106
06	88	40	103	74	98
07	96	41	123	75 ...

Moving MNIST
kaggle.com
zip
Updated Jun 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Huy Phan (2024). Moving MNIST [Dataset]. https://www.kaggle.com/datasets/hughiephan/moving-mnist
Explore at:
zip(22299997 bytes)Available download formats
Dataset updated
Jun 17, 2024
Authors
Huy Phan
Description
Dataset

This dataset was created by Huy Phan

Contents
Mnist Dataset
kaggle.com
zip
Updated Jun 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Talon Guardian (2025). Mnist Dataset [Dataset]. https://www.kaggle.com/datasets/talonguardian/mnist-dataset/suggestions
Explore at:
zip(33306881 bytes)Available download formats
Dataset updated
Jun 10, 2025
Authors
Talon Guardian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class.
mnistdata
kaggle.com
zip
Updated Nov 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
colormap (2020). mnistdata [Dataset]. https://www.kaggle.com/datasets/colormap/mnistdata
Explore at:
zip(15964211 bytes)Available download formats
Dataset updated
Nov 10, 2020
Authors
colormap
Description
How to load? train_data = np.loadtxt('/kaggle/input/mnistdata/mnist_train_images', dtype=np.uint16) train_labels = np.loadtxt('/kaggle/input/mnistdata/mnist_train_labels', dtype=np.uint8) test_data = np.loadtxt('/kaggle/input/mnistdata/mnist_test_images', dtype=np.uint16) test_labels = np.loadtxt('/kaggle/input/mnistdata/mnist_test_labels', dtype=np.uint8)
MNIST Dataset
kaggle.com
zip
Updated Feb 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marvin Luckianto (2024). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/marvinluckianto/mnist-dataset
Explore at:
zip(11494011 bytes)Available download formats
Dataset updated
Feb 6, 2024
Authors
Marvin Luckianto
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels and translating the image so as to position this point at the center of the 28x28 field.

License: Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.
Hindi-MNIST
kaggle.com
zip
Updated Aug 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bikram Saha (2022). Hindi-MNIST [Dataset]. https://www.kaggle.com/datasets/imbikramsaha/hindi-mnist
Explore at:
zip(15529194 bytes)Available download formats
Dataset updated
Aug 7, 2022
Authors
Bikram Saha
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is original MNIST type Hindi-MNIST dataset.

This dataset contains total 20,000 images of 10 categories, 17000 in train folder, and 3000 in test folder

Categories Name: 0, 1, 2, 3, 4,5, 6, 7, 8, 9
original MNIST Dataset
kaggle.com
zip
Updated Mar 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donald Trump (2025). original MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/donaldtrump2025/original-mnist-dataset
Explore at:
zip(11556268 bytes)Available download formats
Dataset updated
Mar 31, 2025
Authors
Donald Trump
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset is copied from https://www.kaggle.com/datasets/hojjatk/mnist-dataset,including introduction and methods for using
Corrupted MNIST
kaggle.com
zip
Updated Nov 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shreyasi Mandal (2023). Corrupted MNIST [Dataset]. https://www.kaggle.com/datasets/shreyasi2002/corrupted-mnist/code
Explore at:
zip(55618716 bytes)Available download formats
Dataset updated
Nov 24, 2023
Authors
Shreyasi Mandal
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This dataset consists of 60,000 images with dimensions 32x32. The images are the same as the MNIST database of handwritten digits - http://yann.lecun.com/exdb/mnist/

CHALLENGE 1. The notebook provided gets a very low test accuracy (45%) on this data, while the training accuracy was 99%. Can you get a higher accuracy? 2. Train models on the original MNIST dataset and test it on this dataset.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F17836414%2Ff5120df23eff1cd395fa01e57964171d%2FScreenshot%202023-11-24%20at%2019.43.35.png?generation=1700835254577242&alt=media" alt="">

Notebook to get started - https://www.kaggle.com/code/shreyasi2002/testing-vgg16-on-corrupted-mnist/notebook

So, how are the images corrupted?
The MNIST images are perturbed using Projected Gradient Descent Attack (https://www.kaggle.com/code/shreyasi2002/pgd-attack-on-mnist-and-fashion-mnist)
Enhanced Sign Language MNIST Dataset
kaggle.com
zip
Updated May 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Oladayo Luke (2024). Enhanced Sign Language MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/oladayoluke/enhanced-sign-language-mnist-dataset
Explore at:
zip(48352794 bytes)Available download formats
Dataset updated
May 12, 2024
Authors
Oladayo Luke
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The Enhanced Sign Language MNIST dataset is a comprehensive collection of grayscale images representing American Sign Language (ASL) gestures. This dataset serves as an enhancement to the original Sign Language MNIST dataset, providing a more diverse and extensive set of hand gesture samples for machine learning tasks.

Inspired by the need for more challenging benchmarks in image-based machine learning, this dataset is consistent with the original Sign Language MNIST dataset to acquire a self-generated dataset, resulting in a more robust and varied collection of hand gesture images. The original Sign Language MNIST dataset, available on Kaggle, provided a solid foundation with 27,455 training cases and 7,172 test cases, each representing a label (0-25) mapped to an alphabetic letter A-Z (excluding J and Z).

The Enhanced Sign Language MNIST dataset builds upon this foundation by incorporating additional images generated through a process involving various image manipulation techniques. These techniques include hand tracking using MediaPipe, cropping, grayscale conversion, and resizing, to create approximately 1400 samples of each alphabetic letter. The enhanced dataset contains 69,252 samples in total, with 55,402 samples for training and validation, and 13,850 samples for testing.

This dataset is invaluable for researchers and developers working on sign language recognition, hand gesture detection, and related computer vision tasks. It offers a challenging benchmark for evaluating the performance of machine learning models, particularly Convolutional Neural Networks (CNNs), in recognizing ASL gestures.

The dataset is divided into training and testing sets following the methodology outlined in Oladayo's research (2024), ensuring the consistency and reproducibility of experimental setups. The experimentation framework incorporated four distinct Convolutional Neural Network (CNN) models: CNN1, CNN2, CNN3, and CNN4. Additionally, four diverse data augmentation techniques were employed, denoted as DAM1, DAM2, DAM3, and DAM4. Notably, DAM1 represents the scenario where no data augmentation is applied.

CNN2 achieved a remarkable 99.89% validation accuracy on the enhanced test samples and 99.78% on the generated test samples. Training the model on a GPU/TPU took approximately 209 seconds (3.5 minutes), which is close to the results reported in the research report. This success underscores the effectiveness of sample generation in enhancing the model's performance, showcasing its superiority over traditional data augmentation methods.

With the Enhanced Sign Language MNIST dataset, researchers can explore new approaches to sign language recognition, develop more robust machine learning models, and ultimately contribute to the advancement of assistive technologies for the deaf and hard-of-hearing community.

If you use this code or the datasets in your research, please cite the following dissertation: Oladayo Luke. (2024). Enhancing Sign Language Recognition and Hand Gesture Detection using Convolutional Neural Networks and Data Augmentation Techniques. (Doctoral dissertation, Nova Southeastern University).
MNIST Dataset
kaggle.com
zip
Updated Mar 26, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saba Hesaraki (2023). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/sabahesaraki/mnist-dataset
Explore at:
zip(11556456 bytes)Available download formats
Dataset updated
Mar 26, 2023
Authors
Saba Hesaraki
Description
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal effort on preprocessing and formatting.

Four files are available on this site: train-images-idx3-ubyte.gz: training set images (9912422 bytes) train-labels-idx1-ubyte.gz: training set labels (28881 bytes) t10k-images-idx3-ubyte.gz: test set images (1648877 bytes) t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
3D MNIST
kaggle.com
zip
Updated Oct 18, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David de la Iglesia Castro (2019). 3D MNIST [Dataset]. https://www.kaggle.com/daavoo/3d-mnist
Explore at:
zip(160210751 bytes)Available download formats
Dataset updated
Oct 18, 2019
Authors
David de la Iglesia Castro
Description
Context

The aim of this dataset is to provide a simple way to get started with 3D computer vision problems such as 3D shape recognition.

Accurate 3D point clouds can (easily and cheaply) be adquired nowdays from different sources:

RGB-D devices: Google Tango, Microsoft Kinect, etc.

Lidar.

3D reconstruction from multiple images.

However there is a lack of large 3D datasets (you can find a good one here based on triangular meshes); it's especially hard to find datasets based on point clouds (wich is the raw output from every 3D sensing device).

This dataset contains 3D point clouds generated from the original images of the MNIST dataset to bring a familiar introduction to 3D to people used to work with 2D datasets (images).

In the 3D_from_2D notebook you can find the code used to generate the dataset.

You can use the code in the notebook to generate a bigger 3D dataset from the original.

Content

full_dataset_vectors.h5

The entire dataset stored as 4096-D vectors obtained from the voxelization (x:16, y:16, z:16) of all the 3D point clouds.

In adition to the original point clouds, it contains randomly rotated copies with noise.

The full dataset is splitted into arrays:

X_train (10000, 4096)

y_train (10000)

X_test(2000, 4096)

y_test (2000)

Example python code reading the full dataset:

with h5py.File("../input/train_point_clouds.h5", "r") as hf: X_train = hf["X_train"][:] y_train = hf["y_train"][:] X_test = hf["X_test"][:] y_test = hf["y_test"][:]

train_point_clouds.h5 & test_point_clouds.h5

5000 (train), and 1000 (test) 3D point clouds stored in HDF5 file format. The point clouds have zero mean and a maximum dimension range of 1.

Each file is divided into HDF5 groups

Each group is named as its corresponding array index in the original mnist dataset and it contains:

"points" dataset: x, y, z coordinates of each 3D point in the point cloud.

"normals" dataset: nx, ny, nz components of the unit normal associate to each point.

"img" dataset: the original mnist image.

"label" attribute: the original mnist label.

Example python code reading 2 digits and storing some of the group content in tuples:

with h5py.File("../input/train_point_clouds.h5", "r") as hf: a = hf["0"] b = hf["1"] digit_a = (a["img"][:], a["points"][:], a.attrs["label"]) digit_b = (b["img"][:], b["points"][:], b.attrs["label"])

voxelgrid.py

Simple Python class that generates a grid of voxels from the 3D point cloud. Check kernel for use.

plot3D.py

Module with functions to plot point clouds and voxelgrid inside jupyter notebook. You have to run this locally due to Kaggle's notebook lack of support to rendering Iframes. See github issue here

Functions included:

array_to_color Converts 1D array to rgb values use as kwarg color in plot_points()

plot_points(xyz, colors=None, size=0.1, axis=False)

plot_voxelgrid(v_grid, cmap="Oranges", axis=False)

Acknowledgements

Website of the original MNIST dataset

Website of the 3D MNIST dataset

Have fun!
MNIST handwritten digits 0 to 9 dataset
kaggle.com
zip
Updated Oct 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kanzari achref (2025). MNIST handwritten digits 0 to 9 dataset [Dataset]. https://www.kaggle.com/datasets/kanzariachref/mnist-handwritten-digits-0-to-9-dataset
Explore at:
zip(1731810 bytes)Available download formats
Dataset updated
Oct 9, 2025
Authors
kanzari achref
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This is a subset of the MNIST handwritten digit dataset (http://yann.lecun.com/exdb/mnist/) . The data set contains 5000 training examples of handwritten digits, 0 to 9. Each training example is a 20-pixel x 20-pixel grayscale image of the digit. Each pixel is represented by a floating-point number indicating the grayscale intensity at that location. The 20 by 20 grid of pixels is “unrolled” into 400-dimensional columns. Each training example becomes a single row in our data set. This gives us a 5000 x 400 dataset where every row is a training example of a handwritten digit image

The second part of the training set is a 5000 x 1 columns y that contains labels for the training set, y = 0 if the image is of the digit 0, y = 7 if the image is of the digit 7.
MNIST-fashion-png
kaggle.com
zip
Updated Feb 19, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PedroStu (2022). MNIST-fashion-png [Dataset]. https://www.kaggle.com/datasets/prashantdandriyal/mnistfashionpng
Explore at:
zip(52473305 bytes)Available download formats
Dataset updated
Feb 19, 2022
Authors
PedroStu
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by PedroStu

Released under CC0: Public Domain

Contents
Mnist dataset
kaggle.com
zip
Updated Sep 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivam Baldha (2020). Mnist dataset [Dataset]. https://www.kaggle.com/shivambaldha/mnist-dataset
Explore at:
zip(9606023 bytes)Available download formats
Dataset updated
Sep 20, 2020
Authors
Shivam Baldha
Description
Dataset

This dataset was created by Shivam Baldha

Contents
MNIST-Pytorch
kaggle.com
zip
Updated Aug 18, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mlagunas (2017). MNIST-Pytorch [Dataset]. https://www.kaggle.com/mlagunas/mnist-pytorch
Explore at:
zip(23134518 bytes)Available download formats
Dataset updated
Aug 18, 2017
Authors
mlagunas
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

MNIST dataset as downloaded by Pytorch libraries.
Hindi/Devanagari MNIST Data
kaggle.com
zip
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anurag Shenoy (2025). Hindi/Devanagari MNIST Data [Dataset]. https://www.kaggle.com/datasets/anurags397/hindi-mnist-data
Explore at:
zip(18064821 bytes)Available download formats
Dataset updated
Mar 18, 2025
Authors
Anurag Shenoy
Description
Context

Handwritten image data is easy to find in languages such as English and Japanese, but not for many Indian languages including Hindi. While trying to create an MNIST like personal project, I stumbled upon a Hindi Handwritten characters dataset by Shailesh Acharya and Prashnna Kumar Gyawali, which is uploaded to the UCI Machine Learning Repository.

This dataset however, only has the digits from 0 to 9, and all other characters have been removed.

Content

Data Type: GrayScale Image Image Format: PNG Resolution: 32 by 32 pixels Actual character is centered within 28 by 28 pixel, padding of 2 pixel is added on all four sides of actual character.

There are ~1700 images per class in the Train set, and around ~300 images per class in the Test set.

Acknowledgements

The Dataset is ©️ Original Authors.

Original Authors: - Shailesh Acharya - Prashnna Kumar Gyawali

Citation: S. Acharya, A.K. Pant and P.K. Gyawali “**Deep Learning Based Large Scale Handwritten Devanagari Character Recognition**”, In Proceedings of the 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), pp. 121-126, 2015.

The full Dataset is available here: https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset
Fashion MNIST Image Dataset
kaggle.com
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ghanshyam Saini (2025). Fashion MNIST Image Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/fashion-mnist-image-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 15, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Ghanshyam Saini
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Fashion-MNIST Dataset (Image Files and CSV Labels)

This dataset contains images of Zalando's article categories, intended for fashion image classification. It serves as a direct drop-in replacement for the original MNIST dataset, often used as a benchmark for machine learning algorithms. Fashion-MNIST is slightly more challenging than regular MNIST.

Dataset Structure:

The dataset is organized into the following files and folders:

train/: This folder contains the training set images. It holds 60,000 grayscale images, each with dimensions 28x28 pixels. The images are in PNG format. The filenames within this folder are not explicitly labeled with the class, so you will need to refer to the train.csv file for the corresponding labels.

test/: This folder contains the testing set images. It holds 10,000 grayscale images, each with dimensions 28x28 pixels and in PNG format. Similar to the training set, the filenames here are not directly labeled, and the test.csv file provides the corresponding labels.

train.csv: This CSV (Comma Separated Values) file contains the labels for the images in the train/ folder. Each row in this file corresponds to an image. It typically contains two columns:

pixel1, pixel2, ..., pixel784: These columns represent the flattened pixel values of the 28x28 grayscale images. The pixel values are integers ranging from 0 to 255.

label: This column contains the corresponding class label (an integer from 0 to 9) for the image. You will need to refer to the class mapping (provided below) to understand the meaning of these numerical labels.

test.csv: This CSV file contains the labels for the images in the test/ folder, following the same format as train.csv with pixel1 to pixel784 columns and a label column.

Content of the Data:

Each image in the Fashion-MNIST dataset belongs to one of the following 10 classes:

Label Description
0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

The images are grayscale, meaning each pixel has a single intensity value.

How to Use This Dataset:

Download the entire dataset, including the train/ and test/ folders and the train.csv and test.csv files.

The image files in the train/ and test/ folders contain the visual data. You can load these images using libraries that handle image formats (like PIL, OpenCV).

The train.csv and test.csv files provide the ground truth labels for the corresponding images. You can read these CSV files using libraries like Pandas. The pixel values in the CSV can be reshaped into a 28x28 matrix to represent the image. The label column provides the class of the fashion item.

You can train your image classification models using the train/ images and train.csv labels.

Evaluate the performance of your trained models using the test/ images and test.csv labels.

Citation:

When using the Fashion-MNIST dataset, please cite the original paper:

Xiao, Han, Kashif Rasul, and Roland Vollgraf. "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms." arXiv preprint arXiv:1708.07747 (2017).

Data Contribution:

Thank you for providing this well-structured Fashion-MNIST dataset with separate image folders and CSV label files. This organization makes it convenient for users to work with both the raw image data and the corresponding labels for training and evaluation of their fashion classification models.

If you find this dataset structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable!
Augmented Kaggle MNSIT Dataset
kaggle.com
zip
Updated Oct 21, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nic Ollis (2019). Augmented Kaggle MNSIT Dataset [Dataset]. https://www.kaggle.com/nicollis/augmented-kaggle-mnsit-dataset
Explore at:
zip(140776547 bytes)Available download formats
Dataset updated
Oct 21, 2019
Authors
Nic Ollis
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Kaggle's MNSIT dataset with augmentation

Content

The MNSIT dataset has been augmented with rotations and shifting. While you can use a transformer for this I found better results by using this fixed dataset.

Acknowledgements

Kaggle for their MNIST dataset that was the bases for this one.
400k Augmented MNIST: Extended Handwritten Digits
kaggle.com
zip
Updated Mar 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre Le Mercier (2025). 400k Augmented MNIST: Extended Handwritten Digits [Dataset]. https://www.kaggle.com/datasets/alexandrelemercier/400k-augmented-mnist-extended-handwritten-digits
Explore at:
zip(359213486 bytes)Available download formats
Dataset updated
Mar 26, 2025
Authors
Alexandre Le Mercier
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Overview

The 400k Augmented MNIST dataset is an extended version of the classic MNIST handwritten digits dataset. By applying a variety of augmentation techniques, I have increased the number of training images to 400,000 - roughly 40,000 per digit label. This large and diverse training set is designed to significantly improve the robustness and generalization of models trained on it, making them less susceptible to overfitting and more resilient against adversarial perturbations.

Dataset Structure

The dataset is organized into two main directories:

Augmented MNIST Training Set (400k):
This directory contains 10 subdirectories, one for each digit label ("Label 0" through "Label 9"). Each subdirectory holds the corresponding JPEG images generated via augmentation. These images have been produced using techniques such as random rotation, shear, translation, scaling, reflection, spatial padding, Ben Graham transformation, Gaussian noise, salt-and-pepper noise, and random text overlay.

MNIST Validation Set (4k):
This directory also contains subdirectories "Label 0" to "Label 9". However, the validation set consists solely of the original MNIST images (approximately 400 per label) that were not used for augmentation. This allows you to evaluate model performance on natural, unaltered digit images, providing a clear benchmark for generalization.

How to Use This Dataset

Training:
Use the augmented training set to train your deep learning models. The 400k images offer a wide variety of conditions, helping your model learn robust features that generalize well.

Validation:
Evaluate your models on the validation set, which contains only the original MNIST images. This will help you measure performance on “natural” digits, ensuring that improvements in robustness do not come at the expense of real-world accuracy.

Flexibility:
You can also experiment with mixed training (combining augmented and original images) to study how different training strategies affect model robustness and accuracy.

Augmentation Techniques Applied

The following augmentation functions were used to generate the extended dataset:

Random Rotation: Randomly rotates images within a specified angle range.

Random Shear: Applies slight shearing transformations.

Random Translation: Shifts images horizontally and vertically.

Random Scale: Zooms in or out on the images.

Ben Graham Transform: Enhances image contrast and clarity using a weighted Gaussian blur.

Random Gaussian Noise: Adds Gaussian noise to simulate sensor or environmental disturbances.

Random Salt-and-Pepper Noise: Introduces random pixel-level corruption.

A random number of transformations (between 1 and 6, in a random order) is applied to each image, with the goal of creating a diverse and challenging training set.

Citation

If you use this dataset in your research, please cite it as follows:

@misc{alexandre_le_mercier_2025, title={400k Augmented MNIST: Extended Handwritten Digits}, url={https://www.kaggle.com/ds/6967763}, DOI={10.34740/KAGGLE/DS/6967763}, publisher={Kaggle}, author={Alexandre Le Mercier}, year={2025} }

License

This dataset is under the Apache 2.0 license.

Contact

For any questions or issues regarding this dataset, please send a message in the "Discussions" or "Suggestions" sections of the Kaggle dataset page.

Good luck and happy coding! 🚀

Label	Description
0	T-shirt/top
1	Trouser
2	Pullover
3	Dress
4	Coat
5	Sandal
6	Shirt
7	Sneaker
8	Bag
9	Ankle boot

Facebook

Twitter

Click to copy link

Link copied

Cite

Hojjat Khodabakhsh (2019). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/hojjatk/mnist-dataset

MNIST Dataset

The MNIST database of handwritten digits (http://yann.lecun.com)

Explore at:

124 scholarly articles cite this dataset (View in Google Scholar)

zip(23112702 bytes)Available download formats

Dataset updated

Jan 8, 2019

Authors

Hojjat Khodabakhsh

Description

Context

MNIST is a subset of a larger set available from NIST (it's copied from http://yann.lecun.com/exdb/mnist/)

Content

The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. . Four files are available:

train-images-idx3-ubyte.gz: training set images (9912422 bytes)
train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

How to read

See sample MNIST reader

Acknowledgements

Yann LeCun, Courant Institute, NYU
Corinna Cortes, Google Labs, New York
Christopher J.C. Burges, Microsoft Research, Redmond

Inspiration

Many methods have been tested with this training set and test set (see http://yann.lecun.com/exdb/mnist/ for more details)

Clear search

Close search

Google apps

Main menu

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655

MNIST Dataset

Context

Content

How to read

Acknowledgements

Inspiration

MNIST-100

Moving MNIST

Dataset

Contents

Mnist Dataset

mnistdata

MNIST Dataset

Hindi-MNIST

This is original MNIST type Hindi-MNIST dataset.

This dataset contains total 20,000 images of 10 categories, 17000 in train folder, and 3000 in test folder

Categories Name: 0, 1, 2, 3, 4,5, 6, 7, 8, 9

original MNIST Dataset

Corrupted MNIST

Enhanced Sign Language MNIST Dataset

MNIST Dataset

3D MNIST

Context

Content

full_dataset_vectors.h5

train_point_clouds.h5 & test_point_clouds.h5

voxelgrid.py

plot3D.py

Acknowledgements

Have fun!

MNIST handwritten digits 0 to 9 dataset

MNIST-fashion-png

Dataset

Contents

Mnist dataset

Dataset

Contents

MNIST-Pytorch

Context

Hindi/Devanagari MNIST Data

Context

Content

Acknowledgements

Fashion MNIST Image Dataset

Fashion-MNIST Dataset (Image Files and CSV Labels)

Augmented Kaggle MNSIT Dataset

Context

Content

Acknowledgements

400k Augmented MNIST: Extended Handwritten Digits

Overview

Dataset Structure

How to Use This Dataset

Augmentation Techniques Applied

Citation

License

Contact

MNIST Dataset

The MNIST database of handwritten digits (http://yann.lecun.com)

Context

Content

How to read

Acknowledgements

Inspiration

This is original `MNIST` type `Hindi-MNIST` dataset.

This dataset contains total 20,000 images of 10 categories, 17000 in `train` folder, and 3000 in `test` folder

Categories Name: `0, 1, 2, 3, 4,5, 6, 7, 8, 9`

Label	Occurrences	Label	Occurrences	Label	Occurrences
0	561	34	629	68	606
1	687	35	540	69	582
2	582	36	588	70	566
3	633	37	619	71	659
4	588	38	584	72	572
5	544	39	609	73	682
6	582	40	570	74	627
7	615	41	679	75	598
8	584	42	544	76	605
9	567	43	567	77	602
10	641	44	574	78	595
11	780	45	555	79	586
12	720	46	550	80	569
13	699	47	614	81	628
14	630	48	614	82	578
15	627	49	595	83	622
16	684	50	505	84	569
17	713	51	583	85	540
18	743	52	512	86	557
19	706	53	555	87	628
20	527	54	504	88	562
21	710	55	488	89	625
22	586	56	531	90	600
23	584	57	556	91	700
24	568	58	497	92	622
25	530	59	520	93	622
26	612	60	556	94	591
27	627	61	682	95	557
28	618	62	594	96	580
29	619	63	539	97	640
30	622	64	610	98	577
31	684	65	514	99	563
32	606	66	587
33	592	67	655