51 datasets found
  1. MNIST Dataset

    • kaggle.com
    • opendatalab.com
    • +4more
    zip
    Updated Jan 8, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hojjat Khodabakhsh (2019). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/hojjatk/mnist-dataset
    Explore at:
    zip(23112702 bytes)Available download formats
    Dataset updated
    Jan 8, 2019
    Authors
    Hojjat Khodabakhsh
    Description

    Context

    MNIST is a subset of a larger set available from NIST (it's copied from http://yann.lecun.com/exdb/mnist/)

    Content

    The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. . Four files are available:

    • train-images-idx3-ubyte.gz: training set images (9912422 bytes)
    • train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
    • t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
    • t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

    How to read

    See sample MNIST reader

    Acknowledgements

    • Yann LeCun, Courant Institute, NYU
    • Corinna Cortes, Google Labs, New York
    • Christopher J.C. Burges, Microsoft Research, Redmond

    Inspiration

    Many methods have been tested with this training set and test set (see http://yann.lecun.com/exdb/mnist/ for more details)

  2. MNIST-100

    • kaggle.com
    zip
    Updated Jul 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Wierzbiński (2023). MNIST-100 [Dataset]. https://www.kaggle.com/datasets/martininf1n1ty/mnist100
    Explore at:
    zip(23452456 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    Marcin Wierzbiński
    License

    http://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html

    Description

    The MNIST-100 dataset is a variation of the original MNIST dataset, consisting of 100 handwritten numbers extracted from the MNIST dataset. Unlike the traditional MNIST dataset, which contains 60,000 training images of digits from 0 to 9, the Modified MNIST-10 dataset focuses on 100 numbers.

    Dataset Overview: - Dataset Name: MNIST-100 - Total Number of Images: train: 60000 test: 1000 - Classes: 100 (Numbers from 00 to 99) - Image Size: 28x56 pixels (grayscale)

    Data Collection: The MNIST-100 dataset was created by randomly selecting 10 unique digits from the original MNIST dataset. For each selected digit, 10 representative images were extracted, resulting in a total of 100 images. These images were carefully chosen to represent a diverse range of handwriting styles for each digit.

    Each image in the dataset is labeled with its corresponding numbers, ranging from 00 to 99, making it suitable for classification tasks. Researchers and practitioners can use this dataset to train and evaluate machine learning algorithms and neural networks for digit recognition and classification.

    Please note that the Modified MNIST-100 dataset is not intended to replace the original MNIST dataset but serves as a complementary resource for specific applications requiring a smaller and more focused subset of the MNIST data.

    Overall, the MNIST-100 dataset offers a compact and representative collection of 100 handwritten numbers, providing a convenient tool for experimentation and learning in computer vision and pattern recognition.

    Label Distribution for training set:

    LabelOccurrencesLabelOccurrencesLabelOccurrences
    05613462968606
    16873554069582
    25823658870566
    36333761971659
    45883858472572
    55443960973682
    65824057074627
    76154167975598
    85844254476605
    95674356777602
    106414457478595
    117804555579586
    127204655080569
    136994761481628
    146304861482578
    156274959583622
    166845050584569
    177135158385540
    187435251286557
    197065355587628
    205275450488562
    217105548889625
    225865653190600
    235845755691700
    245685849792622
    255305952093622
    266126055694591
    276276168295557
    286186259496580
    296196353997640
    306226461098577
    316846551499563
    3260666587
    3359267655

    Test data:

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F7193292%2Fac688f2526851734cb50be10f0a7bd7d%2Fpobrane%20(16).png?generation=1690276359580027&alt=media" alt="">

    LabelOccurrencesLabelOccurrencesLabelOccurrences
    0096341006890
    0110835916992
    02913610770102
    03963711271116
    0475389772101
    0585399673106
    0688401037498
    07964112375 ...
  3. a

    MNIST

    • datasets.activeloop.ai
    deeplake
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yann LeCun, MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/mnist/
    Explore at:
    deeplakeAvailable download formats
    Authors
    Yann LeCun
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1998 - Dec 31, 2000
    Area covered
    Earth
    Dataset funded by
    AT&T Bell Labs
    Description

    The MNIST dataset is a dataset of handwritten digits. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 60,000 training images and 10,000 test images. Each image is a 28x28 pixel grayscale image of a handwritten digit. The digits are labeled from 0 to 9.

  4. MNIST Dataset

    • kaggle.com
    zip
    Updated Mar 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saba Hesaraki (2023). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/sabahesaraki/mnist-dataset
    Explore at:
    zip(11556456 bytes)Available download formats
    Dataset updated
    Mar 26, 2023
    Authors
    Saba Hesaraki
    Description

    The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal effort on preprocessing and formatting.

    Four files are available on this site: train-images-idx3-ubyte.gz: training set images (9912422 bytes) train-labels-idx1-ubyte.gz: training set labels (28881 bytes) t10k-images-idx3-ubyte.gz: test set images (1648877 bytes) t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

  5. S

    MNIST Dataset

    • scidb.cn
    Updated Feb 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xuyu Zhang; Jingjing Gao; Yu Gan; Chunyuan Song; Dawei Zhang; Songlin Zhuang; Shensheng Han; Puxiang Lai; Honglin Liu (2023). MNIST Dataset [Dataset]. http://doi.org/10.57760/sciencedb.07421
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 16, 2023
    Dataset provided by
    Science Data Bank
    Authors
    Xuyu Zhang; Jingjing Gao; Yu Gan; Chunyuan Song; Dawei Zhang; Songlin Zhuang; Shensheng Han; Puxiang Lai; Honglin Liu
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    MNIST is a picture data set of handwritten numbers, which was organized by the National Institute of Standards and Technology (NIST) of the United States. A total of 250 handwritten digital pictures were collected, 50% of which were high school students and 50% were from the staff of the Census Bureau. The collection purpose of this data set is to realize the recognition of handwritten digits through algorithms. The data set contains 60000 images and labels, while the test set contains 10000 images and labels. The first 5000 training sets from the initial NIST program, The last 5000 test sets from the original NIST program. The first 5000 are more regular than the last 5000, because the first 5000 data come from the employees of the US Census Bureau, and the last 5000 data come from college students.

  6. Rescaled Fashion-MNIST dataset

    • zenodo.org
    Updated Jun 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST dataset [Dataset]. http://doi.org/10.5281/zenodo.15187793
    Explore at:
    Dataset updated
    Jun 27, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
    Time period covered
    Apr 10, 2025
    Description

    Motivation

    The goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

    The Rescaled Fashion-MNIST dataset was introduced in the paper:

    [1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

    with a pre-print available at arXiv:

    [2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

    Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:

    [3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

    Access and rights

    The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

    [4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

    and also for this new rescaled version, using the reference [1] above.

    The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

    The dataset

    The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

    There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

    The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

    The h5 files containing the dataset

    The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

    fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

    Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2k/4, with k being integers in the range [-4, 4]:

    fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
    fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5

    These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].

    Instructions for loading the data set

    The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
    ('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

    The training dataset can be loaded in Python as:

    with h5py.File(`

    x_train = np.array( f["/x_train"], dtype=np.float32)
    x_val = np.array( f["/x_val"], dtype=np.float32)
    x_test = np.array( f["/x_test"], dtype=np.float32)
    y_train = np.array( f["/y_train"], dtype=np.int32)
    y_val = np.array( f["/y_val"], dtype=np.int32)
    y_test = np.array( f["/y_test"], dtype=np.int32)

    We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

    x_train = np.transpose(x_train, (0, 3, 1, 2))
    x_val = np.transpose(x_val, (0, 3, 1, 2))
    x_test = np.transpose(x_test, (0, 3, 1, 2))

    The test datasets can be loaded in Python as:

    with h5py.File(`

    x_test = np.array( f["/x_test"], dtype=np.float32)
    y_test = np.array( f["/y_test"], dtype=np.int32)

    The test datasets can be loaded in Matlab as:

    x_test = h5read(`

    The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

    There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.

  7. a

    Data from: Fashion-MNIST

    • datasets.activeloop.ai
    • tensorflow.org
    • +3more
    deeplake
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Han Xiao, Kashif Rasul, Roland Vollgraf (2022). Fashion-MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/fashion-mnist-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Feb 8, 2022
    Authors
    Han Xiao, Kashif Rasul, Roland Vollgraf
    License

    https://github.com/zalandoresearch/fashion-mnist/blob/master/LICENSEhttps://github.com/zalandoresearch/fashion-mnist/blob/master/LICENSE

    Description

    A dataset of 70,000 fashion images with labels for 10 classes. The dataset was created by researchers at Zalando Research and is used for research in machine learning and computer vision tasks such as image classification.

  8. Fashion MNIST Image Dataset

    • kaggle.com
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghanshyam Saini (2025). Fashion MNIST Image Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/fashion-mnist-image-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Ghanshyam Saini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Fashion-MNIST Dataset (Image Files and CSV Labels)

    This dataset contains images of Zalando's article categories, intended for fashion image classification. It serves as a direct drop-in replacement for the original MNIST dataset, often used as a benchmark for machine learning algorithms. Fashion-MNIST is slightly more challenging than regular MNIST.

    Dataset Structure:

    The dataset is organized into the following files and folders:

    • train/: This folder contains the training set images. It holds 60,000 grayscale images, each with dimensions 28x28 pixels. The images are in PNG format. The filenames within this folder are not explicitly labeled with the class, so you will need to refer to the train.csv file for the corresponding labels.

    • test/: This folder contains the testing set images. It holds 10,000 grayscale images, each with dimensions 28x28 pixels and in PNG format. Similar to the training set, the filenames here are not directly labeled, and the test.csv file provides the corresponding labels.

    • train.csv: This CSV (Comma Separated Values) file contains the labels for the images in the train/ folder. Each row in this file corresponds to an image. It typically contains two columns:

      • pixel1, pixel2, ..., pixel784: These columns represent the flattened pixel values of the 28x28 grayscale images. The pixel values are integers ranging from 0 to 255.
      • label: This column contains the corresponding class label (an integer from 0 to 9) for the image. You will need to refer to the class mapping (provided below) to understand the meaning of these numerical labels.
    • test.csv: This CSV file contains the labels for the images in the test/ folder, following the same format as train.csv with pixel1 to pixel784 columns and a label column.

    Content of the Data:

    Each image in the Fashion-MNIST dataset belongs to one of the following 10 classes:

    LabelDescription
    0T-shirt/top
    1Trouser
    2Pullover
    3Dress
    4Coat
    5Sandal
    6Shirt
    7Sneaker
    8Bag
    9Ankle boot

    The images are grayscale, meaning each pixel has a single intensity value.

    How to Use This Dataset:

    1. Download the entire dataset, including the train/ and test/ folders and the train.csv and test.csv files.
    2. The image files in the train/ and test/ folders contain the visual data. You can load these images using libraries that handle image formats (like PIL, OpenCV).
    3. The train.csv and test.csv files provide the ground truth labels for the corresponding images. You can read these CSV files using libraries like Pandas. The pixel values in the CSV can be reshaped into a 28x28 matrix to represent the image. The label column provides the class of the fashion item.
    4. You can train your image classification models using the train/ images and train.csv labels.
    5. Evaluate the performance of your trained models using the test/ images and test.csv labels.

    Citation:

    When using the Fashion-MNIST dataset, please cite the original paper:

    Xiao, Han, Kashif Rasul, and Roland Vollgraf. "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms." arXiv preprint arXiv:1708.07747 (2017).

    Data Contribution:

    Thank you for providing this well-structured Fashion-MNIST dataset with separate image folders and CSV label files. This organization makes it convenient for users to work with both the raw image data and the corresponding labels for training and evaluation of their fashion classification models.

    If you find this dataset structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable!

  9. p

    Binarized MNIST data for quantum computing

    • pennylane.ai
    Updated Apr 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Bowles (2025). Binarized MNIST data for quantum computing [Dataset]. https://pennylane.ai/datasets/binarized-mnist
    Explore at:
    Dataset updated
    Apr 15, 2025
    Authors
    Joseph Bowles
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Measurement technique
    Simulation
    Dataset funded by
    Xanadu Quantum Technologies
    Description

    Binarized version of the MNIST handwritten digits dataset

  10. Data from: MNIST Handwritten Digits Dataset

    • kaggle.com
    zip
    Updated May 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ghanshyam Saini (2025). MNIST Handwritten Digits Dataset [Dataset]. https://www.kaggle.com/datasets/ghnshymsaini/mnist-handwritten-digits-dataset/versions/1
    Explore at:
    zip(29605861 bytes)Available download formats
    Dataset updated
    May 15, 2025
    Authors
    Ghanshyam Saini
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MNIST Handwritten Digits Dataset (Organized by Folder)

    This dataset provides the classic MNIST handwritten digits dataset, a foundational resource for image classification in machine learning. It contains a training set of 60,000 examples and a test set of 10,000 examples of grayscale images of handwritten digits (0 through 9).

    Dataset Structure:

    The uploaded data is organized within a main folder named mnist_png, which contains the following subfolders:

    • train: This folder contains the training set images. Upon navigating into the train folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders corresponds to a digit class (e.g., the folder named 0 contains images of the digit zero, the folder named 1 contains images of the digit one, and so on). The images within these subfolders are grayscale handwritten digit images in a common image format (e.g., PNG).

    • test: This folder contains the test set images. Similar to the train folder, upon navigating into the test folder, you will find 10 subfolders, named 0 through 9. Each of these subfolders contains the corresponding test images for that digit class.

    Content of the Data:

    Each image in the MNIST dataset is a 28x28 pixel grayscale image of a handwritten digit (0-9). The pixel values typically range from 0 (black) to 255 (white).

    How to Use This Dataset:

    1. Download the main MNIST folder (or the archive containing it) and extract its contents.
    2. Navigate into the mnist_png folder.
    3. The train and test subfolders contain the image data, organized by digit class. You can directly use this folder structure with image data loaders that support directory-based organization. The name of the subfolder will correspond to the digit label.
    4. The train folder provides the images you can use to train your machine learning models.
    5. The test folder provides a separate set of images that you can use to evaluate the performance of your trained models on unseen data.

    Citation:

    The MNIST dataset is a well-established resource. While there isn't a single definitive paper for the original creation of the dataset in this image format, it's often attributed to the work done at the University of Toronto and is a standard in the field. You can often cite it in the context of the specific papers or implementations you are referencing that utilize it.

    Data Contribution:

    Thank you for downloading this image-based organization of the MNIST dataset. By structuring the images into class-specific folders within the train and test directories, I aim to provide a user-friendly format for those working on handwritten digit recognition tasks. This structure aligns well with many image data loading utilities and workflows.

    If you find this folder structure clear, well-organized, and useful for your projects, please consider giving it an upvote after downloading. Your feedback and appreciation are valuable and encourage further contributions to the Kaggle community. Thank you!

  11. mnist+context

    • figshare.com
    application/x-gzip
    Updated May 30, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Doris Voina; Eric Shea-Brown; Stefan Mihalas (2021). mnist+context [Dataset]. http://doi.org/10.6084/m9.figshare.14703639.v1
    Explore at:
    application/x-gzipAvailable download formats
    Dataset updated
    May 30, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Doris Voina; Eric Shea-Brown; Stefan Mihalas
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset consists of MNIST digits set on either noisy backgrounds, with noise a Gaussian random variable, or MNIST digits set on a more naturalistic background from the CIFAR-10 dataset. A parameter that makes the task more difficult is digit transparency; as we increase transparency, the background interferes with the digit and the identity of the digit becomes more ambiguous. The goal is to perform image classification so that neural networks (NN) correctly identify the MNIST digit despite the different backgrounds.

  12. Hindi/Devanagari MNIST Data

    • kaggle.com
    zip
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag Shenoy (2025). Hindi/Devanagari MNIST Data [Dataset]. https://www.kaggle.com/datasets/anurags397/hindi-mnist-data
    Explore at:
    zip(18064821 bytes)Available download formats
    Dataset updated
    Mar 18, 2025
    Authors
    Anurag Shenoy
    Description

    Context

    Handwritten image data is easy to find in languages such as English and Japanese, but not for many Indian languages including Hindi. While trying to create an MNIST like personal project, I stumbled upon a Hindi Handwritten characters dataset by Shailesh Acharya and Prashnna Kumar Gyawali, which is uploaded to the UCI Machine Learning Repository.

    This dataset however, only has the digits from 0 to 9, and all other characters have been removed.

    Content

    Data Type: GrayScale Image Image Format: PNG Resolution: 32 by 32 pixels Actual character is centered within 28 by 28 pixel, padding of 2 pixel is added on all four sides of actual character.

    There are ~1700 images per class in the Train set, and around ~300 images per class in the Test set.

    Acknowledgements

    The Dataset is ©️ Original Authors.

    Original Authors: - Shailesh Acharya - Prashnna Kumar Gyawali

    Citation: S. Acharya, A.K. Pant and P.K. Gyawali “**Deep Learning Based Large Scale Handwritten Devanagari Character Recognition**”, In Proceedings of the 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), pp. 121-126, 2015.

    The full Dataset is available here: https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset

  13. T

    moving_mnist

    • tensorflow.org
    • opendatalab.com
    Updated Nov 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). moving_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/moving_mnist
    Explore at:
    Dataset updated
    Nov 23, 2022
    Description

    Moving variant of MNIST database of handwritten digits. This is the data used by the authors for reporting model performance. See tfds.video.moving_mnist.image_as_moving_sequence for generating training/validation data from the MNIST dataset.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('moving_mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

  14. mnist.pkl.gz

    • figshare.com
    application/gzip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yann LeCun (2023). mnist.pkl.gz [Dataset]. http://doi.org/10.6084/m9.figshare.13303457.v1
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Yann LeCun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MNIST dataset originally hosted on https://deeplearning.net, re-hosted here because deeplearning.net is currently inaccessible.

  15. MNIST as PNG

    • kaggle.com
    zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ben Gorman (2024). MNIST as PNG [Dataset]. https://www.kaggle.com/datasets/ben519/mnist-as-png
    Explore at:
    zip(32971223 bytes)Available download formats
    Dataset updated
    Jul 17, 2024
    Authors
    Ben Gorman
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    [MNIST](https://en.wikipedia.org/wiki/MNIST_database#:~:text=The%20MNIST%20database%20(Modified%20National,training%20various%20image%20processing%20systems.) data in PNG format, derived directly from MNIST in CSV.

    The data contains 60,000 labelled train samples and 10,000 labelled test samples. Each sample is a 28x28 grayscale PNG image.

    Unzipped directory structure 👇

    test/
     0/
      test_image_3.png
      test_image_10.png
      test_image_13.png
      ...
     1/
      test_image_2.png
      test_image_5.png
      test_image_14.png
      ...
     ...
     9/
    
    train/
     0/
      train_image_1.png
      train_image_21.png
      train_image_34.png
      ...
     1/
     ...
     9/
    

    Data collection script

    import pandas as pd
    from PIL import Image
    
    mnist_train = pd.read_csv("mnist-csv/mnist_train.csv")
    mnist_test = pd.read_csv("mnist-csv/mnist_test.csv")
    
    for i in range(10):
    
      # Convert the training data to png
      train_i = mnist_train.loc[mnist_train.label == i]
      for index, row in train_i.iterrows():
        X = row[1:].to_numpy().reshape(28, 28)
        filepath = (
          f"mnist-png/train/{i}/train_image_{index}.png"
        )
        img = Image.fromarray(X.astype("uint8"), mode="L")
        img.save(filepath)
    
      # Convert the test data to png
      test_i = mnist_test.loc[mnist_test.label == i]
      for index, row in test_i.iterrows():
        X = row[1:].to_numpy().reshape(28, 28)
        filepath = f"mnist-png/test/{i}/test_image_{index}.png"
        img = Image.fromarray(X.astype("uint8"), mode="L")
        img.save(filepath)
    
  16. MNIST Original

    • kaggle.com
    zip
    Updated Aug 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avnish (2018). MNIST Original [Dataset]. https://www.kaggle.com/datasets/avnishnish/mnist-original/data
    Explore at:
    zip(11408921 bytes)Available download formats
    Dataset updated
    Aug 12, 2018
    Authors
    Avnish
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Content

    MNIST dataset, which is a set of 70,000 small images of digits handwritten by high school students and employees of the US Census Bureau. Each image is labeled with the digit it represents. This set has been studied so much that it is often called the “Hello World” of Machine Learning

    Inspiration

    Test classification algorithms on this dataset and find out which one predicts the best

  17. T

    kmnist

    • tensorflow.org
    • datasets.activeloop.ai
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). kmnist [Dataset]. https://www.tensorflow.org/datasets/catalog/kmnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    Kuzushiji-MNIST is a drop-in replacement for the MNIST dataset (28x28 grayscale, 70,000 images), provided in the original MNIST format as well as a NumPy format. Since MNIST restricts us to 10 classes, we chose one character to represent each of the 10 rows of Hiragana when creating Kuzushiji-MNIST.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('kmnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/kmnist-3.0.1.png" alt="Visualization" width="500px">

  18. f

    Data_Sheet_1_Is Neuromorphic MNIST Neuromorphic? Analyzing the...

    • frontiersin.figshare.com
    • figshare.com
    pdf
    Updated Jun 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laxmi R. Iyer; Yansong Chua; Haizhou Li (2023). Data_Sheet_1_Is Neuromorphic MNIST Neuromorphic? Analyzing the Discriminative Power of Neuromorphic Datasets in the Time Domain.PDF [Dataset]. http://doi.org/10.3389/fnins.2021.608567.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 6, 2023
    Dataset provided by
    Frontiers
    Authors
    Laxmi R. Iyer; Yansong Chua; Haizhou Li
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A major characteristic of spiking neural networks (SNNs) over conventional artificial neural networks (ANNs) is their ability to spike, enabling them to use spike timing for coding and efficient computing. In this paper, we assess if neuromorphic datasets recorded from static images are able to evaluate the ability of SNNs to use spike timings in their calculations. We have analyzed N-MNIST, N-Caltech101 and DvsGesture along these lines, but focus our study on N-MNIST. First we evaluate if additional information is encoded in the time domain in a neuromorphic dataset. We show that an ANN trained with backpropagation on frame-based versions of N-MNIST and N-Caltech101 images achieve 99.23 and 78.01% accuracy. These are comparable to the state of the art—showing that an algorithm that purely works on spatial data can classify these datasets. Second we compare N-MNIST and DvsGesture on two STDP algorithms, RD-STDP, that can classify only spatial data, and STDP-tempotron that classifies spatiotemporal data. We demonstrate that RD-STDP performs very well on N-MNIST, while STDP-tempotron performs better on DvsGesture. Since DvsGesture has a temporal dimension, it requires STDP-tempotron, while N-MNIST can be adequately classified by an algorithm that works on spatial data alone. This shows that precise spike timings are not important in N-MNIST. N-MNIST does not, therefore, highlight the ability of SNNs to classify temporal data. The conclusions of this paper open the question—what dataset can evaluate SNN ability to classify temporal data?

  19. p

    MNISQ data for quantum computing

    • pennylane.ai
    Updated Aug 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Leonardo Placidi; Ryuichiro Hataya; Toshio Mori; Koki Aoyama; Hayata Morisaki; Kosuke Mitarai; Keisuke Fujii (2023). MNISQ data for quantum computing [Dataset]. https://pennylane.ai/datasets/mnisq
    Explore at:
    Dataset updated
    Aug 30, 2023
    Authors
    Leonardo Placidi; Ryuichiro Hataya; Toshio Mori; Koki Aoyama; Hayata Morisaki; Kosuke Mitarai; Keisuke Fujii
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Measurement technique
    Simulation
    Dataset funded by
    Xanadu Quantum Technologies
    Description

    This dataset contains a portion of MNISQ: a dataset of quantum circuits that encode data from MNIST, Fashion-MNIST, and Kuzushiji-MNIST.

  20. Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, zip
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. http://doi.org/10.5281/zenodo.6632105
    Explore at:
    bin, zip, jsonAvailable download formats
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hojjat Khodabakhsh (2019). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/hojjatk/mnist-dataset
Organization logo

MNIST Dataset

The MNIST database of handwritten digits (http://yann.lecun.com)

Explore at:
124 scholarly articles cite this dataset (View in Google Scholar)
zip(23112702 bytes)Available download formats
Dataset updated
Jan 8, 2019
Authors
Hojjat Khodabakhsh
Description

Context

MNIST is a subset of a larger set available from NIST (it's copied from http://yann.lecun.com/exdb/mnist/)

Content

The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. . Four files are available:

  • train-images-idx3-ubyte.gz: training set images (9912422 bytes)
  • train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
  • t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
  • t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

How to read

See sample MNIST reader

Acknowledgements

  • Yann LeCun, Courant Institute, NYU
  • Corinna Cortes, Google Labs, New York
  • Christopher J.C. Burges, Microsoft Research, Redmond

Inspiration

Many methods have been tested with this training set and test set (see http://yann.lecun.com/exdb/mnist/ for more details)

Search
Clear search
Close search
Google apps
Main menu