93 datasets found
  1. P

    MNIST Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Y. LeCun; L. Bottou; Y. Bengio; P. Haffner, MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/mnist
    Explore at:
    Authors
    Y. LeCun; L. Bottou; Y. Bengio; P. Haffner
    Description

    The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

  2. T

    mnist

    • tensorflow.org
    • universe.roboflow.com
    • +3more
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    The MNIST database of handwritten digits.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/mnist-3.0.1.png" alt="Visualization" width="500px">

  3. MNIST Digits dataset

    • kaggle.com
    Updated Apr 24, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhi (2018). MNIST Digits dataset [Dataset]. https://www.kaggle.com/datasets/abhi10aug2017/mnist-digits-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 24, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Abhi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Abhi

    Released under CC0: Public Domain

    Contents

  4. t

    MNIST dataset for handwritten digits - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MNIST dataset for handwritten digits - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-dataset-for-handwritten-digits
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The MNIST dataset is a collection of images of handwritten digits, with size n = 70,000 and D = 784.

  5. a

    not-MNIST

    • datasets.activeloop.ai
    • opendatalab.com
    • +3more
    deeplake
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaroslav Bulatov (2022). not-MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/not-mnist-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 11, 2022
    Authors
    Yaroslav Bulatov
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The not-MNIST dataset is a dataset of handwritten digits. It is a challenging dataset that can be used for machine learning and artificial intelligence research. The dataset consists of 100,000 images of handwritten digits. The images are divided into a training set of 60,000 images and a test set of 40,000 images. The images are drawn from a variety of fonts and styles, making them more challenging than the MNIST dataset. The images are 28x28 pixels in size and are grayscale. The dataset is available under the Creative Commons Zero Public Domain Dedication license.

  6. r

    Extended MNIST (EMNIST) dataset

    • researchdata.edu.au
    Updated May 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    van Schaik Andre; Tapson Jonathan; Afshar Saeed; Cohen Gregory (2023). Extended MNIST (EMNIST) dataset [Dataset]. http://doi.org/10.26183/ZN7S-GH79
    Explore at:
    Dataset updated
    May 16, 2023
    Dataset provided by
    Western Sydney University
    Authors
    van Schaik Andre; Tapson Jonathan; Afshar Saeed; Cohen Gregory
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 (https://www.nist.gov/srd/nist-special-database-19) and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset (http://yann.lecun.com/exdb/mnist/). Further information on the dataset contents and conversion process can be found in the paper available at https://arxiv.org/abs/1702.05373v2

    The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits.

    The database is made available in original MNIST format and Matlab format.

  7. Z

    Image classification in Galaxy with MNIST handwritten digits dataset

    • data.niaid.nih.gov
    Updated Aug 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaivan Kamali (2022). Image classification in Galaxy with MNIST handwritten digits dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4697905
    Explore at:
    Dataset updated
    Aug 4, 2022
    Dataset authored and provided by
    Kaivan Kamali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Credit: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998

    This is a subset of MNIST handwritten digits dataset (http://yann.lecun.com/exdb/mnist/). Training data of composed of 12,000 images of digits 0 to 9. Test data is composed of 6,000 images of digits 0 to 9 (Original dataset has 60,000 training and 10,000 testing images. We are using a subset for a Galaxy tutorial, so the training is not too computationally intensive). Images are grayscale and 28 by 28 pixels. Each pixel has a value between 0 and 255 (0 for color black, 255 for color white, and all other values for different shades of gray).

  8. Hindi/Devanagari MNIST Data

    • kaggle.com
    Updated Mar 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anurag S (2025). Hindi/Devanagari MNIST Data [Dataset]. https://www.kaggle.com/datasets/anurags397/hindi-mnist-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 18, 2025
    Dataset provided by
    Kaggle
    Authors
    Anurag S
    Description

    Context

    Handwritten image data is easy to find in languages such as English and Japanese, but not for many Indian languages including Hindi. While trying to create an MNIST like personal project, I stumbled upon a Hindi Handwritten characters dataset by Shailesh Acharya and Prashnna Kumar Gyawali, which is uploaded to the UCI Machine Learning Repository.

    This dataset however, only has the digits from 0 to 9, and all other characters have been removed.

    Content

    Data Type: GrayScale Image Image Format: PNG Resolution: 32 by 32 pixels Actual character is centered within 28 by 28 pixel, padding of 2 pixel is added on all four sides of actual character.

    There are ~1700 images per class in the Train set, and around ~300 images per class in the Test set.

    Acknowledgements

    The Dataset is ©️ Original Authors.

    Original Authors: - Shailesh Acharya - Prashnna Kumar Gyawali

    Citation: S. Acharya, A.K. Pant and P.K. Gyawali “**Deep Learning Based Large Scale Handwritten Devanagari Character Recognition**”, In Proceedings of the 9th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), pp. 121-126, 2015.

    The full Dataset is available here: https://archive.ics.uci.edu/ml/datasets/Devanagari+Handwritten+Character+Dataset

  9. Telegram digits dataset

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Franco Lianza; Franco Lianza (2025). Telegram digits dataset [Dataset]. http://doi.org/10.5281/zenodo.8204338
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Franco Lianza; Franco Lianza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is MNIST-like, containing digitized handwritten characters extracted from electoral telegrams during the General Elections of Santa Fe, Argentina, in the year 2021. The dataset offers a valuable resource for researchers and practitioners in the field of character recognition, particularly in the context of electoral data analysis. Each sample in the dataset represents a single digit, ranging from 0 to 9, handwritten by different individuals participating in the electoral process. The dataset aims to facilitate the development and evaluation of machine learning and computer vision algorithms for character recognition tasks.

    It contains 170718 images, splitted in train (119502), validation (25608) and test (25608).

    This dataset is part of master's thesis in data science which aims to build an Optical Character Recognition (OCR) system using domain adaptation techniques titled "Classification of digits written in the telegrams of legislative elections in Santa Fe using Domain Adaptation techniques".

  10. Data from: Written and spoken digits database for multimodal learning

    • zenodo.org
    • explore.openaire.eu
    • +1more
    bin
    Updated Jan 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond (2021). Written and spoken digits database for multimodal learning [Dataset]. http://doi.org/10.5281/zenodo.4452953
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database description:

    The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion [1].

    The written digits database is the original MNIST handwritten digits database [2] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.

    The spoken digits database was extracted from Google Speech Commands [3], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Frequency Cepstral Coefficients (MFCC) with a framing window size of 50 ms and frame shift size of 25 ms. Since the speech samples are approximately 1 s long, we end up with 39 time slots. For each one, we extract 12 MFCC coefficients with an additional energy coefficient. Thus, we have a final vector of 39 x 13 = 507 dimensions. Standardization and normalization were applied on the MFCC features.

    To construct the multimodal digits dataset, we associated written and spoken digits of the same class respecting the initial partitioning in [2] and [3] for the training and test subsets. Since we have less samples for the spoken digits, we duplicated some random samples to match the number of written digits and have a multimodal digits database of 70000 samples (60000 for training and 10000 for test).

    The dataset is provided in six files as described below. Therefore, if a shuffle is performed on the training or test subsets, it must be performed in unison with the same order for the written digits, spoken digits and labels.

    Files:

    • data_wr_train.npy: 60000 samples of 784-dimentional written digits for training;
    • data_sp_train.npy: 60000 samples of 507-dimentional spoken digits for training;
    • labels_train.npy: 60000 labels for the training subset;
    • data_wr_test.npy: 10000 samples of 784-dimentional written digits for test;
    • data_sp_test.npy: 10000 samples of 507-dimentional spoken digits for test;
    • labels_test.npy: 10000 labels for the test subset.

    References:

    1. Khacef, L. et al. (2020), "Brain-Inspired Self-Organization with Cellular Neuromorphic Computing for Multimodal Unsupervised Learning".
    2. LeCun, Y. & Cortes, C. (1998), “MNIST handwritten digit database”.
    3. Warden, P. (2018), “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”.
  11. t

    MNIST Fashion and MNIST Handwritten digits - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MNIST Fashion and MNIST Handwritten digits - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-fashion-and-mnist-handwritten-digits
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in the paper is MNIST Fashion and MNIST Handwritten digits.

  12. MNIST Dataset

    • kaggle.com
    Updated Mar 26, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saba Hesaraki (2023). MNIST Dataset [Dataset]. https://www.kaggle.com/datasets/sabahesaraki/mnist-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saba Hesaraki
    Description

    The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal effort on preprocessing and formatting.

    Four files are available on this site: train-images-idx3-ubyte.gz: training set images (9912422 bytes) train-labels-idx1-ubyte.gz: training set labels (28881 bytes) t10k-images-idx3-ubyte.gz: test set images (1648877 bytes) t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)

  13. Mnist 42000 Images Dataset

    • universe.roboflow.com
    zip
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2023). Mnist 42000 Images Dataset [Dataset]. https://universe.roboflow.com/roboflow-jvuqo/mnist-42000-images-u0qdg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2023
    Dataset authored and provided by
    Roboflowhttps://roboflow.com/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Numbers
    Description

    The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

    Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond

  14. Wildlife MNIST

    • zenodo.org
    • data.niaid.nih.gov
    bin, png
    Updated Jul 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vít Škvára; Vít Škvára (2024). Wildlife MNIST [Dataset]. http://doi.org/10.5281/zenodo.7602025
    Explore at:
    png, binAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Vít Škvára; Vít Škvára
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Wildlife MNIST dataset contains MNIST digits with colored backgrounds and foregrounds with annotations, suitable for benchmarking disentangling or factor identification. Originally used for the project https://github.com/vitskvara/sgad. There are two versions - non-mixed and mixed. In the non-mixed version (data.npy and label.npy), the background and foreground textures are the same for all digits of a single MNIST class, therefore only a single label describes each sample. In the mixed version (data_test.npy and labels_test.npy), each sample image has a random digit, background and foreground (out of 10 classes for each factor of variation). Then, the label is a tuple of three numbers, describing the individual (digit,background,foreground) labels. Note that the data is scaled to the interval [-1,1], so rescaling them by computing "x*0.5 + 0.5" is necessary for some applications that require them to be in the interval [0,1]. Example images from both versions of the dataset are included. Note that the dataset was originally used in "Sauer, Axel, and Andreas Geiger. Counterfactual generative networks. 2021."

  15. r

    Data from: EMNIST: an extension of MNIST to handwritten letters

    • researchdata.edu.au
    Updated May 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    van Schaik Andre; Tapson Jonathan; Afshar Saeed; Cohen Gregory (2023). EMNIST: an extension of MNIST to handwritten letters [Dataset]. http://doi.org/10.26183/M9K1-ZR06
    Explore at:
    Dataset updated
    May 16, 2023
    Dataset provided by
    Western Sydney University
    Authors
    van Schaik Andre; Tapson Jonathan; Afshar Saeed; Cohen Gregory
    Description

    The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits.

    A Read Me file describing the database is included in the available attachments.
    Note: The available zip files are each > 500MB in size. Should these files become unavailable from the website provided, please contact Western Sydney University Library about this record.

  16. h

    mnist100

    • huggingface.co
    Updated Aug 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marcin Wierzbiński (2023). mnist100 [Dataset]. https://huggingface.co/datasets/marcin119a/mnist100
    Explore at:
    Dataset updated
    Aug 16, 2023
    Authors
    Marcin Wierzbiński
    License

    https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/

    Description

    The MNIST-100 dataset is a variation of the original MNIST dataset, consisting of 100 handwritten numbers extracted from the MNIST dataset. Unlike the traditional MNIST dataset, which contains 60,000 training images of digits from 0 to 9, the Modified MNIST-10 dataset focuses on 100 numbers. Dataset Overview:

    Dataset Name: MNIST-100 Total Number of Images: train: 60000 test: 1000 Classes: 100 (Numbers from 00 to 99) Image Size: 28x56 pixels (grayscale)

    Data Collection: The MNIST-100 dataset… See the full description on the dataset page: https://huggingface.co/datasets/marcin119a/mnist100.

  17. t

    MNIST digits

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MNIST digits [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-digits
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The MNIST digits dataset used in the one-shot learning experiments.

  18. mnistdata

    • kaggle.com
    Updated Nov 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    colormap (2020). mnistdata [Dataset]. https://www.kaggle.com/datasets/colormap/mnistdata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 10, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    colormap
    Description

    How to load? train_data = np.loadtxt('/kaggle/input/mnistdata/mnist_train_images', dtype=np.uint16) train_labels = np.loadtxt('/kaggle/input/mnistdata/mnist_train_labels', dtype=np.uint8) test_data = np.loadtxt('/kaggle/input/mnistdata/mnist_test_images', dtype=np.uint16) test_labels = np.loadtxt('/kaggle/input/mnistdata/mnist_test_labels', dtype=np.uint8)

  19. p

    Downscaled MNIST data for quantum computing

    • pennylane.ai
    Updated Sep 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Bowles; Shahnawaz Ahmed; Maria Schuld (2024). Downscaled MNIST data for quantum computing [Dataset]. https://pennylane.ai/datasets/downscaled-mnist
    Explore at:
    Dataset updated
    Sep 8, 2024
    Authors
    Joseph Bowles; Shahnawaz Ahmed; Maria Schuld
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Measurement technique
    Simulation
    Dataset funded by
    Xanadu Quantum Technologieshttps://xanadu.ai/
    Description

    This dataset contains a simplified version of the famous MNIST handwritten digits dataset. This version involves distinguishing between digits 3 and 5 rather than the full range 0-9.

  20. MNIST-Federated-Learning

    • zenodo.org
    csv, zip
    Updated Jul 3, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ferraguig Lynda; Ferraguig Lynda; Benoit Alexandre; Benoit Alexandre; Bettinelli Mickael; Bettinelli Mickael; Lin-Kwong-Chon Christophe; Lin-Kwong-Chon Christophe (2023). MNIST-Federated-Learning [Dataset]. http://doi.org/10.5281/zenodo.8104408
    Explore at:
    csv, zipAvailable download formats
    Dataset updated
    Jul 3, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ferraguig Lynda; Ferraguig Lynda; Benoit Alexandre; Benoit Alexandre; Bettinelli Mickael; Bettinelli Mickael; Lin-Kwong-Chon Christophe; Lin-Kwong-Chon Christophe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please find below the descriptions of the three configurations for partitioning the MNIST Train dataset into 10 clients and the MNIST Train data:

    1. Balanced Distribution: In the first configuration, the MNIST dataset is partitioned among 10 clients in a balanced manner. This means that the data samples from each class are evenly distributed among the clients. Each client receives a roughly equal number of images from each digit class, ensuring that the distribution of samples across clients is proportional and representative of the overall dataset. [ Config 1]
    2. Heterogeneous Distribution (One Class per Client): In the second configuration, the MNIST dataset is partitioned in a heterogeneous manner, where each client is assigned a single digit class exclusively. This means that one client will only receive images of the digit '0', another client will receive images of the digit '1', and so on. In this setup, each client becomes an expert in classifying a specific digit, allowing for specialized training and evaluation. [ Config 2]
    3. Mixed Distribution: In the third configuration, the MNIST dataset is partitioned using a mixed distribution approach. This means that the data samples from all digit classes are distributed among the 10 clients, but the distribution is not necessarily balanced. The number of samples assigned to each client may vary for different digit classes, resulting in an uneven distribution across the clients. This configuration aims to capture both the overall diversity of the dataset and the varying difficulty levels of classifying different digits. [ Config 3 ]

    Mnist-dataset/
    ├── config1/
    │ ├── client-1/
    │ │ └── data.csv
    │ ├── client-2/
    │ │ └── data.csv
    │ ├── client-3/
    │ │ └── data.csv
    │ └── ...
    ├── config2/
    │ ├── client-1/
    │ │ └── data.csv
    │ ├── client-2/
    │ │ └── data.csv
    │ ├── client-3/
    │ │ └── data.csv
    │ └── ...
    ├── config3/
    │ ├── client-1/
    │ │ └── data.csv
    │ ├── client-2/
    │ │ └── data.csv
    │ ├── client-3/
    │ │ └── data.csv
    │ └── ...
    └── mnist_test.csv

    ***

    License: Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset, which is a derivative work from original NIST datasets. MNIST dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license.

    ***

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Y. LeCun; L. Bottou; Y. Bengio; P. Haffner, MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/mnist

MNIST Dataset

Explore at:
21 scholarly articles cite this dataset (View in Google Scholar)
Authors
Y. LeCun; L. Bottou; Y. Bengio; P. Haffner
Description

The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

Search
Clear search
Close search
Google apps
Main menu