100+ datasets found
  1. h

    mnist

    • huggingface.co
    • opendatalab.com
    • +5more
    Updated Jan 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yann LeCun (2021). mnist [Dataset]. https://huggingface.co/datasets/ylecun/mnist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 13, 2021
    Authors
    Yann LeCun
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset Card for MNIST

      Dataset Summary
    

    The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

  2. Mnist 42000 Images Dataset

    • universe.roboflow.com
    zip
    Updated Apr 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2023). Mnist 42000 Images Dataset [Dataset]. https://universe.roboflow.com/roboflow-jvuqo/mnist-42000-images-u0qdg
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 25, 2023
    Dataset authored and provided by
    Roboflow
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Numbers
    Description

    The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

    Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond

  3. h

    mnist-text-default

    • huggingface.co
    Updated Feb 22, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Fraser Greenlee (2021). mnist-text-default [Dataset]. https://huggingface.co/datasets/Fraser/mnist-text-default
    Explore at:
    Dataset updated
    Feb 22, 2021
    Authors
    Fraser Greenlee
    Description

    MNIST dataset adapted to a text-based representation.

    This allows testing interpolation quality for Transformer-VAEs.

    System is heavily inspired by Matthew Rayfield's work https://youtu.be/Z9K3cwSL6uM

    Works by quantising each MNIST pixel into one of 64 characters. Every sample has an up & down version to encourage the model to learn rotation invarient features.

    Use .array_to_text( and .text_to_array( methods to test your generated data.

    Data format: - text: (30 x 28 tokens, 840 tokens total): Textual representation of MNIST digit, for example: 00 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 01 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 02 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 03 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 04 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 05 down ! ! ! ! ! ! ! ! ! ! ! ! ! % % % @ C L ' J a ^ @ ! ! ! ! 06 down ! ! ! ! ! ! ! ! ( * 8 G K ` ` ` ` ` Y L ` ] Q 1 ! ! ! ! 07 down ! ! ! ! ! ! ! - \ ` ` ` ` ` ` ` ` _ 8 5 5 / * ! ! ! ! ! 08 down ! ! ! ! ! ! ! % W ` ` ` ` ` R N ^ ] ! ! ! ! ! ! ! ! ! ! 09 down ! ! ! ! ! ! ! ! 5 H ; ` ` T # ! + G ! ! ! ! ! ! ! ! ! ! 10 down ! ! ! ! ! ! ! ! ! $ ! G ` 7 ! ! ! ! ! ! ! ! ! ! ! ! ! ! 11 down ! ! ! ! ! ! ! ! ! ! ! C ` P ! ! ! ! ! ! ! ! ! ! ! ! ! ! 12 down ! ! ! ! ! ! ! ! ! ! ! # P ` 2 ! ! ! ! ! ! ! ! ! ! ! ! ! 13 down ! ! ! ! ! ! ! ! ! ! ! ! ) ] Y I < ! ! ! ! ! ! ! ! ! ! ! 14 down ! ! ! ! ! ! ! ! ! ! ! ! ! 5 ] ` ` > ' ! ! ! ! ! ! ! ! ! 15 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! , O ` ` F ' ! ! ! ! ! ! ! ! 16 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! % 8 ` ` O ! ! ! ! ! ! ! ! 17 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! _ ` _ 1 ! ! ! ! ! ! ! 18 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! , A N ` ` T ! ! ! ! ! ! ! ! 19 down ! ! ! ! ! ! ! ! ! ! ! ! * F Z ` ` ` _ N ! ! ! ! ! ! ! ! 20 down ! ! ! ! ! ! ! ! ! ! ' = X ` ` ` ` S 4 ! ! ! ! ! ! ! ! ! 21 down ! ! ! ! ! ! ! ! & 1 V ` ` ` ` R 5 ! ! ! ! ! ! ! ! ! ! ! 22 down ! ! ! ! ! ! % K W ` ` ` ` Q 5 # ! ! ! ! ! ! ! ! ! ! ! ! 23 down ! ! ! ! . L Y ` ` ` ` ^ B # ! ! ! ! ! ! ! ! ! ! ! ! ! ! 24 down ! ! ! ! C ` ` ` V B B % ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 25 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 26 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 27 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! - label: Just a number with the texts matching label.

  4. R

    Data from: Fashion Mnist Dataset

    • universe.roboflow.com
    • opendatalab.com
    • +4more
    zip
    Updated Aug 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Popular Benchmarks (2022). Fashion Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/fashion-mnist-ztryt/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 10, 2022
    Dataset authored and provided by
    Popular Benchmarks
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Clothing
    Description

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Authors:

    Dataset Obtained From: https://github.com/zalandoresearch/fashion-mnist

    All images were sized 28x28 in the original dataset

    Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. * Source

    Here's an example of how the data looks (each class takes three-rows): https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png" alt="Visualized Fashion MNIST dataset">

    Version 1 (original-images_Original-FashionMNIST-Splits):

    • Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.
    • This version was not trained

    Version 3 (original-images_trainSetSplitBy80_20):

    • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
    • https://blog.roboflow.com/train-test-split/ https://i.imgur.com/angfheJ.png" alt="Train/Valid/Test Split Rebalancing">

    Citation:

    @online{xiao2017/online,
     author    = {Han Xiao and Kashif Rasul and Roland Vollgraf},
     title    = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},
     date     = {2017-08-28},
     year     = {2017},
     eprintclass = {cs.LG},
     eprinttype  = {arXiv},
     eprint    = {cs.LG/1708.07747},
    }
    
  5. Z

    Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086
    Explore at:
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Schürholt, Konstantin
    Borth, Damian
    Giró-i-Nieto, Xavier
    Knyazev, Boris
    Taskiran, Diyar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  6. a

    not-MNIST

    • datasets.activeloop.ai
    • opendatalab.com
    • +3more
    deeplake
    Updated Mar 11, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yaroslav Bulatov (2022). not-MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/not-mnist-dataset/
    Explore at:
    deeplakeAvailable download formats
    Dataset updated
    Mar 11, 2022
    Authors
    Yaroslav Bulatov
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The not-MNIST dataset is a dataset of handwritten digits. It is a challenging dataset that can be used for machine learning and artificial intelligence research. The dataset consists of 100,000 images of handwritten digits. The images are divided into a training set of 60,000 images and a test set of 40,000 images. The images are drawn from a variety of fonts and styles, making them more challenging than the MNIST dataset. The images are 28x28 pixels in size and are grayscale. The dataset is available under the Creative Commons Zero Public Domain Dedication license.

  7. o

    mnist_784

    • openml.org
    Updated Sep 29, 2014
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yann LeCun; Corinna Cortes; Christopher J.C. Burges (2014). mnist_784 [Dataset]. https://www.openml.org/d/554
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 29, 2014
    Authors
    Yann LeCun; Corinna Cortes; Christopher J.C. Burges
    Description

    Author: Yann LeCun, Corinna Cortes, Christopher J.C. Burges
    Source: MNIST Website - Date unknown
    Please cite:

    The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples

    It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

    With some classification methods (particularly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications. The MNIST database was constructed from NIST's NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

    The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint. SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.

  8. P

    Data from: MNIST Large Scale dataset Dataset

    • paperswithcode.com
    Updated Jun 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ylva Jansson; Tony Lindeberg (2021). MNIST Large Scale dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-large-scale-dataset
    Explore at:
    Dataset updated
    Jun 10, 2021
    Authors
    Ylva Jansson; Tony Lindeberg
    Description

    The MNIST Large Scale dataset is based on the classic MNIST dataset, but contains large scale variations up to a factor of 16. The motivation behind creating this dataset was to enable testing the ability of different algorithms to learn in the presence of large scale variability and specifically the ability to generalise to new scales not present in the training set over wide scale ranges.

    The dataset contains training data for each one of the relative size factors 1, 2 and 4 relative to the original MNIST dataset and testing data for relative scaling factors between 1/2 and 8, with a ratio of $\sqrt[4]{2}$ between adjacent scales.

  9. P

    MNIST-MIX Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Apr 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MNIST-MIX Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-mix
    Explore at:
    Dataset updated
    Apr 7, 2020
    Authors
    Weiwei Jiang
    Description

    MNIST-MIX is a multi-language handwritten digit recognition dataset. It contains digits from 10 different languages.

  10. P

    MNIST-1D Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Nov 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sam Greydanus; Dmitry Kobak (2024). MNIST-1D Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-1d
    Explore at:
    Dataset updated
    Nov 26, 2024
    Authors
    Sam Greydanus; Dmitry Kobak
    Description

    A minimalist, low-memory, and low-compute alternative to classic deep learning benchmarks. The training examples are 20 times smaller than MNIST examples yet they differentiate more clearly between linear, nonlinear, and convolutional models which attain 32, 68, and 94% accuracy respectively (these models obtain 94, 99+, and 99+% on MNIST).

  11. h

    mnist

    • huggingface.co
    Updated Dec 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katsuhiko Yamamoto (2024). mnist [Dataset]. https://huggingface.co/datasets/kyama0321/mnist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2024
    Authors
    Katsuhiko Yamamoto
    Description

    kyama0321/mnist dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. T

    binarized_mnist

    • tensorflow.org
    Updated Jun 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). binarized_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/binarized_mnist
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    A specific binarization of the MNIST images originally used in (Salakhutdinov & Murray, 2008). This dataset is frequently used to evaluate generative models of images, so labels are not provided.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('binarized_mnist', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/binarized_mnist-1.0.0.png" alt="Visualization" width="500px">

  13. h

    mnist1d

    • huggingface.co
    Updated Oct 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mnist1d [Dataset]. https://huggingface.co/datasets/christopher/mnist1d
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 9, 2024
    Authors
    Christopher Akiki
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset card is based on the README file of the authors' GitHub repository: https://github.com/greydanus/mnist1d

      The MNIST-1D Dataset
    

    Most machine learning models get around the same ~99% test accuracy on MNIST. The MNIST-1D dataset is 100x smaller (default sample size: 4000+1000; dimensionality: 40) and does a better job of separating between models with/without nonlinearity and models with/without spatial inductive biases. MNIST-1D is a core teaching dataset in Simon… See the full description on the dataset page: https://huggingface.co/datasets/christopher/mnist1d.

  14. t

    MNIST database of handwritten digits - Dataset - LDM

    • service.tib.eu
    Updated Dec 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). MNIST database of handwritten digits - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-database-of-handwritten-digits
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The MNIST handwritten digit database is a dataset of 60,000 training and 10,000 test examples of handwritten digit images.

  15. P

    Dirty-MNIST Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Jan 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jishnu Mukhoti; Andreas Kirsch; Joost van Amersfoort; Philip H. S. Torr; Yarin Gal (2024). Dirty-MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/dirty-mnist
    Explore at:
    Dataset updated
    Jan 8, 2024
    Authors
    Jishnu Mukhoti; Andreas Kirsch; Joost van Amersfoort; Philip H. S. Torr; Yarin Gal
    Description

    DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. AmbiguousMNIST contains additional ambiguous digits with varying ambiguity. The AmbiguousMNIST test set contains 60k ambiguous samples as well.

    Additional Guidance

    DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. The current AmbiguousMNIST contains 6k unique samples with 10 labels each. This multi-label dataset gets flattened to 60k samples. The assumption is that ambiguous samples have multiple "valid" labels as they are ambiguous. MNIST samples are intentionally undersampled (in comparison), which benefits AL acquisition functions that can select unambiguous samples. Pick your initial training samples (for warm starting Active Learning) from the MNIST half of DirtyMNIST to avoid starting training with potentially very ambiguous samples, which might add a lot of variance to your experiments. Make sure to pick your validation set from the MNIST half as well, for the same reason as above. Make sure that your batch acquisition size is >= 10 (probably) given that there are 10 multi-labels per samples in Ambiguous-MNIST. By default, Gaussian noise with stddev 0.05 is added to each sample to prevent acquisition functions (in Active Learning) from cheating by disgarding "duplicates". If you want to split Ambiguous-MNIST into subsets (or Dirty-MNIST within the second ambiguous half), make sure to split by multiples of 10 to avoid splits within a flattened multi-label sample.

  16. Data from: Written and spoken digits database for multimodal learning

    • zenodo.org
    • data.niaid.nih.gov
    bin
    Updated Jan 21, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond (2021). Written and spoken digits database for multimodal learning [Dataset]. http://doi.org/10.5281/zenodo.4452953
    Explore at:
    binAvailable download formats
    Dataset updated
    Jan 21, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Database description:

    The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion [1].

    The written digits database is the original MNIST handwritten digits database [2] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.

    The spoken digits database was extracted from Google Speech Commands [3], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Frequency Cepstral Coefficients (MFCC) with a framing window size of 50 ms and frame shift size of 25 ms. Since the speech samples are approximately 1 s long, we end up with 39 time slots. For each one, we extract 12 MFCC coefficients with an additional energy coefficient. Thus, we have a final vector of 39 x 13 = 507 dimensions. Standardization and normalization were applied on the MFCC features.

    To construct the multimodal digits dataset, we associated written and spoken digits of the same class respecting the initial partitioning in [2] and [3] for the training and test subsets. Since we have less samples for the spoken digits, we duplicated some random samples to match the number of written digits and have a multimodal digits database of 70000 samples (60000 for training and 10000 for test).

    The dataset is provided in six files as described below. Therefore, if a shuffle is performed on the training or test subsets, it must be performed in unison with the same order for the written digits, spoken digits and labels.

    Files:

    • data_wr_train.npy: 60000 samples of 784-dimentional written digits for training;
    • data_sp_train.npy: 60000 samples of 507-dimentional spoken digits for training;
    • labels_train.npy: 60000 labels for the training subset;
    • data_wr_test.npy: 10000 samples of 784-dimentional written digits for test;
    • data_sp_test.npy: 10000 samples of 507-dimentional spoken digits for test;
    • labels_test.npy: 10000 labels for the test subset.

    References:

    1. Khacef, L. et al. (2020), "Brain-Inspired Self-Organization with Cellular Neuromorphic Computing for Multimodal Unsupervised Learning".
    2. LeCun, Y. & Cortes, C. (1998), “MNIST handwritten digit database”.
    3. Warden, P. (2018), “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”.
  17. R

    Mnist Project Dataset

    • universe.roboflow.com
    zip
    Updated May 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nabilah K S (2024). Mnist Project Dataset [Dataset]. https://universe.roboflow.com/nabilah-k-s/mnist-project/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset authored and provided by
    Nabilah K S
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Variables measured
    OBJECT DETECTION Bounding Boxes
    Description

    MNIST PROJECT

    ## Overview
    
    MNIST PROJECT is a dataset for object detection tasks - it contains OBJECT DETECTION annotations for 2,550 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
    
  18. MNIST32

    • kaggle.com
    Updated Feb 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    sbordt (2024). MNIST32 [Dataset]. https://www.kaggle.com/datasets/sbordt/mnist32
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 27, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    sbordt
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Description
  19. MNIST dataset

    • kaggle.com
    zip
    Updated Feb 20, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    JoyceWW (2023). MNIST dataset [Dataset]. https://www.kaggle.com/datasets/joyceww/mnist-dataset
    Explore at:
    zip(23112702 bytes)Available download formats
    Dataset updated
    Feb 20, 2023
    Authors
    JoyceWW
    Description

    Dataset

    This dataset was created by JoyceWW

    Contents

  20. Fashion-mnist_train

    • kaggle.com
    Updated Jul 11, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sanjeet Kumar Yadav (2018). Fashion-mnist_train [Dataset]. https://www.kaggle.com/sanjeet41/fashionmnist-train/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 11, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sanjeet Kumar Yadav
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." Zalando seeks to replace the original MNIST dataset

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Yann LeCun (2021). mnist [Dataset]. https://huggingface.co/datasets/ylecun/mnist

mnist

MNIST

ylecun/mnist

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2021
Authors
Yann LeCun
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for MNIST

  Dataset Summary

The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

Search
Clear search
Close search
Google apps
Main menu