100+ datasets found

h
mnist
huggingface.co
opendatalab.com
+5more
Updated Jan 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yann LeCun (2021). mnist [Dataset]. https://huggingface.co/datasets/ylecun/mnist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 13, 2021
Authors
Yann LeCun
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset Card for MNIST

Dataset Summary

The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.
Mnist 42000 Images Dataset
universe.roboflow.com
zip
Updated Apr 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow (2023). Mnist 42000 Images Dataset [Dataset]. https://universe.roboflow.com/roboflow-jvuqo/mnist-42000-images-u0qdg
Explore at:
zipAvailable download formats
Dataset updated
Apr 25, 2023
Dataset authored and provided by
Roboflow
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Numbers
Description
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.

Yann LeCun, Courant Institute, NYU Corinna Cortes, Google Labs, New York Christopher J.C. Burges, Microsoft Research, Redmond
h
mnist-text-default
huggingface.co
Updated Feb 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fraser Greenlee (2021). mnist-text-default [Dataset]. https://huggingface.co/datasets/Fraser/mnist-text-default
Explore at:
Dataset updated
Feb 22, 2021
Authors
Fraser Greenlee
Description
MNIST dataset adapted to a text-based representation.

This allows testing interpolation quality for Transformer-VAEs.

System is heavily inspired by Matthew Rayfield's work https://youtu.be/Z9K3cwSL6uM

Works by quantising each MNIST pixel into one of 64 characters. Every sample has an up & down version to encourage the model to learn rotation invarient features.

Use .array_to_text( and .text_to_array( methods to test your generated data.

Data format: - text: (30 x 28 tokens, 840 tokens total): Textual representation of MNIST digit, for example: 00 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 01 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 02 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 03 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 04 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 05 down ! ! ! ! ! ! ! ! ! ! ! ! ! % % % @ C L ' J a ^ @ ! ! ! ! 06 down ! ! ! ! ! ! ! ! ( * 8 G K ` ` ` ` ` Y L ` ] Q 1 ! ! ! ! 07 down ! ! ! ! ! ! ! - \ ` ` ` ` ` ` ` ` _ 8 5 5 / * ! ! ! ! ! 08 down ! ! ! ! ! ! ! % W ` ` ` ` ` R N ^ ] ! ! ! ! ! ! ! ! ! ! 09 down ! ! ! ! ! ! ! ! 5 H ; ` ` T # ! + G ! ! ! ! ! ! ! ! ! ! 10 down ! ! ! ! ! ! ! ! ! $ ! G ` 7 ! ! ! ! ! ! ! ! ! ! ! ! ! ! 11 down ! ! ! ! ! ! ! ! ! ! ! C ` P ! ! ! ! ! ! ! ! ! ! ! ! ! ! 12 down ! ! ! ! ! ! ! ! ! ! ! # P ` 2 ! ! ! ! ! ! ! ! ! ! ! ! ! 13 down ! ! ! ! ! ! ! ! ! ! ! ! ) ] Y I < ! ! ! ! ! ! ! ! ! ! ! 14 down ! ! ! ! ! ! ! ! ! ! ! ! ! 5 ] ` ` > ' ! ! ! ! ! ! ! ! ! 15 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! , O ` ` F ' ! ! ! ! ! ! ! ! 16 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! % 8 ` ` O ! ! ! ! ! ! ! ! 17 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! _ ` _ 1 ! ! ! ! ! ! ! 18 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! , A N ` ` T ! ! ! ! ! ! ! ! 19 down ! ! ! ! ! ! ! ! ! ! ! ! * F Z ` ` ` _ N ! ! ! ! ! ! ! ! 20 down ! ! ! ! ! ! ! ! ! ! ' = X ` ` ` ` S 4 ! ! ! ! ! ! ! ! ! 21 down ! ! ! ! ! ! ! ! & 1 V ` ` ` ` R 5 ! ! ! ! ! ! ! ! ! ! ! 22 down ! ! ! ! ! ! % K W ` ` ` ` Q 5 # ! ! ! ! ! ! ! ! ! ! ! ! 23 down ! ! ! ! . L Y ` ` ` ` ^ B # ! ! ! ! ! ! ! ! ! ! ! ! ! ! 24 down ! ! ! ! C ` ` ` V B B % ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 25 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 26 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 27 down ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! - label: Just a number with the texts matching label.
R
Data from: Fashion Mnist Dataset
universe.roboflow.com
opendatalab.com
+4more
zip
Updated Aug 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Popular Benchmarks (2022). Fashion Mnist Dataset [Dataset]. https://universe.roboflow.com/popular-benchmarks/fashion-mnist-ztryt/model/3
Explore at:
zipAvailable download formats
Dataset updated
Aug 10, 2022
Dataset authored and provided by
Popular Benchmarks
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Clothing
Description
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Authors:

Han Xiao, Kashif Rasul and Roland Vollgraf

https://arxiv.org/abs/1708.07747

Dataset Obtained From: https://github.com/zalandoresearch/fashion-mnist

All images were sized 28x28 in the original dataset

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. * Source

Here's an example of how the data looks (each class takes three-rows): https://github.com/zalandoresearch/fashion-mnist/raw/master/doc/img/fashion-mnist-sprite.png" alt="Visualized Fashion MNIST dataset">

Version 1 (original-images_Original-FashionMNIST-Splits):

Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.

This version was not trained

Version 3 (original-images_trainSetSplitBy80_20):

Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set

https://blog.roboflow.com/train-test-split/ https://i.imgur.com/angfheJ.png" alt="Train/Valid/Test Split Rebalancing">

Citation:

@online{xiao2017/online, author = {Han Xiao and Kashif Rasul and Roland Vollgraf}, title = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms}, date = {2017-08-28}, year = {2017}, eprintclass = {cs.LG}, eprinttype = {arXiv}, eprint = {cs.LG/1708.07747}, }
Z
Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...
data.niaid.nih.gov
zenodo.org
Updated Jun 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086
Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Schürholt, Konstantin
Borth, Damian
Giró-i-Nieto, Xavier
Knyazev, Boris
Taskiran, Diyar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
a
not-MNIST
datasets.activeloop.ai
opendatalab.com
+3more
deeplake
Updated Mar 11, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yaroslav Bulatov (2022). not-MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/not-mnist-dataset/
Explore at:
deeplakeAvailable download formats
Dataset updated
Mar 11, 2022
Authors
Yaroslav Bulatov
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The not-MNIST dataset is a dataset of handwritten digits. It is a challenging dataset that can be used for machine learning and artificial intelligence research. The dataset consists of 100,000 images of handwritten digits. The images are divided into a training set of 60,000 images and a test set of 40,000 images. The images are drawn from a variety of fonts and styles, making them more challenging than the MNIST dataset. The images are 28x28 pixels in size and are grayscale. The dataset is available under the Creative Commons Zero Public Domain Dedication license.
o
mnist_784
openml.org
Updated Sep 29, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yann LeCun; Corinna Cortes; Christopher J.C. Burges (2014). mnist_784 [Dataset]. https://www.openml.org/d/554
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 29, 2014
Authors
Yann LeCun; Corinna Cortes; Christopher J.C. Burges
Description
Author: Yann LeCun, Corinna Cortes, Christopher J.C. Burges
Source: MNIST Website - Date unknown
Please cite:

The MNIST database of handwritten digits with 784 features, raw data available at: http://yann.lecun.com/exdb/mnist/. It can be split in a training set of the first 60,000 examples, and a test set of 10,000 examples

It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.

With some classification methods (particularly template-based methods, such as SVM and K-nearest neighbors), the error rate improves when the digits are centered by bounding box rather than center of mass. If you do this kind of pre-processing, you should report it in your publications. The MNIST database was constructed from NIST's NIST originally designated SD-3 as their training set and SD-1 as their test set. However, SD-3 is much cleaner and easier to recognize than SD-1. The reason for this can be found on the fact that SD-3 was collected among Census Bureau employees, while SD-1 was collected among high-school students. Drawing sensible conclusions from learning experiments requires that the result be independent of the choice of training set and test among the complete set of samples. Therefore it was necessary to build a new database by mixing NIST's datasets.

The MNIST training set is composed of 30,000 patterns from SD-3 and 30,000 patterns from SD-1. Our test set was composed of 5,000 patterns from SD-3 and 5,000 patterns from SD-1. The 60,000 pattern training set contained examples from approximately 250 writers. We made sure that the sets of writers of the training set and test set were disjoint. SD-1 contains 58,527 digit images written by 500 different writers. In contrast to SD-3, where blocks of data from each writer appeared in sequence, the data in SD-1 is scrambled. Writer identities for SD-1 is available and we used this information to unscramble the writers. We then split SD-1 in two: characters written by the first 250 writers went into our new training set. The remaining 250 writers were placed in our test set. Thus we had two sets with nearly 30,000 examples each. The new training set was completed with enough examples from SD-3, starting at pattern # 0, to make a full set of 60,000 training patterns. Similarly, the new test set was completed with SD-3 examples starting at pattern # 35,000 to make a full set with 60,000 test patterns. Only a subset of 10,000 test images (5,000 from SD-1 and 5,000 from SD-3) is available on this site. The full 60,000 sample training set is available.
P
Data from: MNIST Large Scale dataset Dataset
paperswithcode.com
Updated Jun 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ylva Jansson; Tony Lindeberg (2021). MNIST Large Scale dataset Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-large-scale-dataset
Explore at:
Dataset updated
Jun 10, 2021
Authors
Ylva Jansson; Tony Lindeberg
Description
The MNIST Large Scale dataset is based on the classic MNIST dataset, but contains large scale variations up to a factor of 16. The motivation behind creating this dataset was to enable testing the ability of different algorithms to learn in the presence of large scale variability and specifically the ability to generalise to new scales not present in the training set over wide scale ranges.

The dataset contains training data for each one of the relative size factors 1, 2 and 4 relative to the original MNIST dataset and testing data for relative scaling factors between 1/2 and 8, with a ratio of $\sqrt[4]{2}$ between adjacent scales.
P
MNIST-MIX Dataset
paperswithcode.com
opendatalab.com
Updated Apr 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MNIST-MIX Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-mix
Explore at:
Dataset updated
Apr 7, 2020
Authors
Weiwei Jiang
Description
MNIST-MIX is a multi-language handwritten digit recognition dataset. It contains digits from 10 different languages.
P
MNIST-1D Dataset
paperswithcode.com
opendatalab.com
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sam Greydanus; Dmitry Kobak (2024). MNIST-1D Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-1d
Explore at:
Dataset updated
Nov 26, 2024
Authors
Sam Greydanus; Dmitry Kobak
Description
A minimalist, low-memory, and low-compute alternative to classic deep learning benchmarks. The training examples are 20 times smaller than MNIST examples yet they differentiate more clearly between linear, nonlinear, and convolutional models which attain 32, 68, and 94% accuracy respectively (these models obtain 94, 99+, and 99+% on MNIST).
h
mnist
huggingface.co
Updated Dec 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katsuhiko Yamamoto (2024). mnist [Dataset]. https://huggingface.co/datasets/kyama0321/mnist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2024
Authors
Katsuhiko Yamamoto
Description
kyama0321/mnist dataset hosted on Hugging Face and contributed by the HF Datasets community
T
binarized_mnist
tensorflow.org
Updated Jun 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). binarized_mnist [Dataset]. https://www.tensorflow.org/datasets/catalog/binarized_mnist
Explore at:
Dataset updated
Jun 1, 2024
Description
A specific binarization of the MNIST images originally used in (Salakhutdinov & Murray, 2008). This dataset is frequently used to evaluate generative models of images, so labels are not provided.

To use this dataset:

import tensorflow_datasets as tfds ds = tfds.load('binarized_mnist', split='train') for ex in ds.take(4): print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/binarized_mnist-1.0.0.png" alt="Visualization" width="500px">
h
mnist1d
huggingface.co
Updated Oct 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mnist1d [Dataset]. https://huggingface.co/datasets/christopher/mnist1d
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 9, 2024
Authors
Christopher Akiki
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset card is based on the README file of the authors' GitHub repository: https://github.com/greydanus/mnist1d

The MNIST-1D Dataset

Most machine learning models get around the same ~99% test accuracy on MNIST. The MNIST-1D dataset is 100x smaller (default sample size: 4000+1000; dimensionality: 40) and does a better job of separating between models with/without nonlinearity and models with/without spatial inductive biases. MNIST-1D is a core teaching dataset in Simon… See the full description on the dataset page: https://huggingface.co/datasets/christopher/mnist1d.
t
MNIST database of handwritten digits - Dataset - LDM
service.tib.eu
Updated Dec 16, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). MNIST database of handwritten digits - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/mnist-database-of-handwritten-digits
Explore at:
Dataset updated
Dec 16, 2024
Description
The MNIST handwritten digit database is a dataset of 60,000 training and 10,000 test examples of handwritten digit images.
P
Dirty-MNIST Dataset
paperswithcode.com
opendatalab.com
Updated Jan 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jishnu Mukhoti; Andreas Kirsch; Joost van Amersfoort; Philip H. S. Torr; Yarin Gal (2024). Dirty-MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/dirty-mnist
Explore at:
Dataset updated
Jan 8, 2024
Authors
Jishnu Mukhoti; Andreas Kirsch; Joost van Amersfoort; Philip H. S. Torr; Yarin Gal
Description
DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. AmbiguousMNIST contains additional ambiguous digits with varying ambiguity. The AmbiguousMNIST test set contains 60k ambiguous samples as well.

Additional Guidance

DirtyMNIST is a concatenation of MNIST + AmbiguousMNIST, with 60k samples each in the training set. The current AmbiguousMNIST contains 6k unique samples with 10 labels each. This multi-label dataset gets flattened to 60k samples. The assumption is that ambiguous samples have multiple "valid" labels as they are ambiguous. MNIST samples are intentionally undersampled (in comparison), which benefits AL acquisition functions that can select unambiguous samples. Pick your initial training samples (for warm starting Active Learning) from the MNIST half of DirtyMNIST to avoid starting training with potentially very ambiguous samples, which might add a lot of variance to your experiments. Make sure to pick your validation set from the MNIST half as well, for the same reason as above. Make sure that your batch acquisition size is >= 10 (probably) given that there are 10 multi-labels per samples in Ambiguous-MNIST. By default, Gaussian noise with stddev 0.05 is added to each sample to prevent acquisition functions (in Active Learning) from cheating by disgarding "duplicates". If you want to split Ambiguous-MNIST into subsets (or Dirty-MNIST within the second ambiguous half), make sure to split by multiples of 10 to avoid splits within a flattened multi-label sample.
Data from: Written and spoken digits database for multimodal learning
zenodo.org
data.niaid.nih.gov
bin
Updated Jan 21, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond (2021). Written and spoken digits database for multimodal learning [Dataset]. http://doi.org/10.5281/zenodo.4452953
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4452953
Dataset updated
Jan 21, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Lyes Khacef; Lyes Khacef; Laurent Rodriguez; Benoit Miramond; Laurent Rodriguez; Benoit Miramond
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Database description:

The written and spoken digits database is not a new database but a constructed database from existing ones, in order to provide a ready-to-use database for multimodal fusion [1].

The written digits database is the original MNIST handwritten digits database [2] with no additional processing. It consists of 70000 images (60000 for training and 10000 for test) of 28 x 28 = 784 dimensions.

The spoken digits database was extracted from Google Speech Commands [3], an audio dataset of spoken words that was proposed to train and evaluate keyword spotting systems. It consists of 105829 utterances of 35 words, amongst which 38908 utterances of the ten digits (34801 for training and 4107 for test). A pre-processing was done via the extraction of the Mel Frequency Cepstral Coefficients (MFCC) with a framing window size of 50 ms and frame shift size of 25 ms. Since the speech samples are approximately 1 s long, we end up with 39 time slots. For each one, we extract 12 MFCC coefficients with an additional energy coefficient. Thus, we have a final vector of 39 x 13 = 507 dimensions. Standardization and normalization were applied on the MFCC features.

To construct the multimodal digits dataset, we associated written and spoken digits of the same class respecting the initial partitioning in [2] and [3] for the training and test subsets. Since we have less samples for the spoken digits, we duplicated some random samples to match the number of written digits and have a multimodal digits database of 70000 samples (60000 for training and 10000 for test).

The dataset is provided in six files as described below. Therefore, if a shuffle is performed on the training or test subsets, it must be performed in unison with the same order for the written digits, spoken digits and labels.

Files:

data_wr_train.npy: 60000 samples of 784-dimentional written digits for training;

data_sp_train.npy: 60000 samples of 507-dimentional spoken digits for training;

labels_train.npy: 60000 labels for the training subset;

data_wr_test.npy: 10000 samples of 784-dimentional written digits for test;

data_sp_test.npy: 10000 samples of 507-dimentional spoken digits for test;

labels_test.npy: 10000 labels for the test subset.

References:

Khacef, L. et al. (2020), "Brain-Inspired Self-Organization with Cellular Neuromorphic Computing for Multimodal Unsupervised Learning".

LeCun, Y. & Cortes, C. (1998), “MNIST handwritten digit database”.

Warden, P. (2018), “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”.
R
Mnist Project Dataset
universe.roboflow.com
zip
Updated May 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nabilah K S (2024). Mnist Project Dataset [Dataset]. https://universe.roboflow.com/nabilah-k-s/mnist-project/dataset/2
Explore at:
zipAvailable download formats
Dataset updated
May 2, 2024
Dataset authored and provided by
Nabilah K S
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Variables measured
OBJECT DETECTION Bounding Boxes
Description
MNIST PROJECT

## Overview MNIST PROJECT is a dataset for object detection tasks - it contains OBJECT DETECTION annotations for 2,550 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
MNIST32
kaggle.com
Updated Feb 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
sbordt (2024). MNIST32 [Dataset]. https://www.kaggle.com/datasets/sbordt/mnist32
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 27, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
sbordt
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Description
This dataset is derived from the MNIST dataset (http://yann.lecun.com/exdb/mnist/).

The dataset is described in the research paper https://openaccess.thecvf.com/content/CVPR2023W/XAI4CV/html/Bordt_The_Manifold_Hypothesis_for_Gradient-Based_Explanations_CVPRW_2023_paper.html

The usage of the dataset is described in this example notebook https://github.com/tml-tuebingen/explanations-manifold/blob/main/examples/mnist32.ipynb
MNIST dataset
kaggle.com
zip
Updated Feb 20, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JoyceWW (2023). MNIST dataset [Dataset]. https://www.kaggle.com/datasets/joyceww/mnist-dataset
Explore at:
zip(23112702 bytes)Available download formats
Dataset updated
Feb 20, 2023
Authors
JoyceWW
Description
Dataset

This dataset was created by JoyceWW

Contents
Fashion-mnist_train
kaggle.com
Updated Jul 11, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sanjeet Kumar Yadav (2018). Fashion-mnist_train [Dataset]. https://www.kaggle.com/sanjeet41/fashionmnist-train/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 11, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sanjeet Kumar Yadav
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. Zalando intends Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. The original MNIST dataset contains a lot of handwritten digits. Members of the AI/ML/Data Science community love this dataset and use it as a benchmark to validate their algorithms. In fact, MNIST is often the first dataset researchers try. "If it doesn't work on MNIST, it won't work at all", they said. "Well, if it does work on MNIST, it may still fail on others." Zalando seeks to replace the original MNIST dataset

Facebook

Twitter

Click to copy link

Link copied

Cite

Yann LeCun (2021). mnist [Dataset]. https://huggingface.co/datasets/ylecun/mnist

mnist

MNIST

ylecun/mnist

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Jan 13, 2021

Authors

Yann LeCun

License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Dataset Card for MNIST

  Dataset Summary

The MNIST dataset consists of 70,000 28x28 black-and-white images of handwritten digits extracted from two NIST databases. There are 60,000 images in the training dataset and 10,000 images in the validation dataset, one class per digit so a total of 10 classes, with 7,000 images (6,000 train images and 1,000 test images) per class. Half of the image were drawn by Census Bureau employees and the other half by high school… See the full description on the dataset page: https://huggingface.co/datasets/ylecun/mnist.

Clear search

Close search

Google apps

Main menu

mnist

Mnist 42000 Images Dataset

mnist-text-default

Data from: Fashion Mnist Dataset

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Authors:

Dataset Obtained From: https://github.com/zalandoresearch/fashion-mnist

All images were sized 28x28 in the original dataset

Version 1 (original-images_Original-FashionMNIST-Splits):

Version 3 (original-images_trainSetSplitBy80_20):

Citation:

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

not-MNIST

mnist_784

Data from: MNIST Large Scale dataset Dataset

MNIST-MIX Dataset

MNIST-1D Dataset

mnist

binarized_mnist

mnist1d

MNIST database of handwritten digits - Dataset - LDM

Dirty-MNIST Dataset

Data from: Written and spoken digits database for multimodal learning

Mnist Project Dataset

MNIST PROJECT

MNIST32

MNIST dataset

Dataset

Contents

Fashion-mnist_train

mnist

MNIST

ylecun/mnist