17 datasets found

MNIST R RDS
kaggle.com
zip
Updated Mar 27, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurae (2018). MNIST R RDS [Dataset]. https://www.kaggle.com/laurae2/mnistrds
Explore at:
zip(20508114 bytes)Available download formats
Dataset updated
Mar 27, 2018
Authors
Laurae
Description
Dataset

This dataset was created by Laurae

Contents

It contains the following files:
P
Data from: Mechanical MNIST – Distribution Shift Dataset
paperswithcode.com
Updated Jun 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Mechanical MNIST – Distribution Shift Dataset [Dataset]. https://paperswithcode.com/dataset/mechanical-mnist-distribution-shift
Explore at:
Dataset updated
Jun 28, 2022
Description
The Mechanical MNIST – Distribution Shift dataset contains the results of finite element simulation of heterogeneous material subject to large deformation due to equibiaxial extension at a fixed boundary displacement of d = 7.0. The result provided in this dataset is the change in strain energy after this equibiaxial extension. The Mechanical MNIST dataset is generated by converting the MNIST bitmap images (28x28 pixels) with range 0 - 255 to 2D heterogeneous blocks of material (28x28 unit square) with varying modulus in range 1- s. The original bitmap images are sourced from the MNIST Digits dataset, (http://www.pymvpa.org/datadb/mnist.html) which corresponds to Mechanical MNIST – MNIST, and the EMNIST Letters dataset (https://www.nist.gov/itl/products-and-services/emnist-dataset) which correspond to Mechanical MNIST – EMNIST Letters. The Mechanical MNIST – Distribution Shift dataset is specifically designed to demonstrate three types of data distribution shift: (1) covariate shift, (2) mechanism shift, and (3) sampling bias, for all of which the training and testing environments are drawn from different distributions. For each type of data distribution shift, we have one dataset generated from the Mechanical MNIST bitmaps and one from the Mechanical MNIST – EMNIST Letters bitmaps. For the covariate shift dataset, the training dataset is collected from two environments (2500 samples from s = 100, and 2500 samples from s = 90), and the test data is collected from two additional environments (2000 samples from s = 75, and 2000 samples from s = 50). For the mechanism shift dataset, the training data is identical to the training data in the covariate shift dataset (i.e., 2500 samples from s = 100, and 2500 samples from s = 90), and the test datasets are from two additional environments (2000 samples from s = 25, and 2000 samples from s = 10). For the sampling bias dataset, datasets are collected such that each datapoint is selected from the broader MNIST and EMNIST inputs bitmap selection by a probability which is controlled by a parameter r. The training data is collected from two environments (9800 from r = 15, and 200 from r = -2), and the test data is collected from three different environments (2000 from r = -5, 2000 from r = -10, and 2000 from r = 1). Thus, in the end we have 6 benchmark datasets with multiple training and testing environments in each. The enclosed document “folder_description.pdf'” shows the organization of each zipped folder provided on this page. The code to reproduce these simulations is available on GitHub (https://github.com/elejeune11/Mechanical-MNIST/blob/master/generate_dataset/Equibiaxial_Extension_FEA_test_FEniCS.py).
h
MNIST-DIA-R
huggingface.co
Updated Oct 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ricardo Flores (2024). MNIST-DIA-R [Dataset]. https://huggingface.co/datasets/RicardoFlores/MNIST-DIA-R
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 23, 2024
Authors
Ricardo Flores
Description
RicardoFlores/MNIST-DIA-R dataset hosted on Hugging Face and contributed by the HF Datasets community
P
MNIST Multiview Datasets Dataset
paperswithcode.com
Updated Sep 17, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anil Goyal; Emilie Morvant; Pascal Germain; Massih-Reza Amini (2022). MNIST Multiview Datasets Dataset [Dataset]. https://paperswithcode.com/dataset/mnist-multiview-datasets
Explore at:
Dataset updated
Sep 17, 2022
Authors
Anil Goyal; Emilie Morvant; Pascal Germain; Massih-Reza Amini
Description
MNIST Multiview Datasets MNIST is a publicly available dataset consisting of 70, 000 images of handwritten digits distributed over ten classes. We generated 2 four-view datasets where each view is a vector of R^{14 x 14}:

MNIST₁: It is generated by considering 4 quarters of image as 4 views. MNIST₂: It is generated by considering 4 overlapping views around the centre of images: this dataset brings redundancy between the views.

Related Papers: ``` Goyal, Anil, Emilie Morvant, Pascal Germain, and Massih-Reza Amini. "Multiview Boosting by Controlling the Diversity and the Accuracy of View-specific Voters." Neurocomputing, 358, 2019, pp. 81-92.

Link to the ArXiv version: https://arxiv.org/abs/1808.05784

Published Version: https://doi.org/10.1016/j.neucom.2019.04.072 ```

Goyal, Anil, Emilie Morvant, and Massih-Reza Amini. "Multiview Learning of Weighted Majority Vote by Bregman Divergence Minimization." In International Symposium on Intelligent Data Analysis, pp. 124-136. Springer, Cham, 2018. Link to the ArXiv version: https://arxiv.org/abs/1805.10212 Published Version: https://doi.org/10.1007/978-3-030-01768-2_11
Rescaled Fashion-MNIST dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST dataset [Dataset]. http://doi.org/10.5281/zenodo.15187793
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15187793
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Time period covered
Apr 10, 2025
Description
Motivation

The goal of introducing the Rescaled Fashion-MNIST dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled Fashion-MNIST dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled Fashion-MNIST dataset is more challenging than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

Access and rights

The Rescaled Fashion-MNIST dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled FashionMNIST dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72, with the object in the frame always centred. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

The h5 files containing the dataset

The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

fashionmnist_with_scale_variations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

Additionally, for the Rescaled FashionMNIST dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_te10000_outsize72-72_scte2p000.h5

These dataset files were used for the experiments presented in Figures 6, 7, 14, 16, 19 and 23 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

There is also a closely related Fashion-MNIST with translations dataset, which in addition to scaling variations also comprises spatial translations of the objects.
Free Spoken Digit Dataset (FSDD)
kaggle.com
Updated Sep 4, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jose R. Zapata (2020). Free Spoken Digit Dataset (FSDD) [Dataset]. https://www.kaggle.com/joserzapata/free-spoken-digit-dataset-fsdd/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 4, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jose R. Zapata
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

A free audio dataset of spoken digits. Think MNIST for audio. (3,000 recordings, 6 speakers ) A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.

FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags.

Current status 6 speakers 3,000 recordings (50 of each digit per speaker) English pronunciations

Created by: Zohar Jackson, César Souza, Jason Flaks, Yuxin Pan, Hereman Nicolas, & Adhish Thite.

Link: https://github.com/Jakobovski/free-spoken-digit-dataset

Content

What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

Acknowledgements

Zohar Jackson, César Souza, Jason Flaks, Yuxin Pan, Hereman Nicolas, & Adhish Thite. (2018, August 9). Jakobovski/free-spoken-digit-dataset: v1.0.8 (Version v1.0.8). Zenodo. http://doi.org/10.5281/zenodo.1342401

Inspiration

A free audio dataset of spoken digits. Think MNIST for audio.
Safran-MNIST-DLS
zenodo.org
application/gzip, csv
Updated Dec 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sofia Marino; Jennifer Vandoni; Ichraq Lemghari; Basile Musquer; Thierry Arsaut; Sofia Marino; Jennifer Vandoni; Ichraq Lemghari; Basile Musquer; Thierry Arsaut (2025). Safran-MNIST-DLS [Dataset]. http://doi.org/10.5281/zenodo.13321202
Explore at:
csv, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13321202
Dataset updated
Dec 5, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sofia Marino; Jennifer Vandoni; Ichraq Lemghari; Basile Musquer; Thierry Arsaut; Sofia Marino; Jennifer Vandoni; Ichraq Lemghari; Basile Musquer; Thierry Arsaut
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Apr 30, 2024
Description
This dataset contains images of serial numbers extracted from diverse avionic parts manufactured by SAFRAN, the international high-technology group and world leader operating in the aviation (propulsion, equipment and interiors), defense and space markets. This dataset resembles the well-known MNIST dataset, but with a focus to industrial contexts, encompassing variations in lighting conditions, orientations, writing styles and surface textures.

The dataset contains 32 classes depicting numbers, alphabetic characters, and symbols, namely: [0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, A, B, C, D, E, F, G, H, J, K, L, M, N, P, R, S, T, U, W, Y, /, .]

April 30th, 2024 : Training dataset containing 9314 images without labels is released.

December 5th, 2024 : Testing and validation datasets released, ground-truth labels for training, validation and testing released.

This dataset has been proposed in the context of https://dagecc-challenge.github.io/icpr2024/" href="https://dagecc-challenge.github.io/icpr2024/" target="_blank" rel="noreferrer noopener">ICPR24 DAGECC Competition
Rescaled Fashion-MNIST with translations dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled Fashion-MNIST with translations dataset [Dataset]. http://doi.org/10.5281/zenodo.15188439
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15188439
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Time period covered
Apr 10, 2025
Description
Motivation

The goal of introducing the Rescaled Fashion-MNIST with translations dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data, and to additionally provide a way to test network object detection and object localisation abilities on image data where the objects are not centred.

The Rescaled Fashion-MNIST with translations dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled Fashion-MNIST with translations dataset is more challenging than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2.

Access and rights

The Rescaled Fashion-MNIST with translations dataset is provided on the condition that you provide proper citation for the original Fashion-MNIST dataset:

[4] Xiao, H., Rasul, K., and Vollgraf, R. (2017) “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms”, arXiv preprint arXiv:1708.07747

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled FashionMNIST with translations dataset is generated by rescaling 28×28 gray-scale images of clothes from the original FashionMNIST dataset [4]. The scale variations are up to a factor of 4, and the images are embedded within black images of size 72x72. The objects within the images have also been randomly shifted in the spatial domain, with the object always at least 4 pixels away from the image boundary. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 different classes in the dataset: “T-shirt/top”, “trouser”, “pullover”, “dress”, “coat”, “sandal”, “shirt”, “sneaker”, “bag” and “ankle boot”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 50 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 50 000 samples from the original Fashion-MNIST training set. The validation dataset, on the other hand, is formed from the final 10 000 images of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original Fashion-MNIST test set.

The h5 files containing the dataset

The training dataset file (~2.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

fashionmnist_with_scale_variations_and_translations_tr50000_vl10000_te10000_outsize72-72_scte1p000_scte1p000.h5

Additionally, for the Rescaled FashionMNIST with translations dataset, there are 9 datasets (~415 MB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte0p500.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte0p595.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte0p707.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte0p841.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte1p000.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte1p189.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte1p414.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte1p682.h5
fashionmnist_with_scale_variations_and_translations_te10000_outsize72-72_scte2p000.h5

These dataset files were used for the experiments presented in Figure 8 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.

There is also a closely related Fashion-MNIST dataset, which in addition to scaling variations keeps the objects in the frame centred, meaning no spatial translations are used.
n
MultNIST Dataset
data.ncl.ac.uk
json
Updated Nov 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Towers; Rob Geada; Amir Atapour Abarghouei; Andrew Stephen McGough (2023). MultNIST Dataset [Dataset]. http://doi.org/10.25405/data.ncl.24574678.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.25405/data.ncl.24574678.v1
Dataset updated
Nov 30, 2023
Dataset provided by
Newcastle University
Authors
David Towers; Rob Geada; Amir Atapour Abarghouei; Andrew Stephen McGough
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset containing the images and labels for the MultNIST data used in the CVPR NAS workshop Unseen-data challenge under the codename "Mateo"The MultNIST dataset is a constructed dataset from MNIST Images. The intention of this dataset is to require machine learning models to do more than just image classification but also perform a calculation, in this case multiplaction followed by a mod operation. For each image, three MNIST Images were randomly chosen and combined together through the colour channels, resulting in a three colour-channel image so each MNIST image represents one colour channel. The data is in a channels-first format with a shape of (n, 3, 28, 28) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.The label of each image is generated using the formula "(r * b * g) % 10" where r, g, and b are the red, green, and blue colour channels respectively. An example of a MultNIST Image would be a rgb configuation of 3, 7, and 4 respectively, which would result in a label of 4 ((3 * 7 * 4) % 10).
P
Data from: Mechanical MNIST Crack Path Dataset
paperswithcode.com
opendatalab.com
Updated Jul 23, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S. Mohammadzadeh; E. Lejeune (2021). Mechanical MNIST Crack Path Dataset [Dataset]. https://paperswithcode.com/dataset/mechanical-mnist-crack-path
Explore at:
Dataset updated
Jul 23, 2021
Authors
S. Mohammadzadeh; E. Lejeune
Description
The Mechanical MNIST Crack Path dataset contains Finite Element simulation results from phase-field models of quasi-static brittle fracture in heterogeneous material domains subjected to prescribed loading and boundary conditions. For all samples, the material domain is a square with a side length of $1$. There is an initial crack of fixed length ($0.25$) on the left edge of each domain. The bottom edge of the domain is fixed in $x$ (horizontal) and $y$ (vertical), the right edge of the domain is fixed in $x$ and free in $y$, and the left edge is free in both $x$ and $y$. The top edge is free in $x$, and in $y$ it is displaced such that, at each step, the displacement increases linearly from zero at the top right corner to the maximum displacement on the top left corner. Maximum displacement starts at $0.0$ and increases to $0.02$ by increments of $0.0001$ ($200$ simulation steps in total). The heterogeneous material distribution is obtained by adding rigid circular inclusions to the domain using the Fashion MNIST bitmaps as the reference location for the center of the inclusions. Specifically, each center point location is generated randomly inside a square region defined by the corresponding Fashion MNIST pixel when the pixel has an intensity value higher than $10$. In addition, a minimum center-to-center distance limit of $0.0525$ is applied while generating these center points for each sample. The values of Young’s Modulus $(E)$, Fracture Toughness $(G_f)$, and Failure Strength $(f_t)$ near each inclusion are increased with respect to the background domain by a variable rigidity ratio $r$. The background value for $E$ is $210000$, the background value for $G_f$ is $2.7$, and the background value for $f_t$ is $2445.42$. The rigidity ratio throughout the domain depends on position with respect to all inclusion centers such that the closer a point is to the inclusion center the higher the rigidity ratio will be. We note that the full algorithm for constructing the heterogeneous material property distribution is included in the simulations scripts shared on GitHub. The following information is included in our dataset: (1) A rigidity ratio array to capture heterogeneous material distribution reported over a uniform $64\times64$ grid, (2) the damage field at the final level of applied displacement reported over a uniform $256\times256$ grid, and (3) the force-displacement curves for each simulation. All simulations are conducted with the FEniCS computing platform (https://fenicsproject.org). The code to reproduce these simulations is hosted on GitHub (https://github.com/saeedmhz/phase-field).
Dataset for training the Surrogate Model of microlaser neurons on the...
zenodo.org
csv
Updated Nov 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gibaek Kim; Laurie Calvet; Laurie Calvet; Gibaek Kim (2024). Dataset for training the Surrogate Model of microlaser neurons on the reduced MNIST classification task [Dataset]. http://doi.org/10.5281/zenodo.14169304
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14169304
Dataset updated
Nov 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gibaek Kim; Laurie Calvet; Laurie Calvet; Gibaek Kim
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 31, 2024
Description
This dataset was used to train a surrogate multilayer perceptron surrogate model of microlaser neurons.

It is in csv format. It was generated using the Yamada Model as found in

Selmi F, Braive R, Beaudoin G, Sagnes I, Kuszelewicz R and Barbay S 2014 Relative Refractory Period in an Excitable Semiconductor Laser Phys. Rev. Lett. 112 183902.
d
Data from: Label-noise reduction with support vector machines
search.dataone.org
dataone.org
+1more
Updated Feb 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daly, Kendra (2025). Label-noise reduction with support vector machines [Dataset]. http://doi.org/10.7266/N71834DN
Explore at:
Unique identifier
https://doi.org/10.7266/N71834DN
Dataset updated
Feb 5, 2025
Dataset provided by
GRIIDC
Authors
Daly, Kendra
Description
This dataset reports data generated through the investigation of the problem of detection of label-noise in large pattern-recognition datasets. Specifically, results of label-noise reduction on two datasets are reported. The University of California, Irvine (UCI) Letter Recognition dataset and the Mixed National Institute of Standards and Technology (MNIST) Digit Recognition dataset was used to train an algorithm. The algorithm was then tested a Plankton dataset collected by the SIPPER (Shadow Imaging Particle Profiler and Evaluation Recorder) camera imaging system during trips to the site of the Deepwater Horizon Oil Spill. The experiment with the Plankton dataset represented a more practical application of data cleansing, because the label-noise naturally occurred in the data. Data presented in this database include the noise detected for four classes, total noise detected, and the percent accumulative noise detected for both the UCI and MNIST datasets. SIPPER data are not reported in this dataset. SIPPER data can be found in dataset R1.x130.000:0002 "Application of image processing and machine learning techniques to distinguish suspected oil droplets from plankton and other particles for the SIPPER imaging system". On a dataset that contained images of plankton with inadvertent noise, the new algorithm was able to detect all incorrect samples in the class of interest by reviewing only 5% of the data. Thus, the described approach helps to significantly reduce the effort needed to remove label-noise from data. Data published in: Felatyev, S., M. Shreve, K. Kramer, L. Hall, D. Goldgof, R. Kasturi, K. Daly, A. Remsen, H. Bunke. 2012. Label-Noise Reduction with Support Vector Machines. International Conference on Pattern Recognition (ICPR), November 2012, Tsukuba Science City, Japan.
German Sign Language (DGS) Alphabet
kaggle.com
Updated Jan 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Moritz Kronberger (2022). German Sign Language (DGS) Alphabet [Dataset]. https://www.kaggle.com/datasets/moritzkronberger/german-sign-language
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 20, 2022
Dataset provided by
Kaggle
Authors
Moritz Kronberger
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This dataset was created to train a neural network for real time sign detection, which would be used as automated feedback for a learning application. The dataset ist based on the normalized hand landmark vectors provided by mediapipe's handpose in order to make the trained NN invariant to lighting situations or skin colors, which could not be represented in a diverse enough fashion in the dataset.

The dataset is therefore designed to train a NN which categorizes the MULTI_HAND_LANDMARK output of the handpose solution.

Content

The dataset contains 64 columns with the first column being the sample's label. All static signs (meaning signs not involving movement) of the German Sign language alphabet are represented as 24 classes ('a'-'y', excluding 'j').

All other columns represent the 21 linearized, three-dimensional hand landmarks provided by handpose in their normalized ([0.0, 1.0]) state.

In total the dataset contains ca. 7300 samples with at least 250 samples per class, recorded by 7 different non-native signers.

The dataset is purely made up of recorded samples and does not make use of data augmentation.

Acknowledgements

This dataset was inspired by the desire to create a German version of the Sign Language MNIST dataset with a stronger focus on practical applicability.

Inspiration

Our team is interested in providing a foundation for all kinds of practical applications involving sign language recognition. As with our own work, we appreciate a focus on applications challenging non-signers to engage with sign language in a way that promotes inclusion.

Ethical considerations

We are aware of the ethical implications of such a dataset and encourage developers to seriously consider research on the ethics of machine learning and sign language to avoid harmful outcomes of well intended projects. For more information on this topic we recommend Bragg, D., Caselli, N., Hochgesang, J. A., Huenerfauth, M., Katz-Hernandez, L., Koller, O., Kushalnagar, R., Vogler, C., & Ladner, R. E. (2021). The FATE Landscape of Sign Language AI Datasets: An Interdisciplinary Perspective. In ACM Transactions on Accessible Computing (14th ed., Vol. 2, pp. 1-45). Association for Computing Machinery. 10.1145/3436996 as a starting point.
Fruits-360 dataset
kaggle.com
paperswithcode.com
+1more
Updated Jun 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mihai Oltean (2025). Fruits-360 dataset [Dataset]. https://www.kaggle.com/datasets/moltean/fruits
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 7, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Mihai Oltean
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

The following fruits, vegetables and nuts and are included: Apples (different varieties: Crimson Snow, Golden, Golden-Red, Granny Smith, Pink Lady, Red, Red Delicious), Apricot, Avocado, Avocado ripe, Banana (Yellow, Red, Lady Finger), Beans, Beetroot Red, Blackberry, Blueberry, Cabbage, Caju seed, Cactus fruit, Cantaloupe (2 varieties), Carambula, Carrot, Cauliflower, Cherimoya, Cherry (different varieties, Rainier), Cherry Wax (Yellow, Red, Black), Chestnut, Clementine, Cocos, Corn (with husk), Cucumber (ripened, regular), Dates, Eggplant, Fig, Ginger Root, Goosberry, Granadilla, Grape (Blue, Pink, White (different varieties)), Grapefruit (Pink, White), Guava, Hazelnut, Huckleberry, Kiwi, Kaki, Kohlrabi, Kumsquats, Lemon (normal, Meyer), Lime, Lychee, Mandarine, Mango (Green, Red), Mangostan, Maracuja, Melon Piel de Sapo, Mulberry, Nectarine (Regular, Flat), Nut (Forest, Pecan), Onion (Red, White), Orange, Papaya, Passion fruit, Peach (different varieties), Pepino, Pear (different varieties, Abate, Forelle, Kaiser, Monster, Red, Stone, Williams), Pepper (Red, Green, Orange, Yellow), Physalis (normal, with Husk), Pineapple (normal, Mini), Pistachio, Pitahaya Red, Plum (different varieties), Pomegranate, Pomelo Sweetie, Potato (Red, Sweet, White), Quince, Rambutan, Raspberry, Redcurrant, Salak, Strawberry (normal, Wedge), Tamarillo, Tangelo, Tomato (different varieties, Maroon, Cherry Red, Yellow, not ripened, Heart), Walnut, Watermelon, Zucchini (green and dark).

Branches

The dataset has 5 major branches:

-The 100x100 branch, where all images have 100x100 pixels. See _fruits-360_100x100_ folder.

-The original-size branch, where all images are at their original (captured) size. See _fruits-360_original-size_ folder.

-The meta branch, which contains additional information about the objects in the Fruits-360 dataset. See _fruits-360_dataset_meta_ folder.

-The multi branch, which contains images with multiple fruits, vegetables, nuts and seeds. These images are not labeled. See _fruits-360_multi_ folder.

-The _3_body_problem_ branch where the Training and Test folders contain different (varieties of) the 3 fruits and vegetables (Apples, Cherries and Tomatoes). See _fruits-360_3-body-problem_ folder.

How to cite

Mihai Oltean, Fruits-360 dataset, 2017-

Dataset properties

For the 100x100 branch

Total number of images: 138704.

Training set size: 103993 images.

Test set size: 34711 images.

Number of classes: 206 (fruits, vegetables, nuts and seeds).

Image size: 100x100 pixels.

For the original-size branch

Total number of images: 58363.

Training set size: 29222 images.

Validation set size: 14614 images

Test set size: 14527 images.

Number of classes: 90 (fruits, vegetables, nuts and seeds).

Image size: various (original, captured, size) pixels.

For the 3-body-problem branch

Total number of images: 47033.

Training set size: 34800 images.

Test set size: 12233 images.

Number of classes: 3 (Apples, Cherries, Tomatoes).

Number of varieties: Apples = 29; Cherries = 12; Tomatoes = 19.

Image size: 100x100 pixels.

For the meta branch

Number of classes: 26 (fruits, vegetables, nuts and seeds).

For the multi branch

Number of images: 150.

Filename format:

For the 100x100 branch

image_index_100.jpg (e.g. 31_100.jpg) or

r_image_index_100.jpg (e.g. r_31_100.jpg) or

r?_image_index_100.jpg (e.g. r2_31_100.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis. "100" comes from image size (100x100 pixels).

Different varieties of the same fruit (apple, for instance) are stored as belonging to different classes.

For the original-size branch

r?_image_index.jpg (e.g. r2_31.jpg)

where "r" stands for rotated fruit. "r2" means that the fruit was rotated around the 3rd axis.

The name of the image files in the new version does NOT contain the "_100" suffix anymore. This will help you to make the distinction between the original-size branch and the 100x100 branch.

For the multi branch

The file's name is the concatenation of the names of the fruits inside that picture.

Alternate download

The Fruits-360 dataset can be downloaded from:

Kaggle https://www.kaggle.com/moltean/fruits

GitHub https://github.com/fruits-360

How fruits were filmed

Fruits and vegetables were planted in the shaft of a low-speed motor (3 rpm) and a short movie of 20 seconds was recorded.

A Logitech C920 camera was used for filming the fruits. This is one of the best webcams available.

Behind the fruits, we placed a white sheet of paper as a background.

Here i...
f
Data_Sheet_1_Back-Propagation Learning in Deep Spike-By-Spike Networks.pdf
frontiersin.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Rotermund; Klaus R. Pawelzik (2023). Data_Sheet_1_Back-Propagation Learning in Deep Spike-By-Spike Networks.pdf [Dataset]. http://doi.org/10.3389/fncom.2019.00055.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fncom.2019.00055.s001
Dataset updated
Jun 1, 2023
Dataset provided by
Frontiers
Authors
David Rotermund; Klaus R. Pawelzik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Artificial neural networks (ANNs) are important building blocks in technical applications. They rely on noiseless continuous signals in stark contrast to the discrete action potentials stochastically exchanged among the neurons in real brains. We propose to bridge this gap with Spike-by-Spike (SbS) networks which represent a compromise between non-spiking and spiking versions of generative models. What is missing, however, are algorithms for finding weight sets that would optimize the output performances of deep SbS networks with many layers. Here, a learning rule for feed-forward SbS networks is derived. The properties of this approach are investigated and its functionality is demonstrated by simulations. In particular, a Deep Convolutional SbS network for classifying handwritten digits achieves a classification performance of roughly 99.3% on the MNIST test data when the learning rule is applied together with an optimizer. Thereby it approaches the benchmark results of ANNs without extensive parameter optimization. We envision this learning rule for SBS networks to provide a new basis for research in neuroscience and for technical applications, especially when they become implemented on specialized computational hardware.
Scaled and Translated Image Recognition (STIR) Source Data
zenodo.org
Updated Nov 24, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thomas Altstidl; Thomas Altstidl; An Nguyen; An Nguyen; Leo Schwinn; Leo Schwinn; Franz Köferl; Franz Köferl; Christopher Mutschler; Christopher Mutschler; Björn Eskofier; Björn Eskofier; Dario Zanca; Dario Zanca (2022). Scaled and Translated Image Recognition (STIR) Source Data [Dataset]. http://doi.org/10.5281/zenodo.7351726
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.7351726
Dataset updated
Nov 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Thomas Altstidl; Thomas Altstidl; An Nguyen; An Nguyen; Leo Schwinn; Leo Schwinn; Franz Köferl; Franz Köferl; Christopher Mutschler; Christopher Mutschler; Björn Eskofier; Björn Eskofier; Dario Zanca; Dario Zanca
Description
While convolutions are known to be invariant to (discrete) translations, scaling continues to be a challenge and most image recognition networks are not invariant to them. To explore these effects, we have created the Scaled and Translated Image Recognition (STIR) dataset. This dataset contains objects of size $s \in [17,64]$, each randomly placed in a $64 \times 64$ pixel image.

Original Source Data

dota/ (from DOTA v1.5 Google Drive website)

train/

DOTA-v1.5_train.zip not unzipped

part1.zip not unzipped

part2.zip not unzipped

part3.zip not unzipped

val/

DOTA-v1.5_val.zip not unzipped

part1.zip not unzipped

fontawesome/ (from Font Awesome 5.15.3 "Free for Desktop")

svgs/ unzipped from archive

mapillary/ (from Mapillary Traffic Sign Dataset)

mtsd_v2_fully_annotated unzipped from archive

train.0.zip not unzipped

train.1.zip not unzipped

train.2.zip not unzipped

val.zip not unzipped

mnist/ (from Yann LeCun website)

t10k-images-idx3-ubyte.gz

t10k-labels-idx1-ubyte.gz

train-images-idx3-ubyte.gz

train-labels-idx1-ubyte.gz

License and Attribution

When using the original source data for your own research, please respect the individual licenses. For attribution in papers, we recommend the following citations which introduce the respective datasets.

D. Gandy, J. Otero, E. Emanuel, F. Botsford, J. Lundien, K. Jackson, M. Wilkerson, R. Madole, J. Raphael, T. Chase, G. Taglialatela, B. Talbot, and T. Chase. Font Awesome. https://fontawesome.com/v5/download, Nov. 2022.

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proc. IEEE, 86(11):2278–2324, Nov. 1998.

C. Ertler, J. Mislej, T. Ollmann, L. Porzi, G. Neuhold, and Y. Kuang. The Mapillary Traffic Sign Dataset for Detection and Classification on a Global Scale. In 2020 16th Eur. Conf. Comput. Vision (ECCV), Glasgow, UK, Aug. 2020.

G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang. DOTA: A Large-Scale Dataset for Object Detection in Aerial Images. In 2018 IEEE/CVF Conf. Comput. Vision and Pattern Recognition (CVPR), pages 3974–3983, Salt Lake City, UT, USA, June 2018.
Rescaled CIFAR-10 dataset
zenodo.org
Updated Jun 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg (2025). Rescaled CIFAR-10 dataset [Dataset]. http://doi.org/10.5281/zenodo.15188748
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15188748
Dataset updated
Jun 27, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Andrzej Perzanowski; Andrzej Perzanowski; Tony Lindeberg; Tony Lindeberg
Description
Motivation

The goal of introducing the Rescaled CIFAR-10 dataset is to provide a dataset that contains scale variations (up to a factor of 4), to evaluate the ability of networks to generalise to scales not present in the training data.

The Rescaled CIFAR-10 dataset was introduced in the paper:

[1] A. Perzanowski and T. Lindeberg (2025) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, Journal of Mathematical Imaging and Vision, 67(29), https://doi.org/10.1007/s10851-025-01245-x.

with a pre-print available at arXiv:

[2] Perzanowski and Lindeberg (2024) "Scale generalisation properties of extended scale-covariant and scale-invariant Gaussian derivative networks on image datasets with spatial scaling variations”, arXiv preprint arXiv:2409.11140.

Importantly, the Rescaled CIFAR-10 dataset contains substantially more natural textures and patterns than the MNIST Large Scale dataset, introduced in:

[3] Y. Jansson and T. Lindeberg (2022) "Scale-invariant scale-channel networks: Deep networks that generalise to previously unseen scales", Journal of Mathematical Imaging and Vision, 64(5): 506-536, https://doi.org/10.1007/s10851-022-01082-2

and is therefore significantly more challenging.

Access and rights

The Rescaled CIFAR-10 dataset is provided on the condition that you provide proper citation for the original CIFAR-10 dataset:

[4] Krizhevsky, A. and Hinton, G. (2009). Learning multiple layers of features from tiny images. Tech. rep., University of Toronto.

and also for this new rescaled version, using the reference [1] above.

The data set is made available on request. If you would be interested in trying out this data set, please make a request in the system below, and we will grant you access as soon as possible.

The dataset

The Rescaled CIFAR-10 dataset is generated by rescaling 32×32 RGB images of animals and vehicles from the original CIFAR-10 dataset [4]. The scale variations are up to a factor of 4. In order to have all test images have the same resolution, mirror extension is used to extend the images to size 64x64. The imresize() function in Matlab was used for the rescaling, with default anti-aliasing turned on, and bicubic interpolation overshoot removed by clipping to the [0, 255] range. The details of how the dataset was created can be found in [1].

There are 10 distinct classes in the dataset: “airplane”, “automobile”, “bird”, “cat”, “deer”, “dog”, “frog”, “horse”, “ship” and “truck”. In the dataset, these are represented by integer labels in the range [0, 9].

The dataset is split into 40 000 training samples, 10 000 validation samples and 10 000 testing samples. The training dataset is generated using the initial 40 000 samples from the original CIFAR-10 training set. The validation dataset, on the other hand, is formed from the final 10 000 image batch of that same training set. For testing, all test datasets are built from the 10 000 images contained in the original CIFAR-10 test set.

The h5 files containing the dataset

The training dataset file (~5.9 GB) for scale 1, which also contains the corresponding validation and test data for the same scale, is:

cifar10_with_scale_variations_tr40000_vl10000_te10000_outsize64-64_scte1p000_scte1p000.h5

Additionally, for the Rescaled CIFAR-10 dataset, there are 9 datasets (~1 GB each) for testing scale generalisation at scales not present in the training set. Each of these datasets is rescaled using a different image scaling factor, 2^k/4, with k being integers in the range [-4, 4]:

cifar10_with_scale_variations_te10000_outsize64-64_scte0p500.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p595.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p707.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte0p841.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p000.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p189.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p414.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte1p682.h5
cifar10_with_scale_variations_te10000_outsize64-64_scte2p000.h5

These dataset files were used for the experiments presented in Figures 9, 10, 15, 16, 20 and 24 in [1].

Instructions for loading the data set

The datasets are saved in HDF5 format, with the partitions in the respective h5 files named as
('/x_train', '/x_val', '/x_test', '/y_train', '/y_test', '/y_val'); which ones exist depends on which data split is used.

The training dataset can be loaded in Python as:

with h5py.File(`

x_train = np.array( f["/x_train"], dtype=np.float32)
x_val = np.array( f["/x_val"], dtype=np.float32)
x_test = np.array( f["/x_test"], dtype=np.float32)
y_train = np.array( f["/y_train"], dtype=np.int32)
y_val = np.array( f["/y_val"], dtype=np.int32)
y_test = np.array( f["/y_test"], dtype=np.int32)

We also need to permute the data, since Pytorch uses the format [num_samples, channels, width, height], while the data is saved as [num_samples, width, height, channels]:

x_train = np.transpose(x_train, (0, 3, 1, 2))
x_val = np.transpose(x_val, (0, 3, 1, 2))
x_test = np.transpose(x_test, (0, 3, 1, 2))

The test datasets can be loaded in Python as:

with h5py.File(`

x_test = np.array( f["/x_test"], dtype=np.float32)
y_test = np.array( f["/y_test"], dtype=np.int32)

The test datasets can be loaded in Matlab as:

x_test = h5read(`

The images are stored as [num_samples, x_dim, y_dim, channels] in HDF5 files. The pixel intensity values are not normalised, and are in a [0, 255] range.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Laurae (2018). MNIST R RDS [Dataset]. https://www.kaggle.com/laurae2/mnistrds

MNIST R RDS

Explore at:

zip(20508114 bytes)Available download formats

Dataset updated

Mar 27, 2018

Authors

Laurae

Description

Dataset

This dataset was created by Laurae

It contains the following files:

Clear search

Close search

Google apps

Main menu

MNIST R RDS

Dataset

Contents

Data from: Mechanical MNIST – Distribution Shift Dataset

MNIST-DIA-R

MNIST Multiview Datasets Dataset

Rescaled Fashion-MNIST dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

Free Spoken Digit Dataset (FSDD)

Context

Content

Acknowledgements

Inspiration

Safran-MNIST-DLS

Rescaled Fashion-MNIST with translations dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

MultNIST Dataset

Data from: Mechanical MNIST Crack Path Dataset

Dataset for training the Surrogate Model of microlaser neurons on the...

Data from: Label-noise reduction with support vector machines

German Sign Language (DGS) Alphabet

Context

Content

Acknowledgements

Inspiration

Ethical considerations

Fruits-360 dataset

Fruits-360 dataset: A dataset of images containing fruits, vegetables, nuts and seeds

Version: 2025.06.07.0

Content

Branches

How to cite

Dataset properties

For the 100x100 branch

For the original-size branch

For the 3-body-problem branch

For the meta branch

For the multi branch

Filename format:

For the 100x100 branch

For the original-size branch

For the multi branch

Alternate download

How fruits were filmed

Data_Sheet_1_Back-Propagation Learning in Deep Spike-By-Spike Networks.pdf

Scaled and Translated Image Recognition (STIR) Source Data

Rescaled CIFAR-10 dataset

Motivation

Access and rights

The dataset

The h5 files containing the dataset

Instructions for loading the data set

MNIST R RDS

Dataset

Contents