The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset for magnetic particle imaging based on the MNIST dataset.
This dataset contains simulated MPI measurements along with ground truth phantoms selected from the https://yann.lecun.com/exdb/mnist/" target="_blank" rel="noopener">MNIST database of handwritten digits. A state-of-the-art model-based system matrix is used to simulate the MPI measurements of the MNIST phantoms. These measurements are equipped with noise perturbations captured by the preclinical MPI system (Bruker, Ettlingen, Germany). The dataset can be utilized in its provided form, while additional data is included to offer flexibility for creating customized versions.
MPI-MNIST features four different system matrices, each available in three spatial resolutions. The provided data is generated using a specified system matrix at highest spatial resolution. Reconstruction operations can be performed by using any of the provided system matrices at a lower resolution. This setup allows for simulating reconstructions from either an exact or an inexact forward operator. To cover further operator deviation setups, we provide additional noise data for the application of pixelwise noise to the reconstruction system matrix.
For supporting the development of learning-based methods, a large amount of further noise samples, captured by the Bruker scanner, is provided.
For a detailed description of the dataset, see arxiv.org/abs/2501.05583.
The Python-based GitHub repository available at https://github.com/meiraiske/MPI-MNIST" href="https://github.com/meiraiske/MPI-MNIST" target="_blank" rel="noopener">https://github.com/meiraiske/MPI-MNIST can be used for downloading the data from this website and preparing it for project use which includes an integration to PyTorch or PyTorch Lightning modules.
File Structure
All data, except for the phantoms, is provided in the MDF file format. This format is specifically tailored to store MPI data and contains metadata corresponding to the experimental setup. The ground truth phantoms are provided as HDF5 files since they do not require any metadata.
SM
: Contains twelve system matrices named SM_{physical model}_{resolution}.mdf
. It covers four physical models given in three resolutions ('coarse'
, 'int'
and 'fine'
). The highest resolution ('fine'
) is used for data generation.large_noise
: Contains large_NoiseMeas.mdf
with 390060 noise measurements. Each noise measurement has been averaged over ten empty scanner measurements. This can be used e.g. for learning-based methods. For dataset in ['train', 'test']
:
{dataset}_noise
: Contains four noise matrices, where each noise measurement has been averaged over ten empty scanner measurements: NoiseMeas_phantom_{dataset}.mdf
: Additive measurement noise for simulated measurements.NoiseMeas_phantom_bg_{dataset}.mdf
: Unused noise reserved for background correction of 1. NoiseMeas_SM_{dataset}.mdf
: System Matrix noise, that can be applied to each pixel of the reconstruction system matrix.NoiseMeas_SM_bg_{dataset}.mdf
: Unused noise reserved for background correction of 3. {dataset}_gt
: Contains {dataset}_gt.hdf5
with flattened and preprocessed ground truth MNIST phantoms given in coarse resolution (15x17=255 pixels) with pixel values in [0, 10]
.{dataset}_obs
: Contains {dataset}_obs.mdf
with noise free simulated measurements (observations) of {dataset}_gt.hdf5
using the system matrix stored in SM_fluid_opt_fine.mdf
.{dataset}_obsnoisy
: Contains {dataset}_obsnoisy.mdf
with noise contained simulated measurements, resulting from {dataset}_obs.mdf
and {dataset}_phantom_noise.mdf
.
In line with MNIST, each MDF/HDF5 file in {dataset}_gt
, {dataset}_obs
, {dataset}_obsnoisy
for dataset in ['train', 'test']
contains 60000 samples for 'train'
and 10000 samples for 'test'
. The data can be manually reproduced in the intermediate resolution (45x51=2295 pixels) from the files in this dataset using the system matrices in intermediate ('int'
) resolution for reconstruction and upsampling the ground truth phantoms by 3 pixels per dimension. This case is also implemented in the Github repository .
The PDF file MPI-MNIST_Metadata.pdf
contains a list of meta information for each of the MDF files of this dataset.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The MNIST database (Modified National Institute of Standards and Technology database) is a large collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image. The original black and white (bilevel) images from NIST were size normalized to fit in a 20x20 pixel box while preserving their aspect ratio. The resulting images contain grey levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28x28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28x28 field.