47 datasets found

Z
Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...
data.niaid.nih.gov
zenodo.org
Updated Jun 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632104
Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Taskiran, Diyar
Borth, Damian
Giró-i-Nieto, Xavier
Knyazev, Boris
Schürholt, Konstantin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
MNIST Data for Digit Recognition
kaggle.com
Updated Dec 22, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sylvia Mittal (2017). MNIST Data for Digit Recognition [Dataset]. https://www.kaggle.com/sylvia23/mnist-data-for-digit-recognation/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 22, 2017
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sylvia Mittal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains training and testing data for digit recognition which includes hand written images of digits.

It contains four zip files which you can easily include in your neural network. So, download all four of them by clicking "Download all" button.

This is the MNIST dataset used world-wide to check the performance of neural networks based upon digit recognition.

It also contains training and testing labels.
P
N-MNIST Dataset
paperswithcode.com
Updated Mar 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). N-MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/n-mnist
Explore at:
Dataset updated
Mar 31, 2023
Description
Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the same 60 000 training and 10 000 testing samples as the original MNIST dataset, and is captured at the same visual scale as the original MNIST dataset (28x28 pixels). The N-MNIST dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views MNIST examples on an LCD monitor as shown in this video. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.

Orchard, G.; Cohen, G.; Jayawant, A.; and Thakor, N. “Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades", Frontiers in Neuroscience, vol.9, no.437, Oct. 2015
a
MNIST
datasets.activeloop.ai
deeplake
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yann LeCun, MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/mnist/
Explore at:
deeplakeAvailable download formats
Authors
Yann LeCun
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Time period covered
Jan 1, 1998 - Dec 31, 2000
Area covered
Earth
Dataset funded by
AT&T Bell Labs
Description
The MNIST dataset is a dataset of handwritten digits. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 60,000 training images and 10,000 test images. Each image is a 28x28 pixel grayscale image of a handwritten digit. The digits are labeled from 0 to 9.
Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...
zenodo.org
data.niaid.nih.gov
bin, json, zip
Updated Jun 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. http://doi.org/10.5281/zenodo.6632087
Explore at:
zip, json, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6632087
Dataset updated
Jun 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

MNIST IDX Dataset- Fasion

kaggle.com

Updated May 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

ShreyaSuresh (2025). MNIST IDX Dataset- Fasion [Dataset]. https://www.kaggle.com/datasets/shreyasuresh0407/mnist-idx-dataset-fasion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 21, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

ShreyaSuresh

Description

📦 About the Dataset

This project uses a classic machine learning dataset of handwritten digits — the MNIST dataset — stored in IDX format.

🧠 Each image is a 28x28 pixel grayscale picture of a handwritten number from 0 to 9. Your task is to teach a simple neural network (your "brain") to recognize these digits.

🔍 What’s Inside?

File Name	Description
`train-images-idx3-ubyte`	🖼️ 60,000 training images (28x28 pixels each)
`train-labels-idx1-ubyte`	🔢 Labels (0–9) for each training image
`t10k-images-idx3-ubyte`	🖼️ 10,000 test images
`t10k-labels-idx1-ubyte`	🔢 Labels (0–9) for test images

All files are in the IDX binary format, which is compact and fast for loading, but needs to be parsed using a small Python function (see below 👇).

###✨ Why This Dataset Is Awesome

🎯 It's the “Hello World” of machine learning — perfect for beginners
📊 Ideal for testing image classification algorithms
🧠 Helps you learn how neural networks "see" numbers
💥 Small enough to train quickly, powerful enough to learn real skills

🧩 Sample Image

(Add this cell below in your notebook to visualize a few images)

import matplotlib.pyplot as plt

# Show the first 10 images
fig, axes = plt.subplots(1, 10, figsize=(15, 2))
for i in range(10):
  axes[i].imshow(train_images[i][0], cmap="gray")
  axes[i].set_title(f"Label: {train_labels[i].item()}")
  axes[i].axis("off")
plt.show()

Discretized MNIST for Digital Circuits and Neural Networks based on...
figshare.com
zip
Updated Jan 13, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Agiza (2023). Discretized MNIST for Digital Circuits and Neural Networks based on Acid-Base Chemistry implemented by Robotic Fluid Handling [Dataset]. http://doi.org/10.6084/m9.figshare.21753545.v4
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.21753545.v4
Dataset updated
Jan 13, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Ahmed Agiza
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A discretized version (binary & 3-bit) version of the MNIST dataset for Nature Communications paper "Digital Circuits and Neural Networks based on Acid-Base Chemistry implemented by Robotic Fluid Handling "
MNIST-224by224-train-test-dataset
kaggle.com
Updated Nov 26, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
DHRUV Desh (2021). MNIST-224by224-train-test-dataset [Dataset]. https://www.kaggle.com/dhruvdesh/mnist224by224testdataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 26, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DHRUV Desh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

I needed 224 by 224 version of MNIST dataset for one of my projects so I made this.

Content

The dataset has 35 files in idx3-ubyte format with 2000 images each and dimension 224x224. 5 of these are test data files and 30 are train data files.

Acknowledgements

The actual MNIST dataset creators
f
Data_Sheet_1_Supervised Learning With First-to-Spike Decoding in Multilayer...
frontiersin.figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brian Gardner; André Grüning (2023). Data_Sheet_1_Supervised Learning With First-to-Spike Decoding in Multilayer Spiking Neural Networks.PDF [Dataset]. http://doi.org/10.3389/fncom.2021.617862.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fncom.2021.617862.s001
Dataset updated
May 30, 2023
Dataset provided by
Frontiers
Authors
Brian Gardner; André Grüning
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Experimental studies support the notion of spike-based neuronal information processing in the brain, with neural circuits exhibiting a wide range of temporally-based coding strategies to rapidly and efficiently represent sensory stimuli. Accordingly, it would be desirable to apply spike-based computation to tackling real-world challenges, and in particular transferring such theory to neuromorphic systems for low-power embedded applications. Motivated by this, we propose a new supervised learning method that can train multilayer spiking neural networks to solve classification problems based on a rapid, first-to-spike decoding strategy. The proposed learning rule supports multiple spikes fired by stochastic hidden neurons, and yet is stable by relying on first-spike responses generated by a deterministic output layer. In addition to this, we also explore several distinct, spike-based encoding strategies in order to form compact representations of presented input data. We demonstrate the classification performance of the learning rule as applied to several benchmark datasets, including MNIST. The learning rule is capable of generalizing from the data, and is successful even when used with constrained network architectures containing few input and hidden layer neurons. Furthermore, we highlight a novel encoding strategy, termed “scanline encoding,” that can transform image data into compact spatiotemporal patterns for subsequent network processing. Designing constrained, but optimized, network structures and performing input dimensionality reduction has strong implications for neuromorphic applications.
f
Model comparison results using MNIST-C and MNIST-C-shape datasets.
plos.figshare.com
xls
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seoyoung Ahn; Hossein Adeli; Gregory J. Zelinsky (2024). Model comparison results using MNIST-C and MNIST-C-shape datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012159.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pcbi.1012159.t001
Dataset updated
Jun 13, 2024
Dataset provided by
PLOS Computational Biology
Authors
Seoyoung Ahn; Hossein Adeli; Gregory J. Zelinsky
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Recognition accuracy (means and standard deviations from 5 trained models, hereafter referred to as model “runs”) from ORA and two CNN baselines, both of which were trained using identical CNN encoders (one a 2-layer CNN and the other a Resnet-18), and a CapsNet model following the implementation in [51].
MNIST Self Drawn Test Numbers
kaggle.com
Updated Mar 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hilkar (2023). MNIST Self Drawn Test Numbers [Dataset]. https://www.kaggle.com/datasets/hilkar/mnist-self-drawn-test-numbers
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 7, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hilkar
Description
Self Drawn Numbers for Testing the performance of Convolutional Neural Networks with trained with MNIST dataset.
Words MNIST
kaggle.com
Updated Jun 6, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TusharPawar (2018). Words MNIST [Dataset]. https://www.kaggle.com/backalla/words-mnist/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 6, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
TusharPawar
License
http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
Description
Context

During the time of developing an OCR software, this dataset was used to train the neural network. The original dataset has 1.2M images and this dataset is a random sample of 10k images from the main dataset.

Content

This dataset is a mix bag of images gathered from multiple sources. 1. Manually cropped and labelled images from natural scanned documents. 2. Synthetically generated images which look very similar to natural images to boost infrequent characters. 3. Data labelled using tesseract OCR software and manually checked for OCR errors.

Preview

https://image.ibb.co/g3ZLTT/12.jpg" alt="IT">
https://image.ibb.co/gnN78T/23.jpg" alt="oftener">
https://image.ibb.co/kHVS8T/49.png" alt="check">
https://image.ibb.co/iFpLTT/75.jpg" alt="Spor">
https://image.ibb.co/gMcUNo/104.jpg" alt="she>">
https://image.ibb.co/mrSUNo/116.jpg" alt="smirking">
https://image.ibb.co/eKUyF8/135.jpg" alt="for">
https://image.ibb.co/g8MOho/188.png" alt="(2)">

Details

Images are in raw format and do not have a specific size. Images may need to be resized for training. Only jpeg and png images. Characters vocabulary: English characters small/capital and special symbols

Acknowledgements

This dataset would not have been possible without the contributions of all the manual labellers, data contributors and also the developers of tesseract OCR software which was used to label a portion of this dataset.

Inspiration

This data is portrayed as a successor of the very famous MNIST dataset. This is done to give a more challenging task to the beginners who have solved the MNIST dataset and are looking for a level 2.
Z
Sparsified Model Zoo Twins: A Dataset of Sparsified Populations of Neural...
data.niaid.nih.gov
Updated Aug 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giró-i-Nieto, Xavier (2022). Sparsified Model Zoo Twins: A Dataset of Sparsified Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7023335
Explore at:
Dataset updated
Aug 28, 2022
Dataset provided by
Taskiran, Diyar
Borth, Damian
Giró-i-Nieto, Xavier
Knyazev, Boris
Schürholt, Konstantin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 27 model zoos with varying hyperparameter combinations are generated and includes 50’360 unique neural network models resulting in over 2’585’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the sparsified twins of models trained on MNIST. The original population is made available at https://doi.org/10.5281/zenodo.6632086. Sparsification is done using Variational Dropout, starting from the last epoch of the original population. The zip file contains the sparsification trajectory for 25 epochs for all 1000 models. All zoos with extensive information and code can be found at www.modelzoos.cc.
f
Federated EMNIST Dataset
figshare.com
xz
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saroj Mali (2024). Federated EMNIST Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.26308777.v1
Explore at:
xzAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26308777.v1
Dataset updated
Jul 16, 2024
Dataset provided by
figshare
Authors
Saroj Mali
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is derived from the Leaf repository (https://github.com/TalwalkarLab/leaf) pre-processing of the Extended MNIST dataset, grouping examples by writer. Details about Leaf were published in "LEAF: A Benchmark for Federated Settings" https://arxiv.org/abs/1812.01097Note: This dataset does not include some additional preprocessing that MNIST includes, such as size-normalization and centering. In the Federated EMNIST data, the value of 1.0 corresponds to the background, and 0.0 corresponds to the color of the digits themselves; this is the inverse of some MNIST representations, e.g. in tensorflow_datasets, where 0 corresponds to the background color, and 255 represents the color of the digit.Data set sizes:only_digits=True: 3,383 users, 10 label classestrain: 341,873 examplestest: 40,832 examplesonly_digits=False: 3,400 users, 62 label classestrain: 671,585 examplestest: 77,483 examplesRather than holding out specific users, each user's examples are split across train and test so that all users have at least one example in train and one example in test. Writers that had less than 2 examples are excluded from the data set.The tf.data.Datasets returned by tff.simulation.datasets.ClientData.create_tf_dataset_for_client will yield collections.OrderedDict objects at each iteration, with the following keys and values, in lexicographic order by key:'label': a tf.Tensor with dtype=tf.int32 and shape [1], the class label of the corresponding pixels. Labels [0-9] correspond to the digits classes, labels [10-35] correspond to the uppercase classes (e.g., label 11 is 'B'), and labels [36-61] correspond to the lowercase classes (e.g., label 37 is 'b').'pixels': a tf.Tensor with dtype=tf.float32 and shape [28, 28], containing the pixels of the handwritten digit, with values in the range [0.0, 1.0].Argsonly_digits(Optional) whether to only include examples that are from the digits [0-9] classes. If False, includes lower and upper case characters, for a total of 62 class labels.cache_dir(Optional) directory to cache the downloaded file. If None, caches in Keras' default cache directory.ReturnsTuple of (train, test) where the tuple elements are tff.simulation.datasets.ClientData objects.
P
Neural Field Arena - Classification Dataset
paperswithcode.com
data.niaid.nih.gov
+1more
Updated Dec 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Samuele Papa; Riccardo Valperga; David Knigge; Miltiadis Kofinas; Phillip Lippe; Jan-Jakob Sonke; Efstratios Gavves (2023). Neural Field Arena - Classification Dataset [Dataset]. https://paperswithcode.com/dataset/neural-field-arena-classification
Explore at:
Dataset updated
Dec 15, 2023
Authors
Samuele Papa; Riccardo Valperga; David Knigge; Miltiadis Kofinas; Phillip Lippe; Jan-Jakob Sonke; Efstratios Gavves
Description
Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, many works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is partly caused by the large amount of time required to fit datasets of neural fields.

Thanks to fit-a-nef, a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, we performed a comprehensive study that investigates the effects of different hyperparameters --including initialization, network architecture, and optimization strategies-- on fitting NeFs for downstream tasks. Based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.

The datasets that are currently available are the following:

MNIST, SIREN. CIFAR10, SIREN, MicroImageNet, SIREN. ShapeNet, SIREN.

More datasets will be added in the future.
Robustness assessment of a C++ implementation of a quantized (int8) version...
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David de Andrés; David de Andrés; Juan Carlos Ruiz; Juan Carlos Ruiz (2023). Robustness assessment of a C++ implementation of a quantized (int8) version of the LeNet-5 convolutional neural network [Dataset]. http://doi.org/10.5281/zenodo.10196616
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10196616
Dataset updated
Nov 22, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
David de Andrés; David de Andrés; Juan Carlos Ruiz; Juan Carlos Ruiz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 24, 2023 - Jun 26, 2023
Description
The architecture of the LeNet-5 convolutional neural network (CNN) was defined by LeCun in its paper "Gradient-based learning applied to document recognition" (https://ieeexplore.ieee.org/document/726791) to classify images of hand written digits (MNIST dataset).
This architecture has been customized to use Rectified Linear Unit (ReLU) as activation functions instead of Sigmoid, and 8-bit integers for weights and activations instead of floating-point.
It consists of the following layers:
conv1: Convolution 2D, 1 input channel (28x28), 3 output channels (28x28), kernel size 5, stride 1, padding 2.
relu1: Rectified Linear Unit (3@28x28).
max1: Subsampling buy max pooling (3@14x14).
conv2: Convolution 2D, 3 input channels (14x14), 6 output channels (14x14), kernel size 5, stride 1, padding 2.
relu2: Rectified Linear Unit (6@14x14).
max2: Subsampling buy max pooling (6@7x7).
fc1: Fully connected (294, 147)
fc2: Fully connected (147, 10)
The fault hypotheses for this work include the occurrence of:
BF: single, double-adjacent and triple-adjacent bit-flip faults
S0: single, double-adjacent and triple-adjacent stuck-at-0 faults
S1: single, double-adjacent and triple-adjacent stuck-at-1 faults
In the memory cells containing all the parameters of the CNN:
w: weights (int8)
zw: zero point of the weights (int8)
b: biases (int32)
z: zero point (int8)
m: m (int32)
Images 200 to 249 from the MNIST dataset have been used as workload.
This dataset contains the raw data obtained from running exhaustive fault injection campaigns for all considered fault models, targeting all considered locations and for all the images in the workload.
In addition, the raw data have been lightly processed to obtain global data related to the particular bits and parameters affected by the faults, and the obtained failure modes.
Files information
golden_run.csv: Prediction obtained for all the images considered in the workload in the absence of faults (Golden Run). This is intended to act as oracle to determine the impact of injected faults.
single_faults/bit_flip folder: Prediction obtained for all the images considered in the workload in presence of single bit-flip faults. There is one file for each parameter of each layer.
single_faults/stuck_at_0 folder: Prediction obtained for all the images considered in the workload in presence of single stuck-at-0 faults. There is one file for each parameter of each layer.
single_faults/stuck_at_1 folder: Prediction obtained for all the images considered in the workload in presence of single stuck-at-1 faults. There is one file for each parameter of each layer.
double_adjacent_faults/bit_flip folder: Prediction obtained for all the images considered in the workload in presence of double adjacent bit-flip faults. There is one file for each parameter of each layer.
double_adjacent_faults/stuck_at_0 folder: Prediction obtained for all the images considered in the workload in presence of double adjacent stuck-at-0 faults. There is one file for each parameter of each layer.
double_adjacent_faults/stuck_at_1 folder: Prediction obtained for all the images considered in the workload in presence of double adjacent stuck-at-1 faults. There is one file for each parameter of each layer.
triple_adjacent_faults/bit_flip folder: Prediction obtained for all the images considered in the workload in presence of triple adjacent bit-flip faults. There is one file for each parameter of each layer.
triple_adjacent_faults/stuck_at_0 folder: Prediction obtained for all the images considered in the workload in presence of triple adjacent stuck-at-0 faults. There is one file for each parameter of each layer.
triple_adjacent_faults/stuck_at_1 folder: Prediction obtained for all the images considered in the workload in presence of triple adjacent stuck-at-1 faults. There is one file for each parameter of each layer.
Methodology information
First, the CNN was used to classify all the images of the workload in the absence of faults to get a reference to determine the impact of faults. This is golden_run.csv file.
After that, one fault injection experiment was executed for each bit of each element of each parameter of the CNN.
Each experiment consisted in:
Affecting the bits (inverting it in case of bit-flip faults, setting it to 0 or 1 in case of stuck-at-0 or atuck-at-1 faults) identified by the mask.
Classifying all the images of the workload in the presence of this fault. The obtained output was stored in a given .csv file.
Removing the fault from the CNN by restoring the affected bits to its previous value.
List of variables (Name : Description (Possible values))
IMGID: Integer number identifying the considered image (200-249).
TENSORID: Integer number identiying the parameter affected by the fault (0 - No fault, 1 - conv1.w, 2 - conv1.zw, 3 - conv1.m, 4 - conv1.b, 5 - conv1.z, 6 - conv2.w, 7 - conv2.zw, 8 - conv2.m, 9 - conv2.b, 10 - conv2.z, 11 - fc1.w, 12 - fc1.zw, 13 - fc1.m, 14 - fc.b, 15 - fc1.z, 16 - fc2.w, 17 - fc2.zw, 18 - fc2.m, 19 - fc2.b, 20 - fc2.z)
ELEMID: Integer number identiying the element of the parameter affected by the fault (-1 - No fault, [0-2] - {conv1.b, conv1.m, conv1.zw}, [0-74] - conv1.w, 0 - conv1.z, [0-5] - {conv2.b, conv2.m, conv2.zw}, [0-149] - conv2.w, 0 - {conv1.z, conv2.z, fc1.z, fc2.z}, [0-146] - {fc1.b, fc1.m, fc1.zw}, [0-43217] - fc1.w, [0-9] - {fc2.b, fc2.m, fc2.zw}, [0-1469] - fc2.w)
MASK: 8-digit hexadecimal number identifying those bits affected by the fault ([00000000 - No fault, FFFFFFFF - all 32 bits faulty])
FAULT: String identiying the type of fault (NF - No fault, BF - bit-flip, S0 - Stuck-at-0, S1 - Stuck-at-1)
OUTPUT: 10 integer numbers provided by the CNN as output after processing the image. The highest value identifies the selected category for classification.
SOFTMAX: 10 decimal numbers obtained after applying the softmax function to the provided output. They represent the probability of the image of belonging to the corresponding category for classification.
PRED: Integer number representing the category predicted for the processed image.
LABEL: integer number representing the actual category for the processed image.
MedMNIST: Standardized Biomedical Images
kaggle.com
Updated Feb 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 2, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Möbius
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
"'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

MedMNIST Landscape :

https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

Key Features

###

Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

Starter Code: download more data and training

Github Page: https://github.com/MedMNIST/MedMNIST

My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

Acknowledgements

Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

License and Citation

The code is under Apache-2.0 License.

The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...
f
Data_Sheet_1_CRBA: A Competitive Rate-Based Algorithm Based on Competitive...
frontiersin.figshare.com
pdf
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paolo G. Cachi; Sebastián Ventura; Krzysztof J. Cios (2023). Data_Sheet_1_CRBA: A Competitive Rate-Based Algorithm Based on Competitive Spiking Neural Networks.PDF [Dataset]. http://doi.org/10.3389/fncom.2021.627567.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fncom.2021.627567.s001
Dataset updated
Jun 2, 2023
Dataset provided by
Frontiers
Authors
Paolo G. Cachi; Sebastián Ventura; Krzysztof J. Cios
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper we present a Competitive Rate-Based Algorithm (CRBA) that approximates operation of a Competitive Spiking Neural Network (CSNN). CRBA is based on modeling of the competition between neurons during a sample presentation, which can be reduced to ranking of the neurons based on a dot product operation and the use of a discrete Expectation Maximization algorithm; the latter is equivalent to the spike time-dependent plasticity rule. CRBA's performance is compared with that of CSNN on the MNIST and Fashion-MNIST datasets. The results show that CRBA performs on par with CSNN, while using three orders of magnitude less computational time. Importantly, we show that the weights and firing thresholds learned by CRBA can be used to initialize CSNN's parameters that results in its much more efficient operation.
OSCAR: Occluded Stereo dataset for Convolutional Architectures with...
zenodo.org
data.niaid.nih.gov
bin, text/x-python +1
Updated Dec 31, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Markus Roland Ernst; Markus Roland Ernst; Thomas Burwick; Thomas Burwick; Jochen Triesch; Jochen Triesch (2021). OSCAR: Occluded Stereo dataset for Convolutional Architectures with Recurrence [Dataset]. http://doi.org/10.5281/zenodo.4085133
Explore at:
bin, zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4085133
Dataset updated
Dec 31, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Markus Roland Ernst; Markus Roland Ernst; Thomas Burwick; Thomas Burwick; Jochen Triesch; Jochen Triesch
Description
OSCAR, the Occluded Stereo dataset for Convolutional Architectures with Recurrence. Version: 2.0
(dataset as presented in our JOV 2021 journal publication "Recurrent Processing Improves Occluded Object Recognition and Gives Rise to Perceptual Hysteresis")

If you make use of the dataset, please cite as follows:

Ernst, M. R., Burwick, T., & Triesch, J. (2021). Recurrent Processing Improves Occluded Object Recognition and Gives Rise to Perceptual Hysteresis. In Journal of Vision

Contents

readme.md - detailed description and sample pictures

img.zip - folder that contains images for the readme file

licence.md - licence agreement for using the datasets

os-fmnist2c.zip - compressed archive of the occluded stereo FashionMNIST dataset (centered, ~1.1GB)

os-fmnist2r.zip - compressed archive of the occluded stereo FashionMNIST dataset (random, ~1.2GB)

os-mnist2c.zip - compressed archive of the occluded stereo MNIST dataset (centered, ~865MB)

os-mnist2r.zip - compressed archive of the occluded stereo MNIST dataset (random, ~851MB)

os-ycb2.zip - compressed archive of the occluded stereo ycb-object dataset (~1.1GB)

os-ycb2_highres.zip - compressed archive of the occluded stereo ycb-object dataset (high resolution, ~9.8GB)

OSCARv2_dataset.py - python script to directly load image data from folder, pytorch dataset
Tamil Vowels (உயிர் எழுத்துக்கள்) Image dataset
kaggle.com
zip
Updated Jun 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muthu A (2020). Tamil Vowels (உயிர் எழுத்துக்கள்) Image dataset [Dataset]. https://www.kaggle.com/muthua/tamil-vowels-image-dataset
Explore at:
zip(2837781 bytes)Available download formats
Dataset updated
Jun 13, 2020
Authors
Muthu A
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Context

This dataset shows MNIST compatible 60,000 images of Tamil vowels அ - ஔ + ஆய்த எழுத்து in grayscale 28x28 pixel size. A total of 13 classes are to be identified from the image data. However, some of the augmented data are overflowing the bounding box and some minor cleanup maybe required like the following routine shows,

def load_acchu_data(mode='train'): path = os.path.split(_file_)[0] labels_path = os.path.join(path,'data',mode+'-label-onehot.npy') images_path = os.path.join(path,'data',mode+'-image.npy') labels = np.load(labels_path) images = np.load(images_path) # skip the rows which are more than 2 sides exceeding boundary. keep_rows = [] for i in range(images.shape[0]): img = images[i,:].reshape(28,28) hasTopFilled=any(img[0,:]) hasBotFilled=any(img[27,:]) hasLeftFilled=any(img[:,0]) hasRightFilled=any(img[:,27]) if sum([hasBotFilled, hasTopFilled, hasLeftFilled, hasRightFilled]) < 2: keep_rows.append(i) return labels[keep_rows,:],images[keep_rows,:]

Content

Content is float32 data set of 60,000 rows and 784 (=28x28) columns where each row shows one image of a Tamil vowel. The label is one-hot encoded version of the image from 0-12 with correspondence of அரிசுவடி வரிசை + ஆய்தம்.

Inspiration

This data-set was inspired by the classic MNIST dataset used by Yann Le-Cun.

Facebook

Twitter

Click to copy link

Link copied

Cite

Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632104

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST

Explore at:

Dataset updated

Jun 13, 2022

Dataset provided by

Taskiran, Diyar
Borth, Damian
Giró-i-Nieto, Xavier
Knyazev, Boris
Schürholt, Konstantin

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Clear search

Close search

Google apps

Main menu

Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...

MNIST Data for Digit Recognition

N-MNIST Dataset

MNIST

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

MNIST IDX Dataset- Fasion

🔍 What’s Inside?

🧩 Sample Image

Discretized MNIST for Digital Circuits and Neural Networks based on...

MNIST-224by224-train-test-dataset

Context

Content

Acknowledgements

Data_Sheet_1_Supervised Learning With First-to-Spike Decoding in Multilayer...

Model comparison results using MNIST-C and MNIST-C-shape datasets.

MNIST Self Drawn Test Numbers

Words MNIST

Context

Content

Preview

Details

Acknowledgements

Inspiration

Sparsified Model Zoo Twins: A Dataset of Sparsified Populations of Neural...

Federated EMNIST Dataset

Neural Field Arena - Classification Dataset

Robustness assessment of a C++ implementation of a quantized (int8) version...

Files information

Methodology information

List of variables (Name : Description (Possible values))

MedMNIST: Standardized Biomedical Images

Key Features

Starter Code: download more data and training

Acknowledgements

License and Citation

Data_Sheet_1_CRBA: A Competitive Rate-Based Algorithm Based on Competitive...

OSCAR: Occluded Stereo dataset for Convolutional Architectures with...

Tamil Vowels (உயிர் எழுத்துக்கள்) Image dataset

Context

Content

Inspiration

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNISTSee More Versions

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST