47 datasets found
  1. Z

    Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632104
    Explore at:
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Taskiran, Diyar
    Borth, Damian
    Giró-i-Nieto, Xavier
    Knyazev, Boris
    Schürholt, Konstantin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  2. MNIST Data for Digit Recognition

    • kaggle.com
    Updated Dec 22, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sylvia Mittal (2017). MNIST Data for Digit Recognition [Dataset]. https://www.kaggle.com/sylvia23/mnist-data-for-digit-recognation/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 22, 2017
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sylvia Mittal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • This dataset contains training and testing data for digit recognition which includes hand written images of digits.
      • It contains four zip files which you can easily include in your neural network. So, download all four of them by clicking "Download all" button.
      • This is the MNIST dataset used world-wide to check the performance of neural networks based upon digit recognition.
      • It also contains training and testing labels.
  3. P

    N-MNIST Dataset

    • paperswithcode.com
    Updated Mar 31, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). N-MNIST Dataset [Dataset]. https://paperswithcode.com/dataset/n-mnist
    Explore at:
    Dataset updated
    Mar 31, 2023
    Description

    Brief Description The Neuromorphic-MNIST (N-MNIST) dataset is a spiking version of the original frame-based MNIST dataset. It consists of the same 60 000 training and 10 000 testing samples as the original MNIST dataset, and is captured at the same visual scale as the original MNIST dataset (28x28 pixels). The N-MNIST dataset was captured by mounting the ATIS sensor on a motorized pan-tilt unit and having the sensor move while it views MNIST examples on an LCD monitor as shown in this video. A full description of the dataset and how it was created can be found in the paper below. Please cite this paper if you make use of the dataset.

    Orchard, G.; Cohen, G.; Jayawant, A.; and Thakor, N. “Converting Static Image Datasets to Spiking Neuromorphic Datasets Using Saccades", Frontiers in Neuroscience, vol.9, no.437, Oct. 2015

  4. a

    MNIST

    • datasets.activeloop.ai
    deeplake
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yann LeCun, MNIST [Dataset]. https://datasets.activeloop.ai/docs/ml/datasets/mnist/
    Explore at:
    deeplakeAvailable download formats
    Authors
    Yann LeCun
    License

    Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1998 - Dec 31, 2000
    Area covered
    Earth
    Dataset funded by
    AT&T Bell Labs
    Description

    The MNIST dataset is a dataset of handwritten digits. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 60,000 training images and 10,000 test images. Each image is a 28x28 pixel grayscale image of a handwritten digit. The digits are labeled from 0 to 9.

  5. Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, zip
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. http://doi.org/10.5281/zenodo.6632087
    Explore at:
    zip, json, binAvailable download formats
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  6. MNIST IDX Dataset- Fasion

    • kaggle.com
    Updated May 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShreyaSuresh (2025). MNIST IDX Dataset- Fasion [Dataset]. https://www.kaggle.com/datasets/shreyasuresh0407/mnist-idx-dataset-fasion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ShreyaSuresh
    Description

    📦 About the Dataset

    This project uses a classic machine learning dataset of handwritten digits — the MNIST dataset — stored in IDX format.

    🧠 Each image is a 28x28 pixel grayscale picture of a handwritten number from 0 to 9. Your task is to teach a simple neural network (your "brain") to recognize these digits.

    🔍 What’s Inside?

    File NameDescription
    train-images-idx3-ubyte🖼️ 60,000 training images (28x28 pixels each)
    train-labels-idx1-ubyte🔢 Labels (0–9) for each training image
    t10k-images-idx3-ubyte🖼️ 10,000 test images
    t10k-labels-idx1-ubyte🔢 Labels (0–9) for test images

    All files are in the IDX binary format, which is compact and fast for loading, but needs to be parsed using a small Python function (see below 👇).

    ###✨ Why This Dataset Is Awesome

    • 🎯 It's the “Hello World” of machine learning — perfect for beginners
    • 📊 Ideal for testing image classification algorithms
    • 🧠 Helps you learn how neural networks "see" numbers
    • 💥 Small enough to train quickly, powerful enough to learn real skills

    🧩 Sample Image

    (Add this cell below in your notebook to visualize a few images)

    import matplotlib.pyplot as plt
    
    # Show the first 10 images
    fig, axes = plt.subplots(1, 10, figsize=(15, 2))
    for i in range(10):
      axes[i].imshow(train_images[i][0], cmap="gray")
      axes[i].set_title(f"Label: {train_labels[i].item()}")
      axes[i].axis("off")
    plt.show()
    
  7. Discretized MNIST for Digital Circuits and Neural Networks based on...

    • figshare.com
    zip
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Agiza (2023). Discretized MNIST for Digital Circuits and Neural Networks based on Acid-Base Chemistry implemented by Robotic Fluid Handling [Dataset]. http://doi.org/10.6084/m9.figshare.21753545.v4
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Ahmed Agiza
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A discretized version (binary & 3-bit) version of the MNIST dataset for Nature Communications paper "Digital Circuits and Neural Networks based on Acid-Base Chemistry implemented by Robotic Fluid Handling "

  8. MNIST-224by224-train-test-dataset

    • kaggle.com
    Updated Nov 26, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DHRUV Desh (2021). MNIST-224by224-train-test-dataset [Dataset]. https://www.kaggle.com/dhruvdesh/mnist224by224testdataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 26, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DHRUV Desh
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    I needed 224 by 224 version of MNIST dataset for one of my projects so I made this.

    Content

    The dataset has 35 files in idx3-ubyte format with 2000 images each and dimension 224x224. 5 of these are test data files and 30 are train data files.

    Acknowledgements

    The actual MNIST dataset creators

  9. f

    Data_Sheet_1_Supervised Learning With First-to-Spike Decoding in Multilayer...

    • frontiersin.figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brian Gardner; André Grüning (2023). Data_Sheet_1_Supervised Learning With First-to-Spike Decoding in Multilayer Spiking Neural Networks.PDF [Dataset]. http://doi.org/10.3389/fncom.2021.617862.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Brian Gardner; André Grüning
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Experimental studies support the notion of spike-based neuronal information processing in the brain, with neural circuits exhibiting a wide range of temporally-based coding strategies to rapidly and efficiently represent sensory stimuli. Accordingly, it would be desirable to apply spike-based computation to tackling real-world challenges, and in particular transferring such theory to neuromorphic systems for low-power embedded applications. Motivated by this, we propose a new supervised learning method that can train multilayer spiking neural networks to solve classification problems based on a rapid, first-to-spike decoding strategy. The proposed learning rule supports multiple spikes fired by stochastic hidden neurons, and yet is stable by relying on first-spike responses generated by a deterministic output layer. In addition to this, we also explore several distinct, spike-based encoding strategies in order to form compact representations of presented input data. We demonstrate the classification performance of the learning rule as applied to several benchmark datasets, including MNIST. The learning rule is capable of generalizing from the data, and is successful even when used with constrained network architectures containing few input and hidden layer neurons. Furthermore, we highlight a novel encoding strategy, termed “scanline encoding,” that can transform image data into compact spatiotemporal patterns for subsequent network processing. Designing constrained, but optimized, network structures and performing input dimensionality reduction has strong implications for neuromorphic applications.

  10. f

    Model comparison results using MNIST-C and MNIST-C-shape datasets.

    • plos.figshare.com
    xls
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seoyoung Ahn; Hossein Adeli; Gregory J. Zelinsky (2024). Model comparison results using MNIST-C and MNIST-C-shape datasets. [Dataset]. http://doi.org/10.1371/journal.pcbi.1012159.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    PLOS Computational Biology
    Authors
    Seoyoung Ahn; Hossein Adeli; Gregory J. Zelinsky
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recognition accuracy (means and standard deviations from 5 trained models, hereafter referred to as model “runs”) from ORA and two CNN baselines, both of which were trained using identical CNN encoders (one a 2-layer CNN and the other a Resnet-18), and a CapsNet model following the implementation in [51].

  11. MNIST Self Drawn Test Numbers

    • kaggle.com
    Updated Mar 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hilkar (2023). MNIST Self Drawn Test Numbers [Dataset]. https://www.kaggle.com/datasets/hilkar/mnist-self-drawn-test-numbers
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Hilkar
    Description

    Self Drawn Numbers for Testing the performance of Convolutional Neural Networks with trained with MNIST dataset.

  12. Words MNIST

    • kaggle.com
    Updated Jun 6, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TusharPawar (2018). Words MNIST [Dataset]. https://www.kaggle.com/backalla/words-mnist/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 6, 2018
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    TusharPawar
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Description

    Context

    During the time of developing an OCR software, this dataset was used to train the neural network. The original dataset has 1.2M images and this dataset is a random sample of 10k images from the main dataset.

    Content

    This dataset is a mix bag of images gathered from multiple sources. 1. Manually cropped and labelled images from natural scanned documents. 2. Synthetically generated images which look very similar to natural images to boost infrequent characters. 3. Data labelled using tesseract OCR software and manually checked for OCR errors.

    Preview

    https://image.ibb.co/g3ZLTT/12.jpg" alt="IT">
    https://image.ibb.co/gnN78T/23.jpg" alt="oftener">
    https://image.ibb.co/kHVS8T/49.png" alt="check">
    https://image.ibb.co/iFpLTT/75.jpg" alt="Spor">
    https://image.ibb.co/gMcUNo/104.jpg" alt="she>">
    https://image.ibb.co/mrSUNo/116.jpg" alt="smirking">
    https://image.ibb.co/eKUyF8/135.jpg" alt="for">
    https://image.ibb.co/g8MOho/188.png" alt="(2)">

    Details

    Images are in raw format and do not have a specific size. Images may need to be resized for training. Only jpeg and png images. Characters vocabulary: English characters small/capital and special symbols

    Acknowledgements

    This dataset would not have been possible without the contributions of all the manual labellers, data contributors and also the developers of tesseract OCR software which was used to label a portion of this dataset.

    Inspiration

    This data is portrayed as a successor of the very famous MNIST dataset. This is done to give a more challenging task to the beginners who have solved the MNIST dataset and are looking for a level 2.

  13. Z

    Sparsified Model Zoo Twins: A Dataset of Sparsified Populations of Neural...

    • data.niaid.nih.gov
    Updated Aug 28, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giró-i-Nieto, Xavier (2022). Sparsified Model Zoo Twins: A Dataset of Sparsified Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7023335
    Explore at:
    Dataset updated
    Aug 28, 2022
    Dataset provided by
    Taskiran, Diyar
    Borth, Damian
    Giró-i-Nieto, Xavier
    Knyazev, Boris
    Schürholt, Konstantin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 27 model zoos with varying hyperparameter combinations are generated and includes 50’360 unique neural network models resulting in over 2’585’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the sparsified twins of models trained on MNIST. The original population is made available at https://doi.org/10.5281/zenodo.6632086. Sparsification is done using Variational Dropout, starting from the last epoch of the original population. The zip file contains the sparsification trajectory for 25 epochs for all 1000 models. All zoos with extensive information and code can be found at www.modelzoos.cc.

  14. f

    Federated EMNIST Dataset

    • figshare.com
    xz
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saroj Mali (2024). Federated EMNIST Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.26308777.v1
    Explore at:
    xzAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    figshare
    Authors
    Saroj Mali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset is derived from the Leaf repository (https://github.com/TalwalkarLab/leaf) pre-processing of the Extended MNIST dataset, grouping examples by writer. Details about Leaf were published in "LEAF: A Benchmark for Federated Settings" https://arxiv.org/abs/1812.01097Note: This dataset does not include some additional preprocessing that MNIST includes, such as size-normalization and centering. In the Federated EMNIST data, the value of 1.0 corresponds to the background, and 0.0 corresponds to the color of the digits themselves; this is the inverse of some MNIST representations, e.g. in tensorflow_datasets, where 0 corresponds to the background color, and 255 represents the color of the digit.Data set sizes:only_digits=True: 3,383 users, 10 label classestrain: 341,873 examplestest: 40,832 examplesonly_digits=False: 3,400 users, 62 label classestrain: 671,585 examplestest: 77,483 examplesRather than holding out specific users, each user's examples are split across train and test so that all users have at least one example in train and one example in test. Writers that had less than 2 examples are excluded from the data set.The tf.data.Datasets returned by tff.simulation.datasets.ClientData.create_tf_dataset_for_client will yield collections.OrderedDict objects at each iteration, with the following keys and values, in lexicographic order by key:'label': a tf.Tensor with dtype=tf.int32 and shape [1], the class label of the corresponding pixels. Labels [0-9] correspond to the digits classes, labels [10-35] correspond to the uppercase classes (e.g., label 11 is 'B'), and labels [36-61] correspond to the lowercase classes (e.g., label 37 is 'b').'pixels': a tf.Tensor with dtype=tf.float32 and shape [28, 28], containing the pixels of the handwritten digit, with values in the range [0.0, 1.0].Argsonly_digits(Optional) whether to only include examples that are from the digits [0-9] classes. If False, includes lower and upper case characters, for a total of 62 class labels.cache_dir(Optional) directory to cache the downloaded file. If None, caches in Keras' default cache directory.ReturnsTuple of (train, test) where the tuple elements are tff.simulation.datasets.ClientData objects.

  15. P

    Neural Field Arena - Classification Dataset

    • paperswithcode.com
    • data.niaid.nih.gov
    • +1more
    Updated Dec 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samuele Papa; Riccardo Valperga; David Knigge; Miltiadis Kofinas; Phillip Lippe; Jan-Jakob Sonke; Efstratios Gavves (2023). Neural Field Arena - Classification Dataset [Dataset]. https://paperswithcode.com/dataset/neural-field-arena-classification
    Explore at:
    Dataset updated
    Dec 15, 2023
    Authors
    Samuele Papa; Riccardo Valperga; David Knigge; Miltiadis Kofinas; Phillip Lippe; Jan-Jakob Sonke; Efstratios Gavves
    Description

    Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, many works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is partly caused by the large amount of time required to fit datasets of neural fields.

    Thanks to fit-a-nef, a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, we performed a comprehensive study that investigates the effects of different hyperparameters --including initialization, network architecture, and optimization strategies-- on fitting NeFs for downstream tasks. Based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.

    The datasets that are currently available are the following:

    MNIST, SIREN. CIFAR10, SIREN, MicroImageNet, SIREN. ShapeNet, SIREN.

    More datasets will be added in the future.

  16. Robustness assessment of a C++ implementation of a quantized (int8) version...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David de Andrés; David de Andrés; Juan Carlos Ruiz; Juan Carlos Ruiz (2023). Robustness assessment of a C++ implementation of a quantized (int8) version of the LeNet-5 convolutional neural network [Dataset]. http://doi.org/10.5281/zenodo.10196616
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 22, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    David de Andrés; David de Andrés; Juan Carlos Ruiz; Juan Carlos Ruiz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jun 24, 2023 - Jun 26, 2023
    Description

    The architecture of the LeNet-5 convolutional neural network (CNN) was defined by LeCun in its paper "Gradient-based learning applied to document recognition" (https://ieeexplore.ieee.org/document/726791) to classify images of hand written digits (MNIST dataset).

    This architecture has been customized to use Rectified Linear Unit (ReLU) as activation functions instead of Sigmoid, and 8-bit integers for weights and activations instead of floating-point.

    It consists of the following layers:

    • conv1: Convolution 2D, 1 input channel (28x28), 3 output channels (28x28), kernel size 5, stride 1, padding 2.
    • relu1: Rectified Linear Unit (3@28x28).
    • max1: Subsampling buy max pooling (3@14x14).
    • conv2: Convolution 2D, 3 input channels (14x14), 6 output channels (14x14), kernel size 5, stride 1, padding 2.
    • relu2: Rectified Linear Unit (6@14x14).
    • max2: Subsampling buy max pooling (6@7x7).
    • fc1: Fully connected (294, 147)
    • fc2: Fully connected (147, 10)

    The fault hypotheses for this work include the occurrence of:

    • BF: single, double-adjacent and triple-adjacent bit-flip faults
    • S0: single, double-adjacent and triple-adjacent stuck-at-0 faults
    • S1: single, double-adjacent and triple-adjacent stuck-at-1 faults

    In the memory cells containing all the parameters of the CNN:

    • w: weights (int8)
    • zw: zero point of the weights (int8)
    • b: biases (int32)
    • z: zero point (int8)
    • m: m (int32)

    Images 200 to 249 from the MNIST dataset have been used as workload.

    This dataset contains the raw data obtained from running exhaustive fault injection campaigns for all considered fault models, targeting all considered locations and for all the images in the workload.

    In addition, the raw data have been lightly processed to obtain global data related to the particular bits and parameters affected by the faults, and the obtained failure modes.

    Files information

    • golden_run.csv: Prediction obtained for all the images considered in the workload in the absence of faults (Golden Run). This is intended to act as oracle to determine the impact of injected faults.
    • single_faults/bit_flip folder: Prediction obtained for all the images considered in the workload in presence of single bit-flip faults. There is one file for each parameter of each layer.
    • single_faults/stuck_at_0 folder: Prediction obtained for all the images considered in the workload in presence of single stuck-at-0 faults. There is one file for each parameter of each layer.
    • single_faults/stuck_at_1 folder: Prediction obtained for all the images considered in the workload in presence of single stuck-at-1 faults. There is one file for each parameter of each layer.
    • double_adjacent_faults/bit_flip folder: Prediction obtained for all the images considered in the workload in presence of double adjacent bit-flip faults. There is one file for each parameter of each layer.
    • double_adjacent_faults/stuck_at_0 folder: Prediction obtained for all the images considered in the workload in presence of double adjacent stuck-at-0 faults. There is one file for each parameter of each layer.
    • double_adjacent_faults/stuck_at_1 folder: Prediction obtained for all the images considered in the workload in presence of double adjacent stuck-at-1 faults. There is one file for each parameter of each layer.
    • triple_adjacent_faults/bit_flip folder: Prediction obtained for all the images considered in the workload in presence of triple adjacent bit-flip faults. There is one file for each parameter of each layer.
    • triple_adjacent_faults/stuck_at_0 folder: Prediction obtained for all the images considered in the workload in presence of triple adjacent stuck-at-0 faults. There is one file for each parameter of each layer.
    • triple_adjacent_faults/stuck_at_1 folder: Prediction obtained for all the images considered in the workload in presence of triple adjacent stuck-at-1 faults. There is one file for each parameter of each layer.

    Methodology information

    First, the CNN was used to classify all the images of the workload in the absence of faults to get a reference to determine the impact of faults. This is golden_run.csv file.

    After that, one fault injection experiment was executed for each bit of each element of each parameter of the CNN.

    Each experiment consisted in:

    • Affecting the bits (inverting it in case of bit-flip faults, setting it to 0 or 1 in case of stuck-at-0 or atuck-at-1 faults) identified by the mask.
    • Classifying all the images of the workload in the presence of this fault. The obtained output was stored in a given .csv file.
    • Removing the fault from the CNN by restoring the affected bits to its previous value.

    List of variables (Name : Description (Possible values))

    • IMGID: Integer number identifying the considered image (200-249).
    • TENSORID: Integer number identiying the parameter affected by the fault (0 - No fault, 1 - conv1.w, 2 - conv1.zw, 3 - conv1.m, 4 - conv1.b, 5 - conv1.z, 6 - conv2.w, 7 - conv2.zw, 8 - conv2.m, 9 - conv2.b, 10 - conv2.z, 11 - fc1.w, 12 - fc1.zw, 13 - fc1.m, 14 - fc.b, 15 - fc1.z, 16 - fc2.w, 17 - fc2.zw, 18 - fc2.m, 19 - fc2.b, 20 - fc2.z)
    • ELEMID: Integer number identiying the element of the parameter affected by the fault (-1 - No fault, [0-2] - {conv1.b, conv1.m, conv1.zw}, [0-74] - conv1.w, 0 - conv1.z, [0-5] - {conv2.b, conv2.m, conv2.zw}, [0-149] - conv2.w, 0 - {conv1.z, conv2.z, fc1.z, fc2.z}, [0-146] - {fc1.b, fc1.m, fc1.zw}, [0-43217] - fc1.w, [0-9] - {fc2.b, fc2.m, fc2.zw}, [0-1469] - fc2.w)
    • MASK: 8-digit hexadecimal number identifying those bits affected by the fault ([00000000 - No fault, FFFFFFFF - all 32 bits faulty])
    • FAULT: String identiying the type of fault (NF - No fault, BF - bit-flip, S0 - Stuck-at-0, S1 - Stuck-at-1)
    • OUTPUT: 10 integer numbers provided by the CNN as output after processing the image. The highest value identifies the selected category for classification.
    • SOFTMAX: 10 decimal numbers obtained after applying the softmax function to the provided output. They represent the probability of the image of belonging to the corresponding category for classification.
    • PRED: Integer number representing the category predicted for the processed image.
    • LABEL: integer number representing the actual category for the processed image.

  17. MedMNIST: Standardized Biomedical Images

    • kaggle.com
    Updated Feb 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Möbius (2024). MedMNIST: Standardized Biomedical Images [Dataset]. https://www.kaggle.com/datasets/arashnic/standardized-biomedical-images-medmnist
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 2, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Möbius
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    "'https://www.nature.com/articles/s41597-022-01721-8'">MedMNIST v2 - A large-scale lightweight benchmark for 2D and 3D biomedical image classification https://www.nature.com/articles/s41597-022-01721-8

    A large-scale MNIST-like collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are pre-processed into 28x28 (2D) or 28x28x28 (3D) with the corresponding classification labels, so that no background knowledge is required for users. Covering primary data modalities in biomedical images, MedMNIST is designed to perform classification on lightweight 2D and 3D images with various data scales (from 100 to 100,000) and diverse tasks (binary/multi-class, ordinal regression and multi-label). The resulting dataset, consisting of approximately 708K 2D images and 10K 3D images in total, could support numerous research and educational purposes in biomedical image analysis, computer vision and machine learning.Providers benchmark several baseline methods on MedMNIST, including 2D / 3D neural networks and open-source / commercial AutoML tools.

    MedMNIST Landscape :

    https://storage.googleapis.com/kagglesdsdata/datasets/4390240/7539891/medmnistlandscape.png?X-Goog-Algorithm=GOOG4-RSA-SHA256&X-Goog-Credential=databundle-worker-v2%40kaggle-161607.iam.gserviceaccount.com%2F20240202%2Fauto%2Fstorage%2Fgoog4_request&X-Goog-Date=20240202T132716Z&X-Goog-Expires=345600&X-Goog-SignedHeaders=host&X-Goog-Signature=479c8d80a4c6f28bf9532fea037969292a4f963662b022484a79c139297cfa1afc82db06c9b5275d6c52d5555d7fb178701d3ad7ebb036c9cf3d076fcf41014c05a6230d293f39dd320303efaa81d18e9c5888c23fe19884148a3be618e3e7c041383119a4c5547f0fa6cb1ddb5f3bf4dc1330a6fd5c693f32280e90fde5735e02052f2fc5b0003085d9ea70039903439814154dc39980dce3bace422d0672a69c4f4cefbe6bcebaacd2c5192a60172143667b14ba050a8383d0a7c6c639526c820ae58bbad99b4afc84e97bc87b2da6002d6faf181d4138e2a33961514370578892409b1e1a662424051573a3392273b00132a4f39becff877dff16a594848f" alt="medmnistlandscape">

    About MedMNIST Landscape figure: The horizontal axis denotes the base-10 logarithm of the dataset scale, and the vertical axis denotes base-10 logarithm of imaging resolution. The upward and downward triangles are used to distinguish between 2D datasets and 3D datasets, and the 4 different colors represent different tasks

    Key Features

    ###

    Diverse: It covers diverse data modalities, dataset scales (from 100 to 100,000), and tasks (binary/multi-class, multi-label, and ordinal regression). It is as diverse as the VDD and MSD to fairly evaluate the generalizable performance of machine learning algorithms in different settings, but both 2D and 3D biomedical images are provided.

    Standardized: Each sub-dataset is pre-processed into the same format, which requires no background knowledge for users. As an MNIST-like dataset collection to perform classification tasks on small images, it primarily focuses on the machine learning part rather than the end-to-end system. Furthermore, we provide standard train-validation-test splits for all datasets in MedMNIST, therefore algorithms could be easily compared.

    User-Friendly: The small size of 28×28 (2D) or 28×28×28 (3D) is lightweight and ideal for evaluating machine learning algorithms. We also offer a larger-size version, MedMNIST+: 64x64 (2D), 128x128 (2D), 224x224 (2D), and 64x64x64 (3D). Serving as a complement to the 28-size MedMNIST, this could be a standardized resource for developing medical foundation models. All these datasets are accessible via the same API.

    Educational: As an interdisciplinary research area, biomedical image analysis is difficult to hand on for researchers from other communities, as it requires background knowledge from computer vision, machine learning, biomedical imaging, and clinical science. Our data with the Creative Commons (CC) License is easy to use for educational purposes.

    Refer to the paper to learn more about data : https://www.nature.com/articles/s41597-022-01721-8

    Starter Code: download more data and training

    Github Page: https://github.com/MedMNIST/MedMNIST

    My Kaggle Starter Notebook: https://www.kaggle.com/code/arashnic/medmnist-download-and-use-data?scriptVersionId=161421937

    Acknowledgements

    Jiancheng Yang,Rui Shi,Donglai Wei,Zequan Liu,Lin Zhao,Bilian Ke,Hanspeter Pfister,Bingbing Ni Shanghai Jiao Tong University, Shanghai, China, Boston College, Chestnut Hill, MA RWTH Aachen University, Aachen, Germany, Fudan Institute of Metabolic Diseases, Zhongshan Hospital, Fudan University, Shanghai, China, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China, Harvard University, Cambridge, MA

    License and Citation

    The code is under Apache-2.0 License.

    The MedMNIST dataset is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)...

  18. f

    Data_Sheet_1_CRBA: A Competitive Rate-Based Algorithm Based on Competitive...

    • frontiersin.figshare.com
    pdf
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paolo G. Cachi; Sebastián Ventura; Krzysztof J. Cios (2023). Data_Sheet_1_CRBA: A Competitive Rate-Based Algorithm Based on Competitive Spiking Neural Networks.PDF [Dataset]. http://doi.org/10.3389/fncom.2021.627567.s001
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Frontiers
    Authors
    Paolo G. Cachi; Sebastián Ventura; Krzysztof J. Cios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper we present a Competitive Rate-Based Algorithm (CRBA) that approximates operation of a Competitive Spiking Neural Network (CSNN). CRBA is based on modeling of the competition between neurons during a sample presentation, which can be reduced to ranking of the neurons based on a dot product operation and the use of a discrete Expectation Maximization algorithm; the latter is equivalent to the spike time-dependent plasticity rule. CRBA's performance is compared with that of CSNN on the MNIST and Fashion-MNIST datasets. The results show that CRBA performs on par with CSNN, while using three orders of magnitude less computational time. Importantly, we show that the weights and firing thresholds learned by CRBA can be used to initialize CSNN's parameters that results in its much more efficient operation.

  19. OSCAR: Occluded Stereo dataset for Convolutional Architectures with...

    • zenodo.org
    • data.niaid.nih.gov
    bin, text/x-python +1
    Updated Dec 31, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Markus Roland Ernst; Markus Roland Ernst; Thomas Burwick; Thomas Burwick; Jochen Triesch; Jochen Triesch (2021). OSCAR: Occluded Stereo dataset for Convolutional Architectures with Recurrence [Dataset]. http://doi.org/10.5281/zenodo.4085133
    Explore at:
    bin, zip, text/x-pythonAvailable download formats
    Dataset updated
    Dec 31, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Markus Roland Ernst; Markus Roland Ernst; Thomas Burwick; Thomas Burwick; Jochen Triesch; Jochen Triesch
    Description

    OSCAR, the Occluded Stereo dataset for Convolutional Architectures with Recurrence. Version: 2.0
    (dataset as presented in our JOV 2021 journal publication "Recurrent Processing Improves Occluded Object Recognition and Gives Rise to Perceptual Hysteresis")

    If you make use of the dataset, please cite as follows:

    Ernst, M. R., Burwick, T., & Triesch, J. (2021). Recurrent Processing Improves Occluded Object Recognition and Gives Rise to Perceptual Hysteresis. In Journal of Vision

    Contents

    • readme.md - detailed description and sample pictures
    • img.zip - folder that contains images for the readme file
    • licence.md - licence agreement for using the datasets
    • os-fmnist2c.zip - compressed archive of the occluded stereo FashionMNIST dataset (centered, ~1.1GB)
    • os-fmnist2r.zip - compressed archive of the occluded stereo FashionMNIST dataset (random, ~1.2GB)
    • os-mnist2c.zip - compressed archive of the occluded stereo MNIST dataset (centered, ~865MB)
    • os-mnist2r.zip - compressed archive of the occluded stereo MNIST dataset (random, ~851MB)
    • os-ycb2.zip - compressed archive of the occluded stereo ycb-object dataset (~1.1GB)
    • os-ycb2_highres.zip - compressed archive of the occluded stereo ycb-object dataset (high resolution, ~9.8GB)
    • OSCARv2_dataset.py - python script to directly load image data from folder, pytorch dataset
  20. Tamil Vowels (உயிர் எழுத்துக்கள்) Image dataset

    • kaggle.com
    zip
    Updated Jun 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muthu A (2020). Tamil Vowels (உயிர் எழுத்துக்கள்) Image dataset [Dataset]. https://www.kaggle.com/muthua/tamil-vowels-image-dataset
    Explore at:
    zip(2837781 bytes)Available download formats
    Dataset updated
    Jun 13, 2020
    Authors
    Muthu A
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    This dataset shows MNIST compatible 60,000 images of Tamil vowels அ - ஔ + ஆய்த எழுத்து in grayscale 28x28 pixel size. A total of 13 classes are to be identified from the image data. However, some of the augmented data are overflowing the bounding box and some minor cleanup maybe required like the following routine shows,

    def load_acchu_data(mode='train'):
      path = os.path.split(_file_)[0]
      labels_path = os.path.join(path,'data',mode+'-label-onehot.npy')
      images_path = os.path.join(path,'data',mode+'-image.npy')
      labels = np.load(labels_path)
      images = np.load(images_path)
      # skip the rows which are more than 2 sides exceeding boundary.
      keep_rows = []
      for i in range(images.shape[0]):
        img = images[i,:].reshape(28,28)
        hasTopFilled=any(img[0,:])
        hasBotFilled=any(img[27,:])
        hasLeftFilled=any(img[:,0])
        hasRightFilled=any(img[:,27])
        if sum([hasBotFilled, hasTopFilled, hasLeftFilled, hasRightFilled]) < 2:
          keep_rows.append(i)
      return labels[keep_rows,:],images[keep_rows,:]
    

    Content

    Content is float32 data set of 60,000 rows and 784 (=28x28) columns where each row shows one image of a Tamil vowel. The label is one-hot encoded version of the image from 0-12 with correspondence of அரிசுவடி வரிசை + ஆய்தம்.

    Inspiration

    This data-set was inspired by the classic MNIST dataset used by Yann Le-Cun.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632104

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST

Explore at:
Dataset updated
Jun 13, 2022
Dataset provided by
Taskiran, Diyar
Borth, Damian
Giró-i-Nieto, Xavier
Knyazev, Boris
Schürholt, Konstantin
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Search
Clear search
Close search
Google apps
Main menu