6 datasets found
  1. Model Zoo: A Dataset of Diverse Populations of Neural Network Models -...

    • zenodo.org
    • data.niaid.nih.gov
    bin, json, zip
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. http://doi.org/10.5281/zenodo.6632105
    Explore at:
    bin, zip, jsonAvailable download formats
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  2. Deep-Learning-using-MNIST-Dataset

    • kaggle.com
    zip
    Updated Feb 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Adeolu Joseph (2023). Deep-Learning-using-MNIST-Dataset [Dataset]. https://www.kaggle.com/datasets/adeolujoseph/deep-learning-using-mnist-dataset/suggestions
    Explore at:
    zip(36110 bytes)Available download formats
    Dataset updated
    Feb 26, 2023
    Authors
    Adeolu Joseph
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Pytorch The MNIST database of handwritten digits has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image THE ORIGINAL DATA SET CAN BE FOUND IN http://yann.lecun.com/exdb/mnist/ This projects uses 2 hidden Layers with 128 and 64 units. SGD optimizer was used to improve the Weights and bias

  3. Bangla-MNIST

    • kaggle.com
    zip
    Updated Mar 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rito Ghosh (2021). Bangla-MNIST [Dataset]. https://www.kaggle.com/truthr/banglamnist
    Explore at:
    zip(1371699534 bytes)Available download formats
    Dataset updated
    Mar 16, 2021
    Authors
    Rito Ghosh
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Starter Notebook (PyTorch)

    Context

    The MNIST Dataset which was a challenge for people in the field of Computer Vision, has long been 'solved'. Vision models have achieved superhuman level of accuracy in the MNIST dataset. The MNIST still remains one of the most important datasets a student or a practitioner of Computer Vision comes across. It is widely used as a benchmark for newer models and architectures. It is used widely to demonstrate new frameworks, new methods, and so on. It is also one of the most famous datasets for teaching Deep Learning.

    Such a Dataset was lacking in the Bengali language. The Bengali language has its own digits, and the goal of this dataset is to present an easily usable dataset that is all completely labeled.

    The NumtaDB database exists for a while, but to get to the ease that MNIST provides, some work has to be put on it. This dataset aims to do just that. It provides the ease you are provided with when you are using the MNIST Dataset.

    Content

    The dataset contains more than 72,000 files which are all completely labeled. The labels are supplied in the CSV file provided. The dataset does not contain a train-validation-test split, as it can be done trivially.

    Acknowledgements

    • Folks at and contributing to Bengali AI, who created the NumtaDB database which is a huge undertaking. This dataset is created solely from that dataset with no additional data.
    • Yann LeCun for creating MNIST which served the DL community and research in a major and unique way.

    Inspiration

    Bengali is spoken by 228 million people all over the world. Bengali digits are predominantly used in billboards, signs, car signs in several states of India, and Bangladesh. This dataset is intended to be used in commercial and non-commercial settings.

    • Research- This context is important to researchers of Vision whether they are interested particularly in Bengali application in industry or academia. This dataset provides an example for application of newly developed methods in a digit dataset other than English. This also provides the opportunity to quickly prototype solutions while doing research, because this dataset is aimed towards providing ease and comfort in its usage.
    • Education- This dataset paves the way for educators to teach Deep Learning techniques quickly using this dataset.
    • Commerce- As this dataset is provided with the CC-BY-SA-4.0 license, it is free for commercial use.

    Citation

    If you use this dataset for research or project, it is important that you cite both of these entries below-

    @dataset{banglamnist,
      author = {Ritobrata Ghosh},
      year = {2021},
      title = {Bangla-MNIST},
      publisher = {Kaggle},
      address = {Kolkata}
    }
    

    Or,

    Ghosh, Ritobrata; Bangla-MNIST via Kaggle, doi: 10.34740/kaggle/dsv/2029296 And, BengaliAI

    It's exciting to have a dataset that provides the ease of MNIST for Bengali digits. Fascinating things are possible. Let's begin- ৯, ৮, ৭, ৬, ৫, ৪, ৩, ২, ১, ০!

  4. PHCD - Polish Handwritten Characters Database

    • kaggle.com
    zip
    Updated Dec 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wiktor Flis (2023). PHCD - Polish Handwritten Characters Database [Dataset]. https://www.kaggle.com/datasets/westedcrean/phcd-polish-handwritten-characters-database/versions/3
    Explore at:
    zip(250262763 bytes)Available download formats
    Dataset updated
    Dec 30, 2023
    Authors
    Wiktor Flis
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F950187%2Fd8a0b40fa9a5ad45c65e703b28d4a504%2Fbackground.png?generation=1703873571061442&alt=media" alt="">

    The process for collecting this dataset was documented in paper "https://doi.org/10.12913/22998624/122567">"Development of Extensive Polish Handwritten Characters Database for Text Recognition Research" by Mikhail Tokovarov, dr Monika Kaczorowska and dr Marek Miłosz. Link to download the original dataset: https://cs.pollub.pl/phcd/. The source fileset also contains a dataset of raw images of whole sentences written in Polish.

    Context

    PHCD (Polish Handwritten Characters Database) is a collection of handwritten texts in Polish. It was created by researchers at Lublin University of Technology for the purpose of offline handwritten text recognition. The database contains more than 530 000 images of handwritten characters. Each image is a 32x32 pixel grayscale image representing one of 89 classes (10 digits, 26 lowercase latin letters, 26 uppercase latin letters, 9 lowercase polish letters, 9 uppercase polish letters and 9 special characters), with around 6 000 examples per class.

    How to use

    This notebook contains a PyTorch example of how to load the dataset from .npz files and train a CNN model. You can also use the dataset with other frameworks, such as TensorFlow, Keras, etc.

    For .npz files, use numpy.load method.

    Contents

    The dataset contains the following:

    • dataset.npz - a file with two compressed numpy arrays:
      • "signs" - with all the images, sized 32 x 32 (grayscale)
      • "labels" - with all the labels (0-88) for examples from signs
    • label_mapping.csv - a csv file with columns label and char, mapping from ids to characters from dataset
    • images - folder with original 530 000 png images, sized 32 x 32, to use with other loading techniques

    Acknowledgements

    I want to express my gratitude to the following people: Dr. Edyta Łukasik for introducing me to this dataset and to authors of this dataset - Mikhail Tokovarov, dr. Monika Kaczorowska and dr. Marek Miłosz from Lublin University of Technology in Poland.

    Inspiration

    You can use this data the same way you used MNIST, KMNIST of Fashion MNIST: refine your image classification skills, use GPU & TPU to implement CNN architectures for models to perform such multiclass classifications.

  5. Chinese MNIST

    • kaggle.com
    zip
    Updated Mar 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Preda (2021). Chinese MNIST [Dataset]. https://www.kaggle.com/datasets/gpreda/chinese-mnist/code
    Explore at:
    zip(17261860 bytes)Available download formats
    Dataset updated
    Mar 28, 2021
    Authors
    Gabriel Preda
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F769452%2Ff6e2d0f05093e42a67119bde723b24d5%2Fdata-original.png?generation=1600931282565624&alt=media" alt="">

    The Chinese MNIST dataset uses data collected in the frame of a project at Newcastle University.

    Project Description

    One hundred Chinese nationals took part in data collection. Each participant wrote with a standard black ink pen all 15 numbers in a table with 15 designated regions drawn on a white A4 paper. This process was repeated 10 times with each participant. Each sheet was scanned at the resolution of 300x300 pixels. It resulted a dataset of 15000 images, each representing one character from a set of 15 characters (grouped in samples, grouped in suites, with 10 samples/volunteer and 100 volunteers).

    Further Data Processing

    I downloaded from the original project page the raw images. Based on images names, I created an index for each image, as following:

    original name (example): Locate{1,3,4}.jpg 
    index extracted: suite_id: 1, sample_id: 3, code: 4 
    resulted file name: input_1_3_4.jpg 
    

    I also added the mapping of each image code to the actual numeric value of Chinese number character and the actual Chinese character. Here is described the mapping

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F769452%2F61c54df3540346d4b56cd611ba41143d%2Fchanracter_mapping.png?generation=1596618751340901&alt=media" alt="">

    Content

    The dataset contains the following:

    • an index file, chinese_mnist.csv
    • a folder with 15,000 jpg images, sized 64 x 64. See the images folder description for details.

    Acknowledgements

    I want to express my gratitude to the following people: Dr. K Nazarpour and Dr. M Chen from Newcastle University, who collected the data.

    Inspiration

    You can use this data the same way you used MNIST, KMNIST of Fashion MNIST: refine your image classification skills, use GPU & TPU to implement CNN architectures for models to perform such multiclass classifications.

  6. torchsummary-1.5.1-wheel

    • kaggle.com
    zip
    Updated Mar 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rito Ghosh (2021). torchsummary-1.5.1-wheel [Dataset]. https://www.kaggle.com/truthr/torchsummary
    Explore at:
    zip(2494 bytes)Available download formats
    Dataset updated
    Mar 20, 2021
    Authors
    Rito Ghosh
    Description

    Starter Notebook

    ABOUT (from project's README)

    Keras style model.summary() in PyTorch

    PyPI version

    Keras has a neat API to view the visualization of the model which is very helpful while debugging your network. Here is a barebone code to try and mimic the same in PyTorch. The aim is to provide information complementary to, what is not provided by print(your_model) in PyTorch.

    Usage

    from torchsummary import summary
    summary(your_model, input_size=(channels, H, W))
    
    • Note that the input_size is required to make a forward pass through the network.

    Examples

    CNN for MNIST

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    from torchsummary import summary
    
    class Net(nn.Module):
      def _init_(self):
        super(Net, self)._init_()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
    
      def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
    
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # PyTorch v0.4.0
    model = Net().to(device)
    
    summary(model, (1, 28, 28))
    
    ----------------------------------------------------------------
        Layer (type)        Output Shape     Param #
    ================================================================
          Conv2d-1      [-1, 10, 24, 24]       260
          Conv2d-2       [-1, 20, 8, 8]      5,020
         Dropout2d-3       [-1, 20, 8, 8]        0
          Linear-4          [-1, 50]     16,050
          Linear-5          [-1, 10]       510
    ================================================================
    Total params: 21,840
    Trainable params: 21,840
    Non-trainable params: 0
    ----------------------------------------------------------------
    Input size (MB): 0.00
    Forward/backward pass size (MB): 0.06
    Params size (MB): 0.08
    Estimated Total Size (MB): 0.15
    ----------------------------------------------------------------
    

    VGG16

    import torch
    from torchvision import models
    from torchsummary import summary
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    vgg = models.vgg16().to(device)
    
    summary(vgg, (3, 224, 224))
    
    ----------------------------------------------------------------
        Layer (type)        Output Shape     Param #
    ================================================================
          Conv2d-1     [-1, 64, 224, 224]      1,792
           ReLU-2     [-1, 64, 224, 224]        0
          Conv2d-3     [-1, 64, 224, 224]     36,928
           ReLU-4     [-1, 64, 224, 224]        0
         MaxPool2d-5     [-1, 64, 112, 112]        0
          Conv2d-6    [-1, 128, 112, 112]     73,856
           ReLU-7    [-1, 128, 112, 112]        0
          Conv2d-8    [-1, 128, 112, 112]     147,584
           ReLU-9    [-1, 128, 112, 112]        0
        MaxPool2d-10     [-1, 128, 56, 56]        0
          Conv2d-11     [-1, 256, 56, 56]     295,168
           ReLU-12     [-1, 256, 56, 56]        0
          Conv2d-13     [-1, 256, 56, 56]     590,080
           ReLU-14     [-1, 256, 56, 56]        0
          Conv2d-15     [-1, 256, 56, 56]     590,080
           ReLU-16     [-1, 256, 56, 56]        0
        MaxPool2d-17     [-1, 256, 28, 28]        0
          Conv2d-18     [-1, 512, 28, 28]    1,180,160
           ReLU-19     [-1, 512, 28, 28]        0
          Conv2d-20     [-1, 512, 28, 28]    2,359,808
           ReLU-21     [-1, 512, 28, 28]        0
          Conv2d-22     [-1, 512, 28, 28]    2,359,808
           ReLU-23     [-1, 512, 28, 28]        0
        MaxPool2d-24     [-1, 512, 14, 14]        0
          Conv2d-25     [-1, 512, 14, 14]    2,359,808
           ReLU-26     [-1, 512, 14, 14]        0
          Conv2d-27     [-1, 512, 14, 14]    2,359,808
           ReLU...
    
  7. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST [Dataset]. http://doi.org/10.5281/zenodo.6632105
Organization logo

Model Zoo: A Dataset of Diverse Populations of Neural Network Models - Fashion-MNIST

Explore at:
bin, zip, jsonAvailable download formats
Dataset updated
Jun 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth; Konstantin Schürholt; Diyar Taskiran; Boris Knyazev; Xavier Giró-i-Nieto; Damian Borth
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Abstract

In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

Dataset

This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from Fashion-MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "fmnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

Search
Clear search
Close search
Google apps
Main menu