100+ datasets found
  1. r

    Training a Neural Network Model on Encrypted MNIST Data

    • resodate.org
    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dowlin et al. (2024). Training a Neural Network Model on Encrypted MNIST Data [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdHJhaW5pbmctYS1uZXVyYWwtbmV0d29yay1tb2RlbC1vbi1lbmNyeXB0ZWQtbW5pc3QtZGF0YQ==
    Explore at:
    Dataset updated
    Dec 16, 2024
    Dataset provided by
    Leibniz Data Manager
    Authors
    Dowlin et al.
    Description

    The dataset used in this paper is not explicitly mentioned, but it is implied to be a large-scale dataset for machine learning.

  2. f

    Data from: Deep learning neural network derivation and testing to...

    • tandf.figshare.com
    • datasetcatalog.nlm.nih.gov
    png
    Updated Aug 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omid Mehrpour; Christopher Hoyte; Abdullah Al Masud; Ashis Biswas; Jonathan Schimmel; Samaneh Nakhaee; Mohammad Sadegh Nasr; Heather Delva-Clark; Foster Goss (2023). Deep learning neural network derivation and testing to distinguish acute poisonings [Dataset]. http://doi.org/10.6084/m9.figshare.23694504.v1
    Explore at:
    pngAvailable download formats
    Dataset updated
    Aug 8, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Omid Mehrpour; Christopher Hoyte; Abdullah Al Masud; Ashis Biswas; Jonathan Schimmel; Samaneh Nakhaee; Mohammad Sadegh Nasr; Heather Delva-Clark; Foster Goss
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Acute poisoning is a significant global health burden, and the causative agent is often unclear. The primary aim of this pilot study was to develop a deep learning algorithm that predicts the most probable agent a poisoned patient was exposed to from a pre-specified list of drugs. Data were queried from the National Poison Data System (NPDS) from 2014 through 2018 for eight single-agent poisonings (acetaminophen, diphenhydramine, aspirin, calcium channel blockers, sulfonylureas, benzodiazepines, bupropion, and lithium). Two Deep Neural Networks (PyTorch and Keras) designed for multi-class classification tasks were applied. There were 201,031 single-agent poisonings included in the analysis. For distinguishing among selected poisonings, PyTorch model had specificity of 97%, accuracy of 83%, precision of 83%, recall of 83%, and a F1-score of 82%. Keras had specificity of 98%, accuracy of 83%, precision of 84%, recall of 83%, and a F1-score of 83%. The best performance was achieved in the diagnosis of single-agent poisoning in diagnosing poisoning by lithium, sulfonylureas, diphenhydramine, calcium channel blockers, then acetaminophen, in PyTorch (F1-score = 99%, 94%, 85%, 83%, and 82%, respectively) and Keras (F1-score = 99%, 94%, 86%, 82%, and 82%, respectively). Deep neural networks can potentially help in distinguishing the causative agent of acute poisoning. This study used a small list of drugs, with polysubstance ingestions excluded.Reproducible source code and results can be obtained at https://github.com/ashiskb/npds-workspace.git.

  3. dataset for neural network training (gas network)

    • kaggle.com
    zip
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohammad meysam mollai (2024). dataset for neural network training (gas network) [Dataset]. https://www.kaggle.com/datasets/mohammadmeysammollai/dataset-for-neural-network-training-gas-network
    Explore at:
    zip(9282 bytes)Available download formats
    Dataset updated
    Apr 27, 2024
    Authors
    mohammad meysam mollai
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset for neural network training (Gas Network) This dataset is used from the data available on the internet, that is, a large amount of data has been cleaned, extra items have been removed and important items have remained.

  4. Training and Validation Datasets for Neural Network to Fill in Missing Data...

    • catalog.data.gov
    • gimi9.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Standards and Technology (2025). Training and Validation Datasets for Neural Network to Fill in Missing Data in EBSD Maps [Dataset]. https://catalog.data.gov/dataset/training-and-validation-datasets-for-neural-network-to-fill-in-missing-data-in-ebsd-maps
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset provided by
    National Institute of Standards and Technologyhttp://www.nist.gov/
    Description

    This dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.

  5. Z

    Training dataset used in the magazine paper entitled "A Flexible Machine...

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    • +1more
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Francisco Wilhelmi (2020). Training dataset used in the magazine paper entitled "A Flexible Machine Learning-Aware Architecture for Future WLANs" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3626690
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Universitat Pompeu Fabra
    Authors
    Francisco Wilhelmi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A Flexible Machine Learning-Aware Architecture for Future WLANs

    Authors: Francesc Wilhelmi, Sergio Barrachina-Muñoz, Boris Bellalta, Cristina Cano, Anders Jonsson & Vishnu Ram.

    Abstract: Lots of hopes have been placed in Machine Learning (ML) as a key enabler of future wireless networks. By taking advantage of the large volumes of data generated by networks, ML is expected to deal with the ever-increasing complexity of networking problems. Unfortunately, current networking systems are not yet prepared for supporting the ensuing requirements of ML-based applications, especially for enabling procedures related to data collection, processing, and output distribution. This article points out the architectural requirements that are needed to pervasively include ML as part of future wireless networks operation. To this aim, we propose to adopt the International Telecommunications Union (ITU) unified architecture for 5G and beyond. Specifically, we look into Wireless Local Area Networks (WLANs), which, due to their nature, can be found in multiple forms, ranging from cloud-based to edge-computing-like deployments. Based on ITU's architecture, we provide insights on the main requirements and the major challenges of introducing ML to the multiple modalities of WLANs.

    Dataset description: This is the dataset generated for training a Neural Network (NN) in the Access Point (AP) (re)association problem in IEEE 802.11 Wireless Local Area Networks (WLANs).

    In particular, the NN is meant to output a prediction function of the throughput that a given station (STA) can obtain from a given Access Point (AP) after association. The features included in the dataset are:

    Identifier of the AP to which the STA has been associated.

    RSSI obtained from the AP to which the STA has been associated.

    Data rate in bits per second (bps) that the STA is allowed to use for the selected AP.

    Load in packets per second (pkt/s) that the STA generates.

    Percentage of data that the AP is able to serve before the user association is done.

    Amount of traffic load in pkt/s handled by the AP before the user association is done.

    Airtime in % that the AP enjoys before the user association is done.

    Throughput in pkt/s that the STA receives after the user association is done.

    The dataset has been generated through random simulations, based on the model provided in https://github.com/toniadame/WiFi_AP_Selection_Framework. More details regarding the dataset generation have been provided in https://github.com/fwilhelmi/machine_learning_aware_architecture_wlans.

  6. Data from: Prediction models in the design of neural network based ECG...

    • healthdata.gov
    • data.virginia.gov
    • +1more
    csv, xlsx, xml
    Updated Jul 14, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Prediction models in the design of neural network based ECG classifiers: A neural network and genetic programming approach [Dataset]. https://healthdata.gov/d/6w28-3tdb
    Explore at:
    csv, xml, xlsxAvailable download formats
    Dataset updated
    Jul 14, 2025
    Description

    Background Classification of the electrocardiogram using Neural Networks has become a widely used method in recent years. The efficiency of these classifiers depends upon a number of factors including network training. Unfortunately, there is a shortage of evidence available to enable specific design choices to be made and as a consequence, many designs are made on the basis of trial and error. In this study we develop prediction models to indicate the point at which training should stop for Neural Network based Electrocardiogram classifiers in order to ensure maximum generalisation.

       Methods
       Two prediction models have been presented; one based on Neural Networks and the other on Genetic Programming. The inputs to the models were 5 variable training parameters and the output indicated the point at which training should stop. Training and testing of the models was based on the results from 44 previously developed bi-group Neural Network classifiers, discriminating between Anterior Myocardial Infarction and normal patients.
    
    
       Results
       Our results show that both approaches provide close fits to the training data; p = 0.627 and p = 0.304 for the Neural Network and Genetic Programming methods respectively. For unseen data, the Neural Network exhibited no significant differences between actual and predicted outputs (p = 0.306) while the Genetic Programming method showed a marginally significant difference (p = 0.047).
    
    
       Conclusions
       The approaches provide reverse engineering solutions to the development of Neural Network based Electrocardiogram classifiers. That is given the network design and architecture, an indication can be given as to when training should stop to obtain maximum network generalisation.
    
  7. Z

    Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST...

    • data.niaid.nih.gov
    Updated Jun 13, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian (2022). Model Zoo: A Dataset of Diverse Populations of Neural Network Models - MNIST [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6632086
    Explore at:
    Dataset updated
    Jun 13, 2022
    Dataset provided by
    AI Lab Montreal, Samsung Advanced Institute of Technology
    AIML Lab, University of St.Gallen
    Image Processing Group, Universitat Politècnica de Catalunya
    Authors
    Schürholt, Konstantin; Taskiran, Diyar; Knyazev, Boris; Giró-i-Nieto, Xavier; Borth, Damian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.

    Dataset

    This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.

    This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.

    For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.

  8. R

    Training and testing XRD dataset for crystallite size and microstrain...

    • entrepot.recherche.data.gouv.fr
    image/x-silx-numpy +1
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexandre BOULLE; Alexandre BOULLE; Arthur SOUESME; Arthur SOUESME (2025). Training and testing XRD dataset for crystallite size and microstrain determination using deep neural networks [Dataset]. http://doi.org/10.57745/SVQART
    Explore at:
    text/markdown(1068), image/x-silx-numpy(6059958836), image/x-silx-numpy(673347924)Available download formats
    Dataset updated
    Nov 20, 2025
    Dataset provided by
    Recherche Data Gouv
    Authors
    Alexandre BOULLE; Alexandre BOULLE; Arthur SOUESME; Arthur SOUESME
    License

    https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html

    Time period covered
    Oct 1, 2023 - Oct 1, 2026
    Dataset funded by
    Région Nouvelle Aquitaine
    Description

    Numpy tensors to train and test a convolutional neural network dedicated to determine crystallite size and/or microstrain from X-ray diffraction data (XRD): train_size.npz: training dataset with only crystallite size test_size.npz: testing dataset with only crystallite size train_size_strain.npz: training dataset with crystallite size and microstrain test_size_strain.npz: testing dataset with crystallite size and microstrain Each dataset contains the XRD data and the labels ("ground truth") in the form of 2D tensors with 10501 data points (columns) for the XRD data, and 24 labels (columns) for the labels. Training data contain 71971 rows ; testing data contain 7997 rows. Example python script to read the data: import numpy as np train = np.load("train_size.npz") train_data, train_label = train["train_data"], train["train_label"] print(f"Train data shape: {train_data.shape}, Train labels shape: {train_label.shape}") Jupyter notebooks to train and test a neural network can be found here: https://github.com/aboulle/LPA-NN

  9. t

    Data from: BEND: Bagging Deep Learning Training Based on Efficient Neural...

    • service.tib.eu
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion [Dataset]. https://service.tib.eu/ldmservice/dataset/bend--bagging-deep-learning-training-based-on-efficient-neural-network-diffusion
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The paper proposes a Bagging Deep Learning Training Framework (BEND) based on efficient neural network diffusion.

  10. c

    Prediction of biological wastewater treatment performance using artificial...

    • esango.cput.ac.za
    xlsx
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Winile Sindane (2023). Prediction of biological wastewater treatment performance using artificial neural networks [Dataset]. http://doi.org/10.25381/cput.22261720.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 2, 2023
    Dataset provided by
    Cape Peninsula University of Technology
    Authors
    Winile Sindane
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Ethical Clearance Reference Number: 2021FEBEREC-STD- 065

    Pre-processing data from published and unpublished previous studies treating biodiesel-, textile-, polymer-, and pulp and paper wastewater using an ABR and EGSB for artificial neural network (ANN) model simulation and developnent.

    For ANN problems to be solved, the selection of a suitable learning rate, momentum, the number of neurons from each of the hidden layers and the activation function is crucial. Therefore, the collected data must be prepared in a Microsoft Excel spreadsheet format with input and output columns. A training file is then created with samples of the whole problem domain to select the required parameters. Three data sets are used: a training data set, test data set and validation data set. When the training process takes place, the neural network will be tested against the testing data to determine accuracy, and training will be stopped when the mean average error remains the same for a period of time.

  11. Data from: Evaluation of the preprocessing and training stages in text...

    • scielo.figshare.com
    jpeg
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucas Marques Sathler Guimarães; Magali Rezende Gouvêa Meireles; Paulo Eduardo Maciel de Almeida (2023). Evaluation of the preprocessing and training stages in text classification algorithms in the context of information retrieval [Dataset]. http://doi.org/10.6084/m9.figshare.8162216.v1
    Explore at:
    jpegAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    SciELOhttp://www.scielo.org/
    Authors
    Lucas Marques Sathler Guimarães; Magali Rezende Gouvêa Meireles; Paulo Eduardo Maciel de Almeida
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract The amount of unstructured data grows with the popularization of the Internet. Texts in natural language represent a relevant and significant set for the analysis and production of knowledge. This work proposes a quantitative analysis of the preprocessing and training stages of a text classifier, which uses as an attribute the feelings expressed by the users. Artificial Neural Network, as a classifier algorithm, and texts from Amazon, IMDB and Yelp sites were used for the experiments. The database allows the analysis of the expression of positive and negative feelings of the users in evaluations of products and services in unstructured texts. Two distinct processes of preprocessing and different training of the Artificial Neural Networks were carried out to classify the textual set. The results quantitatively confirm the importance of the preprocessing and training stages of the classifier, highlighting the importance of the vocabulary selected for the text representation and classification. The available classification techniques achieve satisfactory results. However, even by using two distinct processes of preprocessing and identifying the best training process, it was not possible to totally eliminate the learning difficulties and understanding of the model for the classifications of feelings that involved subjective characteristics of the expression of human feeling.

  12. c

    Training datasets for AIMNet2 machine-learned neural network potential

    • kilthub.cmu.edu
    txt
    Updated Jan 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roman Zubatiuk; Olexandr Isayev; Dylan Anstine (2025). Training datasets for AIMNet2 machine-learned neural network potential [Dataset]. http://doi.org/10.1184/R1/27629937.v2
    Explore at:
    txtAvailable download formats
    Dataset updated
    Jan 27, 2025
    Dataset provided by
    Carnegie Mellon University
    Authors
    Roman Zubatiuk; Olexandr Isayev; Dylan Anstine
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The datasets contain molecular structures and the properties computed with B97-3c (GGA DFT) or wB97M-def2-TZVPP (range-separated hybrid DFT) methods. Each data file contains about 20M structures. DFT calculation performed with ORCA 5.0.3 software. Properties include energy, forces, atomic charges, and molecular dipole and quadrupole moments.

  13. Dataset: Handwritten Digits and Operators

    • kaggle.com
    zip
    Updated Jul 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michel Heusser (2020). Dataset: Handwritten Digits and Operators [Dataset]. https://www.kaggle.com/datasets/michelheusser/handwritten-digits-and-operators/code
    Explore at:
    zip(214990104 bytes)Available download formats
    Dataset updated
    Jul 13, 2020
    Authors
    Michel Heusser
    Description

    Context

    I created this dataset as part of a larger, rather educational project, which aims to create a simple web-interface to write simple numerical mathematical operations on a drawing grid (either on a PC, tablet, or smartphone), which is then recognized and evaluated. It also aims, however, to contribute and help other projects that may involve or require large datasets of handwritten digits.

    Dataset Characteristics

    What makes this dataset different from existing ones, is that it lays emphasis on the actual strokes of handwritten signs, instead of just being a compilation of scanned images of existing records (the way, for example, the MNIST dataset was created). Each pixel is strictly full black, or full white (no greytones), and the strokes are rather thin. It is meant to have the information of the stroke itself, and not just a scanned image.

    In the Context of Neural Networks and Deep Learning

    The dataset is meant to stretch and challenge the understanding of a neural network about what makes a specific symbol mean what it does. For this, I initially created a starting dataset of the different ways people write symbols, playing with their internal proportions and adding "ill"-written signs that are still barely recognizable from the rest. After that, I artificially augmented the dataset by performing stretches and small rotations on all images, making sure that a human being would still recognize them. A neural network would be then forced to understand better what gives a symbol its characteristics. I made sure not to delete many, rather ambiguous, cases (e.g. a clockwise-rotated '1' and '/') for the neural network to have to deal and understand nuances during training and classification.

    Using 60% of the dataset (randomly selected) to train a neural network with two inner layers, and 20% to validate it, I achieved a +96% validation accuracy using my own implementation of the stochastic gradient descent and backpropagation algorithm.

    Data Augmentation

    The resizing, rotating, and scaling algorithms I wrote do not work with images by modifying each pixel, but rather the stroke. This means, that each line of one pixel width is scaled/resized/rotated to a line (longer or shorter) of the same pixel width. This is advantageous for the following reasons: - When I created the dataset, the original images containing many symbols had a pen-stroke that was very thin compared to the symbol's proportions. Using the stroke scaling methods mentioned, resizing to smaller images made the strokes thinner, which is desirable - Transformations on the symbols that change their proportions would not increase stroke width or make the stroke disappear - It is easy to create new datasets out of this one by easily thickening strokes, or adding noise and artificial irregularities

    Tools for the Dataset Creation

    In the following github repository, one can access the code, as well as the modules I created to perform the following tasks: - Import large images containing multiple symbols to be extracted - Agglomerate each independent symbol, perform image processes on them (resizing, scaling, and rotating) and save them to individual images - Create custom datasets out of the individual images for training, validation, and testing of machine learning models (e.g. neural networks)

    https://github.com/michheusser/neural-network-training - main_dataset_creation.py - Creation of datasets - main_neural_network_training.py - Training of neural network - datatools (Folder) - Package for dataset and image manipulation - nntools (Folder) - Package with neural network tools (incl. training)

    Content

    CompleteImages - ca. 300'000 symbol images as .png containing transformation information in their name with syntax: [symbol]_[papersheet_index]_[rotation]_[index in untransformed dataset]_scaled_x[scaling in x]y[scaling in y].png (e.g. +_1_8ccw_26_scaled_x1_2y1_2.png)

    • CompleteDataSet_tuples.npy - List of tuples with all datapoints. (ca. 300'000 Datapoints)
    • CompleteDataSet_training_tuples.npy - Training dataset (60% of CompleteDataSet.npy randomly selected)
    • CompleteDataSet_validation_tuples.npy - Validation dataset (20% of CompleteDataSet.npy randomly selected)
    • CompleteDataSet_testing_tuples.npy- Testing dataset (20% of CompleteDataSet.npy randomly selected)

    Creation

    The datasets were created in the following way: - I drew each symbol in ~500 ways on pieces of white paper using a thin pen, and scanned them to .pdf images. - All images were run through my 'datatool' module for each symbol to be isolated and fit into a 28x28 greyscale .png with each pixel being either black or white - Each image was transformed with all possible combinations of the following: rotation (-15°, -8°, 0°, 8°, 15°), stretching in each axis (1, 1.2, 1.3). Each transformed image was saved as a .png ...

  14. d

    Data from: Using convolutional neural networks to efficiently extract...

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Jan 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rachel Reeb; Naeem Aziz; Samuel Lapp; Justin Kitzes; J. Mason Heberling; Sara Kuebbing (2022). Using convolutional neural networks to efficiently extract immense phenological data from community science images [Dataset]. http://doi.org/10.5061/dryad.mkkwh7123
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 4, 2022
    Dataset provided by
    Dryad
    Authors
    Rachel Reeb; Naeem Aziz; Samuel Lapp; Justin Kitzes; J. Mason Heberling; Sara Kuebbing
    Time period covered
    Dec 15, 2021
    Description

    Community science image libraries offer a massive, but largely untapped, source of observational data for phenological research. The iNaturalist platform offers a particularly rich archive, containing more than 49 million verifiable, georeferenced, open access images, encompassing seven continents and over 278,000 species. A critical limitation preventing scientists from taking full advantage of this rich data source is labor. Each image must be manually inspected and categorized by phenophase, which is both time-intensive and costly. Consequently, researchers may only be able to use a subset of the total number of images available in the database. While iNaturalist has the potential to yield enough data for high-resolution and spatially extensive studies, it requires more efficient tools for phenological data extraction. A promising solution is automation of the image annotation process using deep learning. Recent innovations in deep learning have made these open-source tools accessibl...

  15. t

    Training Over-Parameterized Deep Neural Networks - Dataset - LDM

    • service.tib.eu
    • resodate.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Training Over-Parameterized Deep Neural Networks - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/training-over-parameterized-deep-neural-networks
    Explore at:
    Dataset updated
    Dec 16, 2024
    Description

    The dataset used in this paper is a collection of training data for over-parameterized deep neural networks.

  16. Dataset for neural network (water network)

    • kaggle.com
    zip
    Updated Apr 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mohammad meysam mollai (2024). Dataset for neural network (water network) [Dataset]. https://www.kaggle.com/datasets/mohammadmeysammollai/dataset-for-neural-network-water-network
    Explore at:
    zip(24308 bytes)Available download formats
    Dataset updated
    Apr 27, 2024
    Authors
    mohammad meysam mollai
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Dataset for neural network training (water network) This dataset is used from the data available on the internet, which means that a large amount of data has been cleaned, extra items have been removed and important items have remained.

  17. r

    Subset of Quick, Draw! dataset for neural network pre-training / Subconjunto...

    • resodate.org
    • portalcientifico.universidadeuropea.com
    Updated Sep 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Guerrero Martín; Alba Gómez-Valadés Batanero; Estela Díaz López; Margarita Bachiller Mayoral; José Manuel Cuadra Troncoso; Rafael Martínez Tomás; Sara García Herranz; María del Carmen Díaz Mardomingo; Herminia Peraita; Herminia Peraita; Mariano Rincón Zamorano (2024). Subset of Quick, Draw! dataset for neural network pre-training / Subconjunto del conjunto de datos Quick, Draw! para pre-entrenamiento de redes neuronales [Dataset]. http://doi.org/10.21950/GWO9RA
    Explore at:
    Dataset updated
    Sep 27, 2024
    Dataset provided by
    Universidad Nacional de Educación a Distancia
    Eciencia Data
    Rey-Osterrieth Complex Figure (ROCF) Test Assessment
    Authors
    Juan Guerrero Martín; Alba Gómez-Valadés Batanero; Estela Díaz López; Margarita Bachiller Mayoral; José Manuel Cuadra Troncoso; Rafael Martínez Tomás; Sara García Herranz; María del Carmen Díaz Mardomingo; Herminia Peraita; Herminia Peraita; Mariano Rincón Zamorano
    Description

    Description of the project This dataset is the result of the research carried out in the project "A Benchmark for Rey-Osterrieth Complex Figure (ROCF) Test Automatic Scoring", whose main goal was to establish a baseline for the scoring task consisting of: a dataset with 528 ROCF and results obtained by several deep learning models, as well as, by a group of psychology experts.

  18. Dataset for Vehicle Detection by Neural Network

    • kaggle.com
    zip
    Updated Dec 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Niloy Kanti Paul (2023). Dataset for Vehicle Detection by Neural Network [Dataset]. https://www.kaggle.com/datasets/niloykantipaul/dataset-for-vehicle-detection-by-neural-network
    Explore at:
    zip(1222125894 bytes)Available download formats
    Dataset updated
    Dec 25, 2023
    Authors
    Niloy Kanti Paul
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    About Dataset

    This dataset was used by the SKYHAWK project with the team REDHAWK. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: redhawk.team.info@gmail.com

    Dive into a Rigorous Collection of 11,000+ multiple types of vehicle images captured and crowdsourced from 100+ urban and rural areas. Each image is meticulously reviewed and verified by the researcher.

    Versatile Training Data: Explore a Spectrum of Resolutions and Weather Conditions in Our Dataset for Comprehensive Vehicle Detection Model Training.

    Dataset Features:

    Dataset size: 11,000+ images Location: Bangladesh Diversity : Various lighting conditions like day and night, various weather conditions, varied distances, view points, etc. Device used: Captured using mobile phones. Usage: Vehicle detection, Traffic automation, Traffic surveillance, etc.

    Vehicle Classes:

    • Bike
    • Auto
    • Car
    • Truck
    • Bus
    • Other Vehicles (Rickshaw, Van, Cycle, etc.)

    Available Annotation formats: YOLO, PYTORCH

  19. D

    Data repository for "Minimial-Risk Training Samples for QNN Training from...

    • darus.uni-stuttgart.de
    Updated Oct 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Mandl; Marvin Bechtold; Johanna Barzen; Frank Leymann (2024). Data repository for "Minimial-Risk Training Samples for QNN Training from Measurements" [Dataset]. http://doi.org/10.18419/DARUS-4113
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 8, 2024
    Dataset provided by
    DaRUS
    Authors
    Alexander Mandl; Marvin Bechtold; Johanna Barzen; Frank Leymann
    License

    https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4113https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4113

    Dataset funded by
    BMWK
    Description

    Replication code and experiment result data for training Quantum Neural Networks with entangled data using one-dimensional projectors as observables. This is the version of the code that was used to generate the experiment results in the related publication. Experiments: - exp_inf_coeffvariation.py: Trains QNNs using training samples of varying Schmidt rank with fixed vector as Schmidt basis state. Varies the associated Schmidt coefficient. - exp_inf_random.py: Trains QNNs using random training data. Experiment results: - exp_inf_coeffvariation.zip and exp_inf_random.zip contain the raw experiment results for both experiments. - For each combination of controlled variables there is one directory containing the result of all 20 runs of the training process. - The results for each run are comprised of 3 files: - [id]_losses.npy: The loss during the training process - [id]_params.npy: The parameters of the QNN after the training process. - [id]_V.npy: The trained QNN exported as a 2^4 * 2^4 unitary matrix. Analysis of data (data_extraction.py): - Computes means and standard deviation of various risk measures and saves the results Plots (plot_obs_risk.py): - Plots the risk w.r.t. the observable for both experiments based on the analysed data obtained from data_extraction.py. - Generates plot_coeffvariation.pdf and plot_random.pdf.

  20. Z

    Data from: Self-Supervised Representation Learning on Neural Network Weights...

    • data.niaid.nih.gov
    Updated Nov 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian (2021). Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction - Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5645137
    Explore at:
    Dataset updated
    Nov 13, 2021
    Dataset provided by
    University of St.Gallen
    Authors
    Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".

    Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.

    Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)

    Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Dowlin et al. (2024). Training a Neural Network Model on Encrypted MNIST Data [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9zZXJ2aWNlLnRpYi5ldS9sZG1zZXJ2aWNlL2RhdGFzZXQvdHJhaW5pbmctYS1uZXVyYWwtbmV0d29yay1tb2RlbC1vbi1lbmNyeXB0ZWQtbW5pc3QtZGF0YQ==

Training a Neural Network Model on Encrypted MNIST Data

Explore at:
Dataset updated
Dec 16, 2024
Dataset provided by
Leibniz Data Manager
Authors
Dowlin et al.
Description

The dataset used in this paper is not explicitly mentioned, but it is implied to be a large-scale dataset for machine learning.

Search
Clear search
Close search
Google apps
Main menu