100+ datasets found

Training and Validation Datasets for Neural Network to Fill in Missing Data...
catalog.data.gov
gimi9.com
Updated Jul 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2025). Training and Validation Datasets for Neural Network to Fill in Missing Data in EBSD Maps [Dataset]. https://catalog.data.gov/dataset/training-and-validation-datasets-for-neural-network-to-fill-in-missing-data-in-ebsd-maps
Explore at:
Dataset updated
Jul 9, 2025
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.
Neural Network - Observation dataset
figshare.com
datasetcatalog.nlm.nih.gov
txt
Updated Aug 4, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesus Rogel-Salazar (2019). Neural Network - Observation dataset [Dataset]. http://doi.org/10.6084/m9.figshare.9249074.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9249074.v1
Dataset updated
Aug 4, 2019
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Jesus Rogel-Salazar
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A dataset with observations to train a neural network.
R
Training and testing XRD dataset for crystallite size and microstrain...
entrepot.recherche.data.gouv.fr
image/x-silx-numpy +1
Updated Nov 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexandre BOULLE; Alexandre BOULLE; Arthur SOUESME; Arthur SOUESME (2025). Training and testing XRD dataset for crystallite size and microstrain determination using deep neural networks [Dataset]. http://doi.org/10.57745/SVQART
Explore at:
text/markdown(1068), image/x-silx-numpy(6059958836), image/x-silx-numpy(673347924)Available download formats
Unique identifier
https://doi.org/10.57745/SVQART
Dataset updated
Nov 20, 2025
Dataset provided by
Recherche Data Gouv
Authors
Alexandre BOULLE; Alexandre BOULLE; Arthur SOUESME; Arthur SOUESME
License
https://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Time period covered
Oct 1, 2023 - Oct 1, 2026
Dataset funded by
Région Nouvelle Aquitaine
Description
Numpy tensors to train and test a convolutional neural network dedicated to determine crystallite size and/or microstrain from X-ray diffraction data (XRD): train_size.npz: training dataset with only crystallite size test_size.npz: testing dataset with only crystallite size train_size_strain.npz: training dataset with crystallite size and microstrain test_size_strain.npz: testing dataset with crystallite size and microstrain Each dataset contains the XRD data and the labels ("ground truth") in the form of 2D tensors with 10501 data points (columns) for the XRD data, and 24 labels (columns) for the labels. Training data contain 71971 rows ; testing data contain 7997 rows. Example python script to read the data: import numpy as np train = np.load("train_size.npz") train_data, train_label = train["train_data"], train["train_label"] print(f"Train data shape: {train_data.shape}, Train labels shape: {train_label.shape}") Jupyter notebooks to train and test a neural network can be found here: https://github.com/aboulle/LPA-NN
4
Code supporting the paper: 1D neural network
data.4tu.nl
zip
Updated Oct 28, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ginger Egberts; Fred Vermolen; Paul van Zuijlen (2022). Code supporting the paper: 1D neural network [Dataset]. http://doi.org/10.4121/21407604.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/21407604.v1
Dataset updated
Oct 28, 2022
Dataset provided by
4TU.ResearchData
Authors
Ginger Egberts; Fred Vermolen; Paul van Zuijlen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This online resource shows two archived folders: Matlab and Python, that contain relevant code for the article: A Bayesian finite-element trained machine learning approach for predicting post-burn contraction.

One finds the codes used to generate the large dataset within the Matlab folder. Here, the file Main.m is the main file and from there, one can run the Monte Carlo simulation. There is a README file.

Within the Python folder, one finds the codes used for training the neural networks and creating the online application. The file Data.mat contains the data generated by the Matlab Monte Carlo simulation. The files run_bound.py, run_rsa.py, and run_tse.py train the neural networks, of which the best scoring ones are saved in the folder Training. The DashApp folder contains the code for the creation of the Application.
Data from: Neural Network Matrix Product Operator: A Multi-Dimensionally...
zenodo.org
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kentaro Hino; Kentaro Hino (2025). Neural Network Matrix Product Operator: A Multi-Dimensionally Integrable Machine Learning Potential [Dataset]. http://doi.org/10.48550/arxiv.2410.23858
Explore at:
Unique identifier
https://doi.org/10.48550/arxiv.2410.23858
Dataset updated
May 14, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Kentaro Hino; Kentaro Hino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
For Pompon-main.zip

See README.md at first for installation

Manuscript data for train/validate/test are in docs/data/*npy. They can be loaded by numpy.load.

The reference geometry information of H2CO molecule is docs/data/bagel_h2co_dft.s0.harmonic.json

training script is docs/notebook/_h2co_opt.py. It is easily executable by uv run _h2co_opt.py if you have installed uv.

The trained weight is in docs/data/nnmpo_final_rmse_8.365e-04.h5. (HDF5 format)

NN-MPO to MPO is in docs/notebook/create-random-mpo.ipynb and docs/notebook/nnmpo_to_itensor_mpo.ipynb (Need ITensors.jl version 0.6.x)

DMRG calculation by ITensors.jl is docs/notebook/itensor_vDMRG.ipynb.

If you have any questions, please post an issue on GitHub.

Discvar-main.zip is an implementation of discrete variable representation (DVR).
Convolutional Neural Networks for Classifying Combinatorial Metamaterials
data.europa.eu
unknown
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo, Convolutional Neural Networks for Classifying Combinatorial Metamaterials [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-5992648?locale=lv
Explore at:
unknown(1276985474)Available download formats
Dataset authored and provided by
Zenodohttp://zenodo.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the training and test data, as well as the trained neural networks as used for the paper 'Machine Learning of Combinatorial Rules in Mechanical Metamaterials', as published in XXX. In this paper, a neural network is used to classify each (k \times k) unit cell design into one of two classes (C or I). Additionally, the performance of the trained networks is analysed in detail. A more detailed description of the contents of the dataset follows below. NeuralNetwork_train_and_test_data.zip This file contains the train and test data used to train the Convolutional Neural Networks (CNNs) of the paper. Each unit cell size has its own file, and is saved in a zipped numpy file type (.npz). CNN_saves_kxk.zip This file contains the parameter configurations of the CNNs trained on (k \times k) unit cells. Every hyperparameter (number of filters nf, number of hidden neurons nh, learning rate lr) combination is saved separately. The neural networks can be loaded using Google's TensorFlow package in Python, specifically using the 'tf.keras.models.load_model' function.
Z
Training dataset used in the magazine paper entitled "A Flexible Machine...
data.niaid.nih.gov
data-staging.niaid.nih.gov
+1more
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Francisco Wilhelmi (2020). Training dataset used in the magazine paper entitled "A Flexible Machine Learning-Aware Architecture for Future WLANs" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3626690
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Universitat Pompeu Fabra
Authors
Francisco Wilhelmi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A Flexible Machine Learning-Aware Architecture for Future WLANs

Authors: Francesc Wilhelmi, Sergio Barrachina-Muñoz, Boris Bellalta, Cristina Cano, Anders Jonsson & Vishnu Ram.

Abstract: Lots of hopes have been placed in Machine Learning (ML) as a key enabler of future wireless networks. By taking advantage of the large volumes of data generated by networks, ML is expected to deal with the ever-increasing complexity of networking problems. Unfortunately, current networking systems are not yet prepared for supporting the ensuing requirements of ML-based applications, especially for enabling procedures related to data collection, processing, and output distribution. This article points out the architectural requirements that are needed to pervasively include ML as part of future wireless networks operation. To this aim, we propose to adopt the International Telecommunications Union (ITU) unified architecture for 5G and beyond. Specifically, we look into Wireless Local Area Networks (WLANs), which, due to their nature, can be found in multiple forms, ranging from cloud-based to edge-computing-like deployments. Based on ITU's architecture, we provide insights on the main requirements and the major challenges of introducing ML to the multiple modalities of WLANs.

Dataset description: This is the dataset generated for training a Neural Network (NN) in the Access Point (AP) (re)association problem in IEEE 802.11 Wireless Local Area Networks (WLANs).

In particular, the NN is meant to output a prediction function of the throughput that a given station (STA) can obtain from a given Access Point (AP) after association. The features included in the dataset are:

Identifier of the AP to which the STA has been associated.

RSSI obtained from the AP to which the STA has been associated.

Data rate in bits per second (bps) that the STA is allowed to use for the selected AP.

Load in packets per second (pkt/s) that the STA generates.

Percentage of data that the AP is able to serve before the user association is done.

Amount of traffic load in pkt/s handled by the AP before the user association is done.

Airtime in % that the AP enjoys before the user association is done.

Throughput in pkt/s that the STA receives after the user association is done.

The dataset has been generated through random simulations, based on the model provided in https://github.com/toniadame/WiFi_AP_Selection_Framework. More details regarding the dataset generation have been provided in https://github.com/fwilhelmi/machine_learning_aware_architecture_wlans.
Convolutional Neural Networks for Classifying Combinatorial Metamaterials
zenodo.org
data.niaid.nih.gov
zip
Updated Nov 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais (2022). Convolutional Neural Networks for Classifying Combinatorial Metamaterials [Dataset]. http://doi.org/10.5281/zenodo.7071282
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7071282
Dataset updated
Nov 8, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Ryan van Mastrigt; Ryan van Mastrigt; Marjolein Dijkstra; Marjolein Dijkstra; Martin van Hecke; Martin van Hecke; Corentin Coulais; Corentin Coulais
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains the training and test data, as well as the trained neural networks as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.

In this paper, a neural network is used to classify each \(k \times k\) unit cell design of metamaterial M1 and M2 into one of two classes (C or I). Additionally, the performance of the trained networks is analysed in detail. A more detailed description of the contents of the dataset follows below.

NeuralNetwork_train_and_test_data.zip

This file contains the train and test data used to train the Convolutional Neural Networks (CNNs) of the paper. Each unit cell size has its own file, and is saved in a zipped numpy file type (.npz). It contains data for metamaterial M1 ("smiley_cube"), and metamaterial M2 classification (i) ("prek_xy") and (ii) ("unimodal_vs_oligomodal_inc_stripmodes").

CNN_saves_kxk.zip

This file contains the parameter configurations of the CNNs trained on \(k \times k\) unit cells for metamaterial M2 classification (ii). Classification (i) is denoted by an additional M2ii in the file name. Metamaterial M1 is denoted by an extra M1 in the file name. Every hyperparameter (number of filters nf, number of hidden neurons nh, learning rate lr) combination is saved separately. The neural networks can be loaded using Google's TensorFlow package in Python, specifically using the 'tf.keras.models.load_model' function.
n
Data from: Domain-specific neural networks improve automated bird sound...
data.niaid.nih.gov
datadryad.org
zip
Updated Sep 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrik Lauha; Panu Somervuo; Petteri Lehikoinen; Lisa Geres; Tobias Richter; Sebastian Seibold; Otso Ovaskainen (2022). Domain-specific neural networks improve automated bird sound recognition already with small amount of local data [Dataset]. http://doi.org/10.5061/dryad.2bvq83btd
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.2bvq83btd
Dataset updated
Sep 28, 2022
Dataset provided by
Goethe University Frankfurt
University of Jyväskylä
University of Helsinki
Technical University of Munich
Authors
Patrik Lauha; Panu Somervuo; Petteri Lehikoinen; Lisa Geres; Tobias Richter; Sebastian Seibold; Otso Ovaskainen
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. In this article we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification. Using local data to adjust the recognition model for the target domain leads to improvement over general non-tailored solutions. The process introduced in this article can be applied to build a fine-tuned bird sound classification model for a specific environment. Methods This repository contains data and recognition models described in paper Domain-specific neural networks improve automated bird sound recognition already with small amount of local data. (Lauha et al., 2022).
Z
Data from: Self-Supervised Representation Learning on Neural Network Weights...
data.niaid.nih.gov
Updated Nov 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian (2021). Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction - Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5645137
Explore at:
Dataset updated
Nov 13, 2021
Dataset provided by
University of St.Gallen
Authors
Schürholt, Kontantin; Kostadinov, Dimche; Borth, Damian
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".

Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.

Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)

Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.
D
Data Repository for: On Reducing the Amount of Samples Required for Training...
darus.uni-stuttgart.de
Updated Sep 27, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alexander Mandl; Johanna Barzen; Frank Leymann; Victoria Mangold; Benedikt Riegel; Daniel Vietz; Felix Winterhalter (2023). Data Repository for: On Reducing the Amount of Samples Required for Training of QNNs [Dataset]. http://doi.org/10.18419/DARUS-3442
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.18419/DARUS-3442
Dataset updated
Sep 27, 2023
Dataset provided by
DaRUS
Authors
Alexander Mandl; Johanna Barzen; Frank Leymann; Victoria Mangold; Benedikt Riegel; Daniel Vietz; Felix Winterhalter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
BMWK
Description
Simulation experiment data for training Quantum Neural Networks (QNNs) using entangled datasets. The experiments investigate the validity of the lower bounds for the expected risk after training QNNs given by the extensions to the Quantum No-Free-Lunch theorem presented in the related publication. The QNNs are trained with (i) samples of varying Schmidt rank, (ii) orthogonal samples of fixed Schmidt rank and (iii) linearly dependent samples of fixed Schmidt rank. The dataset contains raw experiment data (directory "raw_data"), analyzed mean risks and errors (directory "plot_data") and the resulting plots (directory "plots"). Experiments: The experiments train QNNs using various compositions of training samples on a simulator and extract the risk after training to compute average risks. Experiment 1: Trains QNNs using entangled training samples of varying Schmidt rank. The average Schmidt rank and the number of training samples are controlled. Raw data: average_rank_results.zip; Computed average risks: avg_rank_risks.npy; Computed average losses: avg_rank_losses.npy; Plotted average risks: avg_rank_experiments.pdf; Plotted average losses: avg_rank_losses.pdf. Experiment 2: Trains QNNs using entangled orthogonal training samples. The number of training samples is controlled and the Schmidt rank is fixed such that d=r*t for the dimension d of the Hilbert space. Raw data: orthogonal_results.zip; Computed average risks: orthogonal_exp_points.npy; Plotted average risks: orthogonal_experiments.pdf. Experiment 3: Trains QNNs using entangled linearly dependent training samples. The number of training samples is controlled and the Schmidt rank is fixed such that d=r*t for the dimension d of the Hilbert space. Raw data: not_linearly_independent_results.zip; Computed average risks: nlihx_exp_points.npy; Plotted average risks: nlihx_experiments.pdf Additionally, this repository contains the reproduction data for Figure 1 (phases_in_orthogonal_training.zip). This file contains the training data, the target unitary and the resulting hypothesis unitary for orthogonal training samples of (i) high risk and (ii) low risk. For the code to reproduce and analyze the experiments see the Code repository.
c
Training datasets for AIMNet2 machine-learned neural network potential
kilthub.cmu.edu
txt
Updated Jan 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roman Zubatiuk; Olexandr Isayev; Dylan Anstine (2025). Training datasets for AIMNet2 machine-learned neural network potential [Dataset]. http://doi.org/10.1184/R1/27629937.v2
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.1184/R1/27629937.v2
Dataset updated
Jan 27, 2025
Dataset provided by
Carnegie Mellon University
Authors
Roman Zubatiuk; Olexandr Isayev; Dylan Anstine
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
The datasets contain molecular structures and the properties computed with B97-3c (GGA DFT) or wB97M-def2-TZVPP (range-separated hybrid DFT) methods. Each data file contains about 20M structures. DFT calculation performed with ORCA 5.0.3 software. Properties include energy, forces, atomic charges, and molecular dipole and quadrupole moments.
d
Data from: Processed Lab Data for Neural Network-Based Shear Stress Level...
catalog.data.gov
data.openei.org
+3more
Updated Jan 20, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pennsylvania State University (2025). Processed Lab Data for Neural Network-Based Shear Stress Level Prediction [Dataset]. https://catalog.data.gov/dataset/processed-lab-data-for-neural-network-based-shear-stress-level-prediction-309d2
Explore at:
Dataset updated
Jan 20, 2025
Dataset provided by
Pennsylvania State University
Description
Machine learning can be used to predict fault properties such as shear stress, friction, and time to failure using continuous records of fault zone acoustic emissions. The files are extracted features and labels from lab data (experiment p4679). The features are extracted with a non-overlapping window from the original acoustic data. The first column is the time of the window. The second and third columns are the mean and the variance of the acoustic data in this window, respectively. The 4th-11th column is the the power spectrum density ranging from low to high frequency. And the last column is the corresponding label (shear stress level). The name of the file means which driving velocity the sequence is generated from. Data were generated from laboratory friction experiments conducted with a biaxial shear apparatus. Experiments were conducted in the double direct shear configuration in which two fault zones are sheared between three rigid forcing blocks. Our samples consisted of two 5-mm-thick layers of simulated fault gouge with a nominal contact area of 10 by 10 cm^2. Gouge material consisted of soda-lime glass beads with initial particle size between 105 and 149 micrometers. Prior to shearing, we impose a constant fault normal stress of 2 MPa using a servo-controlled load-feedback mechanism and allow the sample to compact. Once the sample has reached a constant layer thickness, the central block is driven down at constant rate of 10 micrometers per second. In tandem, we collect an AE signal continuously at 4 MHz from a piezoceramic sensor embedded in a steel forcing block about 22 mm from the gouge layer The data from this experiment can be used with the deep learning algorithm to train it for future fault property prediction.
Data from: Modeling the Spread of a Livestock Disease With Semi-Supervised...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Modeling the Spread of a Livestock Disease With Semi-Supervised Spatiotemporal Deep Neural Networks [Dataset]. https://catalog.data.gov/dataset/data-from-modeling-the-spread-of-a-livestock-disease-with-semi-supervised-spatiotemporal-d-bdd33
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
This dataset contains the spatiotemporal data used to train the spatiotemporal deep neural networks described in "Modeling the Spread of a Livestock Disease With Semi-Supervised Spatiotemporal Deep Neural Networks". The dataset consists of two sets of NumPy arrays. The first set: X_grid.npy and Y_grid.npy were used to train the convolutional LSTM, while the second set: X_graph.npy, Y_graph.npy, and edge_index.npy were used to train the graph convolutional LSTM. The data consists of spatiotemporally varying environmental and anthropogenic variables along with case reports of vesicular stomatitis. Resources in this dataset:Resource Title: NumPy Arrays of Spatiotemporal Features and VS Cases. File Name: vs_data.zipResource Description: This is a ZIP archive containing five NumPy arrays of spatiotemporal features and geotagged VS cases.Resource Software Recommended: NumPy,url: https://numpy.org/
Cats vs Dogs Redux Transfer Features
kaggle.com
zip
Updated Aug 22, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kanwalinder Singh (2018). Cats vs Dogs Redux Transfer Features [Dataset]. https://www.kaggle.com/kanwalinder/cats-vs-dogs-redux-transfer-features
Explore at:
zip(1345261572 bytes)Available download formats
Dataset updated
Aug 22, 2018
Authors
Kanwalinder Singh
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Most machine learning courses start by implementing a fully-connected Deep Neural Network (DNN) and proceed towards Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), teaching skills on how to manage training, inference, and deployment along the way. For most beginners, the problem with building DNNs from scratch is that either the input data has to be grossly simplified ( working with 64x64x3 images for example) or the network has so many parameters that it is very hard to train. Meanwhile, Transfer Learning has made building even CNNs and RNNs from scratch unnecessary and one can reuse and/or fine tune publicly available CNNs like Inception V3 with very little data for a new problem.

The purpose of this dataset is to make a large dataset of 25000 training examples and 12500 test examples available from the ever popular Dogs vs Cats Redux competition, suitable for students just starting on machine learning. The base dataset, which consists of fairly large image sizes, has been transferred through publicly available CNNs like Inception V3, Inception Resnet V2, Resnet 50, Xception, and MobileNet, creating features that are very easy to build a pretty good DNN classifier with. This should make learning to build DNNs from scratch easy to do, while learning a bit of transfer learning and even "competing" in Dogs vs Cats Redux for kicks!

Content

As mentioned, the input data for this dataset are images from the Dogs vs Cats Redux competition. All transfer learning CNN models were obtained from keras.applications. The features derived by processing the input images through the transfer models are flat (25000x2048 training examples and 12500x2048 test examples when using Inception V3) and ready for ingestion into a DNN. In addition, the dataset provides ids from the original training and test examples so classification results can be reviewed against the base data.

Note that while the classic goal of transfer learning is to apply a network on a smaller dataset and/or fine tune the transferred network on said dataset, the purpose of this dataset is subtly different: make a large dataset available for beginners to build DNNs with. Of course, a subset of the dataset can be used for classification and the base transfer models can be fine tuned.

Acknowledgements

Francois Chollet's Keras framework, specifically keras.applications.

Dr. Andrew Ng's deeplearning.ai specialization on Coursera. In my spare time, I mentor students in Coursera's Neural Networks and Deep Learning and Convolutional Neural Networks courses.

Inspiration

Initially I am posting just the dataset, and will later post the kernel that produced the dataset and a kernel that will use the dataset to classify for Dogs vs Cats Redux. Can you duplicate the log loss score of 0.21 currently possible with reusing the transfer models with no fine-tuning? Can you get into the top 50 by fine tuning the base models and/or augmenting the input data?
IITM_CS6910_Assignment_dataset
kaggle.com
zip
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anik Bhowmick ae20b102 (2024). IITM_CS6910_Assignment_dataset [Dataset]. https://www.kaggle.com/datasets/anikbhowmickae20b102/iitm-cs6910-assignment-dataset
Explore at:
zip(3093048 bytes)Available download formats
Dataset updated
Mar 6, 2024
Authors
Anik Bhowmick ae20b102
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset is a part of the course assignment in IIT Madras. This dataset is ideal for people who are new to Neural networks. The data essentially contains features extracted from the images. The goal is to train the model for multiclass classification purpose.
Manga Vs Classic Art Style Comic Images Dataset
kaggle.com
zip
Updated Apr 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Saral Agrawal (2024). Manga Vs Classic Art Style Comic Images Dataset [Dataset]. https://www.kaggle.com/datasets/saralagrawal/manga-vs-classic-art-style-comic-images/code
Explore at:
zip(52208257 bytes)Available download formats
Dataset updated
Apr 15, 2024
Authors
Saral Agrawal
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This data was collected to train a Convolutional Neural Network Classifier for Manga Vs Classic art style comic images using Transfer Learning (VGG16) and was later deployed using Flask.
S
Deep learning based Missing Data Imputation
scidb.cn
Updated Mar 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahjabeen Tahir (2024). Deep learning based Missing Data Imputation [Dataset]. http://doi.org/10.57760/sciencedb.16599
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.57760/sciencedb.16599
Dataset updated
Mar 4, 2024
Dataset provided by
Science Data Bank
Authors
Mahjabeen Tahir
Description
The code provided is related to training an autoencoder, evaluating its performance, and using it for imputing missing values in a dataset. Let's break down each part:Training the Autoencoder (train_autoencoder function):This function takes an autoencoder model and the input features as input.It trains the autoencoder using the input features as both input and target output (hence features, features).The autoencoder is trained for a specified number of epochs (epochs) with a given batch size (batch_size).The shuffle=True argument ensures that the data is shuffled before each epoch to prevent the model from memorizing the input order.After training, it returns the trained autoencoder model and the training history.Evaluating the Autoencoder (evaluate_autoencoder function):This function takes a trained autoencoder model and the input features as input.It uses the trained autoencoder to predict the reconstructed features from the input features.It calculates Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) scores between the original and reconstructed features.These metrics provide insights into how well the autoencoder is able to reconstruct the input features.Imputing with the Autoencoder (impute_with_autoencoder function):This function takes a trained autoencoder model and the input features as input.It identifies missing values (e.g., -9999) in the input features.For each row with missing values, it predicts the missing values using the trained autoencoder.It replaces the missing values with the predicted values.The imputed features are returned as output.To reuse this code:Load your dataset and preprocess it as necessary.Build an autoencoder model using the build_autoencoder function.Train the autoencoder using the train_autoencoder function with your input features.Evaluate the performance of the autoencoder using the evaluate_autoencoder function.If your dataset contains missing values, use the impute_with_autoencoder function to impute them with the trained autoencoder.Use the trained autoencoder for any other relevant tasks, such as feature extraction or anomaly detection.
Load times during file-format analysis.
plos.figshare.com
xlsx
Updated Aug 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jesper Strøm; Andreas Larsen Engholm; Kristian Peter Lorenzen; Kaare B. Mikkelsen (2024). Load times during file-format analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0307202.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0307202.s002
Dataset updated
Aug 6, 2024
Dataset provided by
PLOShttp://plos.org/
Authors
Jesper Strøm; Andreas Larsen Engholm; Kristian Peter Lorenzen; Kaare B. Mikkelsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The individual values, mean and standard deviation for file-format load times during the analysis. (XLSX)
D
Supporting Data for: UltraMNIST Classification: A Benchmark to Train CNNs...
dataverse.no
dataverse.azure.uit.no
+1more
csv, txt, zip
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepak K. Gupta; Udbhav Bhamba; Abhishek Thakur; Akash Gupta; Suraj Sharan; Ertugrul Demir; Dilip K. Prasad; Dilip K. Prasad; Deepak K. Gupta; Udbhav Bhamba; Abhishek Thakur; Akash Gupta; Suraj Sharan; Ertugrul Demir (2023). Supporting Data for: UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images [Dataset]. http://doi.org/10.18710/4F4KJS
Explore at:
zip(9346307754), zip(9428095020), csv(382013), txt(6374)Available download formats
Unique identifier
https://doi.org/10.18710/4F4KJS
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Deepak K. Gupta; Udbhav Bhamba; Abhishek Thakur; Akash Gupta; Suraj Sharan; Ertugrul Demir; Dilip K. Prasad; Dilip K. Prasad; Deepak K. Gupta; Udbhav Bhamba; Abhishek Thakur; Akash Gupta; Suraj Sharan; Ertugrul Demir
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Dataset funded by
The Research Council of Norway
UiT The Arctic University of Norway
Description
Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present ‘UltraMNIST dataset’, a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: ‘UltraMNIST classification’ and ‘Budget-aware UltraMNIST classification’. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. UltraMNIST dataset comprises very large-scale images, each of 4000x4000 pixels with 3-5 digits per image. Each of these digits has been extracted from the original MNIST dataset. Your task is to predict the sum of the digits per image, and this number can be anything from 0 to 27.

Facebook

Twitter

Click to copy link

Link copied

Cite

National Institute of Standards and Technology (2025). Training and Validation Datasets for Neural Network to Fill in Missing Data in EBSD Maps [Dataset]. https://catalog.data.gov/dataset/training-and-validation-datasets-for-neural-network-to-fill-in-missing-data-in-ebsd-maps

Training and Validation Datasets for Neural Network to Fill in Missing Data in EBSD Maps

Explore at:

Dataset updated

Jul 9, 2025

Dataset provided by

National Institute of Standards and Technologyhttp://www.nist.gov/

Description

This dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.

Clear search

Close search

Google apps

Main menu

Training and Validation Datasets for Neural Network to Fill in Missing Data...

Neural Network - Observation dataset

Training and testing XRD dataset for crystallite size and microstrain...

Code supporting the paper: 1D neural network

Data from: Neural Network Matrix Product Operator: A Multi-Dimensionally...

Convolutional Neural Networks for Classifying Combinatorial Metamaterials

Training dataset used in the magazine paper entitled "A Flexible Machine...

Convolutional Neural Networks for Classifying Combinatorial Metamaterials

Data from: Domain-specific neural networks improve automated bird sound...

Data from: Self-Supervised Representation Learning on Neural Network Weights...

Data Repository for: On Reducing the Amount of Samples Required for Training...

Training datasets for AIMNet2 machine-learned neural network potential

Data from: Processed Lab Data for Neural Network-Based Shear Stress Level...

Data from: Modeling the Spread of a Livestock Disease With Semi-Supervised...

Cats vs Dogs Redux Transfer Features

Context

Content

Acknowledgements

Inspiration

IITM_CS6910_Assignment_dataset

Manga Vs Classic Art Style Comic Images Dataset

Deep learning based Missing Data Imputation

Load times during file-format analysis.

Supporting Data for: UltraMNIST Classification: A Benchmark to Train CNNs...

Training and Validation Datasets for Neural Network to Fill in Missing Data in EBSD Maps