Facebook
TwitterThis dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A dataset with observations to train a neural network.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Numpy tensors to train and test a convolutional neural network dedicated to determine crystallite size and/or microstrain from X-ray diffraction data (XRD): train_size.npz: training dataset with only crystallite size test_size.npz: testing dataset with only crystallite size train_size_strain.npz: training dataset with crystallite size and microstrain test_size_strain.npz: testing dataset with crystallite size and microstrain Each dataset contains the XRD data and the labels ("ground truth") in the form of 2D tensors with 10501 data points (columns) for the XRD data, and 24 labels (columns) for the labels. Training data contain 71971 rows ; testing data contain 7997 rows. Example python script to read the data: import numpy as np train = np.load("train_size.npz") train_data, train_label = train["train_data"], train["train_label"] print(f"Train data shape: {train_data.shape}, Train labels shape: {train_label.shape}") Jupyter notebooks to train and test a neural network can be found here: https://github.com/aboulle/LPA-NN
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This online resource shows two archived folders: Matlab and Python, that contain relevant code for the article: A Bayesian finite-element trained machine learning approach for predicting post-burn contraction.
One finds the codes used to generate the large dataset within the Matlab folder. Here, the file Main.m is the main file and from there, one can run the Monte Carlo simulation. There is a README file.
Within the Python folder, one finds the codes used for training the neural networks and creating the online application. The file Data.mat contains the data generated by the Matlab Monte Carlo simulation. The files run_bound.py, run_rsa.py, and run_tse.py train the neural networks, of which the best scoring ones are saved in the folder Training. The DashApp folder contains the code for the creation of the Application.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For Pompon-main.zip
README.md at first for installationdocs/data/*npy. They can be loaded by numpy.load.docs/data/bagel_h2co_dft.s0.harmonic.jsondocs/notebook/_h2co_opt.py. It is easily executable by uv run _h2co_opt.py if you have installed uv.
docs/data/nnmpo_final_rmse_8.365e-04.h5. (HDF5 format)docs/notebook/create-random-mpo.ipynb and docs/notebook/nnmpo_to_itensor_mpo.ipynb (Need ITensors.jl version 0.6.x)ITensors.jl is docs/notebook/itensor_vDMRG.ipynb.If you have any questions, please post an issue on GitHub.
Discvar-main.zip is an implementation of discrete variable representation (DVR).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the training and test data, as well as the trained neural networks as used for the paper 'Machine Learning of Combinatorial Rules in Mechanical Metamaterials', as published in XXX. In this paper, a neural network is used to classify each (k \times k) unit cell design into one of two classes (C or I). Additionally, the performance of the trained networks is analysed in detail. A more detailed description of the contents of the dataset follows below. NeuralNetwork_train_and_test_data.zip This file contains the train and test data used to train the Convolutional Neural Networks (CNNs) of the paper. Each unit cell size has its own file, and is saved in a zipped numpy file type (.npz). CNN_saves_kxk.zip This file contains the parameter configurations of the CNNs trained on (k \times k) unit cells. Every hyperparameter (number of filters nf, number of hidden neurons nh, learning rate lr) combination is saved separately. The neural networks can be loaded using Google's TensorFlow package in Python, specifically using the 'tf.keras.models.load_model' function.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Flexible Machine Learning-Aware Architecture for Future WLANs
Authors: Francesc Wilhelmi, Sergio Barrachina-Muñoz, Boris Bellalta, Cristina Cano, Anders Jonsson & Vishnu Ram.
Abstract: Lots of hopes have been placed in Machine Learning (ML) as a key enabler of future wireless networks. By taking advantage of the large volumes of data generated by networks, ML is expected to deal with the ever-increasing complexity of networking problems. Unfortunately, current networking systems are not yet prepared for supporting the ensuing requirements of ML-based applications, especially for enabling procedures related to data collection, processing, and output distribution. This article points out the architectural requirements that are needed to pervasively include ML as part of future wireless networks operation. To this aim, we propose to adopt the International Telecommunications Union (ITU) unified architecture for 5G and beyond. Specifically, we look into Wireless Local Area Networks (WLANs), which, due to their nature, can be found in multiple forms, ranging from cloud-based to edge-computing-like deployments. Based on ITU's architecture, we provide insights on the main requirements and the major challenges of introducing ML to the multiple modalities of WLANs.
Dataset description: This is the dataset generated for training a Neural Network (NN) in the Access Point (AP) (re)association problem in IEEE 802.11 Wireless Local Area Networks (WLANs).
In particular, the NN is meant to output a prediction function of the throughput that a given station (STA) can obtain from a given Access Point (AP) after association. The features included in the dataset are:
Identifier of the AP to which the STA has been associated.
RSSI obtained from the AP to which the STA has been associated.
Data rate in bits per second (bps) that the STA is allowed to use for the selected AP.
Load in packets per second (pkt/s) that the STA generates.
Percentage of data that the AP is able to serve before the user association is done.
Amount of traffic load in pkt/s handled by the AP before the user association is done.
Airtime in % that the AP enjoys before the user association is done.
Throughput in pkt/s that the STA receives after the user association is done.
The dataset has been generated through random simulations, based on the model provided in https://github.com/toniadame/WiFi_AP_Selection_Framework. More details regarding the dataset generation have been provided in https://github.com/fwilhelmi/machine_learning_aware_architecture_wlans.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the training and test data, as well as the trained neural networks as used for the paper 'Machine Learning of Implicit Combinatorial Rules in Mechanical Metamaterials', as published in Physical Review Letters.
In this paper, a neural network is used to classify each \(k \times k\) unit cell design of metamaterial M1 and M2 into one of two classes (C or I). Additionally, the performance of the trained networks is analysed in detail. A more detailed description of the contents of the dataset follows below.
NeuralNetwork_train_and_test_data.zip
This file contains the train and test data used to train the Convolutional Neural Networks (CNNs) of the paper. Each unit cell size has its own file, and is saved in a zipped numpy file type (.npz). It contains data for metamaterial M1 ("smiley_cube"), and metamaterial M2 classification (i) ("prek_xy") and (ii) ("unimodal_vs_oligomodal_inc_stripmodes").
CNN_saves_kxk.zip
This file contains the parameter configurations of the CNNs trained on \(k \times k\) unit cells for metamaterial M2 classification (ii). Classification (i) is denoted by an additional M2ii in the file name. Metamaterial M1 is denoted by an extra M1 in the file name. Every hyperparameter (number of filters nf, number of hidden neurons nh, learning rate lr) combination is saved separately. The neural networks can be loaded using Google's TensorFlow package in Python, specifically using the 'tf.keras.models.load_model' function.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
An automatic bird sound recognition system is a useful tool for collecting data of different bird species for ecological analysis. Together with autonomous recording units (ARUs), such a system provides a possibility to collect bird observations on a scale that no human observer could ever match. During the last decades progress has been made in the field of automatic bird sound recognition, but recognizing bird species from untargeted soundscape recordings remains a challenge. In this article we demonstrate the workflow for building a global identification model and adjusting it to perform well on the data of autonomous recorders from a specific region. We show how data augmentation and a combination of global and local data can be used to train a convolutional neural network to classify vocalizations of 101 bird species. We construct a model and train it with a global data set to obtain a base model. The base model is then fine-tuned with local data from Southern Finland in order to adapt it to the sound environment of a specific location and tested with two data sets: one originating from the same Southern Finnish region and another originating from a different region in German Alps. Our results suggest that fine-tuning with local data significantly improves the network performance. Classification accuracy was improved for test recordings from the same area as the local training data (Southern Finland) but not for recordings from a different region (German Alps). Data augmentation enables training with a limited number of training data and even with few local data samples significant improvement over the base model can be achieved. Our model outperforms the current state-of-the-art tool for automatic bird sound classification. Using local data to adjust the recognition model for the target domain leads to improvement over general non-tailored solutions. The process introduced in this article can be applied to build a fine-tuned bird sound classification model for a specific environment. Methods This repository contains data and recognition models described in paper Domain-specific neural networks improve automated bird sound recognition already with small amount of local data. (Lauha et al., 2022).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".
Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.
Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)
Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Simulation experiment data for training Quantum Neural Networks (QNNs) using entangled datasets. The experiments investigate the validity of the lower bounds for the expected risk after training QNNs given by the extensions to the Quantum No-Free-Lunch theorem presented in the related publication. The QNNs are trained with (i) samples of varying Schmidt rank, (ii) orthogonal samples of fixed Schmidt rank and (iii) linearly dependent samples of fixed Schmidt rank. The dataset contains raw experiment data (directory "raw_data"), analyzed mean risks and errors (directory "plot_data") and the resulting plots (directory "plots"). Experiments: The experiments train QNNs using various compositions of training samples on a simulator and extract the risk after training to compute average risks. Experiment 1: Trains QNNs using entangled training samples of varying Schmidt rank. The average Schmidt rank and the number of training samples are controlled. Raw data: average_rank_results.zip; Computed average risks: avg_rank_risks.npy; Computed average losses: avg_rank_losses.npy; Plotted average risks: avg_rank_experiments.pdf; Plotted average losses: avg_rank_losses.pdf. Experiment 2: Trains QNNs using entangled orthogonal training samples. The number of training samples is controlled and the Schmidt rank is fixed such that d=r*t for the dimension d of the Hilbert space. Raw data: orthogonal_results.zip; Computed average risks: orthogonal_exp_points.npy; Plotted average risks: orthogonal_experiments.pdf. Experiment 3: Trains QNNs using entangled linearly dependent training samples. The number of training samples is controlled and the Schmidt rank is fixed such that d=r*t for the dimension d of the Hilbert space. Raw data: not_linearly_independent_results.zip; Computed average risks: nlihx_exp_points.npy; Plotted average risks: nlihx_experiments.pdf Additionally, this repository contains the reproduction data for Figure 1 (phases_in_orthogonal_training.zip). This file contains the training data, the target unitary and the resulting hypothesis unitary for orthogonal training samples of (i) high risk and (ii) low risk. For the code to reproduce and analyze the experiments see the Code repository.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The datasets contain molecular structures and the properties computed with B97-3c (GGA DFT) or wB97M-def2-TZVPP (range-separated hybrid DFT) methods. Each data file contains about 20M structures. DFT calculation performed with ORCA 5.0.3 software. Properties include energy, forces, atomic charges, and molecular dipole and quadrupole moments.
Facebook
TwitterMachine learning can be used to predict fault properties such as shear stress, friction, and time to failure using continuous records of fault zone acoustic emissions. The files are extracted features and labels from lab data (experiment p4679). The features are extracted with a non-overlapping window from the original acoustic data. The first column is the time of the window. The second and third columns are the mean and the variance of the acoustic data in this window, respectively. The 4th-11th column is the the power spectrum density ranging from low to high frequency. And the last column is the corresponding label (shear stress level). The name of the file means which driving velocity the sequence is generated from. Data were generated from laboratory friction experiments conducted with a biaxial shear apparatus. Experiments were conducted in the double direct shear configuration in which two fault zones are sheared between three rigid forcing blocks. Our samples consisted of two 5-mm-thick layers of simulated fault gouge with a nominal contact area of 10 by 10 cm^2. Gouge material consisted of soda-lime glass beads with initial particle size between 105 and 149 micrometers. Prior to shearing, we impose a constant fault normal stress of 2 MPa using a servo-controlled load-feedback mechanism and allow the sample to compact. Once the sample has reached a constant layer thickness, the central block is driven down at constant rate of 10 micrometers per second. In tandem, we collect an AE signal continuously at 4 MHz from a piezoceramic sensor embedded in a steel forcing block about 22 mm from the gouge layer The data from this experiment can be used with the deep learning algorithm to train it for future fault property prediction.
Facebook
TwitterThis dataset contains the spatiotemporal data used to train the spatiotemporal deep neural networks described in "Modeling the Spread of a Livestock Disease With Semi-Supervised Spatiotemporal Deep Neural Networks". The dataset consists of two sets of NumPy arrays. The first set: X_grid.npy and Y_grid.npy were used to train the convolutional LSTM, while the second set: X_graph.npy, Y_graph.npy, and edge_index.npy were used to train the graph convolutional LSTM. The data consists of spatiotemporally varying environmental and anthropogenic variables along with case reports of vesicular stomatitis. Resources in this dataset:Resource Title: NumPy Arrays of Spatiotemporal Features and VS Cases. File Name: vs_data.zipResource Description: This is a ZIP archive containing five NumPy arrays of spatiotemporal features and geotagged VS cases.Resource Software Recommended: NumPy,url: https://numpy.org/
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Most machine learning courses start by implementing a fully-connected Deep Neural Network (DNN) and proceed towards Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN), teaching skills on how to manage training, inference, and deployment along the way. For most beginners, the problem with building DNNs from scratch is that either the input data has to be grossly simplified ( working with 64x64x3 images for example) or the network has so many parameters that it is very hard to train. Meanwhile, Transfer Learning has made building even CNNs and RNNs from scratch unnecessary and one can reuse and/or fine tune publicly available CNNs like Inception V3 with very little data for a new problem.
The purpose of this dataset is to make a large dataset of 25000 training examples and 12500 test examples available from the ever popular Dogs vs Cats Redux competition, suitable for students just starting on machine learning. The base dataset, which consists of fairly large image sizes, has been transferred through publicly available CNNs like Inception V3, Inception Resnet V2, Resnet 50, Xception, and MobileNet, creating features that are very easy to build a pretty good DNN classifier with. This should make learning to build DNNs from scratch easy to do, while learning a bit of transfer learning and even "competing" in Dogs vs Cats Redux for kicks!
As mentioned, the input data for this dataset are images from the Dogs vs Cats Redux competition. All transfer learning CNN models were obtained from keras.applications. The features derived by processing the input images through the transfer models are flat (25000x2048 training examples and 12500x2048 test examples when using Inception V3) and ready for ingestion into a DNN. In addition, the dataset provides ids from the original training and test examples so classification results can be reviewed against the base data.
Note that while the classic goal of transfer learning is to apply a network on a smaller dataset and/or fine tune the transferred network on said dataset, the purpose of this dataset is subtly different: make a large dataset available for beginners to build DNNs with. Of course, a subset of the dataset can be used for classification and the base transfer models can be fine tuned.
Francois Chollet's Keras framework, specifically keras.applications.
Dr. Andrew Ng's deeplearning.ai specialization on Coursera. In my spare time, I mentor students in Coursera's Neural Networks and Deep Learning and Convolutional Neural Networks courses.
Initially I am posting just the dataset, and will later post the kernel that produced the dataset and a kernel that will use the dataset to classify for Dogs vs Cats Redux. Can you duplicate the log loss score of 0.21 currently possible with reusing the transfer models with no fine-tuning? Can you get into the top 50 by fine tuning the base models and/or augmenting the input data?
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is a part of the course assignment in IIT Madras. This dataset is ideal for people who are new to Neural networks. The data essentially contains features extracted from the images. The goal is to train the model for multiclass classification purpose.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This data was collected to train a Convolutional Neural Network Classifier for Manga Vs Classic art style comic images using Transfer Learning (VGG16) and was later deployed using Flask.
Facebook
TwitterThe code provided is related to training an autoencoder, evaluating its performance, and using it for imputing missing values in a dataset. Let's break down each part:Training the Autoencoder (train_autoencoder function):This function takes an autoencoder model and the input features as input.It trains the autoencoder using the input features as both input and target output (hence features, features).The autoencoder is trained for a specified number of epochs (epochs) with a given batch size (batch_size).The shuffle=True argument ensures that the data is shuffled before each epoch to prevent the model from memorizing the input order.After training, it returns the trained autoencoder model and the training history.Evaluating the Autoencoder (evaluate_autoencoder function):This function takes a trained autoencoder model and the input features as input.It uses the trained autoencoder to predict the reconstructed features from the input features.It calculates Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R2) scores between the original and reconstructed features.These metrics provide insights into how well the autoencoder is able to reconstruct the input features.Imputing with the Autoencoder (impute_with_autoencoder function):This function takes a trained autoencoder model and the input features as input.It identifies missing values (e.g., -9999) in the input features.For each row with missing values, it predicts the missing values using the trained autoencoder.It replaces the missing values with the predicted values.The imputed features are returned as output.To reuse this code:Load your dataset and preprocess it as necessary.Build an autoencoder model using the build_autoencoder function.Train the autoencoder using the train_autoencoder function with your input features.Evaluate the performance of the autoencoder using the evaluate_autoencoder function.If your dataset contains missing values, use the impute_with_autoencoder function to impute them with the trained autoencoder.Use the trained autoencoder for any other relevant tasks, such as feature extraction or anomaly detection.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The individual values, mean and standard deviation for file-format load times during the analysis. (XLSX)
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present ‘UltraMNIST dataset’, a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: ‘UltraMNIST classification’ and ‘Budget-aware UltraMNIST classification’. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. UltraMNIST dataset comprises very large-scale images, each of 4000x4000 pixels with 3-5 digits per image. Each of these digits has been extracted from the original MNIST dataset. Your task is to predict the sum of the digits per image, and this number can be anything from 0 to 27.
Facebook
TwitterThis dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.