Facebook
TwitterThe dataset used in this paper is not explicitly mentioned, but it is implied to be a large-scale dataset for machine learning.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Acute poisoning is a significant global health burden, and the causative agent is often unclear. The primary aim of this pilot study was to develop a deep learning algorithm that predicts the most probable agent a poisoned patient was exposed to from a pre-specified list of drugs. Data were queried from the National Poison Data System (NPDS) from 2014 through 2018 for eight single-agent poisonings (acetaminophen, diphenhydramine, aspirin, calcium channel blockers, sulfonylureas, benzodiazepines, bupropion, and lithium). Two Deep Neural Networks (PyTorch and Keras) designed for multi-class classification tasks were applied. There were 201,031 single-agent poisonings included in the analysis. For distinguishing among selected poisonings, PyTorch model had specificity of 97%, accuracy of 83%, precision of 83%, recall of 83%, and a F1-score of 82%. Keras had specificity of 98%, accuracy of 83%, precision of 84%, recall of 83%, and a F1-score of 83%. The best performance was achieved in the diagnosis of single-agent poisoning in diagnosing poisoning by lithium, sulfonylureas, diphenhydramine, calcium channel blockers, then acetaminophen, in PyTorch (F1-score = 99%, 94%, 85%, 83%, and 82%, respectively) and Keras (F1-score = 99%, 94%, 86%, 82%, and 82%, respectively). Deep neural networks can potentially help in distinguishing the causative agent of acute poisoning. This study used a small list of drugs, with polysubstance ingestions excluded.Reproducible source code and results can be obtained at https://github.com/ashiskb/npds-workspace.git.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset for neural network training (Gas Network) This dataset is used from the data available on the internet, that is, a large amount of data has been cleaned, extra items have been removed and important items have remained.
Facebook
TwitterThis dataset consists of the synthetic electron backscatter diffraction (EBSD) maps generated for the paper, titled "Hybrid Algorithm for Filling in Missing Data in Electron Backscatter Diffraction Maps" by Emmanuel Atindama, Conor Miller-Lynch, Huston Wilhite, Cody Mattice, Günay Doğan, and Prashant Athavale. The EBSD maps were used to train, test, and validate a neural network algorithm to fill in missing data points in a given EBSD map.The dataset includes 8000 maps for training, 1000 maps for testing, 2000 maps for validation. The dataset also includes noise-added versions of the maps, namely, one more map per each clean map.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A Flexible Machine Learning-Aware Architecture for Future WLANs
Authors: Francesc Wilhelmi, Sergio Barrachina-Muñoz, Boris Bellalta, Cristina Cano, Anders Jonsson & Vishnu Ram.
Abstract: Lots of hopes have been placed in Machine Learning (ML) as a key enabler of future wireless networks. By taking advantage of the large volumes of data generated by networks, ML is expected to deal with the ever-increasing complexity of networking problems. Unfortunately, current networking systems are not yet prepared for supporting the ensuing requirements of ML-based applications, especially for enabling procedures related to data collection, processing, and output distribution. This article points out the architectural requirements that are needed to pervasively include ML as part of future wireless networks operation. To this aim, we propose to adopt the International Telecommunications Union (ITU) unified architecture for 5G and beyond. Specifically, we look into Wireless Local Area Networks (WLANs), which, due to their nature, can be found in multiple forms, ranging from cloud-based to edge-computing-like deployments. Based on ITU's architecture, we provide insights on the main requirements and the major challenges of introducing ML to the multiple modalities of WLANs.
Dataset description: This is the dataset generated for training a Neural Network (NN) in the Access Point (AP) (re)association problem in IEEE 802.11 Wireless Local Area Networks (WLANs).
In particular, the NN is meant to output a prediction function of the throughput that a given station (STA) can obtain from a given Access Point (AP) after association. The features included in the dataset are:
Identifier of the AP to which the STA has been associated.
RSSI obtained from the AP to which the STA has been associated.
Data rate in bits per second (bps) that the STA is allowed to use for the selected AP.
Load in packets per second (pkt/s) that the STA generates.
Percentage of data that the AP is able to serve before the user association is done.
Amount of traffic load in pkt/s handled by the AP before the user association is done.
Airtime in % that the AP enjoys before the user association is done.
Throughput in pkt/s that the STA receives after the user association is done.
The dataset has been generated through random simulations, based on the model provided in https://github.com/toniadame/WiFi_AP_Selection_Framework. More details regarding the dataset generation have been provided in https://github.com/fwilhelmi/machine_learning_aware_architecture_wlans.
Facebook
TwitterBackground Classification of the electrocardiogram using Neural Networks has become a widely used method in recent years. The efficiency of these classifiers depends upon a number of factors including network training. Unfortunately, there is a shortage of evidence available to enable specific design choices to be made and as a consequence, many designs are made on the basis of trial and error. In this study we develop prediction models to indicate the point at which training should stop for Neural Network based Electrocardiogram classifiers in order to ensure maximum generalisation.
Methods
Two prediction models have been presented; one based on Neural Networks and the other on Genetic Programming. The inputs to the models were 5 variable training parameters and the output indicated the point at which training should stop. Training and testing of the models was based on the results from 44 previously developed bi-group Neural Network classifiers, discriminating between Anterior Myocardial Infarction and normal patients.
Results
Our results show that both approaches provide close fits to the training data; p = 0.627 and p = 0.304 for the Neural Network and Genetic Programming methods respectively. For unseen data, the Neural Network exhibited no significant differences between actual and predicted outputs (p = 0.306) while the Genetic Programming method showed a marginally significant difference (p = 0.047).
Conclusions
The approaches provide reverse engineering solutions to the development of Neural Network based Electrocardiogram classifiers. That is given the network design and architecture, an indication can be given as to when training should stop to obtain maximum network generalisation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
In the last years, neural networks have evolved from laboratory environments to the state-of-the-art for many real-world problems. Our hypothesis is that neural network models (i.e., their weights and biases) evolve on unique, smooth trajectories in weight space during training. Following, a population of such neural network models (refereed to as “model zoo”) would form topological structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can be reveal latent properties of individual models. With such zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of neural network weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of neural networks. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of neural network models for further research. In total the proposed model zoo dataset is based on six image datasets, consist of 24 model zoos with varying hyperparameter combinations are generated and includes 47’360 unique neural network models resulting in over 2’415’360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks as mentioned before.
Dataset
This dataset is part of a larger collection of model zoos and contains the zoos trained on the labelled samples from MNIST. All zoos with extensive information and code can be found at www.modelzoos.cc.
This repository contains two types of files: the raw model zoos as collections of models (file names beginning with "mnist_"), as well as preprocessed model zoos wrapped in a custom pytorch dataset class (filenames beginning with "dataset"). Zoos are trained in three configurations varying the seed only (seed), varying hyperparameters with fixed seeds (hyp_fix) or varying hyperparameters with random seeds (hyp_rand). The index_dict.json files contain information on how to read the vectorized models.
For more information on the zoos and code to access and use the zoos, please see www.modelzoos.cc.
Facebook
Twitterhttps://spdx.org/licenses/etalab-2.0.htmlhttps://spdx.org/licenses/etalab-2.0.html
Numpy tensors to train and test a convolutional neural network dedicated to determine crystallite size and/or microstrain from X-ray diffraction data (XRD): train_size.npz: training dataset with only crystallite size test_size.npz: testing dataset with only crystallite size train_size_strain.npz: training dataset with crystallite size and microstrain test_size_strain.npz: testing dataset with crystallite size and microstrain Each dataset contains the XRD data and the labels ("ground truth") in the form of 2D tensors with 10501 data points (columns) for the XRD data, and 24 labels (columns) for the labels. Training data contain 71971 rows ; testing data contain 7997 rows. Example python script to read the data: import numpy as np train = np.load("train_size.npz") train_data, train_label = train["train_data"], train["train_label"] print(f"Train data shape: {train_data.shape}, Train labels shape: {train_label.shape}") Jupyter notebooks to train and test a neural network can be found here: https://github.com/aboulle/LPA-NN
Facebook
TwitterThe paper proposes a Bagging Deep Learning Training Framework (BEND) based on efficient neural network diffusion.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Ethical Clearance Reference Number: 2021FEBEREC-STD- 065
Pre-processing data from published and unpublished previous studies treating biodiesel-, textile-, polymer-, and pulp and paper wastewater using an ABR and EGSB for artificial neural network (ANN) model simulation and developnent.
For ANN problems to be solved, the selection of a suitable learning rate, momentum, the number of neurons from each of the hidden layers and the activation function is crucial. Therefore, the collected data must be prepared in a Microsoft Excel spreadsheet format with input and output columns. A training file is then created with samples of the whole problem domain to select the required parameters. Three data sets are used: a training data set, test data set and validation data set. When the training process takes place, the neural network will be tested against the testing data to determine accuracy, and training will be stopped when the mean average error remains the same for a period of time.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract The amount of unstructured data grows with the popularization of the Internet. Texts in natural language represent a relevant and significant set for the analysis and production of knowledge. This work proposes a quantitative analysis of the preprocessing and training stages of a text classifier, which uses as an attribute the feelings expressed by the users. Artificial Neural Network, as a classifier algorithm, and texts from Amazon, IMDB and Yelp sites were used for the experiments. The database allows the analysis of the expression of positive and negative feelings of the users in evaluations of products and services in unstructured texts. Two distinct processes of preprocessing and different training of the Artificial Neural Networks were carried out to classify the textual set. The results quantitatively confirm the importance of the preprocessing and training stages of the classifier, highlighting the importance of the vocabulary selected for the text representation and classification. The available classification techniques achieve satisfactory results. However, even by using two distinct processes of preprocessing and identifying the best training process, it was not possible to totally eliminate the learning difficulties and understanding of the model for the classifications of feelings that involved subjective characteristics of the expression of human feeling.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The datasets contain molecular structures and the properties computed with B97-3c (GGA DFT) or wB97M-def2-TZVPP (range-separated hybrid DFT) methods. Each data file contains about 20M structures. DFT calculation performed with ORCA 5.0.3 software. Properties include energy, forces, atomic charges, and molecular dipole and quadrupole moments.
Facebook
TwitterI created this dataset as part of a larger, rather educational project, which aims to create a simple web-interface to write simple numerical mathematical operations on a drawing grid (either on a PC, tablet, or smartphone), which is then recognized and evaluated. It also aims, however, to contribute and help other projects that may involve or require large datasets of handwritten digits.
What makes this dataset different from existing ones, is that it lays emphasis on the actual strokes of handwritten signs, instead of just being a compilation of scanned images of existing records (the way, for example, the MNIST dataset was created). Each pixel is strictly full black, or full white (no greytones), and the strokes are rather thin. It is meant to have the information of the stroke itself, and not just a scanned image.
The dataset is meant to stretch and challenge the understanding of a neural network about what makes a specific symbol mean what it does. For this, I initially created a starting dataset of the different ways people write symbols, playing with their internal proportions and adding "ill"-written signs that are still barely recognizable from the rest. After that, I artificially augmented the dataset by performing stretches and small rotations on all images, making sure that a human being would still recognize them. A neural network would be then forced to understand better what gives a symbol its characteristics. I made sure not to delete many, rather ambiguous, cases (e.g. a clockwise-rotated '1' and '/') for the neural network to have to deal and understand nuances during training and classification.
Using 60% of the dataset (randomly selected) to train a neural network with two inner layers, and 20% to validate it, I achieved a +96% validation accuracy using my own implementation of the stochastic gradient descent and backpropagation algorithm.
The resizing, rotating, and scaling algorithms I wrote do not work with images by modifying each pixel, but rather the stroke. This means, that each line of one pixel width is scaled/resized/rotated to a line (longer or shorter) of the same pixel width. This is advantageous for the following reasons: - When I created the dataset, the original images containing many symbols had a pen-stroke that was very thin compared to the symbol's proportions. Using the stroke scaling methods mentioned, resizing to smaller images made the strokes thinner, which is desirable - Transformations on the symbols that change their proportions would not increase stroke width or make the stroke disappear - It is easy to create new datasets out of this one by easily thickening strokes, or adding noise and artificial irregularities
In the following github repository, one can access the code, as well as the modules I created to perform the following tasks: - Import large images containing multiple symbols to be extracted - Agglomerate each independent symbol, perform image processes on them (resizing, scaling, and rotating) and save them to individual images - Create custom datasets out of the individual images for training, validation, and testing of machine learning models (e.g. neural networks)
https://github.com/michheusser/neural-network-training - main_dataset_creation.py - Creation of datasets - main_neural_network_training.py - Training of neural network - datatools (Folder) - Package for dataset and image manipulation - nntools (Folder) - Package with neural network tools (incl. training)
CompleteImages - ca. 300'000 symbol images as .png containing transformation information in their name with syntax: [symbol]_[papersheet_index]_[rotation]_[index in untransformed dataset]_scaled_x[scaling in x]y[scaling in y].png (e.g. +_1_8ccw_26_scaled_x1_2y1_2.png)
The datasets were created in the following way: - I drew each symbol in ~500 ways on pieces of white paper using a thin pen, and scanned them to .pdf images. - All images were run through my 'datatool' module for each symbol to be isolated and fit into a 28x28 greyscale .png with each pixel being either black or white - Each image was transformed with all possible combinations of the following: rotation (-15°, -8°, 0°, 8°, 15°), stretching in each axis (1, 1.2, 1.3). Each transformed image was saved as a .png ...
Facebook
TwitterCommunity science image libraries offer a massive, but largely untapped, source of observational data for phenological research. The iNaturalist platform offers a particularly rich archive, containing more than 49 million verifiable, georeferenced, open access images, encompassing seven continents and over 278,000 species. A critical limitation preventing scientists from taking full advantage of this rich data source is labor. Each image must be manually inspected and categorized by phenophase, which is both time-intensive and costly. Consequently, researchers may only be able to use a subset of the total number of images available in the database. While iNaturalist has the potential to yield enough data for high-resolution and spatially extensive studies, it requires more efficient tools for phenological data extraction. A promising solution is automation of the image annotation process using deep learning. Recent innovations in deep learning have made these open-source tools accessibl...
Facebook
TwitterThe dataset used in this paper is a collection of training data for over-parameterized deep neural networks.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Dataset for neural network training (water network) This dataset is used from the data available on the internet, which means that a large amount of data has been cleaned, extra items have been removed and important items have remained.
Facebook
TwitterDescription of the project This dataset is the result of the research carried out in the project "A Benchmark for Rey-Osterrieth Complex Figure (ROCF) Test Automatic Scoring", whose main goal was to establish a baseline for the scoring task consisting of: a dataset with 528 ROCF and results obtained by several deep learning models, as well as, by a group of psychology experts.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was used by the SKYHAWK project with the team REDHAWK. To download full dataset or to submit a request for your new data collection needs, please drop a mail to: redhawk.team.info@gmail.com
Dive into a Rigorous Collection of 11,000+ multiple types of vehicle images captured and crowdsourced from 100+ urban and rural areas. Each image is meticulously reviewed and verified by the researcher.
Versatile Training Data: Explore a Spectrum of Resolutions and Weather Conditions in Our Dataset for Comprehensive Vehicle Detection Model Training.
Dataset size: 11,000+ images Location: Bangladesh Diversity : Various lighting conditions like day and night, various weather conditions, varied distances, view points, etc. Device used: Captured using mobile phones. Usage: Vehicle detection, Traffic automation, Traffic surveillance, etc.
Available Annotation formats: YOLO, PYTORCH
Facebook
Twitterhttps://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4113https://darus.uni-stuttgart.de/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.18419/DARUS-4113
Replication code and experiment result data for training Quantum Neural Networks with entangled data using one-dimensional projectors as observables. This is the version of the code that was used to generate the experiment results in the related publication. Experiments: - exp_inf_coeffvariation.py: Trains QNNs using training samples of varying Schmidt rank with fixed vector as Schmidt basis state. Varies the associated Schmidt coefficient. - exp_inf_random.py: Trains QNNs using random training data. Experiment results: - exp_inf_coeffvariation.zip and exp_inf_random.zip contain the raw experiment results for both experiments. - For each combination of controlled variables there is one directory containing the result of all 20 runs of the training process. - The results for each run are comprised of 3 files: - [id]_losses.npy: The loss during the training process - [id]_params.npy: The parameters of the QNN after the training process. - [id]_V.npy: The trained QNN exported as a 2^4 * 2^4 unitary matrix. Analysis of data (data_extraction.py): - Computes means and standard deviation of various risk measures and saves the results Plots (plot_obs_risk.py): - Plots the risk w.r.t. the observable for both experiments based on the analysed data obtained from data_extraction.py. - Generates plot_coeffvariation.pdf and plot_random.pdf.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Datasets to NeurIPS 2021 accepted paper "Self-Supervised Representation Learning on Neural Network Weights for Model Characteristic Prediction".
Datasets are pytorch files containing a dictionary with training, validation and test sets. Train, validation and test sets are custom dataset classes which inherit from the standard torch dataset class. Corresponding code an be found at https://github.com/HSG-AIML/NeurIPS_2021-Weight_Space_Learning.
Datasets 41, 42, 43 and 44 are our dataset format wrapped around the zoos from Unterthiner et al, 2020 (https://github.com/google-research/google-research/tree/master/dnn_predict_accuracy)
Abstract: Self-Supervised Learning (SSL) has been shown to learn useful and information-preserving representations. Neural Networks (NNs) are widely applied, yet their weight space is still not fully understood. Therefore, we propose to use SSL to learn neural representations of the weights of populations of NNs. To that end, we introduce domain specific data augmentations and an adapted attention architecture. Our empirical evaluation demonstrates that self-supervised representation learning in this domain is able to recover diverse NN model characteristics. Further, we show that the proposed learned representations outperform prior work for predicting hyper-parameters, test accuracy, and generalization gap as well as transfer to out-of-distribution settings.
Facebook
TwitterThe dataset used in this paper is not explicitly mentioned, but it is implied to be a large-scale dataset for machine learning.