100+ datasets found

d
Training and validation data from the AI for Critical Mineral Assessment...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Training and validation data from the AI for Critical Mineral Assessment Competition [Dataset]. https://catalog.data.gov/dataset/training-and-validation-data-from-the-ai-for-critical-mineral-assessment-competition
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the competition are provided here, as well as competition details and baseline solutions. The data are derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.
CNN models and training, validation and test datasets for "PlotMI:...
zenodo.org
application/gzip
Updated Sep 15, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tuomo Hartonen; Tuomo Hartonen; Teemu Kivioja; Jussi Taipale; Teemu Kivioja; Jussi Taipale (2021). CNN models and training, validation and test datasets for "PlotMI: interpretation of pairwise interactions and positional preferences learned by a deep learning model from sequence data" [Dataset]. http://doi.org/10.5281/zenodo.5508698
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5508698
Dataset updated
Sep 15, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Tuomo Hartonen; Tuomo Hartonen; Teemu Kivioja; Jussi Taipale; Teemu Kivioja; Jussi Taipale
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Convolutional neural network (CNN) models and their respective training, validation and test datasets used in manuscript:

Tuomo Hartonen, Teemu Kivioja and Jussi Taipale, "PlotMI: interpretation of pairwise interactions and positional preferences learned by a deep learning model from sequence data"
H
Rainbow training and validation data
dataverse.harvard.edu
Updated Nov 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kimberly Carlson (2022). Rainbow training and validation data [Dataset]. http://doi.org/10.7910/DVN/YTRMGN
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/YTRMGN
Dataset updated
Nov 26, 2022
Dataset provided by
Harvard Dataverse
Authors
Kimberly Carlson
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset includes the date and time, latitude (“lat”), longitude (“lon”), sun angle (“sun_angle”, in degrees [o]), rainbow presence (TRUE = rainbow, FALSE = no rainbow), cloud cover (“cloud_cover”, proportion), and liquid precipitation (“liquid_precip”, kg m-2 s-1) for each record used to train and/or validate the models.
Z
Data for training, validation and testing of methods in the thesis:...
data.niaid.nih.gov
zenodo.org
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Hajduková (2021). Data for training, validation and testing of methods in the thesis: Camera-based Accuracy Improvement of Indoor Localization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730337
Explore at:
Dataset updated
May 1, 2021
Dataset authored and provided by
Lucia Hajduková
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The package contains files for two modules designed to improve the accuracy of the indoor positioning system, namely the following:

door detection

videos_test - videos used to demonstrate the application of door detector

videos_res - videos from videos_test directory with detected doors marked

parts detection

frames_train_val - images generated from videos used for training and validation of VGG16 neural network model

frames_test - images generated from videos used for testing of the trained model

videos_test - videos used to demonstrate the application of parts detector

videos_res - videos from videos_test directory with detected parts marked
f
Training, test data and model parameters.
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund (2023). Training, test data and model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0077302.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0077302.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOS ONE
Authors
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training, test data and model parameters. The last 3 columns show the MinORG, LT and HT parameters used to create the pathogenicity families and build the model for each of the 10 models. Zthr is a threshold value, calculated for each model at the cross validation phase, which is used, given the final prediction score, to decide if the input organisms will be predicted as pathogenic or non-pathogenic. The parameters for each model are chosen after 5-fold cross-validation tests.
Training, Validation and Test data for "On the accuracy of posterior...
zenodo.org
txt
Updated Mar 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harry Bevins; Harry Bevins (2025). Training, Validation and Test data for "On the accuracy of posterior recovery with neural network emulators" [Dataset]. http://doi.org/10.5281/zenodo.15040279
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.15040279
Dataset updated
Mar 17, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Harry Bevins; Harry Bevins
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The repo includes the training, test and validation data used in the paper "On the accuracy of posterior recovery with neural network emulators". Note that due to the convention employed by the emulator framework in the paper the test data is the data used for early stopping and the validation data is used to measure the accuracy of the emulator after training. This is the opposite convention to most machine learning literature.

The corresponding code used in the paper is found at: https://github.com/htjb/validating_posteriors.

`_data.txt` corresponds to the ARES parameters used to generate the signals in `_labels.txt`.
f
Data from: Isometric Stratified Ensembles: A Partial and Incremental...
figshare.com
acs.figshare.com
xlsx
Updated Jun 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christophe Molina; Lilia Ait-Ouarab; Hervé Minoux (2023). Isometric Stratified Ensembles: A Partial and Incremental Adaptive Applicability Domain and Consensus-Based Classification Strategy for Highly Imbalanced Data Sets with Application to Colloidal Aggregation [Dataset]. http://doi.org/10.1021/acs.jcim.2c00293.s003
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.2c00293.s003
Dataset updated
Jun 15, 2023
Dataset provided by
ACS Publications
Authors
Christophe Molina; Lilia Ait-Ouarab; Hervé Minoux
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Partial and incremental stratification analysis of a quantitative structure-interference relationship (QSIR) is a novel strategy intended to categorize classification provided by machine learning techniques. It is based on a 2D mapping of classification statistics onto two categorical axes: the degree of consensus and level of applicability domain. An internal cross-validation set allows to determine the statistical performance of the ensemble at every 2D map stratum and hence to define isometric local performance regions with the aim of better hit ranking and selection. During training, isometric stratified ensembles (ISE) applies a recursive decorrelated variable selection and considers the cardinal ratio of classes to balance training sets and thus avoid bias due to possible class imbalance. To exemplify the interest of this strategy, three different highly imbalanced PubChem pairs of AmpC β-lactamase and cruzain inhibition assay campaigns of colloidal aggregators and complementary aggregators data set available at the AGGREGATOR ADVISOR predictor web page were employed. Statistics obtained using this new strategy show outperforming results compared to former published tools, with and without a classical applicability domain. ISE performance on classifying colloidal aggregators shows from a global AUC of 0.82, when the whole test data set is considered, up to a maximum AUC of 0.88, when its highest confidence isometric stratum is retained.
Z
AHI-CALIOP Collocated Data for Training and Validation of Cloud Masking...
data.niaid.nih.gov
Updated Dec 13, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robbins, Daniel (2021). AHI-CALIOP Collocated Data for Training and Validation of Cloud Masking Neural Networks [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5773419
Explore at:
Dataset updated
Dec 13, 2021
Dataset provided by
Proud, Simon
Robbins, Daniel
Poulsen, Caroline
Siems, Steven
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Collocated data between AHI at 2km resolution (nadir) and CALIOP 1km cloud product v4.20 used for training and validating cloud identification neural networks. The main training and validation data from 2019 is stored in monthly directories, whilst the collocated dataset used to compare the NN, JMA and BoM cloud mask performances is the file "superdf.h5". All collocated data is stored as .h5 files and was built using the Python Pandas package. In this archive, the data has been stored as compressed directories for each month or as a single compressed file in the case of "superdf.h5" using tar with bzip2 compression or just bzip2 compression respectively.
t
Training and validation dataset 2 of milling processes for time series...
service.tib.eu
Updated Nov 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Training and validation dataset 2 of milling processes for time series prediction - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/rdr-doi-10-35097-1738
Explore at:
Dataset updated
Nov 28, 2024
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Abstract: Das Ziel des Datensatzes ist das Training und die Validierung von Modellen zur Vorhersage von Zeitreihen für Fräsprozesse. Dazu wurden an einer DMC 60H Prozesse mit einer Abtastrate von 500 Hz durch eine Siemens Industrial Edge aufgenommen. Die Maschine wurde steuerungstechnisch aufgerüstet. Es wurden Prozesse für das Modelltraining und die Validierung aufgenommen, welche sowohl für die Bearbeitung von Stahl sowie von Aluminium verwendet wurden. Es wurden mehrere Aufnahmen mit und ohne Werkstück (Aircut) erstellt, um möglichst viele Fälle abdecken zu können. Es handelt sich um die gleiche Versuchsreihe wie in "Training and validation dataset of milling processes for time series prediction" mit der DOI 10.5445/IR/1000157789 und hat zum Ziel, eine Untersuchung der Übertragbarkeit von Modellen zwischen verschiedenen Maschinen zu ermöglichen. Abstract: The aim of the dataset is to train and validate models for predicting time series for milling processes. For this purpose, processes were recorded at a sampling rate of 500 Hz by a Siemens Industrial Edge on a DMC 60H. The machine was upgraded in terms of control technology. Processes for model training and validation were recorded, suitable for both steel and aluminum machining. Several recordings were made with and without the workpiece (aircut) in order to cover as many cases as possible. This is the same series of experiments as in "Training and validation dataset of milling processes for time series prediction" with DOI 10.5445/IR/1000157789 and allows an investigation of the transferability of models between different machines. TechnicalRemarks: Documents: -Design of Experiments: Information on the paths such as the technological values of the experiments -Recording information: Information about the recordings with comments -Data: All recorded datasets. The first level contains the folders for training and validation both with and without the workpiece. In the next level, the individual test executions are located. The individual recordings are stored in the form of a JSON file. This consists of a header with all relevant information such as the signal sources. This is followed by the entries of the recorded time series. -NC-Code: NC programs executed on the machine Experimental data: -Machine: Retrofitted DMC 60H -Material: S235JR, 2007 T4 -Tools: -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 5mm -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 10mm -VHM-Fräser HPC, TiSi, ⌀ f8 DC: 20mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 5mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 10mm -Schaftfräser HSS-Co8, TiAlN, ⌀ k10 DC: 5mm -Workpiece blank dimensions: 150x75x50mm License: This work is licensed under a Creative Commons Attribution 4.0 International License. Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0).
f
Physical properties of natural coarse aggregate.
plos.figshare.com
xls
Updated May 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
P. Jagadesh; Afzal Hussain Khan; B. Shanmuga Priya; A. Asheeka; Zineb Zoubir; Hassan M. Magbool; Shamshad Alam; Omer Y. Bakather (2024). Physical properties of natural coarse aggregate. [Dataset]. http://doi.org/10.1371/journal.pone.0303101.t004
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0303101.t004
Dataset updated
May 13, 2024
Dataset provided by
PLOS ONE
Authors
P. Jagadesh; Afzal Hussain Khan; B. Shanmuga Priya; A. Asheeka; Zineb Zoubir; Hassan M. Magbool; Shamshad Alam; Omer Y. Bakather
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This research study aims to understand the application of Artificial Neural Networks (ANNs) to forecast the Self-Compacting Recycled Coarse Aggregate Concrete (SCRCAC) compressive strength. From different literature, 602 available data sets from SCRCAC mix designs are collected, and the data are rearranged, reconstructed, trained and tested for the ANN model development. The models were established using seven input variables: the mass of cementitious content, water, natural coarse aggregate content, natural fine aggregate content, recycled coarse aggregate content, chemical admixture and mineral admixture used in the SCRCAC mix designs. Two normalization techniques are used for data normalization to visualize the data distribution. For each normalization technique, three transfer functions are used for modelling. In total, six different types of models were run in MATLAB and used to estimate the 28th day SCRCAC compressive strength. Normalization technique 2 performs better than 1 and TANSING is the best transfer function. The best k-fold cross-validation fold is k = 7. The coefficient of determination for predicted and actual compressive strength is 0.78 for training and 0.86 for testing. The impact of the number of neurons and layers on the model was performed. Inputs from standards are used to forecast the 28th day compressive strength. Apart from ANN, Machine Learning (ML) techniques like random forest, extra trees, extreme boosting and light gradient boosting techniques are adopted to predict the 28th day compressive strength of SCRCAC. Compared to ML, ANN prediction shows better results in terms of sensitive analysis. The study also extended to determine 28th day compressive strength from experimental work and compared it with 28th day compressive strength from ANN best model. Standard and ANN mix designs have similar fresh and hardened properties. The average compressive strength from ANN model and experimental results are 39.067 and 38.36 MPa, respectively with correlation coefficient is 1. It appears that ANN can validly predict the compressive strength of concrete.
Training and validation datasets for "Three-Dimensional Implicit Structural...
zenodo.org
bin
Updated Apr 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
zhengfa Bi; xinming Wu; zhaoliang Li; dekuan Chang; xueshan Yong; zhengfa Bi; xinming Wu; zhaoliang Li; dekuan Chang; xueshan Yong (2022). Training and validation datasets for "Three-Dimensional Implicit Structural Modeling Using Convolutional Neural Network" [Dataset]. http://doi.org/10.5281/zenodo.6480165
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6480165
Dataset updated
Apr 24, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
zhengfa Bi; xinming Wu; zhaoliang Li; dekuan Chang; xueshan Yong; zhengfa Bi; xinming Wu; zhaoliang Li; dekuan Chang; xueshan Yong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is training and validation datasets used in manuscript "Three-Dimensional Implicit Structural Modeling Using Convolutional Neural Network". In this manuscript, we propose an efficient deep learning method using a Convolutional Neural Network (CNN) to predict a scalar field from sparse structural data associated with multiple distinct stratigraphic layers and faults. The CNN architecture is beneficial for the flexible incorporation of empirical geological knowledge when trained with numerous and realistic structural models that are automatically generated from a data simulation workflow. It also presents an expressive characteristic of integrating various types of structural constraints by optimally minimizing a hybrid loss function to compare predicted and reference structural models, opening new opportunities for further improving geological modeling.
Training and validation data used to produce the pre-trained model for the...
zenodo.org
application/gzip
Updated Jun 22, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Thorsten Wagner; Thorsten Wagner; Gavin Rice; Gavin Rice; Markus Stabrin; Markus Stabrin; Stefan Raunser; Stefan Raunser (2022). Training and validation data used to produce the pre-trained model for the TomoTwin paper. [Dataset]. http://doi.org/10.5281/zenodo.6637456
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6637456
Dataset updated
Jun 22, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Thorsten Wagner; Thorsten Wagner; Gavin Rice; Gavin Rice; Markus Stabrin; Markus Stabrin; Stefan Raunser; Stefan Raunser
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This datasets represents the training and validation data that was used to produce the pre-trained model for the TomoTwin paper. Please see 10.5281/zenodo.6637357 for the raw tomograms.
Dog vs Cat
kaggle.com
Updated Mar 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hamed Etezadi (2022). Dog vs Cat [Dataset]. https://www.kaggle.com/datasets/hamedetezadi/dog-vs-cat/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Hamed Etezadi
Description
Dataset

This dataset was created by Hamed Etezadi

Contents
Machine Learning Dataset
brightdata.com
.json, .csv, .xlsx
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bright Data (2024). Machine Learning Dataset [Dataset]. https://brightdata.com/products/datasets/machine-learning
Explore at:
.json, .csv, .xlsxAvailable download formats
Dataset updated
Jun 19, 2024
Dataset authored and provided by
Bright Datahttps://brightdata.com/
License
https://brightdata.com/licensehttps://brightdata.com/license
Area covered
Worldwide
Description
Utilize our machine learning datasets to develop and validate your models. Our datasets are designed to support a variety of machine learning applications, from image recognition to natural language processing and recommendation systems. You can access a comprehensive dataset or tailor a subset to fit your specific requirements, using data from a combination of various sources and websites, including custom ones. Popular use cases include model training and validation, where the dataset can be used to ensure robust performance across different applications. Additionally, the dataset helps in algorithm benchmarking by providing extensive data to test and compare various machine learning algorithms, identifying the most effective ones for tasks such as fraud detection, sentiment analysis, and predictive maintenance. Furthermore, it supports feature engineering by allowing you to uncover significant data attributes, enhancing the predictive accuracy of your machine learning models for applications like customer segmentation, personalized marketing, and financial forecasting.
FoRC-Subtask-I@NSLP2024 Training and Validation Data
zenodo.org
zip
Updated Jan 2, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raia Abu Ahmad; Raia Abu Ahmad; Ekaterina Borisova; Ekaterina Borisova; Georg Rehm; Georg Rehm (2024). FoRC-Subtask-I@NSLP2024 Training and Validation Data [Dataset]. http://doi.org/10.5281/zenodo.10438530
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10438530
Dataset updated
Jan 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Raia Abu Ahmad; Raia Abu Ahmad; Ekaterina Borisova; Ekaterina Borisova; Georg Rehm; Georg Rehm
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training and validation datasets for the first subtask of the shared task "Field of Research Classification" to be held at the Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2024) workshop (https://nfdi4ds.github.io/nslp2024/).
n
Train, validation, test data sets and confusion matrices underlying...
4tu.edu.hpc.n-helix.com
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen (2023). Train, validation, test data sets and confusion matrices underlying publication: "Automated cell counting for Trypan blue stained cell cultures using machine learning" [Dataset]. http://doi.org/10.4121/21695819.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/21695819.v1
Dataset updated
Sep 7, 2023
Dataset provided by
4TU.ResearchData
Authors
Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Annotated test and train data sets. Both images and annotations are provided separately.

Validation data set for Hi5, Sf9 and HEK cells.

Confusion matrices for the determination of performance parameters
d
Training dataset for NABat Machine Learning V1.0
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Training dataset for NABat Machine Learning V1.0 [Dataset]. https://catalog.data.gov/dataset/training-dataset-for-nabat-machine-learning-v1-0
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
U.S. Geological Survey
Description
Bats play crucial ecological roles and provide valuable ecosystem services, yet many populations face serious threats from various ecological disturbances. The North American Bat Monitoring Program (NABat) aims to assess status and trends of bat populations while developing innovative and community-driven conservation solutions using its unique data and technology infrastructure. To support scalability and transparency in the NABat acoustic data pipeline, we developed a fully-automated machine-learning algorithm. This dataset includes audio files of bat echolocation calls that were considered to develop V1.0 of the NABat machine-learning algorithm, however the test set (i.e., holdout dataset) has been excluded from this release. These recordings were collected by various bat monitoring partners across North America using ultrasonic acoustic recorders for stationary acoustic and mobile acoustic surveys. For more information on how these surveys may be conducted, see Chapters 4 and 5 of “A Plan for the North American Bat Monitoring Program” (https://doi.org/10.2737/SRS-GTR-208). These data were then post-processed by bat monitoring partners to remove noise files (or those that do not contain recognizable bat calls) and apply a species label to each file. There is undoubtedly variation in the steps that monitoring partners take to apply a species label, but the steps documented in “A Guide to Processing Bat Acoustic Data for the North American Bat Monitoring Program” (https://doi.org/10.3133/ofr20181068) include first processing with an automated classifier and then manually reviewing to confirm or downgrade the suggested species label. Once a manual ID label was applied, audio files of bat acoustic recordings were submitted to the NABat database in Waveform Audio File format. From these available files in the NABat database, we considered files from 35 classes (34 species and a noise class). Files for 4 species were excluded due to low sample size (Corynorhinus rafinesquii, N=3; Eumops floridanus, N =3; Lasiurus xanthinus, N = 4; Nyctinomops femorosaccus, N =11). From this pool, files were randomly selected until files for each species/grid cell combination were exhausted or the number of recordings reach 1250. The dataset was then randomly split into training, validation, and test sets (i.e., holdout dataset). This data release includes all files considered for training and validation, including files that had been excluded from model development and testing due to low sample size for a given species or because the threshold for species/grid cell combinations had been met. The test set (i.e., holdout dataset) is not included. Audio files are grouped by species, as indicated by the four-letter species code in the name of each folder. Definitions for each four-letter code, including Family, Genus, Species, and Common name, are also included as a dataset in this release.
Z
DocTOR models and cross-validation dataset
data.niaid.nih.gov
zenodo.org
Updated Mar 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Galletti Cristiano (2022). DocTOR models and cross-validation dataset [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_6337103
Explore at:
Dataset updated
Mar 23, 2022
Dataset authored and provided by
Galletti Cristiano
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset necessary for DocTOR utility.

DocTOR (Direct fOreCast Target On Reaction), is a utility written in python3.9 (using the conda workframe) that allows the user to upload a list of Uniprot IDs and Adverse reactions (from the available models) in order to study the relationship between the two.

On output the program will assign a positive or negative class to the protein, assessing its possible involvement in the selected ADRs onset.

DocTOR exploits the data coming from T-ARDIS [https://doi.org/10.1093/database/baab068] to train different Machine Learning approaches (SVM, RF, NN) using network topological measurements as features.

The prediction coming from the single trained models are combined in a meta-predictor exploiting three different voting systems.

The results of the meta-predictor together with the ones from the single ML method will be available in the output log file (named "predictions_community" or "predictions_curated" based on the database type).

The DocTOR utility is avaiable at https://github.com/cristian931/DocTOR
f
The training dataset and the leave-one-out validation results on Baskerville...
plos.figshare.com
figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dong Wang; Ming Lu; Jing Miao; Tingting Li; Edwin Wang; Qinghua Cui (2023). The training dataset and the leave-one-out validation results on Baskerville et al.' data [14]. [Dataset]. http://doi.org/10.1371/journal.pone.0004421.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0004421.t002
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Dong Wang; Ming Lu; Jing Miao; Tingting Li; Edwin Wang; Qinghua Cui
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
*Once again, kicking one sample out as the testing sample, the rest 28 samples are the training dataset.The four features (columns “1”, “2”, “3”, and “4”) of each miRNA are calculated based on the genomic coordinates of the miRNA, the miRNA hosting intron, and the host gene.ER represents the experimental results and PR represents the prediction results. The symbol “+” means high co-expression and the symbol “−” means low co-expression.
f
Data from: Development and validation of HBV surveillance models using big...
tandf.figshare.com
docx
Updated Dec 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Weinan Dong; Cecilia Clara Da Roza; Dandan Cheng; Dahao Zhang; Yuling Xiang; Wai Kay Seto; William C. W. Wong (2024). Development and validation of HBV surveillance models using big data and machine learning [Dataset]. http://doi.org/10.6084/m9.figshare.25201473.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.25201473.v1
Dataset updated
Dec 3, 2024
Dataset provided by
Taylor & Francis
Authors
Weinan Dong; Cecilia Clara Da Roza; Dandan Cheng; Dahao Zhang; Yuling Xiang; Wai Kay Seto; William C. W. Wong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China. Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework. The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05). Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264) This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China. This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections. We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2024). Training and validation data from the AI for Critical Mineral Assessment Competition [Dataset]. https://catalog.data.gov/dataset/training-and-validation-data-from-the-ai-for-critical-mineral-assessment-competition

Training and validation data from the AI for Critical Mineral Assessment Competition

Explore at:

Dataset updated

Jul 6, 2024

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Description

Extracting useful and accurate information from scanned geologic and other earth science maps is a time-consuming and laborious process involving manual human effort. To address this limitation, the USGS partnered with the Defense Advanced Research Projects Agency (DARPA) to run the AI for Critical Mineral Assessment Competition, soliciting innovative solutions for automatically georeferencing and extracting features from maps. The competition opened for registration in August 2022 and concluded in December 2022. Training and validation data from the competition are provided here, as well as competition details and baseline solutions. The data are derived from published sources and are provided to the public to support continued development of automated georeferencing and feature extraction tools. References for all maps are included with the data.

Clear search

Close search

Google apps

Main menu

Training and validation data from the AI for Critical Mineral Assessment...

CNN models and training, validation and test datasets for "PlotMI:...

Rainbow training and validation data

Data for training, validation and testing of methods in the thesis:...

Training, test data and model parameters.

Training, Validation and Test data for "On the accuracy of posterior...

Data from: Isometric Stratified Ensembles: A Partial and Incremental...

AHI-CALIOP Collocated Data for Training and Validation of Cloud Masking...

Training and validation dataset 2 of milling processes for time series...

Physical properties of natural coarse aggregate.

Training and validation datasets for "Three-Dimensional Implicit Structural...

Training and validation data used to produce the pre-trained model for the...

Dog vs Cat

Dataset

Contents

Machine Learning Dataset

FoRC-Subtask-I@NSLP2024 Training and Validation Data

Train, validation, test data sets and confusion matrices underlying...

Training dataset for NABat Machine Learning V1.0

DocTOR models and cross-validation dataset

The training dataset and the leave-one-out validation results on Baskerville...

Data from: Development and validation of HBV surveillance models using big...

Training and validation data from the AI for Critical Mineral Assessment CompetitionSee More Versions

Training and validation data from the AI for Critical Mineral Assessment Competition