100+ datasets found

g
Process-guided deep learning water temperature predictions: 6 Model...
gimi9.com
data.usgs.gov
+2more
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE) [Dataset]. https://gimi9.com/dataset/data-gov_485517587d70c5aee9050558fc1578749f6351e4/
Explore at:
Description
This dataset includes evaluation data ("test" data) and performance metrics for water temperature predictions from multiple modeling frameworks. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Performance was measured as root-mean squared errors relative to temperature observations during the test period. Test data include compiled water temperature data from a variety of sources, including the Water Quality Portal (Read et al. 2017), the North Temperate Lakes Long-TERM Ecological Research Program (https://lter.limnology.wisc.edu/), the Minnesota department of Natural Resources, and the Global Lake Ecological Observatory Network (gleon.org). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).
Challenge Round 0 (Dry Run) Test Dataset
catalog.data.gov
data.nist.gov
+1more
Updated Jul 29, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Standards and Technology (2022). Challenge Round 0 (Dry Run) Test Dataset [Dataset]. https://catalog.data.gov/dataset/challenge-round-0-dry-run-test-dataset-ff885
Explore at:
Dataset updated
Jul 29, 2022
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Description
This dataset was an initial test harness infrastructure test for the TrojAI program. It should not be used for research. Please use the more refined datasets generated for the other rounds. The data being generated and disseminated is training, validation, and test data used to construct trojan detection software solutions. This data, generated at NIST, consists of human level AIs trained to perform a variety of tasks (image classification, natural language processing, etc.). A known percentage of these trained AI models have been poisoned with a known trigger which induces incorrect behavior. This data will be used to develop software solutions for detecting which trained AI models have been poisoned via embedded triggers. This dataset consists of 200 trained, human level, image classification AI models using the following architectures (Inception-v3, DenseNet-121, and ResNet50). The models were trained on synthetically created image data of non-real traffic signs superimposed on road background scenes. Half (50%) of the models have been poisoned with an embedded trigger which causes misclassification of the images when the trigger is present.
Training data and test data sets for simultaneous inversion of velocity...
zenodo.org
data.niaid.nih.gov
zip
Updated May 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chen Guoxin; Chen Guoxin (2023). Training data and test data sets for simultaneous inversion of velocity density based on U-T [Dataset]. http://doi.org/10.5281/zenodo.7965402
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7965402
Dataset updated
May 25, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Chen Guoxin; Chen Guoxin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Here are the training and testing data sets involved in the numerical experiments in the article that has been submitted to the journal “Journal of Geophysical Research: Solid Earth”, named “Joint Model and Data-Driven Simultaneous Inversion of Velocity and Density”: Marmousi model. Each dataset consists of two parts: a training dataset and a testing dataset. Both training and testing data sets contain three parts: seismic data, velocity model and density model.
f
Predictive modeling of treatment resistant depression using data from STAR*D...
plos.figshare.com
docx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhi Nie; Srinivasan Vairavan; Vaibhav A. Narayan; Jieping Ye; Qingqin S. Li (2023). Predictive modeling of treatment resistant depression using data from STAR*D and an independent clinical study [Dataset]. http://doi.org/10.1371/journal.pone.0197268
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0197268
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Zhi Nie; Srinivasan Vairavan; Vaibhav A. Narayan; Jieping Ye; Qingqin S. Li
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Identification of risk factors of treatment resistance may be useful to guide treatment selection, avoid inefficient trial-and-error, and improve major depressive disorder (MDD) care. We extended the work in predictive modeling of treatment resistant depression (TRD) via partition of the data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) cohort into a training and a testing dataset. We also included data from a small yet completely independent cohort RIS-INT-93 as an external test dataset. We used features from enrollment and level 1 treatment (up to week 2 response only) of STAR*D to explore the feature space comprehensively and applied machine learning methods to model TRD outcome at level 2. For TRD defined using QIDS-C16 remission criteria, multiple machine learning models were internally cross-validated in the STAR*D training dataset and externally validated in both the STAR*D testing dataset and RIS-INT-93 independent dataset with an area under the receiver operating characteristic curve (AUC) of 0.70–0.78 and 0.72–0.77, respectively. The upper bound for the AUC achievable with the full set of features could be as high as 0.78 in the STAR*D testing dataset. Model developed using top 30 features identified using feature selection technique (k-means clustering followed by χ2 test) achieved an AUC of 0.77 in the STAR*D testing dataset. In addition, the model developed using overlapping features between STAR*D and RIS-INT-93, achieved an AUC of > 0.70 in both the STAR*D testing and RIS-INT-93 datasets. Among all the features explored in STAR*D and RIS-INT-93 datasets, the most important feature was early or initial treatment response or symptom severity at week 2. These results indicate that prediction of TRD prior to undergoing a second round of antidepressant treatment could be feasible even in the absence of biomarker data.
Dataset, splits, models, and scripts for the QM descriptors prediction
zenodo.org
explore.openaire.eu
application/gzip
Updated Apr 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green (2024). Dataset, splits, models, and scripts for the QM descriptors prediction [Dataset]. http://doi.org/10.5281/zenodo.10668491
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10668491
Dataset updated
Apr 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shih-Cheng Li; Shih-Cheng Li; Haoyang Wu; Haoyang Wu; Angiras Menon; Angiras Menon; Kevin A. Spiekermann; Kevin A. Spiekermann; Yi-Pei Li; Yi-Pei Li; William H. Green; William H. Green
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset, splits, models, and scripts from the manuscript "When Do Quantum Mechanical Descriptors Help Graph Neural Networks Predict Chemical Properties?" are provided. The curated dataset includes 37 QM descriptors for 64,921 unique molecules across six levels of theory: wB97XD, B3LYP, M06-2X, PBE0, TPSS, and BP86. This dataset is stored in the data.tar.gz file, which also contains a file for multitask constraints applied to various atomic and bond properties. The data splits (training, validation, and test splits) for both random and scaffold-based divisions are saved as separate index files in splits.tar.gz. The trained D-MPNN models for predicting QM descriptors are saved in the models.tar.gz file. The scripts.tar.gz file contains ready-to-use scripts for training machine learning models to predict QM descriptors, as well as scripts for predicting QM descriptors using our trained models on unseen molecules and for applying radial basis function (RBF) expansion to QM atom and bond features.

Below are descriptions of the available scripts:

atom_bond_descriptors.sh: Trains atom/bond targets.

atom_bond_descriptors_predict.sh: Predicts atom/bond targets from pre-trained model.

dipole_quadrupole_moments.sh: Trains dipole and quadrupole moments.

dipole_quadrupole_moments_predict.sh: Predicts dipole and quadrupole moments from pre-trained model.

energy_gaps_IP_EA.sh: Trains energy gaps, ionization potential (IP), and electron affinity (EA).

energy_gaps_IP_EA_predict.sh: Predicts energy gaps, IP, and EA from pre-trained model.

get_constraints.py: Generates constraints file for testing dataset. This generated file needs to be provided before using our trained models to predict the atom/bond QM descriptors of your testing data.

csv2pkl.py: Converts QM atom and bond features to .pkl files using RBF expansion for use with Chemprop software.

Below is the procedure for running the ml-QM-GNN on your own dataset:

Use get_constraints.py to generate a constraint file required for predicting atom/bond QM descriptors with the trained ML models.

Execute atom_bond_descriptors_predict.sh to predict atom and bond properties. Run dipole_quadrupole_moments_predict.sh and energy_gaps_IP_EA_predict.sh to calculate molecular QM descriptors.

Utilize csv2pkl.py to convert the data from predicted atom/bond descriptors .csv file into separate atom and bond feature files (which are saved as .pkl files here).

Run Chemprop to train your models using the additional predicted features supported here.
h
deepvl-training-data
huggingface.co
Updated Apr 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NTNU Autonomous Robots Lab (2025). deepvl-training-data [Dataset]. https://huggingface.co/datasets/ntnu-arl/deepvl-training-data
Explore at:
Dataset updated
Apr 27, 2025
Dataset authored and provided by
NTNU Autonomous Robots Lab
License
https://choosealicense.com/licenses/bsd-3-clause/https://choosealicense.com/licenses/bsd-3-clause/
Description
DeepVL training dataset

Introduction

This dataset repository contains the training and testing datasets used in the paper: "DeepVL: Dynamics and Inertial Measurements-based Deep Velocity Learning for Underwater Odometry". The dataset was collected by manually pilotting an underwater robot in a pool and in the Trondhiem fjord.

Dataset details

The training data is located in the train_full directory and the test data in test directory respectively. The training… See the full description on the dataset page: https://huggingface.co/datasets/ntnu-arl/deepvl-training-data.

SVG Code Generation Sample Training Data

kaggle.com

Updated May 3, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Vinothkumar Sekar (2025). SVG Code Generation Sample Training Data [Dataset]. https://www.kaggle.com/datasets/vinothkumarsekar89/svg-generation-sample-training-data

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 3, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Vinothkumar Sekar

License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

This training data was generated using GPT-4o as part of the 'Drawing with LLM' competition (https://www.kaggle.com/competitions/drawing-with-llms). It can be used to fine-tune small language models for the competition or serve as an augmentation dataset alongside other data sources.

The dataset is generated in two steps using the GPT-4o model. - In the first step, topic descriptions relevant to the competition are generated using a specific prompt. By running this prompt multiple times, over 3,000 descriptions were collected.

 
prompt=f""" I am participating in an SVG code generation competition.
  
   The competition involves generating SVG images based on short textual descriptions of everyday objects and scenes, spanning a wide range of categories. The key guidelines are as follows:
  
   - Descriptions are generic and do not contain brand names, trademarks, or personal names.
   - No descriptions include people, even in generic terms.
   - Descriptions are concise—each is no more than 200 characters, with an average length of about 50 characters.
   - Categories cover various domains, with some overlap between public and private test sets.
  
   To train a small LLM model, I am preparing a synthetic dataset. Could you generate 100 unique topics aligned with the competition style?
  
   Requirements:
   - Each topic should range between **20 and 200 characters**, with an **average around 60 characters**.
   - Ensure **diversity and creativity** across topics.
   - **50% of the topics** should come from the categories of **landscapes**, **abstract art**, and **fashion**.
   - Avoid duplication or overly similar phrasing.
  
   Example topics:
                 a purple forest at dusk, gray wool coat with a faux fur collar, a lighthouse overlooking the ocean, burgundy corduroy, pants with patch pockets and silver buttons, orange corduroy overalls, a purple silk scarf with tassel trim, a green lagoon under a cloudy sky, crimson rectangles forming a chaotic grid,  purple pyramids spiraling around a bronze cone, magenta trapezoids layered on a translucent silver sheet,  a snowy plain, black and white checkered pants,  a starlit night over snow-covered peaks, khaki triangles and azure crescents,  a maroon dodecahedron interwoven with teal threads.
  
   Please return the 100 topics in csv format.
   """

In the second step, SVG code is generated by prompting the GPT-4o model. The following prompt is used to query the model to generate svg.

 
  prompt = f"""
      Generate SVG code to visually represent the following text description, while respecting the given constraints.
      
      Allowed Elements: `svg`, `path`, `circle`, `rect`, `ellipse`, `line`, `polyline`, `polygon`, `g`, `linearGradient`, `radialGradient`, `stop`, `defs`
      Allowed Attributes: `viewBox`, `width`, `height`, `fill`, `stroke`, `stroke-width`, `d`, `cx`, `cy`, `r`, `x`, `y`, `rx`, `ry`, `x1`, `y1`, `x2`, `y2`, `points`, `transform`, `opacity`
      

      Please ensure that the generated SVG code is well-formed, valid, and strictly adheres to these constraints. 
      Focus on a clear and concise representation of the input description within the given limitations. 
      Always give the complete SVG code with nothing omitted. Never use an ellipsis.

      The code is scored based on similarity to the description, Visual question anwering and aesthetic components.
      Please generate a detailed svg code accordingly.

      input description: {text}
      """

The raw SVG output is then cleaned and sanitized using a competition-specific sanitization class. After that, the cleaned SVG is scored using the SigLIP model to evaluate text-to-SVG similarity. Only SVGs with a score above 0.5 are included in the dataset. On average, out of three SVG generations, only one meets the quality threshold after the cleaning, sanitization, and scoring process.

TREC 2022 Deep Learning test collection
data.nist.gov
catalog.data.gov
Updated Mar 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ian Soboroff (2023). TREC 2022 Deep Learning test collection [Dataset]. http://doi.org/10.18434/mds2-2974
Explore at:
Unique identifier
https://doi.org/10.18434/mds2-2974, https://identifiers.org/ark:/88434/mds2-2974
Dataset updated
Mar 1, 2023
Dataset provided by
National Institute of Standards and Technologyhttp://www.nist.gov/
Authors
Ian Soboroff
License
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
Description
This is a test collection for passage and document retrieval, produced in the TREC 2023 Deep Learning track. The Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the tens of thousands, if not hundreds of thousands or more. This corresponds to real-world scenarios such as training based on click logs and training based on labels from shallow pools (such as the pooling in the TREC Million Query Track or the evaluation of search engines based on early precision). Certain machine learning based methods, such as methods based on deep learning are known to require very large datasets for training. Lack of such large scale datasets has been a limitation for developing such methods for common information retrieval tasks, such as document ranking. The Deep Learning Track organized in the previous years aimed at providing large scale datasets to TREC, and create a focused research effort with a rigorous blind evaluation of ranker for the passage ranking and document ranking tasks. Similar to the previous years, one of the main goals of the track in 2022 is to study what methods work best when a large amount of training data is available. For example, do the same methods that work on small data also work on large data? How much do methods improve when given more training data? What external data and models can be brought in to bear in this scenario, and how useful is it to combine full supervision with other forms of supervision? The collection contains 12 million web pages, 138 million passages from those web pages, search queries, and relevance judgments for the queries.
Data from: Leveraging Supervised Machine Learning Algorithms for System...
acs.figshare.com
zip
Updated Sep 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Russell R. Kibbe; Alexandria L. Sohn; David C. Muddiman (2024). Leveraging Supervised Machine Learning Algorithms for System Suitability Testing of Mass Spectrometry Imaging Platforms [Dataset]. http://doi.org/10.1021/acs.jproteome.4c00360.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jproteome.4c00360.s001
Dataset updated
Sep 3, 2024
Dataset provided by
ACS Publications
Authors
Russell R. Kibbe; Alexandria L. Sohn; David C. Muddiman
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Quality control and system suitability testing are vital protocols implemented to ensure the repeatability and reproducibility of data in mass spectrometry investigations. However, mass spectrometry imaging (MSI) analyses present added complexity since both chemical and spatial information are measured. Herein, we employ various machine learning algorithms and a novel quality control mixture to classify the working conditions of an MSI platform. Each algorithm was evaluated in terms of its performance on unseen data, validated with negative control data sets to rule out confounding variables or chance agreement, and utilized to determine the necessary sample size to achieve a high level of accurate classifications. In this work, a robust machine learning workflow was established where models could accurately classify the instrument condition as clean or compromised based on data metrics extracted from the analyzed quality control sample. This work highlights the power of machine learning to recognize complex patterns in MSI data and use those relationships to perform a system suitability test for MSI platforms.
t
FAIR Dataset for Disease Prediction in Healthcare Applications
test.researchdata.tuwien.ac.at
bin, csv, json, png
Updated Apr 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf (2025). FAIR Dataset for Disease Prediction in Healthcare Applications [Dataset]. http://doi.org/10.70124/5n77a-dnf02
Explore at:
csv, json, bin, pngAvailable download formats
Unique identifier
https://doi.org/10.70124/5n77a-dnf02
Dataset updated
Apr 14, 2025
Dataset provided by
TU Wien
Authors
Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf; Sufyan Yousaf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Description

Context and Methodology

Research Domain/Project:
This dataset was created for a machine learning experiment aimed at developing a classification model to predict outcomes based on a set of features. The primary research domain is disease prediction in patients. The dataset was used in the context of training, validating, and testing.

Purpose of the Dataset:
The purpose of this dataset is to provide training, validation, and testing data for the development of machine learning models. It includes labeled examples that help train classifiers to recognize patterns in the data and make predictions.

Dataset Creation:
Data preprocessing steps involved cleaning, normalization, and splitting the data into training, validation, and test sets. The data was carefully curated to ensure its quality and relevance to the problem at hand. For any missing values or outliers, appropriate handling techniques were applied (e.g., imputation, removal, etc.).

Technical Details

Structure of the Dataset:
The dataset consists of several files organized into folders by data type:

Training Data: Contains the training dataset used to train the machine learning model.

Validation Data: Used for hyperparameter tuning and model selection.

Test Data: Reserved for final model evaluation.

Each folder contains files with consistent naming conventions for easy navigation, such as train_data.csv, validation_data.csv, and test_data.csv. Each file follows a tabular format with columns representing features and rows representing individual data points.

Software Requirements:
To open and work with this dataset, you need VS Code or Jupyter, which could include tools like:

Python (with libraries such as pandas, numpy, scikit-learn, matplotlib, etc.)

Further Details

Reusability:
Users of this dataset should be aware that it is designed for machine learning experiments involving classification tasks. The dataset is already split into training, validation, and test subsets. Any model trained with this dataset should be evaluated using the test set to ensure proper validation.

Limitations:
The dataset may not cover all edge cases, and it might have biases depending on the selection of data sources. It's important to consider these limitations when generalizing model results to real-world applications.
Z
DCASE 2024 Challenge Task 2 Additional Training Dataset
data.niaid.nih.gov
Updated May 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harsh, Purohit (2024). DCASE 2024 Challenge Task 2 Additional Training Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11183283
Explore at:
Dataset updated
May 15, 2024
Dataset provided by
Noboru, Harada
Sannino, Roberto
Yohei, Kawaguchi
Kota, Dohi
Albertini, Davide
Takashi, Endo
Daisuke, Niizumi
Tomoya, Nishida
Keisuke, Imoto
Pradolini, Simone
Harsh, Purohit
Augusti, Filippo
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

This dataset is the "additional training dataset" for the DCASE 2024 Challenge Task 2.

The data consists of the normal/anomalous operating sounds of nine types of real/toy machines. Each recording is a single-channel audio that includes both a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 10 seconds. The following nine types of real/toy machines are used in this task:

3DPrinter

AirCompressor

BrushlessMotor

HairDryer

HoveringDrone

RoboticArm

Scanner

ToothBrush

ToyCircuit

Overview of the task

Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.

This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.

Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.

Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.

Train a model for a completely new machine typeFor a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.

Train a model using a limited number of machines from its machine typeWhile sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.

5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.

The last requirement is newly introduced in DCASE 2024 Task2.

Definition

We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".

"Machine type" indicates the type of machine, which in the additional training dataset is one of nine: 3D-printer, air compressor, brushless motor, hair dryer, hovering drone, robotic arm, document scanner (scanner), toothbrush, and Toy circuit.

A section is defined as a subset of the dataset for calculating performance metrics.

The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.

Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.

Dataset

This dataset consists of nine machine types. For each machine type, one section is provided, and the section is a complete set of training data. A set of test data corresponding to this training data will be provided in another seperate zenodo page as an "evaluation dataset" for the DCASE 2024 Challenge task 2. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in the target domain for training. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.

File names and attribute csv files

File names and attribute csv files provide reference labels for each clip. The given reference labels for each training clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:

[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...

For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.

Recording procedure

Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.

Directory structure

/eval_data

/raw - /3DPrinter - /train (only normal clips) - /section_00_source_train_normal_0001_.wav - ... - /section_00_source_train_normal_0990_.wav - /section_00_target_train_normal_0001_.wav - ... - /section_00_target_train_normal_0010_.wav - attributes_00.csv (attribute csv for section 00) - /AirCompressor (The other machine types have the same directory structure as 3DPrinter.) - /BrushlessMotor - /HairDryer - /HoveringDrone - /RoboticArm - /Scanner - /ToothBrush - /ToyCircuit

Baseline system

The baseline system is available on the Github repository . The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.

Condition of use

This dataset was created jointly by Hitachi, Ltd., NTT Corporation and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

Citation

Contact

If there is any problem, please contact us:

Tomoya Nishida, tomoya.nishida.ax@hitachi.com

Keisuke Imoto, keisuke.imoto@ieee.org

Noboru Harada, noboru@ieee.org

Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp

Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
4
Train, validation, test data sets and confusion matrices underlying...
data.4tu.nl
zip
Updated Sep 7, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen (2023). Train, validation, test data sets and confusion matrices underlying publication: "Automated cell counting for Trypan blue stained cell cultures using machine learning" [Dataset]. http://doi.org/10.4121/21695819.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/21695819.v1
Dataset updated
Sep 7, 2023
Dataset provided by
4TU.ResearchData
Authors
Louis Kuijpers; Nynke Dekker; Belen Solano Hermosilla; Edo van Veen
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Annotated test and train data sets. Both images and annotations are provided separately.

Validation data set for Hi5, Sf9 and HEK cells.

Confusion matrices for the determination of performance parameters
Dollar street 10 - 64x64x3
zenodo.org
data.niaid.nih.gov
bin
Updated May 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sven van der burg; Sven van der burg (2025). Dollar street 10 - 64x64x3 [Dataset]. http://doi.org/10.5281/zenodo.10970014
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10970014
Dataset updated
May 6, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sven van der burg; Sven van der burg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The MLCommons Dollar Street Dataset is a collection of images of everyday household items from homes around the world that visually captures socioeconomic diversity of traditionally underrepresented populations. It consists of public domain data, licensed for academic, commercial and non-commercial usage, under CC-BY and CC-BY-SA 4.0. The dataset was developed because similar datasets lack socioeconomic metadata and are not representative of global diversity.

This is a subset of the original dataset that can be used for multiclass classification with 10 categories. It is designed to be used in teaching, similar to the widely used, but unlicensed CIFAR-10 dataset.

These are the preprocessing steps that were performed:

Only take examples with one imagenet_synonym label

Use only examples with the 10 most frequently occuring labels

Downscale images to 64 x 64 pixels

Split data in train and test

Store as numpy array

This is the label mapping:

Category label
day bed 0
dishrag 1
plate 2
running shoe 3
soap dispenser 4
street sign 5
table lamp 6
tile roof 7
toilet seat 8
washing machine 9

Checkout https://github.com/carpentries-lab/deep-learning-intro/blob/main/instructors/prepare-dollar-street-data.ipynb" target="_blank" rel="noopener">this notebook to see how the subset was created.

The original dataset was downloaded from https://www.kaggle.com/datasets/mlcommons/the-dollar-street-dataset. See https://mlcommons.org/datasets/dollar-street/ for more information.
Z
CARLA Simulation Datasets for Training, Validation, and Test Data of the...
data.niaid.nih.gov
Updated Jan 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaikh, Hamdaan Asif (2024). CARLA Simulation Datasets for Training, Validation, and Test Data of the project "Out-Of-Domain Data Detection using Uncertainty Quantification in End-to-End Driving Algorithms" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10511420
Explore at:
Dataset updated
Jan 15, 2024
Dataset authored and provided by
Shaikh, Hamdaan Asif
Description
These are CARLA Simulation Datasets of the project "Out-Of-Domain Data Detection using Uncertainty Quantification in End-to-End Driving Algorithms". The simulations are generated in CARLA Town 02 for different sun angles (in degrees). You will find image frames, command labels, and steering control values in the respective 'xxxx_files_data' folder. You will find videos of each simulation run in the 'xxxx_files_visualizations' folder.

The 8 simulation runs for Training Data, are with the Sun Angles : 90, 80, 70, 60, 50, 40, 30, 20

The 8 simulation runs for Training Data were seeded at 0000, 1000, 2000, 3000, 4000, 5000, 6000, 7000 respectively

The 4 simulation runs for Validation Data, are with the Sun Angles : 87, 67, 47, 23

The 4 simulation runs for Validation Data were seeded at 0000, 2000, 4000, 7000 respectively

The 29 simulation runs for Testing Data, are with the Sun Angles : 85, 75, 65, 55, 45, 35, 25, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 09, 08, 07, 06, 05, 04, 03, 02, 01, 00, -1, -10

The 29 simulation runs for Testing Data were all seeded at 5000 respectively
m
LOCBEEF: BEEF Quality Image dataset for Deep Learning Models
data.mendeley.com
Updated Nov 30, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tri Mulya Dharma (2022). LOCBEEF: BEEF Quality Image dataset for Deep Learning Models [Dataset]. http://doi.org/10.17632/7b67ynzr6k.1
Explore at:
Unique identifier
https://doi.org/10.17632/7b67ynzr6k.1
Dataset updated
Nov 30, 2022
Authors
Tri Mulya Dharma
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The LOCBEEF dataset contains 3268 images of local Aceh beef collected from 07:00 a.m - 22:00 p.m, more information about the clock is shown in Figure. The dataset contains two categories of directories, namely train, and test. Furthermore, each subdirectory consists of fresh and rotten. The image directory for train contains 2228 images each subdirectory contains 1114 images, and the test directory contains 980 images for each subdirectory containing 490 images. For images have a resolution of 176 x 144 pixel, 320 x 240 pixel, 640 x 480 pixel, 720 x 480 pixel, 720 x 720 pixel, 1280 x 720 pixel, 1920 x 1080 pixel, 2560 x 1920 pixel, 3120 x 3120 pixel, 3264 x 2248 pixel, and 4160 x 3120 pixel.

The meat image is placed on a plate with a beef size of ± 10x10 cm, then taken using a smartphone that has an open camera application installed to get a different resolution. The smartphone uses a tripod with a distance to the meat of 20 cm at room temperature between 28-30 °Celsius and lighting using lamps.

The classification of LOCBEEF datasets has been carried out using the deep learning method of Convolutional Neural Networks with an image composition of 70% training data and 30% test data. Images with the mentioned dimensions are included in the LOCBEEF dataset to apply to the ResNet-50.

Test Data Management Market Analysis, Size, and Forecast 2025-2029: North...

technavio.com

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio, Test Data Management Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (Australia, China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/test-data-management-market-industry-analysis

Explore at:

Dataset provided by

TechNavio

Authors

Technavio

Time period covered

2021 - 2025

Area covered

United States, Global

Description

Snapshot img

Test Data Management Market Size 2025-2029

The test data management market size is forecast to increase by USD 727.3 million at a CAGR of 10.5% between 2024 and 2029.

The market is experiencing significant growth, driven by the increasing adoption of automation by enterprises and the rising consumer spending on technological solutions. Automation in testing is becoming a priority for businesses seeking to improve efficiency and reduce costs. However, the lack of awareness and standardization in Test Data Management presents challenges to market growth. Regulatory hurdles, such as data privacy laws and compliance requirements, also impact adoption. Furthermore, supply chain inconsistencies and the complexity of managing test data across multiple systems and applications can temper growth potential. To capitalize on market opportunities and navigate challenges effectively, companies must invest in education and training to increase awareness and standardization. The future of TDM lies in the integration of advanced technologies such as Artificial Intelligence and Machine Learning to enhance simulation capabilities.
Implementing robust data management strategies and leveraging advanced technologies, such as artificial intelligence and machine learning, can help address regulatory requirements and supply chain inconsistencies. By focusing on these areas, Test Data Management solution providers can help businesses streamline their testing processes, improve data quality, and ultimately drive innovation and growth.

What will be the Size of the Test Data Management Market during the forecast period?

Request Free Sample

The market is experiencing significant growth as businesses prioritize data accuracy, consistency, and security in their operations. Data lifecycle automation plays a crucial role in managing the various stages of data, from creation to retirement. Data quality metrics, such as test data accuracy and validity, are essential for ensuring data consistency and reliability. Data anonymization methods and masking techniques are used to protect sensitive data during testing, aligning with data security solutions and compliance requirements. Data analytics platforms facilitate data volume management and data visualization, enabling businesses to gain valuable insights from their data. Data modeling tools and data provisioning automation streamline the process of creating test environments, while cloud data storage offers scalability and flexibility. Technology advancements, such as component-based development, the Internet of Things (IoT), machine learning, and artificial intelligence, necessitate efficient simulation file management.
Synthetic data generation is an emerging trend, providing a cost-effective and secure alternative to using real data for testing. Data compliance solutions and data governance tools ensure adherence to regulations and best practices, while data migration and data modernization initiatives enable businesses to leverage new technologies and improve data integrity. Overall, the market is dynamic and evolving, with a focus on automation, security, and data-driven insights. New technologies like advanced data analytics, digital twin simulations, and maintenance efficiency contribute to business process streamlining.

How is this Test Data Management Industry segmented?

The test data management industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

Application

  On-premises
  Cloud-based


Component

  Solutions
  Services


End-user

  Information technology
  Telecom
  BFSI
  Healthcare and life sciences
  Others


Sector

  Large enterprise
  SMEs


Geography

  North America

    US
    Canada


  Europe

    France
    Germany
    Italy
    UK


  APAC

    Australia
    China
    India
    Japan


  Rest of World (ROW)

By Application Insights

The on-premises segment is estimated to witness significant growth during the forecast period. In the realm of Test Data Management, on-premises testing continues to hold significance. This approach entails establishing and managing testing infrastructure within an office or a physical data center. The advantage of on-premises testing lies in the control it offers over the testing process and infrastructure. Testers can configure hardware and software according to their needs without relying on external entities. Moreover, with on-premises testing, data privacy is ensured as the data remains within the organization's premises. Machine learning algorithms play a crucial role in test data generation, ensuring data quality and reducing manual efforts. Functional testing and integration testing are critical components of the Software Development Lifecycle (SDLC

Test Data Generation Tools Market Report | Global Forecast From 2025 To 2033...
dataintelo.com
csv, pdf, pptx
Updated Jan 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataintelo (2025). Test Data Generation Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/global-test-data-generation-tools-market
Explore at:
csv, pptx, pdfAvailable download formats
Dataset updated
Jan 7, 2025
Dataset authored and provided by
Dataintelo
License
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
Time period covered
2024 - 2032
Area covered
Global
Description
Test Data Generation Tools Market Outlook

The global market size for Test Data Generation Tools was valued at USD 800 million in 2023 and is projected to reach USD 2.2 billion by 2032, growing at a CAGR of 12.1% during the forecast period. The surge in the adoption of agile and DevOps practices, along with the increasing complexity of software applications, is driving the growth of this market.

One of the primary growth factors for the Test Data Generation Tools market is the increasing need for high-quality test data in software development. As businesses shift towards more agile and DevOps methodologies, the demand for automated and efficient test data generation solutions has surged. These tools help in reducing the time required for test data creation, thereby accelerating the overall software development lifecycle. Additionally, the rise in digital transformation across various industries has necessitated the need for robust testing frameworks, further propelling the market growth.

The proliferation of big data and the growing emphasis on data privacy and security are also significant contributors to market expansion. With the introduction of stringent regulations like GDPR and CCPA, organizations are compelled to ensure that their test data is compliant with these laws. Test Data Generation Tools that offer features like data masking and data subsetting are increasingly being adopted to address these compliance requirements. Furthermore, the increasing instances of data breaches have underscored the importance of using synthetic data for testing purposes, thereby driving the demand for these tools.

Another critical growth factor is the technological advancements in artificial intelligence and machine learning. These technologies have revolutionized the field of test data generation by enabling the creation of more realistic and comprehensive test data sets. Machine learning algorithms can analyze large datasets to generate synthetic data that closely mimics real-world data, thus enhancing the effectiveness of software testing. This aspect has made AI and ML-powered test data generation tools highly sought after in the market.

Regional outlook for the Test Data Generation Tools market shows promising growth across various regions. North America is expected to hold the largest market share due to the early adoption of advanced technologies and the presence of major software companies. Europe is also anticipated to witness significant growth owing to strict regulatory requirements and increased focus on data security. The Asia Pacific region is projected to grow at the highest CAGR, driven by rapid industrialization and the growing IT sector in countries like India and China.

Synthetic Data Generation has emerged as a pivotal component in the realm of test data generation tools. This process involves creating artificial data that closely resembles real-world data, without compromising on privacy or security. The ability to generate synthetic data is particularly beneficial in scenarios where access to real data is restricted due to privacy concerns or regulatory constraints. By leveraging synthetic data, organizations can perform comprehensive testing without the risk of exposing sensitive information. This not only ensures compliance with data protection regulations but also enhances the overall quality and reliability of software applications. As the demand for privacy-compliant testing solutions grows, synthetic data generation is becoming an indispensable tool in the software development lifecycle.

Component Analysis

The Test Data Generation Tools market is segmented into software and services. The software segment is expected to dominate the market throughout the forecast period. This dominance can be attributed to the increasing adoption of automated testing tools and the growing need for robust test data management solutions. Software tools offer a wide range of functionalities, including data profiling, data masking, and data subsetting, which are essential for effective software testing. The continuous advancements in software capabilities also contribute to the growth of this segment.

In contrast, the services segment, although smaller in market share, is expected to grow at a substantial rate. Services include consulting, implementation, and support services, which are crucial for the successful deployment and management of test data generation tools. The increasing complexity of IT inf
Training, test data and model parameters.
plos.figshare.com
figshare.com
xls
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund (2023). Training, test data and model parameters. [Dataset]. http://doi.org/10.1371/journal.pone.0077302.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0077302.t001
Dataset updated
Jun 3, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Salvatore Cosentino; Mette Voldby Larsen; Frank Møller Aarestrup; Ole Lund
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Training, test data and model parameters. The last 3 columns show the MinORG, LT and HT parameters used to create the pathogenicity families and build the model for each of the 10 models. Zthr is a threshold value, calculated for each model at the cross validation phase, which is used, given the final prediction score, to decide if the input organisms will be predicted as pathogenic or non-pathogenic. The parameters for each model are chosen after 5-fold cross-validation tests.
Data from: Evaluating the Use of Graph Neural Networks and Transfer Learning...
acs.figshare.com
bin
Updated Aug 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sherwin S. S. Ng; Yunpeng Lu (2024). Evaluating the Use of Graph Neural Networks and Transfer Learning for Oral Bioavailability Prediction [Dataset]. http://doi.org/10.1021/acs.jcim.3c00554.s002
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.jcim.3c00554.s002
Dataset updated
Aug 15, 2024
Dataset provided by
ACS Publications
Authors
Sherwin S. S. Ng; Yunpeng Lu
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Oral bioavailability is a pharmacokinetic property that plays an important role in drug discovery. Recently developed computational models involve the use of molecular descriptors, fingerprints, and conventional machine-learning models. However, determining the type of molecular descriptors requires domain expert knowledge and time for feature selection. With the emergence of the graph neural network (GNN), models can be trained to automatically extract features that they deem important. In this article, we exploited the automatic feature selection of GNN to predict oral bioavailability. To enhance the prediction performance of GNN, we utilized transfer learning by pre-training a model to predict solubility and obtained a final average accuracy of 0.797, an F1 score of 0.840, and an AUC-ROC of 0.867, which outperformed previous studies on predicting oral bioavailability with the same test data set.
Dataset for Cost-effective Simulation-based Test Selection in Self-driving...
zenodo.org
data.niaid.nih.gov
pdf, zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella; Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella (2024). Dataset for Cost-effective Simulation-based Test Selection in Self-driving Cars Software with SDC-Scissor [Dataset]. http://doi.org/10.5281/zenodo.5914130
Explore at:
zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5914130
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella; Christian Birchler; Nicolas Ganz; Sajad Khatiri; Alessio Gambi; Sebastiano Panichella
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SDC-Scissor tool for Cost-effective Simulation-based Test Selection in Self-driving Cars Software

This dataset provides test cases for self-driving cars with the BeamNG simulator. Check out the repository and demo video to get started.

GitHub: github.com/ChristianBirchler/sdc-scissor

This project extends the tool competition platform from the Cyber-Phisical Systems Testing Competition which was part of the SBST Workshop in 2021.

Usage

Demo

YouTube Link

Installation

The tool can either be run with Docker or locally using Poetry.

When running the simulations a working installation of BeamNG.research is required. Additionally, this simulation cannot be run in a Docker container but must run locally.

To install the application use one of the following approaches:

Docker: docker build --tag sdc-scissor .

Poetry: poetry install

Using the Tool

The tool can be used with the following two commands:

Docker: docker run --volume "$(pwd)/results:/out" --rm sdc-scissor [COMMAND] [OPTIONS] (this will write all files written to /out to the local folder results)

Poetry: poetry run python sdc-scissor.py [COMMAND] [OPTIONS]

There are multiple commands to use. For simplifying the documentation only the command and their options are described.

Generation of tests:

generate-tests --out-path /path/to/store/tests

Automated labeling of Tests:

label-tests --road-scenarios /path/to/tests --result-folder /path/to/store/labeled/tests

Note: This only works locally with BeamNG.research installed

Model evaluation:

evaluate-models --dataset /path/to/train/set --save

Split train and test data:

split-train-test-data --scenarios /path/to/scenarios --train-dir /path/for/train/data --test-dir /path/for/test/data --train-ratio 0.8

Test outcome prediction:

predict-tests --scenarios /path/to/scenarios --classifier /path/to/model.joblib

Evaluation based on random strategy:

evaluate --scenarios /path/to/test/scenarios --classifier /path/to/model.joblib

The possible parameters are always documented with --help.

Linting

The tool is verified the linters flake8 and pylint. These are automatically enabled in Visual Studio Code and can be run manually with the following commands:

poetry run flake8 . poetry run pylint **/*.py

License

The software we developed is distributed under GNU GPL license. See the LICENSE.md file.

Contacts

Christian Birchler - Zurich University of Applied Science (ZHAW), Switzerland - birc@zhaw.ch

Nicolas Ganz - Zurich University of Applied Science (ZHAW), Switzerland - gann@zhaw.ch

Sajad Khatiri - Zurich University of Applied Science (ZHAW), Switzerland - mazr@zhaw.ch

Dr. Alessio Gambi - Passau University, Germany - alessio.gambi@uni-passau.de

Dr. Sebastiano Panichella - Zurich University of Applied Science (ZHAW), Switzerland - panc@zhaw.ch

References

Christian Birchler, Nicolas Ganz, Sajad Khatiri, Alessio Gambi, and Sebastiano Panichella. 2022. Cost-effective Simulation-based Test Selection in Self-driving Cars Software with SDC-Scissor. In 2022 IEEE 29th International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE.

If you use this tool in your research, please cite the following papers:

@INPROCEEDINGS{Birchler2022, author={Birchler, Christian and Ganz, Nicolas and Khatiri, Sajad and Gambi, Alessio, and Panichella, Sebastiano}, booktitle={2022 IEEE 29th International Conference on Software Analysis, Evolution and Reengineering (SANER), title={Cost-effective Simulationbased Test Selection in Self-driving Cars Software with SDC-Scissor}, year={2022}, }

Facebook

Twitter

Click to copy link

Link copied

Cite

Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE) [Dataset]. https://gimi9.com/dataset/data-gov_485517587d70c5aee9050558fc1578749f6351e4/

Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE)

Explore at:

Description

This dataset includes evaluation data ("test" data) and performance metrics for water temperature predictions from multiple modeling frameworks. Process-Based (PB) models were configured and calibrated with training data to reduce root-mean squared error. Uncalibrated models used default configurations (PB0; see Winslow et al. 2016 for details) and no parameters were adjusted according to model fit with observations. Deep Learning (DL) models were Long Short-Term Memory artificial recurrent neural network models which used training data to adjust model structure and weights for temperature predictions (Jia et al. 2019). Process-Guided Deep Learning (PGDL) models were DL models with an added physical constraint for energy conservation as a loss term. These models were pre-trained with uncalibrated Process-Based model outputs (PB0) before training on actual temperature observations. Performance was measured as root-mean squared errors relative to temperature observations during the test period. Test data include compiled water temperature data from a variety of sources, including the Water Quality Portal (Read et al. 2017), the North Temperate Lakes Long-TERM Ecological Research Program (https://lter.limnology.wisc.edu/), the Minnesota department of Natural Resources, and the Global Lake Ecological Observatory Network (gleon.org). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).

Clear search

Close search

Google apps

Main menu

Category	label
day bed	0
dishrag	1
plate	2
running shoe	3
soap dispenser	4
street sign	5
table lamp	6
tile roof	7
toilet seat	8
washing machine	9

Process-guided deep learning water temperature predictions: 6 Model...

Challenge Round 0 (Dry Run) Test Dataset

Training data and test data sets for simultaneous inversion of velocity...

Predictive modeling of treatment resistant depression using data from STAR*D...

Dataset, splits, models, and scripts for the QM descriptors prediction

deepvl-training-data

SVG Code Generation Sample Training Data

TREC 2022 Deep Learning test collection

Data from: Leveraging Supervised Machine Learning Algorithms for System...

FAIR Dataset for Disease Prediction in Healthcare Applications

Dataset Description

Context and Methodology

Technical Details

Further Details

DCASE 2024 Challenge Task 2 Additional Training Dataset

Train, validation, test data sets and confusion matrices underlying...

Dollar street 10 - 64x64x3

CARLA Simulation Datasets for Training, Validation, and Test Data of the...

LOCBEEF: BEEF Quality Image dataset for Deep Learning Models

Test Data Management Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

Test Data Generation Tools Market Report | Global Forecast From 2025 To 2033...

Test Data Generation Tools Market Outlook

Component Analysis

Training, test data and model parameters.

Data from: Evaluating the Use of Graph Neural Networks and Transfer Learning...

Dataset for Cost-effective Simulation-based Test Selection in Self-driving...

Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE)See More Versions

Process-guided deep learning water temperature predictions: 6 Model evaluation (test data and RMSE)