Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.
This data package contains information on Structured Product Labeling (SPL) Terminology for SPL validation procedures and information on performing SPL validations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary of Classifiers, Features, Validation Techniques and Sample Sizes used in this study.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation, dissimilarity quantification method, and dissimilarity-adaptive cross-validation.
One of NASA’s key mission requirements is robust state estimation. Sensing, using a wide range of sensors and sensor fusion approaches, plays a central role in robust state estimation, and there is a need to diagnose sensor failure as well as component failure. Sensor validation techniques address this problem: given a vector of sensor readings, decide whether sensors have failed, therefore producing bad data. We take in this paper a probabilistic approach, using Bayesian networks, to diagnosis and sensor validation, and investigate several relevant but slightly different Bayesian network queries. We emphasize that on-board inference can be performed on a compiled model, giving fast and predictable execution times. Our results are illustrated using an electrical power system, and we show that a Bayesian network with over 400 nodes can be compiled into an arithmetic circuit that can correctly answer queries in less than 500 microseconds on average. Reference: O. J. Mengshoel, A. Darwiche, and S. Uckun, "Sensor Validation using Bayesian Networks." In Proc. of the 9th International Symposium on Artificial Intelligence, Robotics, and Automation in Space (iSAIRAS-08), Los Angeles, CA, 2008. BibTex Reference: @inproceedings{mengshoel08sensor, author = {Mengshoel, O. J. and Darwiche, A. and Uckun, S.}, title = {Sensor Validation using {Bayesian} Networks}, booktitle = {Proceedings of the 9th International Symposium on Artificial Intelligence, Robotics, and Automation in Space (iSAIRAS-08)}, year = {2008} }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary materials, including R scripts, data files, figures, and documentation for the agent-based model validation framework presented in the article. README.md includes a detailed description.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data relating to the thesis: Constraining Microbial Community Composition, Structure, and Function Within Acidic Geothermal Springs on São Miguel Island
This is a set of proteins used to validate the HMM models constructed in the thesis.
These proteins are known to be actively involved in carbon monoxide oxidation.
No description was included in this Dataset collected from the OSF
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used in training, particularly the sequence of flood and drought events. This technical note develops an extension of bootstrap aggregation (bagging) and cross-validation techniques, inspired by the machine learning literature, to improve control policy performance on out-of-sample hydrology. We explore these methods using a case study of Folsom Reservoir, California using control policies structured as binary trees and daily streamflow resampling based on the paleo-inflow record. Results show that calibration-validation strategies for policy selection and certain ensemble aggregation methods can improve out-of-sample tradeoffs between water supply and flood risk objectives over baseline performance given fixed computational costs. These results highlight the potential to improve policy search methodologies by leveraging well-established model training strategies from machine learning.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global thermal validation system market is experiencing robust growth, driven by increasing regulatory scrutiny across pharmaceutical, biotechnology, and food processing industries. Stringent quality control standards and the need for accurate temperature monitoring throughout the manufacturing and storage processes are key factors fueling market expansion. The market is segmented by system type (e.g., autoclaves, ovens, incubators), application (pharmaceutical, food & beverage, etc.), and end-user (contract research organizations, pharmaceutical manufacturers, etc.). Technological advancements, such as the integration of IoT sensors and cloud-based data analysis, are enhancing the capabilities of thermal validation systems, leading to improved efficiency and data management. Furthermore, the rising demand for sophisticated validation techniques to comply with international regulations like GMP and FDA guidelines is further bolstering market growth. We estimate the 2025 market size to be approximately $850 million, growing at a Compound Annual Growth Rate (CAGR) of 7% from 2025 to 2033. This growth reflects the increasing adoption of advanced technologies and the expanding regulatory landscape in key regions like North America and Europe. Competition in the thermal validation system market is intense, with several established players and emerging companies vying for market share. Key players like Kaye, Ellab, and Thermo Fisher Scientific are leveraging their strong brand reputation and technological expertise to maintain market leadership. However, smaller, specialized firms are also gaining traction by offering niche solutions and innovative technologies. The market is expected to witness further consolidation in the coming years, with strategic acquisitions and partnerships playing a crucial role in shaping the competitive landscape. Geographic expansion, particularly in emerging markets in Asia-Pacific and Latin America, represents a significant growth opportunity for market participants. The restraints to growth include the high initial investment cost associated with implementing thermal validation systems and the need for skilled personnel to operate and maintain these systems.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The package contains files for two modules designed to improve the accuracy of the indoor positioning system, namely the following:
door detection
videos_test - videos used to demonstrate the application of door detector
videos_res - videos from videos_test directory with detected doors marked
parts detection
frames_train_val - images generated from videos used for training and validation of VGG16 neural network model
frames_test - images generated from videos used for testing of the trained model
videos_test - videos used to demonstrate the application of parts detector
videos_res - videos from videos_test directory with detected parts marked
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
The abundance of individuals in a population is a fundamental metric in basic and applied ecology, but sampling protocols yielding precise and unbiased estimates of abundance are often cost prohibitive. Proxies of abundance are therefore common, but require calibration and validation. There are many ways to calibrate a proxy, and it is not obvious which will perform best. We use data from eight populations of Chinook salmon (Oncorhynchus tshawytscha) on the Oregon coast where multiple proxies of abundance were obtained contemporaneously with independent mark-recapture estimates. We combined multiple proxy values associated with a single level of abundance into a univariate index and then calibrated that index to mark-recapture estimates using several different techniques. We tested our calibration methods using leave-one-out cross validation and simulation. Our cross-validation analysis did not definitively identify a single best calibration technique for all populations, but we could identify consistently inferior approaches. The simulations suggested that incorporating the known mark-recapture uncertainty into the calibration technique added bias and imprecision. Cross validation techniques should be used to test multiple methods of calibrating multiple proxies to an estimate of abundance. Critical uncertainties with the application of calibrated proxies still exist, and cost-benefit analysis should be performed to help identify optimal monitoring designs.
The GPM Ground Validation NOAA CPC Morphing Technique (CMORPH) IFloodS dataset consists of global precipitation analyses data produced by the NOAA Climate Prediction Center (CPC). The Iowa Flood Studies (IFloodS) campaign was a ground measurement campaign that took place in eastern Iowa from May 1 to June 15, 2013. The goals of the campaign were to collect detailed measurements of precipitation at the Earth'ssurface using ground instruments and advanced weather radars and, simultaneously, collect data from satellites passing overhead. The CPC morphing technique uses precipitation estimates from low orbiter satellite microwave observations to produce global precipitation analyses at a high temporal and spatial resolution. Data has been selected for the Iowa Flood Studies (IFloodS) field campaign which took place from April 1, 2013 to June 30, 2013. The dataset includes both the near real-time raw data and bias corrected data from NOAA in binary and netCDF format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China. Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework. The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05). Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264) This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China. This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections. We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is dataset for paper published:
G. D’Avanzo et al., "Theory and Experimental Validation of Two Techniques for Compensating VT Nonlinearities," in IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1-12, 2022, Art no. 9001312, doi: 10.1109/TIM.2022.3147883.
The GPM Ground Validation NOAA CPC Morphing Technique (CMORPH) IPHEx dataset consists of global precipitation analyses data produced by the NOAA Climate Prediction Center (CPC) during the Global Precipitation Mission (GPM) Integrated Precipitation and Hydrology Experiment (IPHEx) field campaign in North Carolina. The goal of IPHEx was to evaluate the accuracy of satellite precipitation measurements and use the collected data for hydrology models in the region. The CPC morphing technique uses precipitation estimates from low orbiter satellite microwave observations to produce global precipitation analyses at a high temporal and spatial resolution. CMORPH data has been selected from May 1, 2014 through June 14, 2014, during the IPHEx field campaign. These data files are available in raw binary and netCDF-4 file format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We aim to validate machine learning methods for accurate identification of in vitro research and the subsequent assessment of risk of bias reporting of these papers. In doing so, we will address the research question: Has the reporting of risk of bias measures relevant to in vitro experiments improved over time?
As fault diagnosis and prognosis systems in aerospace applications become more capable, the ability to utilize information supplied by them becomes increasingly important. While certain types of vehicle health data can be effectively processed and acted upon by crew or support personnel, others, due to their complexity or time constraints, require either automated or semi-automated reasoning. Prognostics-enabled Decision Making (PDM) is an emerging research area that aims to integrate prognostic health information and knowledge about the future operating conditions into the process of selecting subsequent actions for the system. The newly developed PDM algorithms require suitable software and hardware platforms for testing under realistic fault scenarios. The paper describes the development of such a platform, based on the K11 planetary rover prototype. A variety of injectable fault modes are being investigated for electrical, mechanical, and power subsystems of the testbed, along with methods for data collection and processing. In addition to the hardware platform, a software simulator with matching capabilities has been developed. The simulator allows for prototyping and initial validation of the algorithms prior to their deployment on the K11. The simulator is also available to the PDM algorithms to assist with the reasoning process. A reference set of diagnostic, prognostic, and decision making algorithms is also described, followed by an overview of the current test scenarios and the results of their execution on the simulator.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Data contains the PEN-Predictor-Keras-Model as well as the 100 validation data sets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.