Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China. Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework. The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05). Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264) This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China. This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections. We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.
This data package contains information on Structured Product Labeling (SPL) Terminology for SPL validation procedures and information on performing SPL validations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Data contains the PEN-Predictor-Keras-Model as well as the 100 validation data sets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains supplementary materials, including R scripts, data files, figures, and documentation for the agent-based model validation framework presented in the article. README.md includes a detailed description.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation, dissimilarity quantification method, and dissimilarity-adaptive cross-validation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data for method validation on detecting pmp-glucose by HPLC
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
copy directly from abstract in PSRM publicationAs the use of computational text analysis in the social sciences has increased, topic modeling has emerged as a popular method for identifying latent themes in textual data. Nevertheless, concerns have been raised regarding the validity of the results produced by this method, given that it is largely automated and inductive in nature, and the lack of clear guidelines for validating topic models has been identified by scholars as an area of concern. In response, we conducted a comprehensive systematic review of 789 studies that employ topic modeling. Our goal is to investigate whether the field is moving towards a common framework for validating these models. The findings of our review indicate a notable absence of standardized validation practices and a lack of convergence towards specific methods of validation. This gap may be attributed to the inherent incompatibility between the inductive, qualitative approach of topic modeling and the deductive, quantitative tradition that favors standardized validation. To address this, we advocate for incorporating qualitative validation approaches, emphasizing transparency and detailed reporting to improve the credibility of findings in computational social science research, when using topic modeling.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.
An overview of the current results of prognostics for DC- DC power converters is presented, focusing on the output filter capacitor component. The electrolytic capacitor used typically as fileter capacitor is one of the components of the power supply with higher failure rate, hence the effort in devel- oping component level prognostics methods for capacitors. An overview of prognostics algorithms based on electrical overstress and thermal overstress accelerated aging data is presented and a discussion on the current efforts in terms of validation of the algorithms is included. The focus of current and future work is to develop a methodology that allows for algoritm development using accelerated aging data and then transform that to a valid algorithm on the real usage time scale.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Validation extension for CKAN enhances data quality within the CKAN ecosystem by leveraging the Frictionless Framework to validate tabular data. This extension allows for automated data validation, generating comprehensive reports directly accessible within the CKAN interface. The validation process helps identify structural and schema-level issues, ensuring data consistency and reliability. Key Features: Automated Data Validation: Performs data validation automatically in the background or during dataset creation, streamlining the quality assurance process. Comprehensive Validation Reports: Generates detailed reports on data quality, highlighting issues such as missing headers, blank rows, incorrect data types, or values outside of defined ranges. Frictionless Framework Integration: Utilizes the Frictionless Framework library for robust and standardized data validation. Exposed Actions: Provides accessible action functions that allows data validation to be integrated into custom workflows from other CKAN extensions. Command Line Interface: Offers a command-line interface (CLI) to manually trigger validation jobs for specific datasets, resources, or based on search criteria. Reporting Utilities: Enables the generation of global reports summarizing validation statuses across all resources. Use Cases: Improve Data Quality: Ensures data integrity and adherence to defined schemas, leading to better data-driven decision-making. Streamline Data Workflows: Integrates validation as part of data creation or update processes, automating quality checks and saving time. Customize Data Validation Rules: Allows developers to extend the validation process with their own custom workflows and integrations using the exposed actions. Technical Integration: The Validation extension integrates deeply within CKAN by providing new action functions (resourcevalidationrun, resourcevalidationshow, resourcevalidationdelete, resourcevalidationrunbatch) that can be called via the CKAN API. It also includes a plugin interface (IPipeValidation) for more advanced customization, which allows other extensions to receive and process validation reports. Users can utilize the command-line interface to trigger validation jobs and generate overview reports. Benefits & Impact: By implementing the Validation extension, CKAN installations can significantly improve the quality and reliability of their data. This leads to increased trust in the data, better data governance, and reduced errors in downstream applications that rely on the data. Automated validation helps to proactively identify and resolve data issues, contributing to a more efficient data management process.
An overview of the current results of prognostics for DC- DC power converters is presented, focusing on the output filter capacitor component. The electrolytic capacitor used typically as fileter capacitor is one of the components of the power supply with higher failure rate, hence the effort in devel- oping component level prognostics methods for capacitors. An overview of prognostics algorithms based on electrical overstress and thermal overstress accelerated aging data is presented and a discussion on the current efforts in terms of validation of the algorithms is included. The focus of current and future work is to develop a methodology that allows for algoritm development using accelerated aging data and then transform that to a valid algorithm on the real usage time scale.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The package contains files for two modules designed to improve the accuracy of the indoor positioning system, namely the following:
door detection
videos_test - videos used to demonstrate the application of door detector
videos_res - videos from videos_test directory with detected doors marked
parts detection
frames_train_val - images generated from videos used for training and validation of VGG16 neural network model
frames_test - images generated from videos used for testing of the trained model
videos_test - videos used to demonstrate the application of parts detector
videos_res - videos from videos_test directory with detected parts marked
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a synthetic smart card data set that can be used to test pattern detection methods for the extraction of temporal and spatial data. The data set is tab seperated and based on a stylized travel pattern description for city of Utrecht in The Netherlands and is developed and used in Chapter 6 of the PhD Thesis of Paul Bouman.
This dataset contains the following files:
journeys.tsv : the actual data set of synthetic smart card data
utrecht.xml : the activity pattern definition that was used to randomly generate the synthethic smart card data
validate.ref : a file derived from the activity pattern definition that can be used for validation purposes. It specifies which activity types occur at each location in the smart card data set.
The objective of the fourth Technical Meeting on Fusion Data Processing, Validation and Analysis was to provide a platform during which a set of topics relevant to fusion data processing, validation and analysis are discussed with the view of extrapolating needs to next step fusion devices such as ITER. The validation and analysis of experimental data obtained from diagnostics used to characterize fusion plasmas are crucial for a knowledge-based understanding of the physical processes governing the dynamics of these plasmas. This paper presents the recent progress and achievements in the domain of plasma diagnostics and synthetic diagnostics data analysis (including image processing, regression analysis, inverse problems, deep learning, machine learning, big data and physics-based models for control) reported at the meeting. The progress in these areas highlight trends observed in current major fusion confinement devices. A special focus is dedicated on data analysis requirements for ITER and DEMO with a particular attention paid to Artificial Intelligence for automatization and improving reliability of control processes.
This dataset contains data from household surveys conducted with women in Argentina, Ghana, and India to validate the construct of "Demand for family planning satisfied through modern methods of contraception." Metadata Demand for family planning satisfied through modern methods of contraception Definition Percentage of women of reproductive age (15−49 years) who have their need for family planning satisfied with modern methods Numerator: Number of women of reproductive age (15 – 49 years) who have their need for family planning satisfied with modern methods Denominator: Total number of women of reproductive age (15–49 years) in need of family planning Disaggregator(s) • Wealth • Age • Education • Residence Data Source • MICS • DHS • RHS • Other national surveys Indicator Reference Countdown to 2030 Construct for Validation Women’s self-identified satisfaction of demand for family planning through a modern method of contraception; search for convergent validity comparing women’s subjective perception of satisfaction with family planning method, with an estimation of the concept derived via a constructed measure. Validation Question(s) 1. How does a direct measure of demand satisfaction for family planning (woman’s self-report) compare to the assigned result provided by the DHS algorithm derived from the responses to the series of questions used to calculate the indicator (same woman surveyed) (construct validity)? 2. How does the value of the indicator vary based on a new data source/estimation method compared to an established source/method? II. Study Aims This study aims to validate the DHS algorithm used to determine demand for family planning satisfied by comparing the results of the derived measure for a sample of women to the gold standard of those women’s own subjective perceptions as to whether their demand for family planning was actually satisfied.
The GPM Ground Validation NOAA CPC Morphing Technique (CMORPH) IFloodS dataset consists of global precipitation analyses data produced by the NOAA Climate Prediction Center (CPC). The Iowa Flood Studies (IFloodS) campaign was a ground measurement campaign that took place in eastern Iowa from May 1 to June 15, 2013. The goals of the campaign were to collect detailed measurements of precipitation at the Earth's surface using ground instruments and advanced weather radars and, simultaneously, collect data from satellites passing overhead. The CPC morphing technique uses precipitation estimates from low orbiter satellite microwave observations to produce global precipitation analyses at a high temporal and spatial resolution. Data has been selected for the Iowa Flood Studies (IFloodS) field campaign which took place from April 1, 2013 to June 30, 2013. The dataset includes both the near real-time raw data and bias corrected data from NOAA in binary and netCDF format.
Spaceflight is known to affect immune cell populations. In particular, splenic B-cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after spaceflight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. Our bioinformatic workflow has been optimized for Illumina HiSeq and MiSeq datasets, and is designed specifically to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options.
Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/29911https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/29911
When studying sensitive issues including corruption, prejudice, and sexual behavior, researchers have increasingly relied upon indirect questioning techniques to mitigate such known problems of direct survey questions as under-reporting and nonresponse. However, there have been surprisingly few empirical validation studies of these indirect techniques, because the information required to verify the resulting estimates is often difficult to access. This paper reports findings from the first comprehensive validation study of indirect methods. We estimate whether people voted for an anti-abortion referendum held during the 2011 Mississippi General Election using direct questioning and three popular indirect methods: list experiment, endorsement experiment, and randomized response. We then validate these estimates against the official election outcome. While direct questioning leads to significant under-estimation of sensitive votes against the referendum, these survey techniques yield estimates much closer to the actual vote count, with endorsement experiment and randomized response yielding least bias.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China. Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework. The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05). Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264) This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China. This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections. We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.