100+ datasets found
  1. f

    Data from: Development and validation of HBV surveillance models using big...

    • tandf.figshare.com
    docx
    Updated Dec 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Weinan Dong; Cecilia Clara Da Roza; Dandan Cheng; Dahao Zhang; Yuling Xiang; Wai Kay Seto; William C. W. Wong (2024). Development and validation of HBV surveillance models using big data and machine learning [Dataset]. http://doi.org/10.6084/m9.figshare.25201473.v1
    Explore at:
    docxAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset provided by
    Taylor & Francis
    Authors
    Weinan Dong; Cecilia Clara Da Roza; Dandan Cheng; Dahao Zhang; Yuling Xiang; Wai Kay Seto; William C. W. Wong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China. Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework. The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05). Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264) This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China. This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections. We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.

  2. FDA Drug Product Labels Validation Method Data Package

    • johnsnowlabs.com
    csv
    Updated Jan 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Snow Labs (2021). FDA Drug Product Labels Validation Method Data Package [Dataset]. https://www.johnsnowlabs.com/marketplace/fda-drug-product-labels-validation-method-data-package/
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 20, 2021
    Dataset authored and provided by
    John Snow Labs
    Description

    This data package contains information on Structured Product Labeling (SPL) Terminology for SPL validation procedures and information on performing SPL validations.

  3. m

    PEN-Method: Predictor model and Validation Data

    • data.mendeley.com
    • narcis.nl
    Updated Sep 3, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Halle (2021). PEN-Method: Predictor model and Validation Data [Dataset]. http://doi.org/10.17632/459f33wxf6.4
    Explore at:
    Dataset updated
    Sep 3, 2021
    Authors
    Alex Halle
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Data contains the PEN-Predictor-Keras-Model as well as the 100 validation data sets.

  4. Supplementary Materials: Choosing Validation Methods for Agent-Based Models...

    • zenodo.org
    bin, csv, png
    Updated Jun 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Artem Serdyuk; Artem Serdyuk (2025). Supplementary Materials: Choosing Validation Methods for Agent-Based Models - R Scripts, Data, and Visualizations" [Dataset]. http://doi.org/10.5281/zenodo.15633195
    Explore at:
    bin, csv, pngAvailable download formats
    Dataset updated
    Jun 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Artem Serdyuk; Artem Serdyuk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains supplementary materials, including R scripts, data files, figures, and documentation for the agent-based model validation framework presented in the article. README.md includes a detailed description.

  5. D

    Python functions -- cross-validation methods from a data-driven perspective

    • phys-techsciences.datastations.nl
    • zenodo.org
    docx, png +4
    Updated Aug 16, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Y. Wang; Y. Wang (2024). Python functions -- cross-validation methods from a data-driven perspective [Dataset]. http://doi.org/10.17026/PT/TXAU9W
    Explore at:
    tiff(2474294), tiff(2412540), tsv(49141), txt(1220), tiff(2413148), tsv(20072), tsv(30174), tiff(4833081), tiff(12196238), tiff(1606453), tiff(4729349), tiff(5695336), tsv(29), tiff(6478950), tiff(6534556), tiff(6466131), text/x-python(8210), docx(63366), tsv(12056), tiff(6567360), tsv(28), tiff(5385805), tsv(263901), tiff(6385076), text/x-python(5598), tiff(2423836), tiff(3417568), text/x-python(8181), png(110251), tiff(5726045), tsv(48948), tsv(1564525), tiff(3031197), tiff(2059260), tiff(2880005), tiff(6135064), tiff(3648419), tsv(102), tiff(3060978), tiff(3802696), tiff(4396561), tiff(1385025), text/x-python(1184), tiff(2817752), tiff(2516606), tsv(27725), text/x-python(12795), tiff(2282443)Available download formats
    Dataset updated
    Aug 16, 2024
    Dataset provided by
    DANS Data Station Physical and Technical Sciences
    Authors
    Y. Wang; Y. Wang
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This is the organized python functions of proposed methods in Yanwen Wang PhD research. Researchers can directly use these functions to conduct spatial+ cross-validation, dissimilarity quantification method, and dissimilarity-adaptive cross-validation.

  6. Method Validation Data.xlsx

    • figshare.com
    xlsx
    Updated Jan 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Norberto Gonzalez; Alanah Fitch (2020). Method Validation Data.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.11741703.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jan 28, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Norberto Gonzalez; Alanah Fitch
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data for method validation on detecting pmp-glucose by HPLC

  7. H

    Replication Data for: Beyond Standardization: A Comprehensive Review of...

    • dataverse.harvard.edu
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jana Bernhard-Harrer (2025). Replication Data for: Beyond Standardization: A Comprehensive Review of Topic Modeling Validation Methods for Computational Social Science Research [Dataset]. http://doi.org/10.7910/DVN/N67BDI
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    Harvard Dataverse
    Authors
    Jana Bernhard-Harrer
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    copy directly from abstract in PSRM publicationAs the use of computational text analysis in the social sciences has increased, topic modeling has emerged as a popular method for identifying latent themes in textual data. Nevertheless, concerns have been raised regarding the validity of the results produced by this method, given that it is largely automated and inductive in nature, and the lack of clear guidelines for validating topic models has been identified by scholars as an area of concern. In response, we conducted a comprehensive systematic review of 789 studies that employ topic modeling. Our goal is to investigate whether the field is moving towards a common framework for validating these models. The findings of our review indicate a notable absence of standardized validation practices and a lack of convergence towards specific methods of validation. This gap may be attributed to the inherent incompatibility between the inductive, qualitative approach of topic modeling and the deductive, quantitative tradition that favors standardized validation. To address this, we advocate for incorporating qualitative validation approaches, emphasizing transparency and detailed reporting to improve the credibility of findings in computational social science research, when using topic modeling.

  8. f

    Data from: Selection of optimal validation methods for quantitative...

    • tandf.figshare.com
    xlsx
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    K. Héberger (2023). Selection of optimal validation methods for quantitative structure–activity relationships and applicability domain [Dataset]. http://doi.org/10.6084/m9.figshare.23185916.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    K. Héberger
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.

  9. Prognostics of Power Electronics, methods and validation testbeds - Dataset...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). Prognostics of Power Electronics, methods and validation testbeds - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/prognostics-of-power-electronics-methods-and-validation-testbeds
    Explore at:
    Dataset updated
    Mar 31, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    An overview of the current results of prognostics for DC- DC power converters is presented, focusing on the output filter capacitor component. The electrolytic capacitor used typically as fileter capacitor is one of the components of the power supply with higher failure rate, hence the effort in devel- oping component level prognostics methods for capacitors. An overview of prognostics algorithms based on electrical overstress and thermal overstress accelerated aging data is presented and a discussion on the current efforts in terms of validation of the algorithms is included. The focus of current and future work is to develop a methodology that allows for algoritm development using accelerated aging data and then transform that to a valid algorithm on the real usage time scale.

  10. c

    ckanext-validation

    • catalog.civicdataecosystem.org
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ckanext-validation [Dataset]. https://catalog.civicdataecosystem.org/dataset/ckanext-validation
    Explore at:
    Dataset updated
    Dec 16, 2024
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Validation extension for CKAN enhances data quality within the CKAN ecosystem by leveraging the Frictionless Framework to validate tabular data. This extension allows for automated data validation, generating comprehensive reports directly accessible within the CKAN interface. The validation process helps identify structural and schema-level issues, ensuring data consistency and reliability. Key Features: Automated Data Validation: Performs data validation automatically in the background or during dataset creation, streamlining the quality assurance process. Comprehensive Validation Reports: Generates detailed reports on data quality, highlighting issues such as missing headers, blank rows, incorrect data types, or values outside of defined ranges. Frictionless Framework Integration: Utilizes the Frictionless Framework library for robust and standardized data validation. Exposed Actions: Provides accessible action functions that allows data validation to be integrated into custom workflows from other CKAN extensions. Command Line Interface: Offers a command-line interface (CLI) to manually trigger validation jobs for specific datasets, resources, or based on search criteria. Reporting Utilities: Enables the generation of global reports summarizing validation statuses across all resources. Use Cases: Improve Data Quality: Ensures data integrity and adherence to defined schemas, leading to better data-driven decision-making. Streamline Data Workflows: Integrates validation as part of data creation or update processes, automating quality checks and saving time. Customize Data Validation Rules: Allows developers to extend the validation process with their own custom workflows and integrations using the exposed actions. Technical Integration: The Validation extension integrates deeply within CKAN by providing new action functions (resourcevalidationrun, resourcevalidationshow, resourcevalidationdelete, resourcevalidationrunbatch) that can be called via the CKAN API. It also includes a plugin interface (IPipeValidation) for more advanced customization, which allows other extensions to receive and process validation reports. Users can utilize the command-line interface to trigger validation jobs and generate overview reports. Benefits & Impact: By implementing the Validation extension, CKAN installations can significantly improve the quality and reliability of their data. This leads to increased trust in the data, better data governance, and reduced errors in downstream applications that rely on the data. Automated validation helps to proactively identify and resolve data issues, contributing to a more efficient data management process.

  11. d

    Prognostics of Power Electronics, methods and validation testbeds

    • catalog.data.gov
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dashlink (2025). Prognostics of Power Electronics, methods and validation testbeds [Dataset]. https://catalog.data.gov/dataset/prognostics-of-power-electronics-methods-and-validation-testbeds
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dashlink
    Description

    An overview of the current results of prognostics for DC- DC power converters is presented, focusing on the output filter capacitor component. The electrolytic capacitor used typically as fileter capacitor is one of the components of the power supply with higher failure rate, hence the effort in devel- oping component level prognostics methods for capacitors. An overview of prognostics algorithms based on electrical overstress and thermal overstress accelerated aging data is presented and a discussion on the current efforts in terms of validation of the algorithms is included. The focus of current and future work is to develop a methodology that allows for algoritm development using accelerated aging data and then transform that to a valid algorithm on the real usage time scale.

  12. Z

    Data for training, validation and testing of methods in the thesis:...

    • data.niaid.nih.gov
    Updated May 1, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lucia Hajduková (2021). Data for training, validation and testing of methods in the thesis: Camera-based Accuracy Improvement of Indoor Localization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730337
    Explore at:
    Dataset updated
    May 1, 2021
    Dataset authored and provided by
    Lucia Hajduková
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The package contains files for two modules designed to improve the accuracy of the indoor positioning system, namely the following:

    door detection

    videos_test - videos used to demonstrate the application of door detector

    videos_res - videos from videos_test directory with detected doors marked

    parts detection

    frames_train_val - images generated from videos used for training and validation of VGG16 neural network model

    frames_test - images generated from videos used for testing of the trained model

    videos_test - videos used to demonstrate the application of parts detector

    videos_res - videos from videos_test directory with detected parts marked

  13. f

    Data from: Cross-Validation With Confidence

    • tandf.figshare.com
    zip
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Lei (2023). Cross-Validation With Confidence [Dataset]. http://doi.org/10.6084/m9.figshare.9976901.v3
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Taylor & Francis
    Authors
    Jing Lei
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Cross-validation is one of the most popular model and tuning parameter selection methods in statistics and machine learning. Despite its wide applicability, traditional cross-validation methods tend to overfit, due to the ignorance of the uncertainty in the testing sample. We develop a novel statistically principled inference tool based on cross-validation that takes into account the uncertainty in the testing sample. This method outputs a set of highly competitive candidate models containing the optimal one with guaranteed probability. As a consequence, our method can achieve consistent variable selection in a classical linear regression setting, for which existing cross-validation methods require unconventional split ratios. When used for tuning parameter selection, the method can provide an alternative trade-off between prediction accuracy and model interpretability than existing variants of cross-validation. We demonstrate the performance of the proposed method in several simulated and real data examples. Supplemental materials for this article can be found online.

  14. Z

    Data from: Synthetic Smart Card Data for the Analysis of Temporal and...

    • data.niaid.nih.gov
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Bouman (2020). Synthetic Smart Card Data for the Analysis of Temporal and Spatial Patterns [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_776718
    Explore at:
    Dataset updated
    Jan 24, 2020
    Dataset authored and provided by
    Paul Bouman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is a synthetic smart card data set that can be used to test pattern detection methods for the extraction of temporal and spatial data. The data set is tab seperated and based on a stylized travel pattern description for city of Utrecht in The Netherlands and is developed and used in Chapter 6 of the PhD Thesis of Paul Bouman.

    This dataset contains the following files:

    journeys.tsv : the actual data set of synthetic smart card data

    utrecht.xml : the activity pattern definition that was used to randomly generate the synthethic smart card data

    validate.ref : a file derived from the activity pattern definition that can be used for validation purposes. It specifies which activity types occur at each location in the smart card data set.

  15. d

    Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing,...

    • search.dataone.org
    Updated Sep 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege (2024). Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis (FDPVA) [Dataset]. http://doi.org/10.7910/DVN/ZZ9UKO
    Explore at:
    Dataset updated
    Sep 24, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege
    Description

    The objective of the fourth Technical Meeting on Fusion Data Processing, Validation and Analysis was to provide a platform during which a set of topics relevant to fusion data processing, validation and analysis are discussed with the view of extrapolating needs to next step fusion devices such as ITER. The validation and analysis of experimental data obtained from diagnostics used to characterize fusion plasmas are crucial for a knowledge-based understanding of the physical processes governing the dynamics of these plasmas. This paper presents the recent progress and achievements in the domain of plasma diagnostics and synthetic diagnostics data analysis (including image processing, regression analysis, inverse problems, deep learning, machine learning, big data and physics-based models for control) reported at the meeting. The progress in these areas highlight trends observed in current major fusion confinement devices. A special focus is dedicated on data analysis requirements for ITER and DEMO with a particular attention paid to Artificial Intelligence for automatization and improving reliability of control processes.

  16. d

    Validation Data for: Demand for family planning satisfied through modern...

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jolivet, Rima; Gausman, Jewel; Adanu, Richard; Bandoh, Delia; Berrueta, Mabel; Chakraborty, Suchandrima; Kenu, Ernest; Khan, Nizamuddin; Odikro, Magdalene; Pingray, Veronica; Ramesh, Sowmya; Vázquez, Paula; Williams, Caitlin; Langer, Ana (2023). Validation Data for: Demand for family planning satisfied through modern methods of contraception [Dataset]. http://doi.org/10.7910/DVN/DZSODD
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Jolivet, Rima; Gausman, Jewel; Adanu, Richard; Bandoh, Delia; Berrueta, Mabel; Chakraborty, Suchandrima; Kenu, Ernest; Khan, Nizamuddin; Odikro, Magdalene; Pingray, Veronica; Ramesh, Sowmya; Vázquez, Paula; Williams, Caitlin; Langer, Ana
    Description

    This dataset contains data from household surveys conducted with women in Argentina, Ghana, and India to validate the construct of "Demand for family planning satisfied through modern methods of contraception." Metadata Demand for family planning satisfied through modern methods of contraception Definition Percentage of women of reproductive age (15−49 years) who have their need for family planning satisfied with modern methods Numerator: Number of women of reproductive age (15 – 49 years) who have their need for family planning satisfied with modern methods Denominator: Total number of women of reproductive age (15–49 years) in need of family planning Disaggregator(s) • Wealth • Age • Education • Residence Data Source • MICS • DHS • RHS • Other national surveys Indicator Reference Countdown to 2030 Construct for Validation Women’s self-identified satisfaction of demand for family planning through a modern method of contraception; search for convergent validity comparing women’s subjective perception of satisfaction with family planning method, with an estimation of the concept derived via a constructed measure. Validation Question(s) 1. How does a direct measure of demand satisfaction for family planning (woman’s self-report) compare to the assigned result provided by the DHS algorithm derived from the responses to the series of questions used to calculate the indicator (same woman surveyed) (construct validity)? 2. How does the value of the indicator vary based on a new data source/estimation method compared to an established source/method? II. Study Aims This study aims to validate the DHS algorithm used to determine demand for family planning satisfied by comparing the results of the derived measure for a sample of women to the gold standard of those women’s own subjective perceptions as to whether their demand for family planning was actually satisfied.

  17. GPM GROUND VALIDATION NOAA CPC MORPHING TECHNIQUE (CMORPH) IFLOODS V1

    • catalog.data.gov
    • datasets.ai
    • +3more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA/MSFC/GHRC (2025). GPM GROUND VALIDATION NOAA CPC MORPHING TECHNIQUE (CMORPH) IFLOODS V1 [Dataset]. https://catalog.data.gov/dataset/gpm-ground-validation-noaa-cpc-morphing-technique-cmorph-ifloods-v1-f4b35
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The GPM Ground Validation NOAA CPC Morphing Technique (CMORPH) IFloodS dataset consists of global precipitation analyses data produced by the NOAA Climate Prediction Center (CPC). The Iowa Flood Studies (IFloodS) campaign was a ground measurement campaign that took place in eastern Iowa from May 1 to June 15, 2013. The goals of the campaign were to collect detailed measurements of precipitation at the Earth's surface using ground instruments and advanced weather radars and, simultaneously, collect data from satellites passing overhead. The CPC morphing technique uses precipitation estimates from low orbiter satellite microwave observations to produce global precipitation analyses at a high temporal and spatial resolution. Data has been selected for the Iowa Flood Studies (IFloodS) field campaign which took place from April 1, 2013 to June 30, 2013. The dataset includes both the near real-time raw data and bias corrected data from NOAA in binary and netCDF format.

  18. Data from: Validation of Methods to Assess the Immunoglobulin Gene...

    • data.nasa.gov
    • data.staging.idas-ds1.appdat.jsc.nasa.gov
    • +1more
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.nasa.gov (2025). Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station [Dataset]. https://data.nasa.gov/dataset/validation-of-methods-to-assess-the-immunoglobulin-gene-repertoire-in-tissues-obtained-fro-e1070
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    Spaceflight is known to affect immune cell populations. In particular, splenic B-cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after spaceflight, an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage, junctional regions, and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station, alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report, we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Igκ variable gene usage. Our bioinformatic workflow has been optimized for Illumina HiSeq and MiSeq datasets, and is designed specifically to reduce bias, capture the most information from Ig sequences, and produce a data set that provides other data mining options.

  19. d

    Data from: Partnering for science: proceedings of the USGS Workshop on...

    • datadiscoverystudio.org
    pdf
    Updated Jan 14, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2017). Partnering for science: proceedings of the USGS Workshop on Citizen Science [Dataset]. http://datadiscoverystudio.org/geoportal/rest/metadata/item/aee29daa83444cb7bdb59539ad895bc8/html
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jan 14, 2017
    Area covered
    Description

    Link to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information

  20. H

    Replication Data for: An Empirical Validation Study of Popular Survey...

    • dataverse.harvard.edu
    application/x-gzip +6
    Updated Oct 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Harvard Dataverse (2020). Replication Data for: An Empirical Validation Study of Popular Survey Methodologies for Sensitive Questions [Dataset]. http://doi.org/10.7910/DVN/29911
    Explore at:
    application/x-rlang-transport(1439200), pdf(26067), tsv(3673), zip(13725), txt(3726), application/x-gzip(61656), text/plain; charset=us-ascii(455)Available download formats
    Dataset updated
    Oct 19, 2020
    Dataset provided by
    Harvard Dataverse
    License

    https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/29911https://dataverse.harvard.edu/api/datasets/:persistentId/versions/3.0/customlicense?persistentId=doi:10.7910/DVN/29911

    Description

    When studying sensitive issues including corruption, prejudice, and sexual behavior, researchers have increasingly relied upon indirect questioning techniques to mitigate such known problems of direct survey questions as under-reporting and nonresponse. However, there have been surprisingly few empirical validation studies of these indirect techniques, because the information required to verify the resulting estimates is often difficult to access. This paper reports findings from the first comprehensive validation study of indirect methods. We estimate whether people voted for an anti-abortion referendum held during the 2011 Mississippi General Election using direct questioning and three popular indirect methods: list experiment, endorsement experiment, and randomized response. We then validate these estimates against the official election outcome. While direct questioning leads to significant under-estimation of sensitive votes against the referendum, these survey techniques yield estimates much closer to the actual vote count, with endorsement experiment and randomized response yielding least bias.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Weinan Dong; Cecilia Clara Da Roza; Dandan Cheng; Dahao Zhang; Yuling Xiang; Wai Kay Seto; William C. W. Wong (2024). Development and validation of HBV surveillance models using big data and machine learning [Dataset]. http://doi.org/10.6084/m9.figshare.25201473.v1

Data from: Development and validation of HBV surveillance models using big data and machine learning

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Dec 3, 2024
Dataset provided by
Taylor & Francis
Authors
Weinan Dong; Cecilia Clara Da Roza; Dandan Cheng; Dahao Zhang; Yuling Xiang; Wai Kay Seto; William C. W. Wong
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The construction of a robust healthcare information system is fundamental to enhancing countries’ capabilities in the surveillance and control of hepatitis B virus (HBV). Making use of China’s rapidly expanding primary healthcare system, this innovative approach using big data and machine learning (ML) could help towards the World Health Organization’s (WHO) HBV infection elimination goals of reaching 90% diagnosis and treatment rates by 2030. We aimed to develop and validate HBV detection models using routine clinical data to improve the detection of HBV and support the development of effective interventions to mitigate the impact of this disease in China. Relevant data records extracted from the Family Medicine Clinic of the University of Hong Kong-Shenzhen Hospital’s Hospital Information System were structuralized using state-of-the-art Natural Language Processing techniques. Several ML models have been used to develop HBV risk assessment models. The performance of the ML model was then interpreted using the Shapley value (SHAP) and validated using cohort data randomly divided at a ratio of 2:1 using a five-fold cross-validation framework. The patterns of physical complaints of patients with and without HBV infection were identified by processing 158,988 clinic attendance records. After removing cases without any clinical parameters from the derivation sample (n = 105,992), 27,392 cases were analysed using six modelling methods. A simplified model for HBV using patients’ physical complaints and parameters was developed with good discrimination (AUC = 0.78) and calibration (goodness of fit test p-value >0.05). Suspected case detection models of HBV, showing potential for clinical deployment, have been developed to improve HBV surveillance in primary care setting in China. (Word count: 264) This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections.We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China. This study has developed a suspected case detection model for HBV, which can facilitate early identification and treatment of HBV in the primary care setting in China, contributing towards the achievement of WHO’s elimination goals of HBV infections. We utilized the state-of-art natural language processing techniques to structure the data records, leading to the development of a robust healthcare information system which enhances the surveillance and control of HBV in China.

Search
Clear search
Close search
Google apps
Main menu