41 datasets found
  1. Time to Update the Split-Sample Approach in Hydrological Model Calibration...

    • zenodo.org
    zip
    Updated May 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai (2022). Time to Update the Split-Sample Approach in Hydrological Model Calibration v1.0 [Dataset]. http://doi.org/10.5281/zenodo.5915374
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Time to Update the Split-Sample Approach in Hydrological Model Calibration

    Hongren Shen1, Bryan A. Tolson1, Juliane Mai1

    1Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Ontario, Canada

    Corresponding author: Hongren Shen (hongren.shen@uwaterloo.ca)

    Abstract

    Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly-used split-sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines. This large-sample SST assessment study empirically assesses how different data splitting methods influence post-validation model testing period performance, thereby identifying optimal data splitting methods under different conditions. This study investigates the performance of two lumped conceptual hydrological models calibrated and tested in 463 catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length and data recentness of the continuous calibration sub-periods (CSPs). A full-period CSP is also included in the experiment, which skips model validation. The assessment approach is novel in multiple ways including how model building decisions are framed as a decision tree problem and viewing the model building process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period. Results span different climate and catchment conditions across a 35-year period with available data, making conclusions quite generalizable. Calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split-sample decision. Experimental findings remain consistent no matter how model building factors (i.e., catchments, model types, data availability, and testing periods) are varied. Results strongly support revising the traditional split-sample approach in hydrological modeling.

    Data description

    This data was used in the paper entitled "Time to Update the Split-Sample Approach in Hydrological Model Calibration" by Shen et al. (2022).

    Catchment, meteorological forcing and streamflow data are provided for hydrological modeling use. Specifically, the forcing and streamflow data are archived in the Raven hydrological modeling required format. The GR4J and HMETS model building results in the paper, i.e., reference KGE and KGE metrics in calibration, validation and testing periods, are provided for replication of the split-sample assessment performed in the paper.

    Data content

    The data folder contains a gauge info file (CAMELS_463_gauge_info.txt), which reports basic information of each catchment, and 463 subfolders, each having four files for a catchment, including:

    (1) Raven_Daymet_forcing.rvt, which contains Daymet meteorological forcing (i.e., daily precipitation in mm/d, minimum and maximum air temperature in deg_C, shortwave in MJ/m2/day, and day length in day) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

    (2) Raven_USGS_streamflow.rvt, which contains daily discharge data (in m3/s) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

    (3) GR4J_metrics.txt, which contains reference KGE and GR4J-based KGE metrics in calibration, validation and testing periods.

    (4) HMETS_metrics.txt, which contains reference KGE and HMETS-based KGE metrics in calibration, validation and testing periods.

    Data collection and processing methods

    Data source

    • Catchment information and the Daymet meteorological forcing are retrieved from the CAMELS data set, which can be found here.
    • The USGS streamflow data are collected from the U.S. Geological Survey's (USGS) National Water Information System (NWIS), which can be found here.
    • The GR4J and HMETS performance metrics (i.e., reference KGE and KGE) are produced in the study by Shen et al. (2022).

    Forcing data processing

    • A quality assessment procedure was performed. For example, daily maximum air temperature should be larger than the daily minimum air temperature; otherwise, these two values will be swapped.
    • Units are converted to Raven-required ones. Precipitation: mm/day, unchanged; daily minimum/maximum air temperature: deg_C, unchanged; shortwave: W/m2 to MJ/m2/day; day length: seconds to days.
    • Data for a catchment is archived in a RVT (ASCII-based) file, in which the second line specifies the start time of the forcing series, the time step (= 1 day), and the total time steps in the series (= 12784), respectively; the third and the fourth lines specify the forcing variables and their corresponding units, respectively.
    • More details of Raven formatted forcing files can be found in the Raven manual (here).

    Streamflow data processing

    • Units are converted to Raven-required ones. Daily discharge originally in cfs is converted to m3/s.
    • Missing data are replaced with -1.2345 as Raven requires. Those missing time steps will not be counted in performance metrics calculation.
    • Streamflow series is archived in a RVT (ASCII-based) file, which is open with eight commented lines specifying relevant gauge and streamflow data information, such as gauge name, gauge ID, USGS reported catchment area, calculated catchment area (based on the catchment shapefiles in CAMELS dataset), streamflow data range, data time step, and missing data periods. The first line after the commented lines in the streamflow RVT files specifies data type (default is HYDROGRAPH), subbasin ID (i.e., SubID), and discharge unit (m3/s), respectively. And the next line specifies the start of the streamflow data, time step (=1 day), and the total time steps in the series(= 12784), respectively.

    GR4J and HMETS metrics

    The GR4J and HMETS metrics files consists of reference KGE and KGE in model calibration, validation, and testing periods, which are derived in the massive split-sample test experiment performed in the paper.

    • Columns in these metrics files are gauge ID, calibration sub-period (CSP) identifier, KGE in calibration, validation, testing1, testing2, and testing3, respectively.
    • We proposed 50 different CSPs in the experiment. "CSP_identifier" is a unique name of each CSP. e.g., CSP identifier "CSP-3A_1990" stands for the model is built in Jan 1st 1990, calibrated in the first 3-year sample (1981-1983), calibrated in the rest years during the period of 1980 to 1989. Note that 1980 is always used for spin-up.
    • We defined three testing periods (independent to calibration and validation periods) for each CSP, which are the first 3 years from model build year inclusive, the first 5 years from model build year inclusive, and the full years from model build year inclusive. e.g., "testing1", "testing2", and "testing3" for CSP-3A_1990 are 1990-1992, 1990-1994, and 1990-2014, respectively.
    • Reference flow is the interannual mean daily flow based on a specific period, which is derived for a one-year period and then repeated in each year in the calculation period.
      • For calibration, its reference flow is based on spin-up + calibration periods.
      • For validation, its reference flow is based on spin-up + calibration periods.
      • For testing, its reference flow is based on spin-up +calibration + validation periods.
    • Reference KGE is calculated based on the reference flow and observed streamflow in a specific calculation period (e.g., calibration). Reference KGE is computed using the KGE equation with substituting the "simulated" flow for "reference" flow in the period for calculation. Note that the reference KGEs for the three different testing periods corresponds to the same historical period, but are different, because each testing period spans in a different time period and covers different series of observed flow.

    More details of the split-sample test experiment and modeling results analysis can be referred to the paper by Shen et al. (2022).

    Citation

    Journal Publication

    This study:

    Shen, H., Tolson, B. A., & Mai, J.(2022). Time to update the split-sample approach in hydrological model calibration. Water Resources Research, 58, e2021WR031523. https://doi.org/10.1029/2021WR031523

    Original CAMELS dataset:

    A. J. Newman, M. P. Clark, K. Sampson, A. Wood, L. E. Hay, A. Bock, R. J. Viger, D. Blodgett, L. Brekke, J. R. Arnold, T. Hopson, and Q. Duan (2015). Development of a large-sample watershed-scale hydrometeorological dataset for the contiguous USA: dataset characteristics and assessment of regional variability in hydrologic model performance. Hydrol. Earth Syst. Sci., 19, 209-223, http://doi.org/10.5194/hess-19-209-2015

    Data Publication

    This study:

    H. Shen, B.

  2. m

    Data for: Identification of hindered internal rotational mode for complex...

    • data.mendeley.com
    Updated Nov 21, 2017
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lam Huynh (2017). Data for: Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model [Dataset]. http://doi.org/10.17632/snstf5rd5n.1
    Explore at:
    Dataset updated
    Nov 21, 2017
    Authors
    Lam Huynh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The "Dataset_HIR" folder contains the data to reproduce the results of the data mining approach proposed in the manuscript titled "Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model".

    More specifically, the folder contains the raw electronic structure calculation input data provided by the domain experts as well as the training and testing dataset with the extracted features.

    The "Dataset_HIR" folder contains the following subfolders namely:

    1. Electronic structure calculation input data: contains the electronic structure calculation input generated by the Gaussian program

      1.1. Testing data: contains the raw data of all training species (each is stored in a separate folder) used for extracting dataset for training and validation phases.

      1.2. Testing data: contains the raw data of all testing species (each is stored in a separate folder) used for extracting data for the testing phase.

    2. Dataset 2.1. Training dataset: used to produce the results in Tables 3 and 4 in the manuscript

      + datasetTrain_raw.csv: contains the features for all vibrational modes associated with corresponding labeled species to let the chemists select the Hindered Internal Rotor from the list easily for the training and validation steps.  
      
      + datasetTrain.csv: refines the datasetTrain_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the modeling and validation steps.
      

      2.2. Testing dataset: used to produce the results of the data mining approach in Table 5 in the manuscript.

      + datasetTest_raw.csv: contains the features for all vibrational modes of each labeled species to let the chemists select the Hindered Internal Rotor from the list for the testing step.
      
      + datasetTest.csv: refines the datasetTest_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the testing step.
      

    Note for the Result feature in the dataset: 1 is for the mode needed to be treated as Hindered Internal Rotor, and 0 otherwise.

  3. Analytical and Simulation Framework for Performance Validation of Complex...

    • data.nasa.gov
    • data.amerigeoss.org
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Analytical and Simulation Framework for Performance Validation of Complex Systems, Phase II [Dataset]. https://data.nasa.gov/dataset/Analytical-and-Simulation-Framework-for-Performanc/qdfg-i8yz
    Explore at:
    application/rdfxml, json, csv, xml, tsv, application/rssxmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Next-generation aerospace systems will require increased autonomy to modify system behavior based on changing mission requirements, environmental factors, and system performance. For example, intelligent systems have been employed to improve safety by adaptively compensating for unexpected failures or damage. Despite many successful demonstrations of autonomous and intelligent control laws in simulations and flight tests, the difficulty associated with the verification, validation, and testing of adaptive and nondeterministic systems poses a significant barrier to their use in safety-critical systems. The proposed Phase II research addresses this challenge through the development of innovative V\&V algorithms and easy-to-use software tools that will provide intelligent, automated, and interactive test plan generation and execution. The tool will integrate state-of-the-art analysis and numerical methods to automatically generate test vector sets, execute high-fidelity simulations or monitor pilot-in-the-loop simulations, analyze simulation results, and adapt/modify future test vectors based on observations to date. The result will be a significant reduction in cost associated with system testing while simultaneously offering a significant increase in the probability that system problems are uncovered early in the development process. The tool will have broad applicability for aerospace as well as non-aerospace applications.

  4. Global Testing Center of Excellence (TCoE) Market Size By Type Of Service,...

    • verifiedmarketresearch.com
    Updated Jul 18, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Testing Center of Excellence (TCoE) Market Size By Type Of Service, By Industry Vertical, By Organization Size, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/testing-center-of-excellence-market/
    Explore at:
    Dataset updated
    Jul 18, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2031
    Area covered
    Global
    Description

    Testing Center of Excellence (TCoE) Market size was valued at USD 51 Billion in 2023 and is projected to reach USD 96 Billion by 2031, growing at a CAGR of 12 % during the forecast period 2024-2031.

    Global Testing Center of Excellence (TCoE) Market Drivers
    Increased Demand for Quality Assurance

    Customer Expectations**: Consumers today expect seamless, bug-free experiences. Any minor glitch can lead to user dissatisfaction, damaging brand reputation.
    Market Differentiation**: In a saturated market, quality can be a key differentiator. Companies that can guarantee high-performing, reliable software often hold a competitive edge.
    Complexity of Applications**: As applications become more complex, integrating multiple features and services, the risks of defects increase. This requires robust QA processes.
    IoT and Connected Devices**: The rise of Internet of Things (IoT) has further heightened the need for thorough QA, as the interconnectivity of devices adds layers of complexity and potential failure points.
    Global Market Reach**: Companies releasing products on a global stage need to assure quality in various markets, considering diverse user needs, regulatory conditions, and usage environments.
    Rising Adoption of Agile and DevOps
    Faster Time-to-Market**: Agile and DevOps focus on iterative development and continuous delivery, which necessitates regular and rigorous testing to ensure that each iteration and deployment is reliable.
    Integrated Teams**: The blending of development, operations, and QA teams in DevOps promotes a culture of shared responsibility for quality, fortifying the significance of a structured TCoE.3. **Automated Testing**: Both methodologies emphasize automation to speed up processes. A TCoE can establish best practices for automated testing, ensuring consistency and reliability across the board.
    Continuous Feedback**: Agile and DevOps rely on continuous feedback mechanisms. A specialized TCoE can streamline the capture, analysis, and implementation of feedback into the testing cycles.
    Scalability**: As organizations scale their Agile and DevOps practices, a centralized Testing Center of Excellence can help in maintaining standards and managing resources efficiently.
    Regulatory Compliance Requirements
    Avoidance of Penalties**: Non-compliance with regulations can lead to hefty fines and legal repercussions. Organizations are motivated to invest in quality assurance frameworks that ensure adherence to regulatory standards.
    Industry-Specific Regulations**: Sectors such as finance, healthcare, and telecommunications experience tighter regulations. A TCoE can specialize in the specific compliance requirements of these industries to ensure that all testing processes and outcomes align with legal mandates.
    Data Security and Privacy**: With regulations such as GDPR, organizations must ensure that their software applications handle data securely and privately. Comprehensive testing within a TCoE can preempt compliance issues.
    Audits and Validation**: Regular audits are a part of regulatory compliance. A TCoE can help organizations prepare for these audits by maintaining thorough documentation and validation processes.
    Global Standards**: Companies operating globally face the challenge of multi-jurisdictional compliance. A centralized TCoE can help standardize testing processes to adhere to different regulatory frameworks simultaneously.
    Need for Cost Efficiency
    Resource Utilization**: A centralized TCoE can optimize the use of testing resources, ensuring that skilled personnel and tools are employed where they are most needed, avoiding redundancy and inefficiencies.
    Standardization**: Standardized testing processes and tools across projects lead to reduced training costs and faster onboarding of new team members.
    Automation**: By adopting automated testing processes, a TCoE can reduce the time and cost associated with manual testing, while also minimizing human errors.
    Economies of Scale**: Centralizing testing services can lead to economies of scale, with bulk licensing of tools and technologies, and shared infrastructure costs.
    Risk Mitigation**: Proactive and comprehensive testing reduces the risk of post-release defects, which can be costly to fix and damage customer relations. Prevention is generally more cost-effective than remediation.
    Vendor Management**: A TCoE can effectively manage relationships with third-party vendors, negotiating better terms and ensuring that external services conform to internal quality standards.
    Complex IT Environments: The increasing complexity of IT environments and applications requires specialized testing capabilities, which TCoEs can provide.
    Digital Transformation Initiatives: Ongoing digital transformation efforts across industries demand robust testing frameworks to ensure the reliability and performance of digital solutions.
    Increased Outsourcing and Offshoring: Companies are increasingly outsourcing their testing functions to TCoEs to leverage specialized expertise and focus on core business activities.
    Integration of Automation and AI in Testing: The integration of automation tools and AI in testing processes enhances efficiency and accuracy, making TCoEs more attractive to organizations.
    Growing Need for Cybersecurity: With the rising threat of cyberattacks, organizations need comprehensive testing frameworks to ensure robust security measures, which TCoEs can provide.
    Globalization and Collaboration: The need for global collaboration and standardized testing practices across geographically dispersed teams drives the demand for TCoEs.

  5. f

    Model performance per dataset.

    • plos.figshare.com
    xls
    Updated May 2, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Erik D. Huckvale; Hunter N. B. Moseley (2024). Model performance per dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0299583.t010
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Erik D. Huckvale; Hunter N. B. Moseley
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (~26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation (CV) performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We hope others will avoid similar mistakes.

  6. Preclinical Software For Physiology DA And AS Market Analysis North America,...

    • technavio.com
    Updated Jan 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Technavio (2025). Preclinical Software For Physiology DA And AS Market Analysis North America, Europe, Asia, Rest of World (ROW) - US, UK, Germany, Japan, Canada, France, China, Italy, India, The Netherlands - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/preclinical-software-for-physiology-data-assessment-and-animal-supervision-market-industry-analysis
    Explore at:
    Dataset updated
    Jan 30, 2025
    Dataset provided by
    TechNavio
    Authors
    Technavio
    Time period covered
    2021 - 2025
    Area covered
    Germany, United Kingdom, United States, Global
    Description

    Snapshot img

    Preclinical Software For Physiology DA and AS Market Size 2025-2029

    The preclinical software for physiology DA and AS market size is forecast to increase by USD 4.38 billion at a CAGR of 6% between 2024 and 2029.

    Preclinical software for physiology in the global market is witnessing significant growth due to several key trends. The emerging role of bioinformatics tools and software in preclinical research is a major growth factor. Bioinformatics tools are increasingly being used to analyze large datasets generated during preclinical studies, enabling researchers to gain deeper insights into the biological mechanisms underlying various diseases. Another trend driving market growth is the rising digitalization in preclinical research. The adoption of digital technologies, such as electronic data capture systems and cloud-based platforms is streamlining research processes and improving data accuracy and accessibility. The software facilitates data visualization, scientific collaboration, and data analysis, enabling researchers to make informed decisions in areas such as neurology, gene therapy, pharmacokinetic studies, and biosimilar development. Furthermore, the stringent ethical framework using animals in preclinical research is pushing the need for advanced software solutions to ensure compliance with regulations and improve animal welfare. These trends are expected to continue shaping the preclinical software for physiology market In the coming years.
    

    What will be the Size of the Preclinical Software For Physiology DA And AS Market During the Forecast Period?

    Request Free Sample

    In the dynamic and innovative realm of biomedical research, preclinical software for physiology DA and AS plays a pivotal role in drug discovery and scientific advancement. This market caters to various sectors, including pharmaceutical research, academic labs, government labs, and biotechnology, to streamline laboratory automation, ensure regulatory compliance, and enhance data security. It also supports data integrity, pharmacovigilance, research infrastructure, and clinical trial design. Furthermore, it caters to specialized fields like cardiology research, drug repurposing, biomarker development, precision medicine, and pharmacodynamic studies.
    Moreover, with a strong emphasis on data security and regulatory compliance, this software is indispensable for pharmaceutical R&D outsourcing and pharmaceutical consulting. In summary, preclinical software for physiology DA and AS is a vital tool In the life sciences analytics sector, driving scientific progress and ensuring the integrity and security of research data.
    

    How is this Preclinical Software For Physiology DA And AS Industry segmented and which is the largest segment?

    The preclinical software for physiology da and as industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.

    End-user
    
      Industrial labs and CROs
      Academic government and research labs
    
    
    Deployment
    
      On-premises
      Cloud
    
    
    Geography
    
      North America
    
        Canada
        US
    
    
      Europe
    
        Germany
        UK
        France
        Italy
    
    
      Asia
    
        China
        India
        Japan
    
    
      Rest of World (ROW)
    

    By End-user Insights

    The industrial labs and CROs segment is estimated to witness significant growth during the forecast period.
    

    Preclinical software plays a crucial role In the research and development of new drugs and therapies in pharmaceutical and biotechnology industries. Many companies outsource their preclinical research to Contract Research Organizations (CROs), which offer advanced technologies and facilities, including preclinical software. CROs execute various research activities, from fundamental research to late-stage development, encompassing tasks like genetic engineering, animal testing, assay development, target validation, and clinical trials. Pharmaceutical and biotech firms specializing in chronic conditions and disorders may prefer small-scale CROs or conduct preclinical research in-house. Preclinical software solutions facilitate physiology data assessment, compound management, cardiology, chemistry, toxicology testing, and adhere to drug approval standards.

    Get a glance at the Preclinical Software For Physiology DA And AS Industry report of share of various segments Request Free Sample

    The industrial labs and CROs segment was valued at USD 6.71 billion in 2019 and showed a gradual increase during the forecast period.

    Regional Analysis

    North America is estimated to contribute 44% to the growth of the global market during the forecast period.
    

    Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period.

    For more insights on the market share of various regio

  7. Time to Update the Split-Sample Approach in Hydrological Model Calibration...

    • zenodo.org
    • data.subak.org
    • +1more
    bin
    Updated May 31, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai (2022). Time to Update the Split-Sample Approach in Hydrological Model Calibration v1.1 [Dataset]. http://doi.org/10.5281/zenodo.6578924
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Time to Update the Split-Sample Approach in Hydrological Model Calibration

    Hongren Shen1, Bryan A. Tolson1, Juliane Mai1

    1Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Ontario, Canada

    Corresponding author: Hongren Shen (hongren.shen@uwaterloo.ca)

    Abstract

    Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly-used split-sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines. This large-sample SST assessment study empirically assesses how different data splitting methods influence post-validation model testing period performance, thereby identifying optimal data splitting methods under different conditions. This study investigates the performance of two lumped conceptual hydrological models calibrated and tested in 463 catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length and data recentness of the continuous calibration sub-periods (CSPs). A full-period CSP is also included in the experiment, which skips model validation. The assessment approach is novel in multiple ways including how model building decisions are framed as a decision tree problem and viewing the model building process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period. Results span different climate and catchment conditions across a 35-year period with available data, making conclusions quite generalizable. Calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split-sample decision. Experimental findings remain consistent no matter how model building factors (i.e., catchments, model types, data availability, and testing periods) are varied. Results strongly support revising the traditional split-sample approach in hydrological modeling.

    Version updates

    v1.1 Updated on May 19, 2022. We added hydrographs for each catchment.

    There are 8 parts of the zipped file attached in v1.1. You should download all of them and unzip all those eight parts together.

    In this update, we added two zipped files in each gauge subfolder:

    (1) GR4J_Hydrographs.zip and

    (2) HMETS_Hydrographs.zip

    Each of the zip files contains 50 CSV files. These CSV files are named with keywords of model name, gauge ID, and the calibration sub-period (CSP) identifier.

    Each hydrograph CSV file contains four key columns:

    (1) Date time (note that the hour column is less significant since this is daily data);

    (2) Precipitation in mm that is the aggregated basin mean precipitation;

    (3) Simulated streamflow in m3/s and the column is named as "subXXX", where XXX is the ID of the catchment, specified in the CAMELS_463_gauge_info.txt file; and

    (4) Observed streamflow in m3/s and the column is named as "subXXX(observed)".

    Note that these hydrograph CSV files reported period-ending time-averaged flows. They were directly produced by the Raven hydrological modeling framework. More information about the format of the hydrograph CSV files can be redirected to the Raven webpage.

    v1.0 First version published on Jan 29, 2022.

    Data description

    This data was used in the paper entitled "Time to Update the Split-Sample Approach in Hydrological Model Calibration" by Shen et al. (2022).

    Catchment, meteorological forcing and streamflow data are provided for hydrological modeling use. Specifically, the forcing and streamflow data are archived in the Raven hydrological modeling required format. The GR4J and HMETS model building results in the paper, i.e., reference KGE and KGE metrics in calibration, validation and testing periods, are provided for replication of the split-sample assessment performed in the paper.

    Data content

    The data folder contains a gauge info file (CAMELS_463_gauge_info.txt), which reports basic information of each catchment, and 463 subfolders, each having four files for a catchment, including:

    (1) Raven_Daymet_forcing.rvt, which contains Daymet meteorological forcing (i.e., daily precipitation in mm/d, minimum and maximum air temperature in deg_C, shortwave in MJ/m2/day, and day length in day) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

    (2) Raven_USGS_streamflow.rvt, which contains daily discharge data (in m3/s) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

    (3) GR4J_metrics.txt, which contains reference KGE and GR4J-based KGE metrics in calibration, validation and testing periods.

    (4) HMETS_metrics.txt, which contains reference KGE and HMETS-based KGE metrics in calibration, validation and testing periods.

    Data collection and processing methods

    Data source

    • Catchment information and the Daymet meteorological forcing are retrieved from the CAMELS data set, which can be found here.
    • The USGS streamflow data are collected from the U.S. Geological Survey's (USGS) National Water Information System (NWIS), which can be found here.
    • The GR4J and HMETS performance metrics (i.e., reference KGE and KGE) are produced in the study by Shen et al. (2022).

    Forcing data processing

    • A quality assessment procedure was performed. For example, daily maximum air temperature should be larger than the daily minimum air temperature; otherwise, these two values will be swapped.
    • Units are converted to Raven-required ones. Precipitation: mm/day, unchanged; daily minimum/maximum air temperature: deg_C, unchanged; shortwave: W/m2 to MJ/m2/day; day length: seconds to days.
    • Data for a catchment is archived in a RVT (ASCII-based) file, in which the second line specifies the start time of the forcing series, the time step (= 1 day), and the total time steps in the series (= 12784), respectively; the third and the fourth lines specify the forcing variables and their corresponding units, respectively.
    • More details of Raven formatted forcing files can be found in the Raven manual (here).

    Streamflow data processing

    • Units are converted to Raven-required ones. Daily discharge originally in cfs is converted to m3/s.
    • Missing data are replaced with -1.2345 as Raven requires. Those missing time steps will not be counted in performance metrics calculation.
    • Streamflow series is archived in a RVT (ASCII-based) file, which is open with eight commented lines specifying relevant gauge and streamflow data information, such as gauge name, gauge ID, USGS reported catchment area, calculated catchment area (based on the catchment shapefiles in CAMELS dataset), streamflow data range, data time step, and missing data periods. The first line after the commented lines in the streamflow RVT files specifies data type (default is HYDROGRAPH), subbasin ID (i.e., SubID), and discharge unit (m3/s), respectively. And the next line specifies the start of the streamflow data, time step (=1 day), and the total time steps in the series(= 12784), respectively.

    GR4J and HMETS metrics

    The GR4J and HMETS metrics files consists of reference KGE and KGE in model calibration, validation, and testing periods, which are derived in the massive split-sample test experiment performed in the paper.

    • Columns in these metrics files are gauge ID, calibration sub-period (CSP) identifier, KGE in calibration, validation, testing1, testing2, and testing3, respectively.
    • We proposed 50 different CSPs in the experiment. "CSP_identifier" is a unique name of each CSP. e.g., CSP identifier "CSP-3A_1990" stands for the model is built in Jan 1st 1990, calibrated in the first 3-year sample (1981-1983), calibrated in the rest years during the period of 1980 to 1989. Note that 1980 is always used for spin-up.
    • We defined three testing periods (independent to calibration and validation periods) for each CSP, which are the first 3 years from model build year inclusive, the first 5 years from model build year inclusive, and the full years from model build year inclusive. e.g., "testing1", "testing2", and "testing3" for CSP-3A_1990 are 1990-1992, 1990-1994, and 1990-2014, respectively.
    • Reference flow is the interannual mean daily flow based on a specific period, which is derived for a one-year period and then repeated in each year in the calculation period.
      • For calibration, its reference flow is based on spin-up + calibration periods.
      • For validation, its reference flow is based on spin-up + calibration periods.
      • For testing, its reference flow is based on spin-up +calibration + validation periods.
      </li>
      <li>Reference KGE is calculated based on the reference flow and observed streamflow in a specific calculation period (e.g.,
      
  8. s

    Data from: Target and non-target analysis of xenobiotics in urine samples....

    • research.science.eus
    • ekoizpen-zientifikoa.ehu.eus
    Updated 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Musatadi, Mikel; Anakabe Iturriaga, Eneritz; Olivares Zabalandikoetxea, Maitane; Etxebarria, Nestor; Zuloaga Zubieta, Olatz; Mijangos, Leire; De Angelis, Francesca; Prieto Sobrino, Ailette; Musatadi, Mikel; Anakabe Iturriaga, Eneritz; Olivares Zabalandikoetxea, Maitane; Etxebarria, Nestor; Zuloaga Zubieta, Olatz; Mijangos, Leire; De Angelis, Francesca; Prieto Sobrino, Ailette (2024). Target and non-target analysis of xenobiotics in urine samples. Method optimization and validation [Dataset]. https://research.science.eus/documentos/67a9c7bc19544708f8c70cb1
    Explore at:
    Dataset updated
    2024
    Authors
    Musatadi, Mikel; Anakabe Iturriaga, Eneritz; Olivares Zabalandikoetxea, Maitane; Etxebarria, Nestor; Zuloaga Zubieta, Olatz; Mijangos, Leire; De Angelis, Francesca; Prieto Sobrino, Ailette; Musatadi, Mikel; Anakabe Iturriaga, Eneritz; Olivares Zabalandikoetxea, Maitane; Etxebarria, Nestor; Zuloaga Zubieta, Olatz; Mijangos, Leire; De Angelis, Francesca; Prieto Sobrino, Ailette
    Description

    This is the collection of the metadata associated with target and non-target analysis of xenobiotics (optimization and validation) in urine samples obtained within the AQUASOMIC project and related to the following publications:

    Development and evaluation of a comprehensive workflow for suspect screening of exposome-related xenobiotics and phase II metabolites indiverse human biofluids (http://hdl.handle.net/10810/66708; https://doi.org/10.1016/j.chemosphere.2024.141221)

    The role of sample preparation in suspect and non-target screening for exposome analysis using human urine (http://hdl.handle.net/10810/63636; https://doi.org/10.1016/j.chemosphere.2023.139690)

    Sample preparation for suspect and non-target screening of xenobiotics in human biofluids by liquid chromatography —High resolution tandem mass spectrometry (http://hdl.handle.net/10810/66707; https://doi.org/10.1016/j.mex.2023.102501)

    From target analysis to suspect and non-target screening of endocrine-disrupting compounds in human urine (http://hdl.handle.net/10810/57486; https://doi.org/10.1007/s00216-022-04250-w)

    Description of tables included in the metadata - datasheet

    Table Z1 includes information and physic-chemical properties of benzophenones, bisphenols, parabens, phthalates and antibacterial and biocides analyzed by low-resolution mass spectrometry (MS/MS).

    Table Z2 includes MS/MS conditions for the target analysis of selected endocrine disrupting compounds and 4 surrogates by LC-QQQ at pH 2.5 and 10.5, including the ionization mode, retention time, m/z fragments, fragmentor voltage (V) and collision energy (eV).

    Table Z3 instrumental and procedural limits of quantification (ng/g), upper limits (ng/g) and determination coefficients of the calibration curves for UHPLC-QqQ and UHPLC-qOrbitrap at pH 2.5 and pH 10.5 for selected endocrine disrupting compounds and 4 surrogates.

    Table Z4 includes apparent recoveries and matrix effect during solid-phase extraction of endocrine disrupting compounds in human urine using Oasis-HLB and mixed mode strong ion exchanger and cation exchanger cartridges.

    Table Z5 includes average (n=5) absolute (%) and apparent (%) recoveries and relative standard deviations (%) obtained at the (a) low (3 ng/g), (b) medium (6 ng/g) and (c) high (30 ng/g) spiking levels for selected endocrine disrupting compounds using OASIS HLB SPE cartridges.

    Table Z6 includes commercial information, physic-chemical properties of selected compounds determined using UHPLC-qOrbitrap.

    Table Z7 includes UHPLC-qOrbitrap conditions and limits of identification of pure chemical standards determined by HRMS.

    Table Z8 includes figures of merit (limits of quantification, trueness and precision) for selected compounds determined in non-hydrolysed urine samples by Oasis HLB SPE-RPLC-QqQ and Oasis HLB SPE-RPLC-qOrbitrap.

    Table Z9 includes figures of merit (limits of quantification, trueness and precision) for selected compounds determined in hydrolysed urine samples by Oasis HLB SPE-RPLC-QqQ and Oasis HLB SPE-RPLC-qOrbitrap

    Table Z10 includes Data pre-processing parameters for suspect screening analysis of urine samples using Compound Discoverer 3.2 and 3.3.

    Table Z11 includes Commercial and chemical information of the compounds used as model calibrants for retention time index model.

    Table Z12 includes absolute recoveries and matrix effect values obtained using liquid-liquid extraction, salt assisted liquid-liquid extraction and dilute-and-shoot for a selected number of xenobiotics.

    Table Z13 includes list of xenobiotics and phase II metabolites annotated in real urine samples (n=5) with different sample preparation procedures (dilute-and-shoot, salt assisted liquid-liquid extraction using acetonitrile or ethyl acetate, liquid-liquid extraction using ethyl acetate and solid-phase extraction using Oasis HLB) and with/without hydrolysis using suspect lists included in files 1, 2, 5 and 7 published in https://doi.org/10.5281/zenodo.14527541

    Table Z14 includes annotated suspects (levels 1 - 3) in the human biofluids at the positive ionization mode using suspect lists included in files 1, 2, 5 and 7 published in https://doi.org/10.5281/zenodo.14527541

    Table Z15 includes annotated suspects (levels 1 - 3) in the human biofluids at the negative ionization mode using suspect lists included in files 1, 2, 5 and 7 published in https://doi.org/10.5281/zenodo.14527541

  9. Continuous Integrated Invariant Inference, Phase I

    • data.nasa.gov
    application/rdfxml +5
    Updated Jun 26, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2018). Continuous Integrated Invariant Inference, Phase I [Dataset]. https://data.nasa.gov/dataset/Continuous-Integrated-Invariant-Inference-Phase-I/tdqa-gatj
    Explore at:
    application/rssxml, tsv, csv, json, application/rdfxml, xmlAvailable download formats
    Dataset updated
    Jun 26, 2018
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    The proposed project will develop a new technique for invariant inference and embed this and other current invariant inference and checking techniques in an easy-to-use tool. The result will enhance an engineer's ability to use formal methods -- generating, editing, reviewing, proving and testing invariants -- and improve productivity in verification and validation of safety and correctness properties software.

    Currently, invariants that represent such properties require extensive human effort to write; automated techniques, though improving, are still insufficiently capable of automatically inferring them.

    The proposed project will develop innovative techniques to infer logical invariants describing the behavior of individual software modules by combining static (analyzing a program without running it) and hybrid analysis (inferring invariants from observations of executing software). In particular, the project will (a) combine concolic execution and hybrid analysis to find candidate invariants from high-branch-coverage test suites, (b) apply that combination to obtain invariants for individual functions and data structures, (c) iterate the analysis to broaden data coverage of the test suite and improve the accuracy of invariants, and (d) create early prototypes and development plans to integrate the resulting tools in selected IDEs (Eclipse and GrammaTech's CodeSonar tool).

    In carrying out this project, GrammaTech will build on its static analysis tools, concolic engine, and software dynamic translation module. It will leverage its base of research and expertise in static and hybrid analysis, specification languages, automated SMT theorem provers, and GUI tools for program analysis and development. The commercialization prospects for the proposed project are enhanced by GrammaTech's demonstrated experience in producing prototypes and commercial products from research results.

  10. D

    Data Acquisition Industry Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Nov 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AMA Research & Media LLP (2024). Data Acquisition Industry Report [Dataset]. https://www.datainsightsmarket.com/reports/data-acquisition-industry-10779
    Explore at:
    pdf, ppt, docAvailable download formats
    Dataset updated
    Nov 23, 2024
    Dataset provided by
    AMA Research & Media LLP
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Data Acquisition Industry market was valued at USD XXX Million in 2023 and is projected to reach USD XXX Million by 2032, with an expected CAGR of 7.10% during the forecast period.It actually refers to the measurement of real-time physical phenomena such as temperature, pressure, voltage, current, acceleration, sound, etc. Then these are converted into digital values which a computer can use for analysis and interpretation. A typical DAQ consists of sensors, data acquisition hardware, and a computer fitted with specific software.The DAQ systems have been applied in several process industries for monitoring and control of processes, analysis data acquisition, and for making intelligent decisions. In the manufacturing sector, DAQ systems are used to monitor production lines, signal anomalies in the manufacturing process, and optimize the latter. In the research and development sector, DAQ systems are applied for the acquisition of data from experiments, the analysis of the results, and validation of hypotheses. Daq systems are used in health-care to monitor the health parameters of patients, and medical data analysis followed by remote patient monitoring. Its other applications lie in environmental monitoring, energy management, and automotive testing. The increasing demand for real-time data and automation is promoting the demand in the market for daq systems. Improved technologies in sensor-based devices, data acquisition hardware, and data acquisition software have been aiding in more complex and efficient data gathering and analysis.Data will come center stage in an increasingly data-driven world, with the acquisition of data assuming increased importance, fueling innovation, and determining the future character of many industries. Recent developments include: June 2022 - Advantech announced the launch of a new series of data acquisition modules - iDAQ series. The iDAQ series is a new series of modular DAQ modules and chassis, including the iDAQ-900 series chassis and iDAQ-700 and 800 series DAQ modules., December 2021 - ABB India partnered with Indore smart city development to deploy digital technology that enables the continuous electricity supply to homes and businesses. ABB's Compact Secondary Substations (CSS) used in the project reduces downtime by providing a steady and reliable power supply through digitally-enabled supervisory control and data acquisition (SCADA) solutions.. Key drivers for this market are: Growing Adoption of Industrial Ethernet Solutions, Increasing Complexity in Manufacturing Establishments is Driving Operators Towards Adoption of DAQ for Design Validation and Testing; Technological Advancements Such as Edge Computing and TSN. Potential restraints include: Cost Implications and Saturation in Key Markets Could Hinder Growth Over the Forecast Period. Notable trends are: Aerospace and Defense Accounts for Significant Share in the Market.

  11. MSL Curiosity Rover Images with Science and Engineering Classes

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Sep 17, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Steven Lu; Steven Lu; Kiri L. Wagstaff; Kiri L. Wagstaff (2020). MSL Curiosity Rover Images with Science and Engineering Classes [Dataset]. http://doi.org/10.5281/zenodo.4033453
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 17, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Steven Lu; Steven Lu; Kiri L. Wagstaff; Kiri L. Wagstaff
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Please note that the file msl-labeled-data-set-v2.1.zip below contains the latest images and labels associated with this data set.

    Data Set Description

    The data set consists of 6,820 images that were collected by the Mars Science Laboratory (MSL) Curiosity Rover by three instruments: (1) the Mast Camera (Mastcam) Left Eye; (2) the Mast Camera Right Eye; (3) the Mars Hand Lens Imager (MAHLI). With the help from Dr. Raymond Francis, a member of the MSL operations team, we identified 19 classes with science and engineering interests (see the "Classes" section for more information), and each image is assigned with 1 class label. We split the data set into training, validation, and test sets in order to train and evaluate machine learning algorithms. The training set contains 5,920 images (including augmented images; see the "Image Augmentation" section for more information); the validation set contains 300 images; the test set contains 600 images. The training set images were randomly sampled from sol (Martian day) range 1 - 948; validation set images were randomly sampled from sol range 949 - 1920; test set images were randomly sampled from sol range 1921 - 2224. All images are resized to 227 x 227 pixels without preserving the original height/width aspect ratio.

    Directory Contents

    • images - contains all 6,820 images
    • class_map.csv - string-integer class mappings
    • train-set-v2.1.txt - label file for the training set
    • val-set-v2.1.txt - label file for the validation set
    • test-set-v2.1.txt - label file for the test set

    The label files are formatted as below:

    "Image-file-name class_in_integer_representation"

    Labeling Process

    Each image was labeled with help from three different volunteers (see Contributor list). The final labels are determined using the following processes:

    • If all three labels agree with each other, then use the label as the final label.
    • If the three labels do not agree with each other, then we manually review the labels and decide the final label.
    • We also performed error analysis to correct labels as a post-processing step in order to remove noisy/incorrect labels in the data set.

    Classes

    There are 19 classes identified in this data set. In order to simplify our training and evaluation algorithms, we mapped the class names from string to integer representations. The names of classes, string-integer mappings, distributions are shown below:

    Class name, counts (training set), counts (validation set), counts (test set), integer representation

    Arm cover, 10, 1, 4, 0

    Other rover part, 190, 11, 10, 1

    Artifact, 680, 62, 132, 2

    Nearby surface, 1554, 74, 187, 3

    Close-up rock, 1422, 50, 84, 4

    DRT, 8, 4, 6, 5

    DRT spot, 214, 1, 7, 6

    Distant landscape, 342, 14, 34, 7

    Drill hole, 252, 5, 12, 8

    Night sky, 40, 3, 4, 9

    Float, 190, 5, 1, 10

    Layers, 182, 21, 17, 11

    Light-toned veins, 42, 4, 27, 12

    Mastcam cal target, 122, 12, 29, 13

    Sand, 228, 19, 16, 14

    Sun, 182, 5, 19, 15

    Wheel, 212, 5, 5, 16

    Wheel joint, 62, 1, 5, 17

    Wheel tracks, 26, 3, 1, 18

    Image Augmentation

    Only the training set contains augmented images. 3,920 of the 5,920 images in the training set are augmented versions of the remaining 2000 original training images. Images taken by different instruments were augmented differently. As shown below, we employed 5 different methods to augment images. Images taken by the Mastcam left and right eye cameras were augmented using a horizontal flipping method, and images taken by the MAHLI camera were augmented using all 5 methods. Note that one can filter based on the file names listed in the train-set.txt file to obtain a set of non-augmented images.

    • 90 degrees clockwise rotation (file name ends with -r90.jpg)
    • 180 degrees clockwise rotation (file name ends with -r180.jpg)
    • 270 degrees clockwise rotation (file name ends with -r270.jpg)
    • Horizontal flip (file name ends with -fh.jpg)
    • Vertical flip (file name ends with -fv.jpg)

    Acknowledgment

    The authors would like to thank the volunteers (as in the Contributor list) who provided annotations for this data set. We would also like to thank the PDS Imaging Note for the continuous support of this work.

  12. AWC to 60cm DSM data of the Roper catchment NT generated by the Roper River...

    • data.csiro.au
    • researchdata.edu.au
    Updated Apr 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Elisabeth Bui; John Gallant; Peter R Wilson; Peter Wilson (2024). AWC to 60cm DSM data of the Roper catchment NT generated by the Roper River Water Resource Assessment [Dataset]. http://doi.org/10.25919/y0v9-7b58
    Explore at:
    Dataset updated
    Apr 16, 2024
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Elisabeth Bui; John Gallant; Peter R Wilson; Peter Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2020 - Jun 30, 2023
    Area covered
    Dataset funded by
    CSIROhttp://www.csiro.au/
    Northern Territory Department of Environment, Parks and Water Security
    Description

    AWC to 60cm is one of 18 attributes of soils chosen to underpin the land suitability assessment of the Roper River Water Resource Assessment (ROWRA) through the digital soil mapping process (DSM). AWC (available water capacity) indicates the ability of a soil to retain and supply water for plant growth. This AWC raster data represents a modelled dataset of AWC to 60cm (mm of water to 60cm of soil depth) and is derived from analysed site data, spline calculations and environmental covariates. AWC is a parameter used in land suitability assessments for rainfed cropping and for water use efficiency in irrigated land uses. This raster data provides improved soil information used to underpin and identify opportunities and promote detailed investigation for a range of sustainable regional development options and was created within the ‘Land Suitability’ activity of the CSIRO ROWRA. A companion dataset and statistics reflecting reliability of this data are also provided and can be found described in the lineage section of this metadata record. Processing information is supplied in ranger R scripts and attributes were modelled using a Random Forest approach. The DSM process is described in the CSIRO ROWRA published report ‘Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. The Roper River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Roper catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: This AWC to 60cm dataset has been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO ROWRA published reports and in particular ' Soils and land suitability for the Roper catchment, Northern Territory’. A technical report from the CSIRO Roper River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create AWC to 60cm Digital Soil Mapping (DSM) attribute raster dataset. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 8. QA Quality assessment of this DSM attribute data was conducted by three methods. Method 1: Statistical (quantitative) method of the model and input data. Testing the quality of the DSM models was carried out using data withheld from model computations and expressed as OOB and R squared results, giving an estimate of the reliability of the model predictions. These results are supplied. Method 2: Statistical (quantitative) assessment of the spatial attribute output data presented as a raster of the attributes “reliability”. This used the 500 individual trees of the attributes RF models to generate 500 datasets of the attribute to estimate model reliability for each attribute. For continuous attributes the method for estimating reliability is the Coefficient of Variation. This data is supplied. Method 3: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. Across each of the study areas a two week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling using the reliability data of the attribute. The modelled DSM attribute value was assessed against the actual on-ground value. These results are published in the report cited in this metadata record.

  13. Food Insecurity Experience Scale 2014 - Brazil

    • microdata.worldbank.org
    • catalog.ihsn.org
    Updated Jan 25, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FAO Statistics Division (2023). Food Insecurity Experience Scale 2014 - Brazil [Dataset]. https://microdata.worldbank.org/index.php/catalog/5598
    Explore at:
    Dataset updated
    Jan 25, 2023
    Dataset provided by
    Food and Agriculture Organizationhttp://fao.org/
    Authors
    FAO Statistics Division
    Time period covered
    2014
    Area covered
    Brazil
    Description

    Abstract

    Sustainable Development Goal (SDG) target 2.1 commits countries to end hunger, ensure access by all people to safe, nutritious and sufficient food all year around. Indicator 2.1.2, “Prevalence of moderate or severe food insecurity based on the Food Insecurity Experience Scale (FIES)”, provides internationally-comparable estimates of the proportion of the population facing difficulties in accessing food. More detailed background information is available at http://www.fao.org/in-action/voices-of-the-hungry/fies/en/ .

    The FIES-based indicators are compiled using the FIES survey module, containing 8 questions. Two indicators can be computed: 1. The proportion of the population experiencing moderate or severe food insecurity (SDG indicator 2.1.2), 2. The proportion of the population experiencing severe food insecurity.

    These data were collected by FAO through the Gallup World Poll. General information on the methodology can be found here: https://www.gallup.com/178667/gallup-world-poll-work.aspx. National institutions can also collect FIES data by including the FIES survey module in nationally representative surveys.

    Microdata can be used to calculate the indicator 2.1.2 at national level. Instructions for computing this indicator are described in the methodological document available under the "DOCUMENTATION" tab above. Disaggregating results at sub-national level is not encouraged because estimates will suffer from substantial sampling and measurement error.

    Geographic coverage

    National coverage

    Analysis unit

    Individuals

    Universe

    Individuals of 15 years or older.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A multi-stage sampling method was employed. The sample was first distributed proportionally to the 5 regions of the country and to the size of municipality's population. Municipalities were chosen and then further subdivided by census region. Exclusions: None Design effect: 1.3

    Mode of data collection

    Face-to-face [f2f]

    Cleaning operations

    Statistical validation assesses the quality of the FIES data collected by testing their consistency with the assumptions of the Rasch model. This analysis involves the interpretation of several statistics that reveal 1) items that do not perform well in a given context, 2) cases with highly erratic response patterns, 3) pairs of items that may be redundant, and 4) the proportion of total variance in the population that is accounted for by the measurement model.

    Sampling error estimates

    The margin of error is estimated as 3.5 .This is calculated around a proportion at the 95% confidence level. The maximum margin of error was calculated assuming a reported percentage of 50% and takes into account the design effect.

    Data appraisal

    The variable WHLDAY was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable RUNOUT was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable HUNGRY was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable WORRIED was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable ATELESS was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable HEALTHY was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable FEWFOOD was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

    The variable SKIPPED was not considered in the computation of the published FAO food insecurity indicator based on FIES due to the results of the validation process.

  14. Data from: A comparison of new cardiovascular endurance test using the...

    • data.niaid.nih.gov
    • search.dataone.org
    • +1more
    zip
    Updated Jul 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suchai Surapichpong; Sucheela Jisarojito; Chaiyanut Surapichpong (2024). A comparison of new cardiovascular endurance test using the 2-minute marching test vs. 6-minute walk test in healthy volunteers: A crossover randomized controlled trial [Dataset]. http://doi.org/10.5061/dryad.31zcrjdv2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Bangkok Hospital
    Bangbo Hospital
    Authors
    Suchai Surapichpong; Sucheela Jisarojito; Chaiyanut Surapichpong
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Description

    This was a 2×2 randomized crossover control trial to compare the cardiovascular endurance of healthy volunteers using a 2-minute marching test (2MMT) and a 6-minute walk test (6MWT). This study included 254 participants of both sexes, aged 20–50 years, with a height and body mass index (BMI) of ≥150 cm and ≤25 kg/m2, respectively. Participants could perform activities independently and had normal annual chest radiographs and electrocardiograms. A group-randomized design was used to assign participants to Sequence 1 (AB) or 2 (BA). The tests were conducted over 2 consecutive days, with a 1-day washout period. On day 1, the participants randomly underwent either a 6MWT or 2MMT in a single-anonymized setup, and on day 2, the tests were performed in reverse order. We analyzed maximal oxygen consumption (VO2max) as the primary outcome and heart rate (HR), respiratory rate (RR), blood pressure (BP), oxygen saturation, dyspnea, and leg fatigue as secondary outcomes. Data were collected from 127 participants, categorized into two groups for different testing sequences. The first (AB) and second groups had 63 and 64 participants, respectively. The estimated VO2max was equivalent between both groups. The 2MMT and 6MWT estimated VO2max with a mean of 41.00 ± 3.95 mL/kg/min and 40.65 ± 3.98 mL/kg/min, respectively. The mean difference was -0.35 mL/kg/min (95% confidence interval: -1.09 to 0.38; p <0.001), and no treatment and carryover effects were observed. No significant changes were observed in HR, RR, and systolic BP (p = 0.295, p = 0.361 and p = 0.389, respectively). However, significant changes were found in the ratings of perceived exertion (p <0.001) and leg fatigue scale (p <0.001). The 2MMT is practical, simple, and equivalent to the 6MWT in estimating VO2max. Methods Sample size The sample size required for the equivalence study was estimated using nQuery software and calculated using two one-sided equivalence tests for crossover design. To calculate the sample size, we set the alpha error probability, statistical power, the lower equivalence limit, and upper equivalence limit at 5%, 90%, -2.00, and +2.00, respectively, using the clinical margin (minimal clinically important difference [MCID] of VO2max from a previous study, which was 2 ml/kg/min [15], and standard deviation was 8.6 [16]. Based on these values, we needed 101 participants for the crossover design, allowing for a 20% dropout rate. Therefore, we decided to randomize 127 patients per arm, resulting in 254 participants. However, due to the COVID-19 pandemic, data collection was incomplete, and we could only analyze 127 data sets in this study. Inclusion and exclusion criteria The inclusion criteria were male and female healthy volunteers, aged 20–50 years, with height: ≥150 cm and, BMI ≤25 kg/m2. Participants could perform activities independently and had normal annual chest radiographs and electrocardiograms. The exclusion criteria were significantly unstable vital signs, a history of COVID-19, and underlying heart disease or neuromuscular/skeletal impairment. Procedure and measurement We conducted a 2MMT and compared the results with those of the standard test, the 6MWT, to test the equivalence of both tests in estimating VO2max Condition A: According to the standard protocol, the 6MWT was performed indoors on a flat surface in a 30-m straight corridor, with 180º turns every 30 m. [10]. The walk test was performed with stable vital signs, and SpO2 was maintained at >95%, all monitored by a cardiopulmonary physical therapist. Condition B: The 2MMT was developed to determine the number of steps performed within 2 min. After the “start” command, the participants began marching in place and lifting their knees to an appropriate height of 30 cm. The participants were instructed to perform as many steps as possible (reaching a height of 30 cm) within 2 min. The participants were allowed to perform a few training steps to adjust to the marching technique and verify their ability to complete the task. The participants marched at their own pace; they could slow down or even stop, if necessary, and continue marching until the end of the 2-minute test period. The investigator determined the number of steps performed, informed the participants about the time left until the end of the trial, and motivated them to achieve the best possible result. The test results were expressed as the number of performed steps during which the right foot touched the ground. When the participants exhibited severe symptoms of exercise intolerance in both tests, such as severe dyspnea, fatigue, or other alarming symptoms, they were allowed to slow down or stop and rest. However, they were encouraged to resume the test as soon as possible. Adverse events were monitored during and after test completion. Both tests were terminated and interpreted as incomplete if any of the following symptoms were present: chest pain, intolerable dyspnea, leg cramps, staggering, diaphoresis, and ashen appearance. Data regarding the sex, age, BMI, HR, and RR were collected, and SpO2 was assessed using the NONIN Onyx2 9590 Oximeter, SBP and DBP were measured using the Philip Patient Monitor Efficia CM100, RPE, and LFS were assessed using the Borg’s scale. All parameters were recorded at 1 min, 5 min, and 10 min for pretest and posttest. VO2max estimated the cardiovascular endurance using the following formula: VO2max estimated in the 6MWT: 70.161 + (0.023 × 6MWT [m]) - (0.276 × weight [kg]) - (6.79 × sex, where m = 0, f = 1) - (0.193 × resting HR [beats per minute] - (0.191 × age [years]) [15]. where resting HR is the 10-min resting HR of posttest. VO2max estimated in the 2MMT: 13.341 + 0.138 × total up and down steps (UDS) – (0.183 × BMI) [16]. Data analysis Due to the COVID-19 pandemic in Thailand and hospital policies, only 127 of the 254 participants, who were healthy volunteers, could complete data collection. Descriptive statistics were used to evaluate demographic characteristics. Continuous variables were reported as mean ± standard deviation, whereas binary variables were reported as percentages. The primary outcome (VO2max), evaluated using Statgraphics software, was analyzed through a two-one-sided t-test procedure. The analysis was conducted with an equivalence bound of ± 2 mL/kg/min from the margin of VO2max observed in a previous study [17]. The carryover and treatment effects were insignificant, and the equivalence result was significant for the test. For the secondary outcome, all parameters were analyzed using a linear mixed-effect model to compare 2MMT and 6MWT with STATA software. References

    Chu P, Gotink RA, Yeh GY, Goldie SJ, Hunink MG. The effectiveness of yoga in modifying risk factors for cardiovascular disease and metabolic syndrome: A systematic review and meta-analysis of randomized controlled trials. Eur J Prev Cardiol. 2016;23: 291-307. doi: 10.1177/2047487314562741.

    Khushoo TN, Rafiq N, Qayoom O. Assessment of cardiovascular fitness (VO2 max) among medical students by Queens College step test. Int J Biomed Adv Res. 2015;6: 418-421. doi: 10.7439/IJBAR.V6I5.1965.

    Haas F, Sweeney G, Pierre A, Plusch T, Whiteson JH, editors. Validation of a 2 minute step test for assessing functional Improvement 2017. OJTR. 2017;05: 71-81. doi: 10.4236/ojtr.2017.52007.

    Oliveros MJ, Seron P, Román C, Gálvez M, Navarro R, Latin G, et al. Two-minute step test as a complement to six-minute walk test in subjects with treated coronary artery disease. Front Cardiovasc Med. 2022;9: 848589. doi: 10.3389/fcvm.2022.848589.

    Kammin EJ. The 6-minute walk test: indications and guidelines for use in outpatient practices. J Nurse Pract. 2022;18: 608-610. doi: 10.1016/j.nurpra.2022.04.013.

    Cazzola M, Biscione GL, Pasqua F, Crigna G, Appodia M, Cardaci V, et al. Use of 6-min and 12-min walking test for assessing the efficacy of formoterol in COPD. Respir Med. 2008;102: 1425-1430. doi:10.1016/j.rmed.2008.04.017.

    Pollentier B, Irons SL, Benedetto CM, Dibenedetto AM, Loton D, Seyler RD, et al. Examination of the six minute walk test to determine functional capacity in people with chronic heart failure: a systematic review. Cardiopulm Phys Ther J. 2010;21: 13-21. doi: 10.1097/01823246-201021010-00003.

    ATS Committee on Proficiency Standards for Clinical Pulmonary Function Laboratories. ATS statement: guidelines for the six-minute walk test. Am J Respir Crit Care Med. 2002;166: 111-117. doi: 10.1164/ajrccm.166.1.at1102.

    Rasekaba T, Lee AL, Naughton MT, Williams TJ, Holland AE. The six-minute walk test: a useful metric for the cardiopulmonary patient. Intern Med J. 2009;39: 495-501. doi: 10.1111/j.1445-5994.2008.01880.x.

    Bohannon RW, Crouch RH. Two-minute step test of exercise capacity: systematic review of procedures, performance, and clinimetric properties. J Geriatr Phys Ther. 2019;42: 105-112. doi: 10.1519/JPT.0000000000000164.

    Wells CL, Kegelmeyer D, Mayer KP, Kumble S, Reilley A, Campbell A, et al. APTA cross sections and academies recommendations for COVID-19 core outcome measures. J Acute Care Phys Ther. 2022;13: 62-76. doi: 10.1097/JAT.0000000000000172.

    Berlanga LA, Matos-Duarte M, Abdalla P, Alves E, Mota J, Bohn L. Validity of the two-minute step test for healthy older adults. Geriatr Nurs. 2023;51: 415-421. doi: 10.1016/j.gerinurse.2023.04.009.

    Vilarinho R, Caneiras C, Montes AM. Measurement properties of step tests for exercise capacity in COPD: A systematic review. Clin Rehabil. 2021;35: 578-588. doi: 10.1177/0269215520968054.

    Burr JF, Bredin SS, Faktor MD, Warburton DE. The 6-minute walk test as a predictor of objectively measured aerobic fitness in healthy working-aged adults. Phys Sportsmed. 2011;39: 133-139. doi: 10.3810/psm.2011.05.1904.

    Ricci PA, Cabiddu R, Jürgensen SP, André LD, Oliveira CR, Di Thommazo-Luporini L, et al. Validation of the two-minute step test in obese with comorbibities and morbidly obese patients. Braz J Med

  15. Soil surface salinity DSM data of the Victoria catchment NT generated by the...

    • data.csiro.au
    Updated Dec 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Peter R Wilson; Peter Wilson (2024). Soil surface salinity DSM data of the Victoria catchment NT generated by the Victoria River Water Resource Assessment [Dataset]. http://doi.org/10.25919/pyx8-x959
    Explore at:
    Dataset updated
    Dec 13, 2024
    Dataset provided by
    CSIROhttp://www.csiro.au/
    Authors
    Ian Watson; Mark Thomas; Seonaid Philip; Uta Stockmann; Ross Searle; Linda Gregory; jason hill; Peter R Wilson; Peter Wilson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2021 - Sep 30, 2024
    Area covered
    Dataset funded by
    Northern Territory Department of Environment, Parks and Water Security
    CSIROhttp://www.csiro.au/
    Description

    Soil surface salinity is one of 18 attributes of soils chosen to underpin the land suitability assessment of the Victoria River Water Resource Assessment (VIWRA) through the digital soil mapping process (DSM). Soil salinity represents the salt content of the soil. This raster data represents a modelled dataset of salinity at the soil surface and is derived from field measured and laboratory analysed site data, and environmental covariates. Data values are: 1 Surface salinity absent, 2 Surface salinity present. Soil surface salinity is a parameter used in land suitability assessments as it hinders seed establishment and retards plant growth. This raster data provides improved soil information used to underpin and identify opportunities and promote detailed investigation for a range of sustainable regional development options and was created within the ‘Land Suitability’ activity of the CSIRO VIWRA. A companion dataset and statistics reflecting reliability of this data are also provided and can be found described in the lineage section of this metadata record. Processing information is supplied in ranger R scripts and attributes were modelled using a Random Forest approach. The DSM process is described in the CSIRO VIWRA published report ‘Soils and land suitability for the Victoria catchment, Northern Territory’. A technical report from the CSIRO Victoria River Water Resource Assessment to the Government of Australia. The Victoria River Water Resource Assessment provides a comprehensive overview and integrated evaluation of the feasibility of aquaculture and agriculture development in the Victoria catchment NT as well as the ecological, social and cultural (indigenous water values, rights and aspirations) impacts of development. Lineage: The soil surface salinity dataset has been generated from a range of inputs and processing steps. Following is an overview. For more information refer to the CSIRO VIWRA published reports and in particular ' Soils and land suitability for the Victoria catchment, Northern Territory’. A technical report from the CSIRO Victoria River Water Resource Assessment to the Government of Australia. 1. Collated existing data (relating to: soils, climate, topography, natural resources, remotely sensed, of various formats: reports, spatial vector, spatial raster etc). 2. Selection of additional soil and land attribute site data locations by a conditioned Latin hypercube statistical sampling method applied across the covariate data space. 3. Fieldwork was carried out to collect new attribute data, soil samples for analysis and build an understanding of geomorphology and landscape processes. 4. Database analysis was performed to extract the data to specific selection criteria required for the attribute to be modelled. 5. The R statistical programming environment was used for the attribute computing. Models were built from selected input data and covariate data using predictive learning from a Random Forest approach implemented in the ranger R package. 6. Create soil surface salinity Digital Soil Mapping (DSM) attribute raster dataset. DSM data is a geo-referenced dataset, generated from field observations and laboratory data, coupled with environmental covariate data through quantitative relationships. It applies pedometrics - the use of mathematical and statistical models that combine information from soil observations with information contained in correlated environmental variables, remote sensing images and some geophysical measurements. 7. Companion predicted reliability data was produced from the 500 individual Random Forest attribute models created. 8. QA Quality assessment of this DSM attribute data was conducted by three methods. Method 1: Statistical (quantitative) method of the model and input data. Testing the quality of the DSM models was carried out using data withheld from model computations and expressed as OOB and confusion matrix results, giving an estimate of the reliability of the model predictions. These results are supplied. Method 2: Statistical (quantitative) assessment of the spatial attribute output data presented as a raster of the attributes “reliability”. This used the 500 individual trees of the attributes RF models to generate 500 datasets of the attribute to estimate model reliability for each attribute. For categorical attributes the method for estimating reliability is the Confusion Index. This data is supplied. Method 3: Collecting independent external validation site data combined with on-ground expert (qualitative) examination of outputs during validation field trips. Across each of the study areas a two week validation field trip was conducted using a new validation site set which was produced by a random sampling design based on conditioned Latin Hypercube sampling using the reliability data of the attribute. The modelled DSM attribute value was assessed against the actual on-ground value. These results are published in the report cited in this metadata record.

  16. i

    Multiple Indicator Cluster Survey 2006 - Iraq

    • catalog.ihsn.org
    • datacatalog.ihsn.org
    • +2more
    Updated Mar 29, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Suleimaniya Statistical Directorate (2019). Multiple Indicator Cluster Survey 2006 - Iraq [Dataset]. https://catalog.ihsn.org/catalog/842
    Explore at:
    Dataset updated
    Mar 29, 2019
    Dataset provided by
    Ministry of Health
    Kurdistan Region Statistics Office
    Suleimaniya Statistical Directorate
    Central Organization for Statistics and Information Technology
    Time period covered
    2006
    Area covered
    Iraq
    Description

    Abstract

    The Multiple Indicator Cluster Survey (MICS) is a household survey programme developed by UNICEF to assist countries in filling data gaps for monitoring human development in general and the situation of children and women in particular. MICS is capable of producing statistically sound, internationally comparable estimates of social indicators. The current round of MICS is focused on providing a monitoring tool for the Millennium Development Goals (MDGs), the World Fit for Children (WFFC), as well as for other major international commitments, such as the United Nations General Assembly Special Session (UNGASS) on HIV/AIDS and the Abuja targets for malaria.

    The 2006 Iraq Multiple Indicator Cluster Survey has as its primary objectives: - To provide up-to-date information for assessing the situation of children and women in Iraq; - To furnish data needed for monitoring progress toward goals established by the Millennium Development Goals and the goals of A World Fit For Children (WFFC) as a basis for future action; - To contribute to the improvement of data and monitoring systems in Iraq and to strengthen technical expertise in the design, implementation and analysis of such systems.

    Survey Content MICS questionnaires are designed in a modular fashion that was customized to the needs of the country. They consist of a household questionnaire, a questionnaire for women aged 15-49 and a questionnaire for children under the age of five (to be administered to the mother or caretaker). Other than a set of core modules, countries can select which modules they want to include in each questionnaire.

    Survey Implementation The survey was implemented by the Central Organization for Statistics and Information Technology (COSIT), the Kurdistan Region Statistics Office (KRSO) and Suleimaniya Statistical Directorate (SSD), in partnership with the Ministry of Health (MOH). The survey also received support and assistance of UNICEF and other partners. Technical assistance and training for the surveys was provided through a series of regional workshops, covering questionnaire content, sampling and survey implementation; data processing; data quality and data analysis; report writing and dissemination.

    Geographic coverage

    The survey is nationally representative and covers the whole of Iraq.

    Analysis unit

    Households (defined as a group of persons who usually live and eat together)

    De jure household members (defined as memers of the household who usually live in the household, which may include people who did not sleep in the household the previous night, but does not include visitors who slept in the household the previous night but do not usually live in the household)

    Women aged 15-49

    Children aged 0-4

    Universe

    The survey covered all de jure household members (usual residents), all women aged 15-49 years resident in the household, and all children aged 0-4 years (under age 5) resident in the household. The survey also includes a full birth history listing all chuldren ever born to ever-married women age 15-49 years.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    The sample for the Iraq Multiple Indicator Cluster Survey was designed to provide estimates on a large number of indicators on the situation of children and women at the national level; for areas of residence of Iraq represented by rural and urban (metropolitan and other urban) areas; for the18 governorates of Iraq; and also for metropolitan, other urban, and rural areas for each governorate. Thus, in total, the sample consists of 56 different sampling domains, that includes 3 sampling domains in each of the 17 governorates outside the capital city Baghdad (namely, a metropolitan area domain representing the governorate city centre, an other urban area domain representing the urban area outside the governorate city centre, and a rural area domain) and 5 sampling domains in Baghdad (namely, 3 metropolitan areas representing Sadir City, Resafa side, and Kurkh side, an other urban area sampling domain representing the urban area outside the three Baghdad governorate city centres, and a sampling domain comprising the rural area of Baghdad).

    The sample was selected in two stages. Within each of the 56 sampling domains, 54 PSUs were selected with linear systematic probability proportional to size (PPS).

    \After mapping and listing of households were carried out within the selected PSU or segment of the PSU, linear systematic samples of six households were drawn. Cluster sizes of 6 households were selected to accommodate the current security conditions in the country to allow the surveys team to complete a full cluster in a minimal time. The total sample size for the survey is 18144 households. The sample is not self-weighting. For reporting national level results, sample weights are used.

    The sampling procedures are more fully described in the sampling appendix of the final report and can also be found in the list of technical documents within this archive.

    (Extracted from the final report: Central Organisation for Statistics & Information Technology and Kurdistan Statistics Office. 2007. Iraq Multiple Indicator Cluster Survey 2006, Final Report. Iraq.)

    Sampling deviation

    No major deviations from the original sample design were made. One cluster of the 3024 clusters selected was not completed all othe clusters were accessed.

    Mode of data collection

    Face-to-face [f2f]

    Research instrument

    The questionnaires were based on the third round of the Multiple Indicator Cluster survey model questionnaires. From the MICS-3 model English version, the questionnaires were revised and customized to suit local conditions and translated into Arabic and Kurdish languages. The Arabic language version of the questionnaire was pre-tested during January 2006 while the Kurdish language version was pre-tested during March 2006. Based on the results of the pre-test, modifications were made to the wording and translation of the questionnaires.

    In addition to the administration of questionnaires, fieldwork teams tested the salt used for cooking in the households for iodine content, and measured the weights and heights of children age under-5 years.

    Cleaning operations

    Data were processed in clusters, with each cluster being processed as a complete unit through each stage of data processing. Each cluster goes through the following steps: 1) Questionnaire reception 2) Office editing and coding 3) Data entry 4) Structure and completeness checking 5) Verification entry 6) Comparison of verification data 7) Back up of raw data 8) Secondary editing 9) Edited data back up

    After all clusters are processed, all data is concatenated together and then the following steps are completed for all data files: 10) Export to SPSS in 5 files (hh - household, hl - household members, wm - women age 15-49, ch - children under 5 bh - women age 15-49) 11) Recoding of variables needed for analysis 12) Adding of sample weights 13) Calculation of wealth quintiles and merging into data 14) Structural checking of SPSS files 15) Data quality tabulations 16) Production of analysis tabulations

    Detailed documentation of the editing of data can be found in the data processing guidelines in the MICS Manual (http://www.childinfo.org/mics/mics3/manual.php)

    Data entry was conducted by 12 data entry operators in tow shifts, supervised by 2 data entry supervisors, using a total of 7 computers (6 data entry computers plus one supervisors computer). All data entry was conducted at the GenCenStat head office using manual data entry. For data entry, CSPro version 2.6.007 was used with a highly structured data entry program, using system controlled approach, that controlled entry of each variable. All range checks and skips were controlled by the program and operators could not override these. A limited set of consistency checks were also included inthe data entry program. In addition, the calculation of anthropometric Z-scores was also included in the data entry programs for use during analysis. Open-ended responses ("Other" answers) were not entered or coded, except in rare circumstances where the response matched an existing code in the questionnaire.

    Structure and completeness checking ensured that all questionnaires for the cluster had been entered, were structurally sound, and that women's and children's questionnaires existed for each eligible woman and child.

    100% verification of all variables was performed using independent verification, i.e. double entry of data, with separate comparison of data followed by modification of one or both datasets to correct keying errors by original operators who first keyed the files.

    After completion of all processing in CSPro, all individual cluster files were backed up before concatenating data together using the CSPro file concatenate utility.

    Data editing took place at a number of stages throughout the processing (see Other processing), including: a) Office editing and coding b) During data entry c) Structure checking and completeness d) Secondary editing e) Structural checking of SPSS data files

    Detailed documentation of the editing of data can be found in the data processing guidelines in the MICS Manual (http://www.childinfo.org/mics/mics3/manual.php)

    Response rate

    Of the 18144 households selected for the sample, 18123 were found to be occupied. Of these, 17873 were successfully interviewed for a household response rate of 98.6 percent. In the interviewed households, 27564 women (age 15-49 years) were identified. Of these, 27186 were successfully interviewed, yielding a

  17. Food Insecurity Experience Scale (FIES) - Maldives

    • microdata.fao.org
    Updated Jun 29, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FAO Statistics Division (2022). Food Insecurity Experience Scale (FIES) - Maldives [Dataset]. https://microdata.fao.org/index.php/catalog/2270
    Explore at:
    Dataset updated
    Jun 29, 2022
    Dataset provided by
    Food and Agriculture Organizationhttp://fao.org/
    Authors
    FAO Statistics Division
    Time period covered
    2021
    Area covered
    Maldives
    Description

    Abstract

    Sustainable Development Goal (SDG) target 2.1 commits countries to end hunger, ensure access by all people to safe, nutritious and sufficient food all year around. Indicator 2.1.2, “Prevalence of moderate or severe food insecurity based on the Food Insecurity Experience Scale (FIES)”, provides internationally-comparable estimates of the proportion of the population facing difficulties in accessing food. More detailed background information is available at http://www.fao.org/in-action/voices-of-the-hungry/fies/en/.

    The FIES-based indicators are compiled using the FIES survey module, containing 8 questions. Two indicators can be computed:
    1. The proportion of the population experiencing moderate or severe food insecurity (SDG indicator 2.1.2). 2. The proportion of the population experiencing severe food insecurity.

    These data were collected by FAO through GeoPoll. National institutions can also collect FIES data by including the FIES survey module in nationally representative surveys.

    Microdata can be used to calculate the indicator 2.1.2 at national level. Instructions for computing this indicator are described in the methodological document available in the documentations tab. Disaggregating results at sub-national level is not encouraged because estimates will suffer from substantial sampling and measurement error.

    Geographic coverage

    National

    Analysis unit

    Individuals

    Universe

    Individuals of 15 years or older.

    Kind of data

    Sample survey data [ssd]

    Sampling procedure

    A sampling quota of at least 200 observations per each Administrative 1 areas is set Exclusions: NA Design effect: NA

    Mode of data collection

    Computer Assisted Telephone Interview [CATI]

    Cleaning operations

    Statistical validation assesses the quality of the FIES data collected by testing their consistency with the assumptions of the Rasch model. This analysis involves the interpretation of several statistics that reveal 1) items that do not perform well in a given context, 2) cases with highly erratic response patterns, 3) pairs of items that may be redundant, and 4) the proportion of total variance in the population that is accounted for by the measurement model.

    Sampling error estimates

    The margin of error is estimated as NA. This is calculated around a proportion at the 95% confidence level. The maximum margin of error was calculated assuming a reported percentage of 50% and takes into account the design effect.

    Data appraisal

    Since the population with access to mobile telephones is likely to differ from the rest of the population with respect to their access to food, post-hoc adjustments were made to control for the potential resulting bias. Post-stratification weights were built to adjust the sample distribution by gender and education of the respondent at admin-1 level, to match the same distribution in the total population. However, an additional step was needed to try to ascertain the food insecurity condition of those with access to phones compared to that of the total population.

    Using FIES data collected by FAO through the GWP between 2014 and 2019, and a variable on access to mobile telephones that was also in the dataset, it was possible to compare the prevalence of food insecurity at moderate or severe level, and severe level only, of respondents with access to a mobile phone to that of the total population at national level.

  18. A

    Automotive Simulation Software Market Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Dec 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2024). Automotive Simulation Software Market Report [Dataset]. https://www.archivemarketresearch.com/reports/automotive-simulation-software-market-4932
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Dec 8, 2024
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    global
    Variables measured
    Market Size
    Description

    The size of the Automotive Simulation Software Market market was valued at USD 6,034.3 million in 2023 and is projected to reach USD 15,873.9 million by 2032, with an expected CAGR of 14.9 % during the forecast period. Automotive Simulation Software Market refers to technologies and systems that are applied to modeling and modeling of different car characteristics, their creation and examination. Performance, safety, and efficiency can be enhanced in a cost-effective way with this kind of software which helps in building virtual models and actual scenarios. It is used in energy- and performance-related simulations, impact tests, and the investigation of different systems. Areas of application include design, engineering as well as the manufacturing aspects of automobile, as well as for the intended users of the automobile. Some of these are the incorporation of artificial intelligence for analytics based on predictive modeling, consideration and realization of the electric and self-driving cars as well as the consideration of the real-time data processing and analytics for better vehicle performance and safety. Recent developments include: In February 2024, Dassault Systèmes announced a strategic partnership with BMW Group to develop BMW's future engineering platform, utilizing Dassault Systèmes' 3DEXPERIENCE platform as its core. This collaboration involves over 17,000 BMW employees globally working on a virtual twin of a vehicle, allowing real-time configuration for different model variants. The partnership represents the next phase in their longstanding collaboration, leveraging digital innovation to streamline engineering processes and enhance the development of personalized and sustainable automotive experiences for BMW customers , In January 2024, ANSYS, Inc. announced that its AVxcelerate Sensors will be integrated into NVIDIA DRIVE Sim, a scenario-based autonomous vehicle (AV) simulator powered by NVIDIA Omniverse. This collaboration aimed to enhance the development and validation of AV perception systems, incorporating Ansys' physics solvers for camera, lidar, radar, and thermal camera sensors. The integration enables users to access high-fidelity sensor simulation outputs for training and validating perception ADAS/AV systems in a controlled virtual environment, addressing the challenges of testing and validating sensor suites and software in real-world driving scenarios .

  19. g

    Public portrait of the uses of AI in the Quebec public administration

    • gimi9.com
    • open.canada.ca
    Updated Mar 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Public portrait of the uses of AI in the Quebec public administration [Dataset]. https://gimi9.com/dataset/ca_1c9131d2-ab2a-4513-9a46-945b2bc96e1b
    Explore at:
    Dataset updated
    Mar 1, 2025
    Area covered
    Quebec
    Description

    In a perspective of transparency and to ensure the coherence of government action in the field of artificial intelligence (AI), the Minister of Cybersecurity and Digital Technology issued, on February 28, 2024, decree number 2024-01 concerning information resource requirements with regard to the use of artificial intelligence by public organizations. This new obligation allowed the Ministry of Cybersecurity and Digital Affairs (MCN) to document the use cases of AI in public administration and to establish a portrait of them. The information contained in the file therefore comes from a collection carried out from public bodies. The portrait shows all the initiatives that are currently under development or in production within public organizations. However, for obvious security reasons, cybersecurity initiatives are excluded. For the same reason, the commercial name of solutions and solution providers has been changed to a generic name. The data contains the name of the IT asset, project or AI initiative, its category, the responsible public body, the ministerial portfolio to which the organization is attached, the associated benefits and its status. The categories are: * Decision Support, Planning, and/or Prediction: AI system used primarily as a tool to support decision, planning, or to make predictions. * Behavior/Feeling Analysis: AI system used primarily to do behavior analysis or for the analysis of feelings. * Assistant/Conversational Agent: AI system using natural language communication and primarily used to assist with specific tasks or to maintain a conversation. with a user* Automation: AI system used primarily for the automation of defined and targeted tasks and/or processes. * User experience - Personalization: AI system used primarily for improving the user experience on platforms or websites (example: facilitating searches, customizing content, etc.). * AI-assisted training and learning, etc.). * AI-assisted training and learning: AI system used primarily for training and learning assistance. * Geomatic and geospatial management: System of AI used for forest, geological, cartographic and geomatics management. * Laboratory and equipment: Initiative or project aimed at the establishment of an experimental laboratory and/or the acquisition of computer equipment dedicated to the development of AI. * Connected electronic system and ambient intelligence: AI system integrated into initiatives or projects aimed at the implementation of an intelligent electronic system, ambient intelligence or connected objects. * Image processing: AI system used mainly for treatment and the image analysis. The statuses are: * Solution development: This status includes projects or initiatives that are in the development/acquisition stage or in the testing and validation phase of the development cycle of an AI system. * Solution in production: This status includes projects or initiatives that are in the deployment/integration stage or in the maintenance and support of the development cycle of an AI system. * Information unavailable: This status indicates that the information could not be confirmed by the body responsible for the project or initiative.

  20. H

    2017 Africa-wide Breeding Task Force Trials for Rainfed Lowland

    • dataverse.harvard.edu
    Updated Aug 23, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramaiah Venuprasad (2019). 2017 Africa-wide Breeding Task Force Trials for Rainfed Lowland [Dataset]. http://doi.org/10.7910/DVN/VJZFT6
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 23, 2019
    Dataset provided by
    Harvard Dataverse
    Authors
    Ramaiah Venuprasad
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    Jan 1, 2016 - Dec 31, 2016
    Description

    Objective and Requirement of MET by Breeding Task Force: In 2010, Rice Breeding Task Force in Africa was established aimed at accelerating the breeding process, especially in the later phase of evaluation of promising breeding lines in multi-locations throughout Africa. Breeding lines included in the multi-environment trials (MET) are developed by various institutions such as IRRI, CIAT, NARS in Africa and AfricaRice. The objective of the MET is to identify lines that are suitable for cultivation under an ecological environment in the target region in Africa. To reduce noise and to acquire real genetic differences among test lines through MET, the following conditions must be met for the MET: 1. Experimental field is uniform in soil fertility; 2. Experimental field is very well leveled before seeding or transplanting, to facilitate uniform water and fertilizer management; 3. Fertilizer is evenly applied to every plot, and ideally every plant within a plot; 4. Weed control, either by applying herbicide or hand weeding, is carried out uniformly across the whole trial, and finished in one day per time; 5. Any other operations, whatever necessary for a trial, are “uniformly done across the whole trial”. The MET serves as a part of a national testing program. In other words, the MET conducted by BTF is integrated into the corresponding national testing system. Data collected from the MET will be recognized and used by the Varietal Testing and Release Committee of a country where the MET is conducted. This is a measure to shorten breeding cycle and to increase genetic gains. Structure of varietal evaluation series has been changed since the season of 2017. Explanations of the changes 1. The number of entries is reduced to ensure better conduct of the trials and higher quality data. 1.1 For each sub-region (WCA and ESA), a maximum of 29 new entries per production system (irrigated lowland, rainfed lowland, rainfed upland, mangrove and high-elevation) is considered to enter the BTF (Phase I, former MET) each year. 1.2 After evaluation and data analysis, a maximum of 10 entries is selected for further evaluation (Phase II, former PET). 1.3 After evaluation and data analysis, a maximum of 3 entries is selected for further evaluation (Phase III, former PAT). 2. Materials from Phase I, II and III are combined into a single trial (up to 42 test entries every year). 3. Participatory varietal selection (PVS) may be conducted in the single trial (one replication may be chosen for the PVS). 4. The single trial is conducted in three different locations in each participating country and in priority in the Hubs. 5. The Farmers Adoption Trial (FAT) and Validation AfricaRice Trial (VAT) is removed from the BTF scheme. Where required, further testing of selected entries in farmers’ fields will be ideally handled by NARS with the assistance of Rice Agronomy Task Force.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai (2022). Time to Update the Split-Sample Approach in Hydrological Model Calibration v1.0 [Dataset]. http://doi.org/10.5281/zenodo.5915374
Organization logo

Time to Update the Split-Sample Approach in Hydrological Model Calibration v1.0

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zipAvailable download formats
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Time to Update the Split-Sample Approach in Hydrological Model Calibration

Hongren Shen1, Bryan A. Tolson1, Juliane Mai1

1Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Ontario, Canada

Corresponding author: Hongren Shen (hongren.shen@uwaterloo.ca)

Abstract

Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly-used split-sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines. This large-sample SST assessment study empirically assesses how different data splitting methods influence post-validation model testing period performance, thereby identifying optimal data splitting methods under different conditions. This study investigates the performance of two lumped conceptual hydrological models calibrated and tested in 463 catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length and data recentness of the continuous calibration sub-periods (CSPs). A full-period CSP is also included in the experiment, which skips model validation. The assessment approach is novel in multiple ways including how model building decisions are framed as a decision tree problem and viewing the model building process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period. Results span different climate and catchment conditions across a 35-year period with available data, making conclusions quite generalizable. Calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split-sample decision. Experimental findings remain consistent no matter how model building factors (i.e., catchments, model types, data availability, and testing periods) are varied. Results strongly support revising the traditional split-sample approach in hydrological modeling.

Data description

This data was used in the paper entitled "Time to Update the Split-Sample Approach in Hydrological Model Calibration" by Shen et al. (2022).

Catchment, meteorological forcing and streamflow data are provided for hydrological modeling use. Specifically, the forcing and streamflow data are archived in the Raven hydrological modeling required format. The GR4J and HMETS model building results in the paper, i.e., reference KGE and KGE metrics in calibration, validation and testing periods, are provided for replication of the split-sample assessment performed in the paper.

Data content

The data folder contains a gauge info file (CAMELS_463_gauge_info.txt), which reports basic information of each catchment, and 463 subfolders, each having four files for a catchment, including:

(1) Raven_Daymet_forcing.rvt, which contains Daymet meteorological forcing (i.e., daily precipitation in mm/d, minimum and maximum air temperature in deg_C, shortwave in MJ/m2/day, and day length in day) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

(2) Raven_USGS_streamflow.rvt, which contains daily discharge data (in m3/s) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

(3) GR4J_metrics.txt, which contains reference KGE and GR4J-based KGE metrics in calibration, validation and testing periods.

(4) HMETS_metrics.txt, which contains reference KGE and HMETS-based KGE metrics in calibration, validation and testing periods.

Data collection and processing methods

Data source

  • Catchment information and the Daymet meteorological forcing are retrieved from the CAMELS data set, which can be found here.
  • The USGS streamflow data are collected from the U.S. Geological Survey's (USGS) National Water Information System (NWIS), which can be found here.
  • The GR4J and HMETS performance metrics (i.e., reference KGE and KGE) are produced in the study by Shen et al. (2022).

Forcing data processing

  • A quality assessment procedure was performed. For example, daily maximum air temperature should be larger than the daily minimum air temperature; otherwise, these two values will be swapped.
  • Units are converted to Raven-required ones. Precipitation: mm/day, unchanged; daily minimum/maximum air temperature: deg_C, unchanged; shortwave: W/m2 to MJ/m2/day; day length: seconds to days.
  • Data for a catchment is archived in a RVT (ASCII-based) file, in which the second line specifies the start time of the forcing series, the time step (= 1 day), and the total time steps in the series (= 12784), respectively; the third and the fourth lines specify the forcing variables and their corresponding units, respectively.
  • More details of Raven formatted forcing files can be found in the Raven manual (here).

Streamflow data processing

  • Units are converted to Raven-required ones. Daily discharge originally in cfs is converted to m3/s.
  • Missing data are replaced with -1.2345 as Raven requires. Those missing time steps will not be counted in performance metrics calculation.
  • Streamflow series is archived in a RVT (ASCII-based) file, which is open with eight commented lines specifying relevant gauge and streamflow data information, such as gauge name, gauge ID, USGS reported catchment area, calculated catchment area (based on the catchment shapefiles in CAMELS dataset), streamflow data range, data time step, and missing data periods. The first line after the commented lines in the streamflow RVT files specifies data type (default is HYDROGRAPH), subbasin ID (i.e., SubID), and discharge unit (m3/s), respectively. And the next line specifies the start of the streamflow data, time step (=1 day), and the total time steps in the series(= 12784), respectively.

GR4J and HMETS metrics

The GR4J and HMETS metrics files consists of reference KGE and KGE in model calibration, validation, and testing periods, which are derived in the massive split-sample test experiment performed in the paper.

  • Columns in these metrics files are gauge ID, calibration sub-period (CSP) identifier, KGE in calibration, validation, testing1, testing2, and testing3, respectively.
  • We proposed 50 different CSPs in the experiment. "CSP_identifier" is a unique name of each CSP. e.g., CSP identifier "CSP-3A_1990" stands for the model is built in Jan 1st 1990, calibrated in the first 3-year sample (1981-1983), calibrated in the rest years during the period of 1980 to 1989. Note that 1980 is always used for spin-up.
  • We defined three testing periods (independent to calibration and validation periods) for each CSP, which are the first 3 years from model build year inclusive, the first 5 years from model build year inclusive, and the full years from model build year inclusive. e.g., "testing1", "testing2", and "testing3" for CSP-3A_1990 are 1990-1992, 1990-1994, and 1990-2014, respectively.
  • Reference flow is the interannual mean daily flow based on a specific period, which is derived for a one-year period and then repeated in each year in the calculation period.
    • For calibration, its reference flow is based on spin-up + calibration periods.
    • For validation, its reference flow is based on spin-up + calibration periods.
    • For testing, its reference flow is based on spin-up +calibration + validation periods.
  • Reference KGE is calculated based on the reference flow and observed streamflow in a specific calculation period (e.g., calibration). Reference KGE is computed using the KGE equation with substituting the "simulated" flow for "reference" flow in the period for calculation. Note that the reference KGEs for the three different testing periods corresponds to the same historical period, but are different, because each testing period spans in a different time period and covers different series of observed flow.

More details of the split-sample test experiment and modeling results analysis can be referred to the paper by Shen et al. (2022).

Citation

Journal Publication

This study:

Shen, H., Tolson, B. A., & Mai, J.(2022). Time to update the split-sample approach in hydrological model calibration. Water Resources Research, 58, e2021WR031523. https://doi.org/10.1029/2021WR031523

Original CAMELS dataset:

A. J. Newman, M. P. Clark, K. Sampson, A. Wood, L. E. Hay, A. Bock, R. J. Viger, D. Blodgett, L. Brekke, J. R. Arnold, T. Hopson, and Q. Duan (2015). Development of a large-sample watershed-scale hydrometeorological dataset for the contiguous USA: dataset characteristics and assessment of regional variability in hydrologic model performance. Hydrol. Earth Syst. Sci., 19, 209-223, http://doi.org/10.5194/hess-19-209-2015

Data Publication

This study:

H. Shen, B.

Search
Clear search
Close search
Google apps
Main menu