100+ datasets found

d
Data from: Development of a Mobile Robot Test Platform and Methods for...
catalog.data.gov
data.nasa.gov
+1more
Updated Dec 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Development of a Mobile Robot Test Platform and Methods for Validation of Prognostics-Enabled Decision Making Algorithms [Dataset]. https://catalog.data.gov/dataset/development-of-a-mobile-robot-test-platform-and-methods-for-validation-of-prognostics-enab
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Dashlink
Description
As fault diagnosis and prognosis systems in aerospace applications become more capable, the ability to utilize information supplied by them becomes increasingly important. While certain types of vehicle health data can be effectively processed and acted upon by crew or support personnel, others, due to their complexity or time constraints, require either automated or semi-automated reasoning. Prognostics-enabled Decision Making (PDM) is an emerging research area that aims to integrate prognostic health information and knowledge about the future operating conditions into the process of selecting subsequent actions for the system. The newly developed PDM algorithms require suitable software and hardware platforms for testing under realistic fault scenarios. The paper describes the development of such a platform, based on the K11 planetary rover prototype. A variety of injectable fault modes are being investigated for electrical, mechanical, and power subsystems of the testbed, along with methods for data collection and processing. In addition to the hardware platform, a software simulator with matching capabilities has been developed. The simulator allows for prototyping and initial validation of the algorithms prior to their deployment on the K11. The simulator is also available to the PDM algorithms to assist with the reasoning process. A reference set of diagnostic, prognostic, and decision making algorithms is also described, followed by an overview of the current test scenarios and the results of their execution on the simulator.
f
Data from: S1 Dataset -
plos.figshare.com
xlsx
Updated Jul 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Navid Behzadi Koochani; Raúl Muñoz Romo; Ignacio Hernández Palencia; Sergio López Bernal; Carmen Martin Curto; José Cabezas Rodríguez; Almudena Castaño Reguillo (2024). S1 Dataset - [Dataset]. http://doi.org/10.1371/journal.pone.0305699.s002
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0305699.s002
Dataset updated
Jul 18, 2024
Dataset provided by
PLOS ONE
Authors
Navid Behzadi Koochani; Raúl Muñoz Romo; Ignacio Hernández Palencia; Sergio López Bernal; Carmen Martin Curto; José Cabezas Rodríguez; Almudena Castaño Reguillo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionThere is a need to develop harmonized procedures and a Minimum Data Set (MDS) for cross-border Multi Casualty Incidents (MCI) in medical emergency scenarios to ensure appropriate management of such incidents, regardless of place, language and internal processes of the institutions involved. That information should be capable of real-time communication to the command-and-control chain. It is crucial that the models adopted are interoperable between countries so that the rights of patients to cross-border healthcare are fully respected.ObjectiveTo optimize management of cross-border Multi Casualty Incidents through a Minimum Data Set collected and communicated in real time to the chain of command and control for each incident. To determine the degree of agreement among experts.MethodWe used the modified Delphi method supplemented with the Utstein technique to reach consensus among experts. In the first phase, the minimum requirements of the project, the profile of the experts who were to participate, the basic requirements of each variable chosen and the way of collecting the data were defined by providing bibliography on the subject. In the second phase, the preliminary variables were grouped into 6 clusters, the objectives, the characteristics of the variables and the logistics of the work were approved. Several meetings were held to reach a consensus to choose the MDS variables using a Modified Delphi technique. Each expert had to score each variable from 1 to 10. Non-voting variables were eliminated, and the round of voting ended. In the third phase, the Utstein Style was applied to discuss each group of variables and choose the ones with the highest consensus. After several rounds of discussion, it was agreed to eliminate the variables with a score of less than 5 points. In phase four, the researchers submitted the variables to the external experts for final assessment and validation before their use in the simulations. Data were analysed with SPSS Statistics (IBM, version 2) software.ResultsSix data entities with 31 sub-entities were defined, generating 127 items representing the final MDS regarded as essential for incident management. The level of consensus for the choice of items was very high and was highest for the category ‘Incident’ with an overall kappa of 0.7401 (95% CI 0.1265–0.5812, p 0.000), a good level of consensus in the Landis and Koch model. The items with the greatest degree of consensus at ten were those relating to location, type of incident, date, time and identification of the incident. All items met the criteria set, such as digital collection and real-time transmission to the chain of command and control.ConclusionsThis study documents the development of a MDS through consensus with a high degree of agreement among a group of experts of different nationalities working in different fields. All items in the MDS were digitally collected and forwarded in real time to the chain of command and control. This tool has demonstrated its validity in four large cross-border simulations involving more than eight countries and their emergency services.
g
ASSIST Dominican Republic Data Validation
gimi9.com
catalog.data.gov
Updated Nov 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). ASSIST Dominican Republic Data Validation [Dataset]. https://www.gimi9.com/dataset/data-gov_assist-dominican-republic-data-validation/
Explore at:
Dataset updated
Nov 1, 2020
Area covered
Dominican Republic
Description
Three indicators reported by the ASSIST project for its Zika activities in the Dominican Republic were evaluated using the following approaches : 1) external Evaluators re-assessed the same patient’s records that were originally reviewed by facility quality improvement teams ; 2) external evaluators selected a new systematic random sample of records; and 3) external evaluators tallied totals for the indicators of interest from facility registers to determine differences between indicator values reported by the USAID ASSIST Project and the values for the universe of clients seen at these facilities.
d
Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing,...
dataone.org
Updated Sep 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege (2024). Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis (FDPVA) [Dataset]. https://dataone.org/datasets/sha256%3A77e8a13f897db6e70537f800ceea9a2c8228a4eb1b33289b7e2c056845298a12
Explore at:
Dataset updated
Sep 24, 2024
Dataset provided by
Harvard Dataverse
Authors
S.M. Gonzalez de Vicente, D. Mazon, M. Xu, S. Pinches, M. Churchill, A. Dinklage, R. Fischer, A. Murari, P. Rodriguez-Fernandez, J. Stillerman, J. Vega, G. Verdoolaege
Description
The objective of the fourth Technical Meeting on Fusion Data Processing, Validation and Analysis was to provide a platform during which a set of topics relevant to fusion data processing, validation and analysis are discussed with the view of extrapolating needs to next step fusion devices such as ITER. The validation and analysis of experimental data obtained from diagnostics used to characterize fusion plasmas are crucial for a knowledge-based understanding of the physical processes governing the dynamics of these plasmas. This paper presents the recent progress and achievements in the domain of plasma diagnostics and synthetic diagnostics data analysis (including image processing, regression analysis, inverse problems, deep learning, machine learning, big data and physics-based models for control) reported at the meeting. The progress in these areas highlight trends observed in current major fusion confinement devices. A special focus is dedicated on data analysis requirements for ITER and DEMO with a particular attention paid to Artificial Intelligence for automatization and improving reliability of control processes.
Expenditure and Consumption Survey, 2010 - West Bank and Gaza
catalog.ihsn.org
dev.ihsn.org
Updated Mar 29, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Palestinian Central Bureau of Statistics (2019). Expenditure and Consumption Survey, 2010 - West Bank and Gaza [Dataset]. https://catalog.ihsn.org/catalog/3089
Explore at:
Dataset updated
Mar 29, 2019
Dataset authored and provided by
Palestinian Central Bureau of Statisticshttp://pcbs.gov.ps/
Time period covered
2010 - 2011
Area covered
West Bank, Gaza Strip, Gaza
Description
Abstract

The basic goal of this survey is to provide the necessary database for formulating national policies at various levels. It represents the contribution of the household sector to the Gross National Product (GNP). Household Surveys help as well in determining the incidence of poverty, and providing weighted data which reflects the relative importance of the consumption items to be employed in determining the benchmark for rates and prices of items and services. Generally, the Household Expenditure and Consumption Survey is a fundamental cornerstone in the process of studying the nutritional status in the Palestinian territory.

The raw survey data provided by the Statistical Office was cleaned and harmonized by the Economic Research Forum, in the context of a major research project to develop and expand knowledge on equity and inequality in the Arab region. The main focus of the project is to measure the magnitude and direction of change in inequality and to understand the complex contributing social, political and economic forces influencing its levels. However, the measurement and analysis of the magnitude and direction of change in this inequality cannot be consistently carried out without harmonized and comparable micro-level data on income and expenditures. Therefore, one important component of this research project is securing and harmonizing household surveys from as many countries in the region as possible, adhering to international statistics on household living standards distribution. Once the dataset has been compiled, the Economic Research Forum makes it available, subject to confidentiality agreements, to all researchers and institutions concerned with data collection and issues of inequality. Data is a public good, in the interest of the region, and it is consistent with the Economic Research Forum's mandate to make micro data available, aiding regional research on this important topic.

Geographic coverage

The survey data covers urban, rural and camp areas in West Bank and Gaza Strip.

Analysis unit

1- Household/families. 2- Individuals.

Universe

The survey covered all Palestinian households who are usually resident in the Palestinian Territory during 2010.

Kind of data

Sample survey data [ssd]

Sampling procedure

Sample and Frame:

The sampling frame consists of all enumeration areas which were enumerated in 2007, each numeration area consists of buildings and housing units with average of about 120 households in it. These enumeration areas are used as primary sampling units PSUs in the first stage of the sampling selection.

Sample Design:

The sample is a stratified cluster systematic random sample with two stages: First stage: selection of a systematic random sample of 192 enumeration areas. Second stage: selection of a systematic random sample of 24 households from each enumeration area selected in the first stage.

Note: in Jerusalem Governorate (J1), 13 enumeration areas were selected; then in the second phase, a group of households from each enumeration area were chosen using census-2007 method of delineation and enumeration. This method was adopted to ensure household response is to the maximum to comply with the percentage of non-response as set in the sample design.Enumeration areas were distributed to twelve months and the sample for each quarter covers sample strata (Governorate, locality type) Sample strata:

The population was divided by:

1- Governorate 2- Type of Locality (urban, rural, refugee camps)

Sample Size:

The calculated sample size for the Expenditure and Consumption Survey in 2010 is about 3,757 households, 2,574 households in West Bank and 1,183 households in Gaza Strip.

Mode of data collection

Face-to-face [f2f]

Research instrument

The questionnaire consists of two main parts:

First: Survey's questionnaire

Part of the questionnaire is to be filled in during the visit at the beginning of the month, while the other part is to be filled in at the end of the month. The questionnaire includes:

Control sheet: Includes household’s identification data, date of visit, data on the fieldwork and data processing team, and summary of household’s members by gender.

Household roster: Includes demographic, social, and economic characteristics of household’s members.

Housing characteristics: Includes data like type of housing unit, number of rooms, value of rent, and connection of housing unit to basic services like water, electricity and sewage. In addition, data in this section includes source of energy used for cooking and heating, distance of housing unit from transportation, education, and health centers, and sources of income generation like ownership of farm land or animals.

Food and Non-Food Items: includes food and non-food items, and household record her expenditure for one month.

Durable Goods Schedule: Includes list of main goods like washing machine, refrigerator,TV.

Assistances and Poverty: Includes data about cash and in kind assistances (assistance value,assistance source), also collecting data about household situation, and the procedures to cover expenses.

Monthly and annual income: Data pertinent to household’s income from different sources is collected at the end of the registration period.

Second: List of goods

The classification of the list of goods is based on the recommendation of the United Nations for the SNA under the name Classification of Personal Consumption by purpose. The list includes 55 groups of expenditure and consumption where each is given a sequence number based on its importance to the household starting with food goods, clothing groups, housing, medical treatment, transportation and communication, and lastly durable goods. Each group consists of important goods. The total number of goods in all groups amounted to 667 items for goods and services. Groups from 1-21 includes goods pertinent to food, drinks and cigarettes. Group 22 includes goods that are home produced and consumed by the household. The groups 23-45 include all items except food, drinks and cigarettes. The groups 50-55 include durable goods. The data is collected based on different reference periods to represent expenditure during the whole year except for cars where data is collected for the last three years.

Registration form

The registration form includes instructions and examples on how to record consumption and expenditure items. The form includes columns: 1.Monetary: If the good is purchased, or in kind: if the item is self produced. 2.Title of the service of the good 3.Unit of measurement (kilogram, liter, number) 4. Quantity 5. Value

The pages of the registration form are colored differently for the weeks of the month. The footer for each page includes remarks that encourage households to participate in the survey. The following are instructions that illustrate the nature of the items that should be recorded: 1. Monetary expenditures during purchases 2. Purchases based on debts 3.Monetary gifts once presented 4. Interest at pay 5. Self produced food and goods once consumed 6. Food and merchandise from commercial project once consumed 7. Merchandises once received as a wage or part of a wage from the employer.

Cleaning operations

Raw Data

Data editing took place through a number of stages, including: 1. Office editing and coding 2. Data entry 3. Structure checking and completeness 4. Structural checking of SPSS data files

Harmonized Data

The Statistical Package for Social Science (SPSS) is used to clean and harmonize the datasets.

The harmonization process starts with cleaning all raw data files received from the Statistical Office.

Cleaned data files are then all merged to produce one data file on the individual level containing all variables subject to harmonization.

A country-specific program is generated for each dataset to generate/compute/recode/rename/format/label harmonized variables.

A post-harmonization cleaning process is run on the data.

Harmonized data is saved on the household as well as the individual level, in SPSS and converted to STATA format.

Response rate

The survey sample consisted of 4,767 households, which includes 4,608 households of the original sample plus 159 households as an additional sample. A total of 3,757 households completed the interview: 2,574 households from the West Bank and 1,183 households in the Gaza Strip. Weights were modified to account for the non-response rate. The response rate in the Palestinian Territory 28.1% (82.4% in the West Bank was and 81.6% in Gaza Strip).

Sampling error estimates

The impact of errors on data quality was reduced to a minimum due to the high efficiency and outstanding selection, training, and performance of the fieldworkers. Procedures adopted during the fieldwork of the survey were considered a necessity to ensure the collection of accurate data, notably: 1) Develop schedules to conduct field visits to households during survey fieldwork. The objectives of the visits and the data collected on each visit were predetermined. 2) Fieldwork editing rules were applied during the data collection to ensure corrections were implemented before the end of fieldwork activities. 3) Fieldworkers were instructed to provide details in cases of extreme expenditure or consumption by the household. 4) Questions on income were postponed until the final visit at the end of the month. 5) Validation rules were embedded in the data processing systems, along with procedures to verify data entry and data edit.
Z
Data from: Synthetic Smart Card Data for the Analysis of Temporal and...
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Bouman (2020). Synthetic Smart Card Data for the Analysis of Temporal and Spatial Patterns [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_776718
Explore at:
Dataset updated
Jan 24, 2020
Dataset authored and provided by
Paul Bouman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is a synthetic smart card data set that can be used to test pattern detection methods for the extraction of temporal and spatial data. The data set is tab seperated and based on a stylized travel pattern description for city of Utrecht in The Netherlands and is developed and used in Chapter 6 of the PhD Thesis of Paul Bouman.

This dataset contains the following files:

journeys.tsv : the actual data set of synthetic smart card data

utrecht.xml : the activity pattern definition that was used to randomly generate the synthethic smart card data

validate.ref : a file derived from the activity pattern definition that can be used for validation purposes. It specifies which activity types occur at each location in the smart card data set.
U
Forage Fish Aerial Validation Data from Prince William Sound, Alaska
data.usgs.gov
s.cnmilf.com
+1more
Updated Mar 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Forage Fish Aerial Validation Data from Prince William Sound, Alaska [Dataset]. https://data.usgs.gov/datacatalog/data/USGS:ASC523
Explore at:
Unique identifier
https://doi.org/10.5066/F74J0C9Z
Dataset updated
Mar 4, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Mayumi Arimitsu; John Piatt; Brielle Heflin; Caitlin Marsteller
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
2014 - 2022
Area covered
Prince William Sound, Alaska
Description
These data are part of the Gulf Watch Alaska (GWA) long term monitoring program, pelagic monitoring component. This dataset consists of one table, providing fish aerial validation data from summer surveys in Prince William Sound, Alaska. Data includes: date, location, latitude, longitude, aerial ID, validation ID, total length and validation method. Various catch methods were used to obtain fish samples for aerial validations, including:cast net, go pro, hydroacoustics, jig, dip net, gill net, purse seine, photo and visual identification.
H
Data Repository for 'Bootstrap aggregation and cross-validation methods to...
beta.hydroshare.org
hydroshare.org
zip
Updated Jun 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zachary Paul Brodeur; Scott S. Steinschneider; Jonathan D. Herman (2020). Data Repository for 'Bootstrap aggregation and cross-validation methods to reduce overfitting in reservoir control policy search' [Dataset]. http://doi.org/10.4211/hs.b8f87a7b680d44cebfb4b3f4f4a6a447
Explore at:
zip(8.3 MB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.b8f87a7b680d44cebfb4b3f4f4a6a447
Dataset updated
Jun 24, 2020
Dataset provided by
HydroShare
Authors
Zachary Paul Brodeur; Scott S. Steinschneider; Jonathan D. Herman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 1922 - Sep 30, 2016
Area covered

Description
Policy search methods provide a heuristic mapping between observations and decisions and have been widely used in reservoir control studies. However, recent studies have observed a tendency for policy search methods to overfit to the hydrologic data used in training, particularly the sequence of flood and drought events. This technical note develops an extension of bootstrap aggregation (bagging) and cross-validation techniques, inspired by the machine learning literature, to improve control policy performance on out-of-sample hydrology. We explore these methods using a case study of Folsom Reservoir, California using control policies structured as binary trees and daily streamflow resampling based on the paleo-inflow record. Results show that calibration-validation strategies for policy selection and certain ensemble aggregation methods can improve out-of-sample tradeoffs between water supply and flood risk objectives over baseline performance given fixed computational costs. These results highlight the potential to improve policy search methodologies by leveraging well-established model training strategies from machine learning.
Z
Data for training, validation and testing of methods in the thesis:...
data.niaid.nih.gov
zenodo.org
Updated May 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lucia Hajduková (2021). Data for training, validation and testing of methods in the thesis: Camera-based Accuracy Improvement of Indoor Localization [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4730337
Explore at:
Dataset updated
May 1, 2021
Dataset authored and provided by
Lucia Hajduková
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The package contains files for two modules designed to improve the accuracy of the indoor positioning system, namely the following:

door detection

videos_test - videos used to demonstrate the application of door detector

videos_res - videos from videos_test directory with detected doors marked

parts detection

frames_train_val - images generated from videos used for training and validation of VGG16 neural network model

frames_test - images generated from videos used for testing of the trained model

videos_test - videos used to demonstrate the application of parts detector

videos_res - videos from videos_test directory with detected parts marked
d
Data from: Sensor Validation using Bayesian Networks
catalog.data.gov
data.nasa.gov
+1more
Updated Dec 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Sensor Validation using Bayesian Networks [Dataset]. https://catalog.data.gov/dataset/sensor-validation-using-bayesian-networks
Explore at:
Dataset updated
Dec 6, 2023
Dataset provided by
Dashlink
Description
One of NASA’s key mission requirements is robust state estimation. Sensing, using a wide range of sensors and sensor fusion approaches, plays a central role in robust state estimation, and there is a need to diagnose sensor failure as well as component failure. Sensor validation techniques address this problem: given a vector of sensor readings, decide whether sensors have failed, therefore producing bad data. We take in this paper a probabilistic approach, using Bayesian networks, to diagnosis and sensor validation, and investigate several relevant but slightly different Bayesian network queries. We emphasize that on-board inference can be performed on a compiled model, giving fast and predictable execution times. Our results are illustrated using an electrical power system, and we show that a Bayesian network with over 400 nodes can be compiled into an arithmetic circuit that can correctly answer queries in less than 500 microseconds on average. Reference: O. J. Mengshoel, A. Darwiche, and S. Uckun, "Sensor Validation using Bayesian Networks." In Proc. of the 9th International Symposium on Artificial Intelligence, Robotics, and Automation in Space (iSAIRAS-08), Los Angeles, CA, 2008. BibTex Reference: @inproceedings{mengshoel08sensor, author = {Mengshoel, O. J. and Darwiche, A. and Uckun, S.}, title = {Sensor Validation using {Bayesian} Networks}, booktitle = {Proceedings of the 9th International Symposium on Artificial Intelligence, Robotics, and Automation in Space (iSAIRAS-08)}, year = {2008} }
Data from: Validation of Methods to Assess the Immunoglobulin Gene...
data.staging.idas-ds1.appdat.jsc.nasa.gov
gimi9.com
+2more
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.staging.idas-ds1.appdat.jsc.nasa.gov (2025). Validation of Methods to Assess the Immunoglobulin Gene Repertoire in Tissues Obtained from Mice on the International Space Station [Dataset]. https://data.staging.idas-ds1.appdat.jsc.nasa.gov/dataset/validation-of-methods-to-assess-the-immunoglobulin-gene-repertoire-in-tissues-obtained-fro
Explore at:
Dataset updated
Feb 19, 2025
Dataset provided by
NASAhttp://nasa.gov/
Description
Spaceflight is known to affect immune cell populations. In particular splenic B-cell numbers decrease during spaceflight and in ground-based physiological models. Although antibody isotype changes have been assessed during and after spaceflight an extensive characterization of the impact of spaceflight on antibody composition has not been conducted in mice. Next Generation Sequencing and bioinformatic tools are now available to assess antibody repertoires. We can now identify immunoglobulin gene- segment usage junctional regions and modifications that contribute to specificity and diversity. Due to limitations on the International Space Station alternate sample collection and storage methods must be employed. Our group compared Illumina MiSeq xc2 xae sequencing data from multiple sample preparation methods in normal C57Bl/6J mice to validate that sample preparation and storage would not bias the outcome of antibody repertoire characterization. In this report we also compared sequencing techniques and a bioinformatic workflow on the data output when we assessed the IgH and Ig xce xba variable gene usage. Our bioinformatic workflow has been optimized for Illumina HiSeq xc2 xae and MiSeq xc2 xae datasets and is designed specifically to reduce bias capture the most information from Ig sequences and produce a data set that provides other data mining options.
f
Chi-square test result of the dimensional error count by institution.
plos.figshare.com
xls
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ki-Hoon Kim; Seol Whan Oh; Soo Jeong Ko; Kang Hyuck Lee; Wona Choi; In Young Choi (2023). Chi-square test result of the dimensional error count by institution. [Dataset]. http://doi.org/10.1371/journal.pone.0294554.t005
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0294554.t005
Dataset updated
Nov 20, 2023
Dataset provided by
PLOS ONE
Authors
Ki-Hoon Kim; Seol Whan Oh; Soo Jeong Ko; Kang Hyuck Lee; Wona Choi; In Young Choi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Chi-square test result of the dimensional error count by institution.
D
Icing Validation Database
dataverse.no
dataverse.azure.uit.no
tsv, txt, xlsx
Updated Sep 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Richard Hann; Richard Hann; Nicolas Müller; Nicolas Müller (2023). Icing Validation Database [Dataset]. http://doi.org/10.18710/5XYALW
Explore at:
txt(2378), xlsx(35849), tsv(35777)Available download formats
Unique identifier
https://doi.org/10.18710/5XYALW
Dataset updated
Sep 28, 2023
Dataset provided by
DataverseNO
Authors
Richard Hann; Richard Hann; Nicolas Müller; Nicolas Müller
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
1983 - 2019
Description
This database contains an overview of experimental datasets that can be used for the validation of ice prediction simulation methods. This database was generated for the 1st AIAA Ice Prediction Workshop, scheduled for 2021. The database contains entries on 71 experimental datasets in the literature. For each entry, a series of parameters have been identified, including the investigated geometries, Reynolds numbers, Mach numbers, icing envelopes.
Time to Update the Split-Sample Approach in Hydrological Model Calibration...
zenodo.org
zip
Updated May 31, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai (2022). Time to Update the Split-Sample Approach in Hydrological Model Calibration v1.0 [Dataset]. http://doi.org/10.5281/zenodo.5915374
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5915374
Dataset updated
May 31, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hongren Shen; Hongren Shen; Bryan A. Tolson; Bryan A. Tolson; Juliane Mai; Juliane Mai
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Time to Update the Split-Sample Approach in Hydrological Model Calibration

Hongren Shen¹, Bryan A. Tolson¹, Juliane Mai¹

¹Department of Civil and Environmental Engineering, University of Waterloo, Waterloo, Ontario, Canada

Corresponding author: Hongren Shen (hongren.shen@uwaterloo.ca)

Abstract

Model calibration and validation are critical in hydrological model robustness assessment. Unfortunately, the commonly-used split-sample test (SST) framework for data splitting requires modelers to make subjective decisions without clear guidelines. This large-sample SST assessment study empirically assesses how different data splitting methods influence post-validation model testing period performance, thereby identifying optimal data splitting methods under different conditions. This study investigates the performance of two lumped conceptual hydrological models calibrated and tested in 463 catchments across the United States using 50 different data splitting schemes. These schemes are established regarding the data availability, length and data recentness of the continuous calibration sub-periods (CSPs). A full-period CSP is also included in the experiment, which skips model validation. The assessment approach is novel in multiple ways including how model building decisions are framed as a decision tree problem and viewing the model building process as a formal testing period classification problem, aiming to accurately predict model success/failure in the testing period. Results span different climate and catchment conditions across a 35-year period with available data, making conclusions quite generalizable. Calibrating to older data and then validating models on newer data produces inferior model testing period performance in every single analysis conducted and should be avoided. Calibrating to the full available data and skipping model validation entirely is the most robust split-sample decision. Experimental findings remain consistent no matter how model building factors (i.e., catchments, model types, data availability, and testing periods) are varied. Results strongly support revising the traditional split-sample approach in hydrological modeling.

Data description

This data was used in the paper entitled "Time to Update the Split-Sample Approach in Hydrological Model Calibration" by Shen et al. (2022).

Catchment, meteorological forcing and streamflow data are provided for hydrological modeling use. Specifically, the forcing and streamflow data are archived in the Raven hydrological modeling required format. The GR4J and HMETS model building results in the paper, i.e., reference KGE and KGE metrics in calibration, validation and testing periods, are provided for replication of the split-sample assessment performed in the paper.

Data content

The data folder contains a gauge info file (CAMELS_463_gauge_info.txt), which reports basic information of each catchment, and 463 subfolders, each having four files for a catchment, including:

(1) Raven_Daymet_forcing.rvt, which contains Daymet meteorological forcing (i.e., daily precipitation in mm/d, minimum and maximum air temperature in deg_C, shortwave in MJ/m2/day, and day length in day) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

(2) Raven_USGS_streamflow.rvt, which contains daily discharge data (in m3/s) from Jan 1st 1980 to Dec 31 2014 in a Raven hydrological modeling required format.

(3) GR4J_metrics.txt, which contains reference KGE and GR4J-based KGE metrics in calibration, validation and testing periods.

(4) HMETS_metrics.txt, which contains reference KGE and HMETS-based KGE metrics in calibration, validation and testing periods.

Data collection and processing methods

Data source

Catchment information and the Daymet meteorological forcing are retrieved from the CAMELS data set, which can be found here.

The USGS streamflow data are collected from the U.S. Geological Survey's (USGS) National Water Information System (NWIS), which can be found here.

The GR4J and HMETS performance metrics (i.e., reference KGE and KGE) are produced in the study by Shen et al. (2022).

Forcing data processing

A quality assessment procedure was performed. For example, daily maximum air temperature should be larger than the daily minimum air temperature; otherwise, these two values will be swapped.

Units are converted to Raven-required ones. Precipitation: mm/day, unchanged; daily minimum/maximum air temperature: deg_C, unchanged; shortwave: W/m2 to MJ/m2/day; day length: seconds to days.

Data for a catchment is archived in a RVT (ASCII-based) file, in which the second line specifies the start time of the forcing series, the time step (= 1 day), and the total time steps in the series (= 12784), respectively; the third and the fourth lines specify the forcing variables and their corresponding units, respectively.

More details of Raven formatted forcing files can be found in the Raven manual (here).

Streamflow data processing

Units are converted to Raven-required ones. Daily discharge originally in cfs is converted to m3/s.

Missing data are replaced with -1.2345 as Raven requires. Those missing time steps will not be counted in performance metrics calculation.

Streamflow series is archived in a RVT (ASCII-based) file, which is open with eight commented lines specifying relevant gauge and streamflow data information, such as gauge name, gauge ID, USGS reported catchment area, calculated catchment area (based on the catchment shapefiles in CAMELS dataset), streamflow data range, data time step, and missing data periods. The first line after the commented lines in the streamflow RVT files specifies data type (default is HYDROGRAPH), subbasin ID (i.e., SubID), and discharge unit (m3/s), respectively. And the next line specifies the start of the streamflow data, time step (=1 day), and the total time steps in the series(= 12784), respectively.

GR4J and HMETS metrics

The GR4J and HMETS metrics files consists of reference KGE and KGE in model calibration, validation, and testing periods, which are derived in the massive split-sample test experiment performed in the paper.

Columns in these metrics files are gauge ID, calibration sub-period (CSP) identifier, KGE in calibration, validation, testing1, testing2, and testing3, respectively.

We proposed 50 different CSPs in the experiment. "CSP_identifier" is a unique name of each CSP. e.g., CSP identifier "CSP-3A_1990" stands for the model is built in Jan 1st 1990, calibrated in the first 3-year sample (1981-1983), calibrated in the rest years during the period of 1980 to 1989. Note that 1980 is always used for spin-up.

We defined three testing periods (independent to calibration and validation periods) for each CSP, which are the first 3 years from model build year inclusive, the first 5 years from model build year inclusive, and the full years from model build year inclusive. e.g., "testing1", "testing2", and "testing3" for CSP-3A_1990 are 1990-1992, 1990-1994, and 1990-2014, respectively.

Reference flow is the interannual mean daily flow based on a specific period, which is derived for a one-year period and then repeated in each year in the calculation period.

For calibration, its reference flow is based on spin-up + calibration periods.

For validation, its reference flow is based on spin-up + calibration periods.

For testing, its reference flow is based on spin-up +calibration + validation periods.

Reference KGE is calculated based on the reference flow and observed streamflow in a specific calculation period (e.g., calibration). Reference KGE is computed using the KGE equation with substituting the "simulated" flow for "reference" flow in the period for calculation. Note that the reference KGEs for the three different testing periods corresponds to the same historical period, but are different, because each testing period spans in a different time period and covers different series of observed flow.

More details of the split-sample test experiment and modeling results analysis can be referred to the paper by Shen et al. (2022).

Citation

Journal Publication

This study:

Shen, H., Tolson, B. A., & Mai, J.(2022). Time to update the split-sample approach in hydrological model calibration. Water Resources Research, 58, e2021WR031523. https://doi.org/10.1029/2021WR031523

Original CAMELS dataset:

A. J. Newman, M. P. Clark, K. Sampson, A. Wood, L. E. Hay, A. Bock, R. J. Viger, D. Blodgett, L. Brekke, J. R. Arnold, T. Hopson, and Q. Duan (2015). Development of a large-sample watershed-scale hydrometeorological dataset for the contiguous USA: dataset characteristics and assessment of regional variability in hydrologic model performance. Hydrol. Earth Syst. Sci., 19, 209-223, http://doi.org/10.5194/hess-19-209-2015

Data Publication

This study:

H. Shen, B.
Resources for the article 'Reproducible neural network simulations:...
doi.gin.g-node.org
Updated Oct 24, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robin Gutzen; Michael von Papen; Guido Trensch; Pietro Quaglio; Sonja Grün; Michael Denker (2018). Resources for the article 'Reproducible neural network simulations: statistical methods for model validation on the level of network activity data' [Dataset]. http://doi.org/10.12751/g-node.85d46c
Explore at:
Unique identifier
https://doi.org/10.12751/g-node.85d46c
Dataset updated
Oct 24, 2018
Dataset provided by
Forschungszentrum Jülichhttp://www.fz-juelich.de/
Authors
Robin Gutzen; Michael von Papen; Guido Trensch; Pietro Quaglio; Sonja Grün; Michael Denker
License
https://opensource.org/licenses/BSD-3-Clausehttps://opensource.org/licenses/BSD-3-Clause
Dataset funded by
EU
Helmholtz
Description
This repository hosts code and data to reproduce the findings of the article 'Reproducible neural network simulations: statistical methods for model validation on the level of network activity data'. In addition, the repository hosts an additional example for the use of the tool "NetworkUnit".
f
Data from: Cokriging-Based Sequential Design Strategies Using Fast...
tandf.figshare.com
text/x-tex
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Loic Le Gratiet; Claire Cannamela (2023). Cokriging-Based Sequential Design Strategies Using Fast Cross-Validation Techniques for Multi-Fidelity Computer Codes [Dataset]. http://doi.org/10.6084/m9.figshare.1568327.v5
Explore at:
text/x-texAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1568327.v5
Dataset updated
Jun 1, 2023
Dataset provided by
Taylor & Francis
Authors
Loic Le Gratiet; Claire Cannamela
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Cokriging-based surrogate models have become popular in recent decades to approximate a computer code output from a few simulations using both coarse and more complex versions of the code. In practical applications, it is common to sequentially add new simulations to obtain more accurate approximations. We propose a method of cokriging-based sequential design, which combines both the error evaluation provided by the cokriging model and the observed errors of a leave-one-out cross-validation procedure. This method is proposed in two versions, the first one selects points one at a time. The second one allows us to parallelize the simulations and to add several design points at a time. The main advantage of the suggested strategies is that at a new design point they choose which code versions should be simulated (i.e., the complex code or one of its fast approximations). A multifidelity application is used to illustrate the efficiency of the proposed approaches. In this example, the accurate code is a two-dimensional finite element model and the less accurate one is a one-dimensional approximation of the system. This article has supplementary material online.
Data from: Experimental Validation of a Prognostic Health Management System...
data.nasa.gov
datasets.ai
+3more
application/rdfxml +5
Updated Jun 26, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2018). Experimental Validation of a Prognostic Health Management System for Electro-Mechanical Actuators [Dataset]. https://data.nasa.gov/dataset/Experimental-Validation-of-a-Prognostic-Health-Man/tddm-8y6c
Explore at:
application/rdfxml, application/rssxml, json, csv, tsv, xmlAvailable download formats
Dataset updated
Jun 26, 2018
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
The work described herein is aimed to advance prognostic health management solutions for electro-mechanical actuators and, thus, increase their reliability and attractiveness to designers of the next generation aircraft and spacecraft. In pursuit of this goal the team adopted a systematic approach by starting with EMA FMECA reviews, consultations with EMA manufacturers, and extensive literature reviews of previous efforts. Based on the acquired knowledge, nominal/off-nominal physics models and prognostic health management algorithms were developed. In order to aid with development of the algorithms and validate them on realistic data, a testbed capable of supporting experiments in both laboratory and flight environment was developed. Test actuators with architectures similar to potential flight-certified units were obtained for the purposes of testing and realistic fault injection methods were designed. Several hundred fault scenarios were created, using permutations of position and load profiles, as well as fault severity levels. The diagnostic system was tested extensively on these scenarios, with the test results demonstrating high accuracy and low numbers of false positive and false negative diagnoses. The prognostic system was utilized to track fault progression in some of the fault scenarios, predicting the remaining useful life of the actuator. A series of run-to-failure experiments were conducted to validate its performance, with the resulting error in predicting time to failure generally lesser than 10% error. While a more robust validation procedure would require dozens more experiments executed under the same conditions (and, consequently, more test articles destroyed), the current results already demonstrate the potential for predicting fault progression in this type of devices. More prognostic experiments are planned for the next phase of this work, including investigation and comparison of other prognostic algorithms (such as various types of Particle Filter and GPR), addition of new fault types, and execution of prognostic experiments in flight environment.
d
Prognostics of Power Electronics, methods and validation testbeds
catalog.data.gov
data.nasa.gov
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dashlink (2023). Prognostics of Power Electronics, methods and validation testbeds [Dataset]. https://catalog.data.gov/dataset/prognostics-of-power-electronics-methods-and-validation-testbeds
Explore at:
Dataset updated
Dec 7, 2023
Dataset provided by
Dashlink
Description
An overview of the current results of prognostics for DC- DC power converters is presented, focusing on the output filter capacitor component. The electrolytic capacitor used typically as fileter capacitor is one of the components of the power supply with higher failure rate, hence the effort in devel- oping component level prognostics methods for capacitors. An overview of prognostics algorithms based on electrical overstress and thermal overstress accelerated aging data is presented and a discussion on the current efforts in terms of validation of the algorithms is included. The focus of current and future work is to develop a methodology that allows for algoritm development using accelerated aging data and then transform that to a valid algorithm on the real usage time scale.
f
Raw mask quality assessment data from all experiments.
figshare.com
plos.figshare.com
zip
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rolf A. Heckemann; Christian Ledig; Katherine R. Gray; Paul Aljabar; Daniel Rueckert; Joseph V. Hajnal; Alexander Hammers (2023). Raw mask quality assessment data from all experiments. [Dataset]. http://doi.org/10.1371/journal.pone.0129211.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0129211.s001
Dataset updated
May 30, 2023
Dataset provided by
PLOS ONE
Authors
Rolf A. Heckemann; Christian Ledig; Katherine R. Gray; Paul Aljabar; Daniel Rueckert; Joseph V. Hajnal; Alexander Hammers
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Reference brain volume in mm3 (A.csv), overlap values for all pairings (B.csv), surface distance for all pairings (C.csv), success index for all relevant pairings (D.csv). (ZIP)
Z
Microphone Comparison Array Validation Dataset
data.niaid.nih.gov
zenodo.org
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microphone Comparison Array Validation Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_20547
Explore at:
Dataset updated
Jan 24, 2020
Dataset provided by
Pearce, Andy
Brookes, Tim
Dewhirst, Martin
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Data generated as part of research to determine an appropriate method to make audio recordings suitable for use in comparing the perceptual characteristics imparted by microphones. Data comprise audio files, listening test interfaces and MATLAB code.

References

BBC SNN (2015): A.Pearce. T.Brookes, M.Dewhirst, "Timbral differences between microphones", BBC Sound Now & Next Technology Fair, London, UK, 19-20 May 2015

AES139 (2015): A.Pearce. T.Brookes, M.Dewhirst, "Validation of experimental methods to record stimuli for microphone comparisons", Audio Eng.Soc. 139th Convention, New York, USA, 29 Oct - 1 Nov 2015

Facebook

Twitter

Click to copy link

Link copied

Cite

Dashlink (2023). Development of a Mobile Robot Test Platform and Methods for Validation of Prognostics-Enabled Decision Making Algorithms [Dataset]. https://catalog.data.gov/dataset/development-of-a-mobile-robot-test-platform-and-methods-for-validation-of-prognostics-enab

Data from: Development of a Mobile Robot Test Platform and Methods for Validation of Prognostics-Enabled Decision Making Algorithms

Explore at:

Dataset updated

Dec 6, 2023

Dataset provided by

Dashlink

Description

As fault diagnosis and prognosis systems in aerospace applications become more capable, the ability to utilize information supplied by them becomes increasingly important. While certain types of vehicle health data can be effectively processed and acted upon by crew or support personnel, others, due to their complexity or time constraints, require either automated or semi-automated reasoning. Prognostics-enabled Decision Making (PDM) is an emerging research area that aims to integrate prognostic health information and knowledge about the future operating conditions into the process of selecting subsequent actions for the system. The newly developed PDM algorithms require suitable software and hardware platforms for testing under realistic fault scenarios. The paper describes the development of such a platform, based on the K11 planetary rover prototype. A variety of injectable fault modes are being investigated for electrical, mechanical, and power subsystems of the testbed, along with methods for data collection and processing. In addition to the hardware platform, a software simulator with matching capabilities has been developed. The simulator allows for prototyping and initial validation of the algorithms prior to their deployment on the K11. The simulator is also available to the PDM algorithms to assist with the reasoning process. A reference set of diagnostic, prognostic, and decision making algorithms is also described, followed by an overview of the current test scenarios and the results of their execution on the simulator.

Clear search

Close search

Google apps

Main menu

Data from: Development of a Mobile Robot Test Platform and Methods for...

Data from: S1 Dataset -

ASSIST Dominican Republic Data Validation

Summary report of the 4th IAEA Technical Meeting on Fusion Data Processing,...

Expenditure and Consumption Survey, 2010 - West Bank and Gaza

Abstract

Geographic coverage

Analysis unit

Universe

Kind of data

Sampling procedure

Sample and Frame:

Sample Design:

The population was divided by:

Sample Size:

Mode of data collection

Research instrument

Cleaning operations

Raw Data

Harmonized Data

Response rate

Sampling error estimates

Data from: Synthetic Smart Card Data for the Analysis of Temporal and...

Forage Fish Aerial Validation Data from Prince William Sound, Alaska

Data Repository for 'Bootstrap aggregation and cross-validation methods to...

Data for training, validation and testing of methods in the thesis:...

Data from: Sensor Validation using Bayesian Networks

Data from: Validation of Methods to Assess the Immunoglobulin Gene...

Chi-square test result of the dimensional error count by institution.

Icing Validation Database

Time to Update the Split-Sample Approach in Hydrological Model Calibration...

Resources for the article 'Reproducible neural network simulations:...

Data from: Cokriging-Based Sequential Design Strategies Using Fast...

Data from: Experimental Validation of a Prognostic Health Management System...

Prognostics of Power Electronics, methods and validation testbeds

Raw mask quality assessment data from all experiments.

Microphone Comparison Array Validation Dataset

Data from: Development of a Mobile Robot Test Platform and Methods for Validation of Prognostics-Enabled Decision Making Algorithms