Facebook
TwitterThese data contain concentrations of major and trace elements in quality-assurance samples.These are the machine-readable versions of Tables 2–5 from the U.S. Geological Survey Scientific Investigations Report, Distribution of Mining Related Trace Elements in Streambed and Floodplain Sediment along the Middle Big River and Tributaries in the Southeast Missouri Barite District, 2012-15 (Smith and Schumacher, 2018).
Facebook
TwitterA comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework consists of three major phases: Phase 1—Preliminary raw data sets exploration, including time formatting and combining datasets of different lengths and different time intervals; Phase 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme values; and Phase 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado) were analyzed. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.The files that are in this data package include one excel file, converted to CSV format (Billy_Barr_raw_qaqc.csv) that contains the raw meteorological data, i.e., input data used for the QA/QC analysis. The second CSV file (Billy_Barr_1hr.csv) is the QA/QC and flagged meteorological data, i.e., output data from the QA/QC analysis. The last file (QAQC_Billy_Barr_2021-03-22.R) is a script written in R that implements the QA/QC and flagging process. The purpose of the CSV data files included in this package is to provide input and output files implemented in the R script.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This file is used by the SampleQC tableau workbook to provide insights on which samples passed QC. It is a subset of the file that is generated by the RNASeq pipeline where all the genes are dropped out.
Facebook
Twitter
According to our latest research, the global PEMS Data Quality Assurance Toolchain market size reached USD 1.38 billion in 2024, driven by stringent emission regulations and the growing adoption of Portable Emissions Measurement Systems (PEMS) across critical industries. The market is expected to expand at a robust CAGR of 10.6% from 2025 to 2033, with the forecasted market size projected to hit USD 3.34 billion by 2033. This growth is primarily attributed to advances in data analytics, increasing environmental compliance requirements, and the need for real-time, accurate emissions data in automotive and industrial sectors.
One of the primary growth factors fueling the PEMS Data Quality Assurance Toolchain market is the intensification of global emission standards, particularly in the automotive and industrial sectors. Governments and regulatory agencies are tightening their oversight on pollutants, requiring manufacturers to adopt advanced data quality assurance toolchains to ensure compliance. The integration of PEMS with sophisticated software and hardware for real-time emissions monitoring has become a necessity, enabling precise data collection and analysis. This trend is further amplified by the transition from laboratory-based emissions testing to real-world driving emissions (RDE) testing, which demands robust data quality assurance solutions to validate and verify measurement accuracy under variable conditions.
Another significant driver is the technological evolution in data analytics and cloud computing. The deployment of cloud-based PEMS data quality assurance toolchains allows stakeholders to centralize, store, and analyze vast volumes of emissions data from dispersed locations. This not only enhances operational efficiency but also supports predictive maintenance and anomaly detection, reducing downtime and ensuring continuous compliance. The rise of artificial intelligence and machine learning in data validation processes further streamlines quality assurance, automating error detection and correction, and empowering organizations to make data-driven decisions swiftly. As industries increasingly digitize their operations, the demand for scalable and flexible toolchain solutions is expected to surge.
Additionally, the growing focus on environmental sustainability and corporate social responsibility is compelling organizations to invest in comprehensive emissions monitoring and reporting solutions. PEMS data quality assurance toolchains play a critical role in enabling transparent and accurate emissions reporting, which is essential for securing regulatory approvals, maintaining brand reputation, and attracting environmentally conscious investors. The expansion of emissions monitoring to non-traditional sectors, such as shipping and aviation, is also opening new avenues for market growth. However, the complexity of integrating PEMS with legacy systems and the high initial investment required for advanced toolchains remain challenges that market players are striving to address through modular and cost-effective solutions.
Regionally, Europe dominates the PEMS Data Quality Assurance Toolchain market, accounting for the largest share in 2024, driven by aggressive climate policies and the rapid adoption of RDE testing protocols. North America follows closely, benefiting from technological innovations and stringent emission norms. The Asia Pacific region is anticipated to witness the highest CAGR during the forecast period, fueled by rapid industrialization, urbanization, and increasing regulatory scrutiny in emerging economies such as China and India. Latin America and the Middle East & Africa are gradually catching up, supported by growing environmental awareness and the introduction of emission control initiatives. This regional diversification underscores the global importance of data quality assurance in emissions monitoring and compliance.
The PEMS Data Quality Assuran
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
This resources consists of 3 example notebooks and associated data files.
Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA)
Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm
For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').
Facebook
Twitter
According to our latest research, the global Synchrophasor Data Quality Assurance market size reached USD 765 million in 2024, reflecting strong momentum in the power grid modernization sector. The market is projected to expand at a robust CAGR of 12.1% from 2025 to 2033, reaching an estimated USD 2.14 billion by 2033. This growth is primarily driven by the increasing need for real-time grid monitoring, the proliferation of renewable energy sources, and the stringent regulatory mandates for grid reliability and security. As utilities and grid operators worldwide prioritize grid resilience and operational efficiency, the adoption of advanced synchrophasor data quality assurance solutions is accelerating.
One of the primary growth factors for the Synchrophasor Data Quality Assurance market is the global shift towards smart grid infrastructure and the integration of distributed energy resources. As power grids become more complex and interconnected, the volume and velocity of synchrophasor data generated by Phasor Measurement Units (PMUs) are increasing exponentially. This surge in data necessitates robust data quality assurance mechanisms to ensure accurate, reliable, and timely information for critical grid operations. Furthermore, the adoption of renewable energy sources such as wind and solar has introduced greater variability and uncertainty into grid operations, making high-quality synchrophasor data essential for real-time monitoring, state estimation, and fault detection.
Another significant driver is the growing regulatory emphasis on grid reliability and cybersecurity. Regulatory agencies across North America, Europe, and Asia Pacific are mandating utilities to implement advanced monitoring and reporting systems to enhance grid resilience against physical and cyber threats. Synchrophasor data quality assurance solutions play a pivotal role in meeting these regulatory requirements by providing comprehensive data validation, cleansing, and anomaly detection capabilities. Additionally, the increasing frequency of extreme weather events and grid disturbances has heightened the need for continuous, high-fidelity data streams to support rapid situational awareness and decision-making.
Technological advancements in data analytics, artificial intelligence, and machine learning are further propelling market growth. Modern synchrophasor data quality assurance platforms leverage these technologies to automate data validation processes, detect subtle anomalies, and provide actionable insights for grid operators. The convergence of big data analytics with synchrophasor technology is enabling utilities to move beyond traditional monitoring towards predictive maintenance and proactive grid management. This technological evolution is not only enhancing operational efficiency but also reducing downtime and maintenance costs, thereby driving the adoption of data quality assurance solutions across the energy sector.
From a regional perspective, North America currently leads the Synchrophasor Data Quality Assurance market, accounting for the largest share in 2024, followed by Europe and Asia Pacific. The United States, in particular, has been at the forefront of synchrophasor technology deployment, supported by significant investments from the Department of Energy and other government agencies. Europe is witnessing rapid growth, driven by the increasing integration of renewables and cross-border interconnections, while Asia Pacific is emerging as a high-growth region due to ongoing grid modernization initiatives in countries such as China, India, and Japan. Latin America and the Middle East & Africa are also gradually adopting synchrophasor data quality assurance solutions, albeit at a slower pace, as they embark on their respective grid modernization journeys.
The Synchrophasor Data Quality Assurance market is segmented by component into Software, Hardware, and Services. The software segment dominate
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
RNA sequencing (RNA-seq) is widely used for RNA quantification in the environmental, biological and medical sciences. It enables the description of genome-wide patterns of expression and the identification of regulatory interactions and networks. The aim of RNA-seq data analyses is to achieve rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite variation in levels of noise and inherent biases in sequencing data. This can be especially challenging for datasets in which gene expression differences are subtle, as in the behavioural transcriptomics test dataset from D. melanogaster that we used here. We investigated the power of existing approaches for quality checking mRNA-seq data and explored additional, quantitative quality checks. To accommodate nested, multi-level experimental designs, we incorporated sample layout into our analyses. We employed a subsampling without replacement-based normalization and an identification of DE that accounted for the hierarchy and amplitude of effect sizes within samples, then evaluated the resulting differential expression call in comparison to existing approaches. In a final step to test for broader applicability, we applied our approaches to a published set of H. sapiens mRNA-seq samples, The dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. The proposed approaches have the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments.
Facebook
TwitterThis dataset contains quality assurance sample concentrations for 19 constituents for the 15 monitored watersheds in Gwinnett County, Georgia, from 2000 to 2020. Samples include 431 replicate pairs and 335 field blanks. Replicate sampling allows for comparison of water quality to assess the combined variability in sample representativeness and laboratory precision and can also be used to identify possible sample contamination. Field blanks, samples of deionized “blank” water collected using the same sampling equipment and similar methodology as routine samples, were used to identify any possible contamination from field collection equipment and methodology.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Ontologies play an important role in the representation, standardization, and integration of biomedical data, but are known to have data quality (DQ) issues. We aimed to understand if the Harmonized Data Quality Framework (HDQF), developed to standardize electronic health record DQ assessment strategies, could be used to improve ontology quality assessment. A novel set of 14 ontology checks was developed. These DQ checks were aligned to the HDQF and examined by HDQF developers. The ontology checks were evaluated using 11 Open Biomedical Ontology Foundry ontologies. 85.7% of the ontology checks were successfully aligned to at least 1 HDQF category. Accommodating the unmapped DQ checks (n=2), required modifying an original HDQF category and adding a new Data Dependency category. The HQDF is a valuable resource within the clinical domain and this work demonstrates its ability to categorize ontology quality assessment strategies.
This repository contains the following:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
PurposeThe aim of this study was to compare the functional characteristics of two computer-based systems for quality control of cancer registry data through analysis of their output differences.MethodsThe study used cancer incidence data from 22 of the 49 registries of the Italian Network of Cancer Registries registered between 1986 and 2017. Two different data checking systems developed by the WHO International Agency for Research on Cancer (IARC) and the Joint Research Center (JRC) with the European Network of Cancer Registries (ENCR) and routinely used by registrars were used to check the quality of the data. The outputs generated by the two systems on the same dataset of each registry were analyzed and compared.ResultsThe study included a total of 1,305,689 cancer cases. The overall quality of the dataset was high, with 86% (81.7-94.1) microscopically verified cases and only 1.3% (0.03-3.06) cases with a diagnosis by death certificate only. The two check systems identified a low percentage of errors (JRC-ENCR 0.17% and IARC 0.003%) and about the same proportion of warnings (JRC-ENCR 2.79% and IARC 2.42%) in the dataset. Forty-two cases (2% of errors) and 7067 cases (11.5% of warnings) were identified by both systems in equivalent categories. 11.7% of warnings related to TNM staging were identified by the JRC-ENCR system only. The IARC system identified mainly incorrect combination of tumor grade and morphology (72.5% of warnings).ConclusionBoth systems apply checks on a common set of variables, but some variables are checked by only one of the systems (for example, checks on patient follow-up and tumor stage at diagnosis are included by the JRC-ENCR system only). Most errors and warnings were categorized differently by the two systems, but usually described the same issues, with warnings related to “morphology” (JRC-ENCR) and “histology” (IARC) being the most frequent. It is important to find the right balance between the need to maintain high standards of data quality and the workability of such systems in the daily routine of the cancer registry.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
In the context of omics disciplines and especially proteomics and biomarker discovery, the analysis of a clinical sample using label-based tandem mass spectrometry (MS) can be affected by sample preparation effects or by the measurement process itself, resulting in an incorrect outcome. Detection and correction of these mistakes using state-of-the-art methods based on mixed models can use large amounts of (computing) time. MS-based proteomics laboratories are high-throughput and need to avoid a bottleneck in their quantitative pipeline by quickly discriminating between high- and low-quality data. To this end we developed an easy-to-use web-tool called QCQuan (available at qcquan.net) which is built around the CONSTANd normalization algorithm. It automatically provides the user with exploratory and quality control information as well as a differential expression analysis based on conservative, simple statistics. In this document we describe in detail the scientifically relevant steps that constitute the workflow and assess its qualitative and quantitative performance on three reference data sets. We find that QCQuan provides clear and accurate indications about the scientific value of both a high- and a low-quality data set. Moreover, it performed quantitatively better on a third data set than a comparable workflow assembled using established, reliable software.
Facebook
TwitterResearch Ship Roger Revelle Underway Meteorological Data (delayed ~10 days for quality control) are from the Shipboard Automated Meteorological and Oceanographic System (SAMOS) program. IMPORTANT: ALWAYS USE THE QUALITY FLAG DATA! Each data variable's metadata includes a qcindex attribute which indicates a character number in the flag data. ALWAYS check the flag data for each row of data to see which data is good (flag='Z') and which data isn't. For example, to extract just data where time (qcindex=1), latitude (qcindex=2), longitude (qcindex=3), and airTemperature (qcindex=12) are 'good' data, include this constraint in your ERDDAP query: flag=~"ZZZ........Z." in your query. '=~' indicates this is a regular expression constraint. The 'Z's are literal characters. In this dataset, 'Z' indicates 'good' data. The '.'s say to match any character. The '' says to match the previous character 0 or more times. (Don't include backslashes in your query.) See the tutorial for regular expressions at https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global automotive quality control service market size was valued at USD 2.5 billion in 2023 and is projected to reach USD 5.8 billion by 2032, growing at a Compound Annual Growth Rate (CAGR) of 8.9% over the forecast period. This robust growth trajectory is attributed to the increasing demand for higher-quality vehicles, stringent regulatory frameworks, and technological advancements in quality control systems.
One of the primary growth factors driving the automotive quality control service market is the rising consumer demand for quality and reliability in vehicles. As the automotive industry becomes increasingly competitive, manufacturers are under pressure to maintain high standards of quality to differentiate their products. This demand has led to increased adoption of quality control services, including inspection, testing, and certification, to ensure that vehicles meet stringent quality and safety standards.
Another significant growth factor is the implementation of stringent government regulations and industry standards. Regulatory bodies across the world are imposing rigorous guidelines to ensure the safety, performance, and environmental compatibility of vehicles. These regulations necessitate regular quality checks and certifications, driving the demand for quality control services. For example, the European Union's stringent emissions standards and the U.S. National Highway Traffic Safety Administration's (NHTSA) safety regulations are notable contributors to this trend.
Technological advancements are also playing a crucial role in the growth of this market. Innovations such as automated inspection systems, advanced testing equipment, and data analytics have revolutionized quality control processes. These technologies not only enhance the accuracy and efficiency of quality control but also reduce operational costs. The integration of IoT and AI in quality control systems is expected to further boost market growth by enabling real-time monitoring and predictive maintenance.
From a regional perspective, Asia Pacific is anticipated to dominate the automotive quality control service market during the forecast period, primarily due to the rapid expansion of the automotive industry in countries like China, India, and Japan. North America and Europe are also expected to witness significant growth, driven by the presence of established automotive manufacturers and stringent regulatory environments. Latin America and the Middle East & Africa regions are expected to show moderate growth due to slower adoption rates and less stringent regulatory frameworks.
The automotive quality control service market is segmented into inspection, testing, certification, and others. The inspection segment holds the largest market share, driven by the need for thorough and detailed examination of vehicles and their components. Inspections are critical in identifying defects, ensuring compliance with safety standards, and maintaining quality assurance throughout the production process. Technological advancements, such as automated inspection systems, have enhanced the efficiency and accuracy of inspection services, further propelling their demand.
The testing segment is also experiencing substantial growth, attributed to the increasing complexity of automotive systems and the need for rigorous testing to ensure their reliability and performance. Testing services include a wide range of activities, such as performance testing, durability testing, and crash testing. The adoption of advanced testing equipment and methodologies has enhanced the effectiveness of these services, making them indispensable for automotive manufacturers. The growing focus on electric vehicles (EVs) and the need for specialized testing for battery systems and electric drivetrains are further boosting this segment.
Certification services are another critical component of the automotive quality control service market. Certifications are essential for validating that vehicles and their components meet established standards and regulations. The increasing emphasis on safety, environmental sustainability, and performance standards has led to a higher demand for certification services. Regulatory bodies and industry standards organizations play a significant role in driving this demand, as manufacturers must obtain certifications to launch their products in various markets.
The 'others' segment includes a range of additional quality
Facebook
Twitter
According to our latest research, the global Production Data Quality Management market size reached USD 1.74 billion in 2024, reflecting the surging demand for robust data management solutions across industrial and commercial sectors. The market is projected to expand at a CAGR of 12.7% from 2025 to 2033, with the market size forecasted to reach approximately USD 5.13 billion by 2033. The primary growth driver remains the increasing need for accurate, real-time, and actionable production data to optimize operations, ensure compliance, and drive digital transformation initiatives across diverse industries.
The ongoing digitalization of manufacturing and process industries is a significant growth factor for the Production Data Quality Management market. As organizations embrace Industry 4.0, smart manufacturing, and IoT-driven automation, the volume, velocity, and variety of production data have increased exponentially. This surge in data complexity amplifies the importance of data quality management solutions that can ensure data accuracy, consistency, and completeness. Companies are investing heavily in advanced analytics, artificial intelligence, and machine learning to automate data validation, cleansing, and governance processes, which in turn enhances operational efficiency, reduces downtime, and supports data-driven decision-making. The rising adoption of cloud-based solutions further facilitates scalable, centralized, and collaborative data quality management, enabling enterprises to extract greater value from their production data assets.
Another critical growth driver is the stringent regulatory landscape and the growing emphasis on compliance and risk management, particularly in sectors such as pharmaceuticals, food & beverages, and chemicals. Regulatory bodies across the globe are mandating traceability, quality control, and data integrity, compelling organizations to implement comprehensive production data quality management frameworks. These frameworks not only ensure compliance but also help companies avoid costly recalls, reputational damage, and legal penalties. The integration of data quality management with enterprise resource planning (ERP), manufacturing execution systems (MES), and quality management systems (QMS) is becoming increasingly common, providing a holistic approach to end-to-end production data lifecycle management.
Furthermore, the rising focus on operational excellence, cost optimization, and competitive differentiation is driving organizations to invest in production data quality management solutions. Accurate and reliable production data enables predictive maintenance, process optimization, and supply chain efficiency, directly impacting profitability and customer satisfaction. The proliferation of Industrial IoT devices and edge computing is generating new data sources and opportunities for real-time data quality monitoring and management. As organizations strive to leverage advanced analytics and artificial intelligence for predictive insights, the foundational requirement for high-quality data becomes even more pronounced, reinforcing the market's growth trajectory.
Regionally, North America continues to dominate the Production Data Quality Management market due to the early adoption of digital technologies, a strong presence of major industry players, and robust investments in smart manufacturing initiatives. However, Asia Pacific is emerging as the fastest-growing region, driven by rapid industrialization, expanding manufacturing bases, and supportive government policies promoting digital transformation. Europe also holds a significant market share, propelled by stringent regulatory standards and a mature industrial landscape. The Middle East & Africa and Latin America are witnessing steady growth, supported by increasing investments in infrastructure development and industrial automation. These regional dynamics underscore the global relevance and expanding footprint of production data quality management solutions.
Facebook
TwitterNOAA Ship Henry B. Bigelow Underway Meteorological Data (delayed ~10 days for quality control) are from the Shipboard Automated Meteorological and Oceanographic System (SAMOS) program. IMPORTANT: ALWAYS USE THE QUALITY FLAG DATA! Each data variable's metadata includes a qcindex attribute which indicates a character number in the flag data. ALWAYS check the flag data for each row of data to see which data is good (flag='Z') and which data isn't. For example, to extract just data where time (qcindex=1), latitude (qcindex=2), longitude (qcindex=3), and airTemperature (qcindex=12) are 'good' data, include this constraint in your ERDDAP query: flag=~"ZZZ........Z." in your query. "=~" indicates this is a regular expression constraint. The 'Z's are literal characters. In this dataset, 'Z' indicates 'good' data. The '.'s say to match any character. The '' says to match the previous character 0 or more times. See the tutorial for regular expressions at https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Examples from the analysis of qualitative responses to the question “Are data quality processes sufficiently rigorous to provide a ‘fit-for-purpose’ dataset?”.
Facebook
TwitterNOAA Ship Nancy Foster Underway Meteorological Data (delayed ~10 days for quality control) are from the Shipboard Automated Meteorological and Oceanographic System (SAMOS) program. IMPORTANT: ALWAYS USE THE QUALITY FLAG DATA! Each data variable's metadata includes a qcindex attribute which indicates a character number in the flag data. ALWAYS check the flag data for each row of data to see which data is good (flag='Z') and which data isn't. For example, to extract just data where time (qcindex=1), latitude (qcindex=2), longitude (qcindex=3), and airTemperature (qcindex=12) are 'good' data, include this constraint in your ERDDAP query: flag=~"ZZZ........Z." in your query. "=~" indicates this is a regular expression constraint. The 'Z's are literal characters. In this dataset, 'Z' indicates 'good' data. The '.'s say to match any character. The '' says to match the previous character 0 or more times. See the tutorial for regular expressions at https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Facebook
TwitterLink to the ScienceBase Item Summary page for the item described by this metadata record. Service Protocol: Link to the ScienceBase Item Summary page for the item described by this metadata record. Application Profile: Web Browser. Link Function: information
Facebook
TwitterStudy-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study’s outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.
Facebook
TwitterNOAA Ship Oregon II Underway Meteorological Data (delayed ~10 days for quality control) are from the Shipboard Automated Meteorological and Oceanographic System (SAMOS) program. IMPORTANT: ALWAYS USE THE QUALITY FLAG DATA! Each data variable's metadata includes a qcindex attribute which indicates a character number in the flag data. ALWAYS check the flag data for each row of data to see which data is good (flag='Z') and which data isn't. For example, to extract just data where time (qcindex=1), latitude (qcindex=2), longitude (qcindex=3), and airTemperature (qcindex=12) are 'good' data, include this constraint in your ERDDAP query: flag=~"ZZZ........Z." in your query. "=~" indicates this is a regular expression constraint. The 'Z's are literal characters. In this dataset, 'Z' indicates 'good' data. The '.'s say to match any character. The '' says to match the previous character 0 or more times. See the tutorial for regular expressions at https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Facebook
TwitterThese data contain concentrations of major and trace elements in quality-assurance samples.These are the machine-readable versions of Tables 2–5 from the U.S. Geological Survey Scientific Investigations Report, Distribution of Mining Related Trace Elements in Streambed and Floodplain Sediment along the Middle Big River and Tributaries in the Southeast Missouri Barite District, 2012-15 (Smith and Schumacher, 2018).