100+ datasets found

Data quality and methodology (TSM 2024)
gov.uk
Updated Nov 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Regulator of Social Housing (2024). Data quality and methodology (TSM 2024) [Dataset]. https://www.gov.uk/government/statistics/data-quality-and-methodology-tsm-2024
Explore at:
Dataset updated
Nov 26, 2024
Dataset provided by
GOV.UKhttp://gov.uk/
Authors
Regulator of Social Housing
Description
Contents

Introduction

Regulatory context

TSM collection

Statistical release methodology

Why not have your say on our statistics in 2024/25?

Introduction

This report describes the quality assurance arrangements for the registered provider (RP) Tenant Satisfaction Measures statistics, providing more detail on the regulatory and operational context for data collections which feed these statistics and the safeguards that aim to maximise data quality.

Background

The statistics we publish are based on data collected directly from local authority registered provider (LARPs) and from private registered providers (PRPs) through the Tenant Satisfaction Measures (TSM) return. We use the data collected through these returns extensively as a source of administrative data. The United Kingdom Statistics Authority (UKSA) encourages public bodies to use administrative data for statistical purposes and, as such, we publish these data.

These data are first being published in 2024, following the first collection and publication of the TSM.

Official Statistics in development status

In February 2018, the UKSA published the Code of Practice for Statistics. This sets standards for organisations producing and publishing statistics, ensuring quality, trustworthiness and value.

These statistics are drawn from our TSM data collection and are being published for the first time in 2024 as official statistics in development.

Official statistics in development are official statistics that are undergoing development. Over the next year we will review these statistics and consider areas for improvement to guidance, validations, data processing and analysis. We will also seek user feedback with a view to improving these statistics to meet user needs and to explore issues of data quality and consistency.

Change of designation name

Until September 2023, ‘official statistics in development’ were called ‘experimental statistics’. Further information can be found on the https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/guidetoofficialstatisticsindevelopment">Office for Statistics Regulation website.

User feedback

We are keen to increase the understanding of the data, including the accuracy and reliability, and the value to users. Please https://forms.office.com/e/cetNnYkHfL">complete the form or email feedback, including suggestions for improvements or queries as to the source data or processing to enquiries@rsh.gov.uk.

Publication schedule

We intend to publish these statistics in Autumn each year, with the data pre-announced in the release calendar.

All data and additional information (including a list of individuals (if any) with 24 hour pre-release access) are published on our statistics pages.

Quality assurance of administrative data

The data used in the production of these statistics are classed as administrative data. In 2015 the UKSA published a regulatory standard for the quality assurance of administrative data. As part of our compliance to the Code of Practice, and in the context of other statistics published by the UK Government and its agencies, we have determined that the statistics drawn from the TSMs are likely to be categorised as low-quality risk – medium public interest (with a requirement for basic/enhanced assurance).

The publication of these statistics can be considered as medium publi
Data from: DATA QUALITY ON THE WEB: INTEGRATIVE REVIEW OF PUBLICATION...
scielo.figshare.com
tiff
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Morgana Carneiro de Andrade; Maria José Baños Moreno; Juan-Antonio Pastor-Sánchez (2023). DATA QUALITY ON THE WEB: INTEGRATIVE REVIEW OF PUBLICATION GUIDELINES [Dataset]. http://doi.org/10.6084/m9.figshare.22815541.v1
Explore at:
tiffAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22815541.v1
Dataset updated
May 30, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Morgana Carneiro de Andrade; Maria José Baños Moreno; Juan-Antonio Pastor-Sánchez
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT The exponential increase of published data and the diversity of systems require the adoption of good practices to achieve quality indexes that enable discovery, access, and reuse. To identify good practices, an integrative review was used, as well as procedures from the ProKnow-C methodology. After applying the ProKnow-C procedures to the documents retrieved from the Web of Science, Scopus and Library, Information Science & Technology Abstracts databases, an analysis of 31 items was performed. This analysis allowed observing that in the last 20 years the guidelines for publishing open government data had a great impact on the Linked Data model implementation in several domains and currently the FAIR principles and the Data on the Web Best Practices are the most highlighted in the literature. These guidelines presents orientations in relation to various aspects for the publication of data in order to contribute to the optimization of quality, independent of the context in which they are applied. The CARE and FACT principles, on the other hand, although they were not formulated with the same objective as FAIR and the Best Practices, represent great challenges for information and technology scientists regarding ethics, responsibility, confidentiality, impartiality, security, and transparency of data.
data-quality-assessment-datasets
kaggle.com
zip
Updated Dec 23, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
shamiul islam shifat (2022). data-quality-assessment-datasets [Dataset]. https://www.kaggle.com/datasets/shamiulislamshifat/dataqualityassessmentdatasets
Explore at:
zip(407602 bytes)Available download formats
Dataset updated
Dec 23, 2022
Authors
shamiul islam shifat
Description
Dataset

This dataset was created by shamiul islam shifat

Contents
d
Data from: Questions and responses to USGS-wide poll on quality assurance...
catalog.data.gov
data.usgs.gov
+1more
Updated Nov 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Questions and responses to USGS-wide poll on quality assurance practices for timeseries data, 2021 [Dataset]. https://catalog.data.gov/dataset/questions-and-responses-to-usgs-wide-poll-on-quality-assurance-practices-for-timeseries-da
Explore at:
Dataset updated
Nov 27, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Description
This data record contains questions and responses to a USGS-wide survey conducted to identify issues and needs associated with quality assurance and quality control (QA/QC) of USGS timeseries data streams. This research was funded by the USGS Community for Data Integration as part of a project titled “From reactive- to condition-based maintenance: Artificial intelligence for anomaly predictions and operational decision-making”. The poll targeted monitoring network managers and technicians and asked questions about operational data streams and timeseries data collection in order to identity opportunities to streamline data access, expedite the response to data quality issues, improve QA/QC procedures, reduce operations costs, and uncover other maintenance needs. The poll was created using an online survey platform. It was sent to 2326 systematically selected USGS email addresses and received 175 responses in 11 days before it was closed to respondents. The poll contained 48 questions of various types including long answer, multiple choice, and ranking questions. The survey contained a mix of mandatory and optional questions. These distinctions as well as full descriptions of survey questions are noted on the metadata.
Data from: Statistical Process Control as a Tool for Quality Improvement A...
figshare.com
docx
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Canberk Elmalı; Özge Ural (2023). Statistical Process Control as a Tool for Quality Improvement A Case Study in Denim Pant Production [Dataset]. http://doi.org/10.6084/m9.figshare.22147508.v2
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22147508.v2
Dataset updated
Feb 23, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Canberk Elmalı; Özge Ural
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper, we show that concept of Statistical Process Control tools was thoroughly examined and the definitions of quality control concepts were presented. This is significant because of it is anticipated that this study will contribute to the literature as an exemplary application that demonstrates the role of statistical process control (SPC) tools in quality improvement in the evaluation and decision-making phase.

This is significant because of this study is to investigate applications of quality control, to clarify statistical control methods and problem-solving procedures, to generate proposals for problem-solving approaches, and to disseminate improvement studies in the ready-to-wear industry. The basic Statistical Process Control tools used in the study, the most repetitive faults were detected and these faults were divided into sub-headings for more detailed analysis. In this way, it was tried to prevent the repetition of faults by going down to the root causes of any detected fault. With this different perspective, it is expected that the study will contribute to other fields.

We give consent for the publication of identifiable details, which can include photograph(s) and case history and details within the text (“Material”) to be published in the Journal of Quality Technology. We confirm that have seen and been given the opportunity to read both the Material and the Article (as attached) to be published by Taylor & Francis.
V
Data from: Data quality assurance and quality control measures in large...
data.virginia.gov
catalog.data.gov
html
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institutes of Health (2025). Data quality assurance and quality control measures in large multicenter stroke trials: the African-American Antiplatelet Stroke Prevention Study experience [Dataset]. https://data.virginia.gov/dataset/data-quality-assurance-and-quality-control-measures-in-large-multicenter-stroke-trials-the-afri
Explore at:
htmlAvailable download formats
Dataset updated
Sep 6, 2025
Dataset provided by
National Institutes of Health
Description
Data quality assurance and quality control are critical to the effective conduct of a clinical trial. In the present commentary, we discuss our experience in a large, multicenter stroke trial. In addition to standard data quality control techniques, we have developed novel methods to enhance the entire process. Central to our methods is the use of clinical monitors who are trained in the techniques of data monitoring.
K
Replication Data for: Quality control and correction method for air...
rdr.kuleuven.be
bin +2
Updated Feb 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eva Beele; Eva Beele; Maarten Reyniers; Maarten Reyniers; Raf Aerts; Raf Aerts; Ben Somers; Ben Somers (2025). Replication Data for: Quality control and correction method for air temperature data from a citizen science weather station network in Leuven, Belgium [Dataset]. http://doi.org/10.48804/SSRN3F
Explore at:
bin(16164), bin(58376), text/comma-separated-values(154963594), text/comma-separated-values(151156007), text/comma-separated-values(147784385), text/comma-separated-values(159878194), text/comma-separated-values(161578484), bin(10949), bin(24757), bin(33310), bin(10604), txt(1257), text/comma-separated-values(120773313), text/comma-separated-values(142177765), text/comma-separated-values(155399626), text/comma-separated-values(155703873), text/comma-separated-values(146828198), text/comma-separated-values(100874809), text/comma-separated-values(161480206), text/comma-separated-values(143072791), text/comma-separated-values(119230114), text/comma-separated-values(168289190), text/comma-separated-values(162099130), text/comma-separated-values(149774828), text/comma-separated-values(8338), txt(13712), text/comma-separated-values(150623331), text/comma-separated-values(163761533), text/comma-separated-values(163490322), text/comma-separated-values(130717796), text/comma-separated-values(154512726)Available download formats
Unique identifier
https://doi.org/10.48804/SSRN3F
Dataset updated
Feb 19, 2025
Dataset provided by
KU Leuven RDR
Authors
Eva Beele; Eva Beele; Maarten Reyniers; Maarten Reyniers; Raf Aerts; Raf Aerts; Ben Somers; Ben Somers
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Jul 1, 2019 - Dec 31, 2024
Area covered
Leuven, Belgium
Description
This dataset presents crowdsourced data from the Leuven.cool network, a citizen science network of around 100 low-cost weather stations (Fine Offset WH2600) distributed across Leuven, Belgium. The data was quality controlled and corrected by a newly developed station specific temperature quality control (QC) and correction procedure. The procedure consists of three levels removing implausible measurements, while also correcting for inter (in between stations) and intra (station-specific) station temperature biases by means of a random-forest approach.
Manufacturing Defects
kaggle.com
zip
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Fahmida (2024). Manufacturing Defects [Dataset]. https://www.kaggle.com/datasets/fahmidachowdhury/manufacturing-defects
Explore at:
zip(13320 bytes)Available download formats
Dataset updated
Jul 1, 2024
Authors
Fahmida
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains simulated data related to manufacturing defects observed during quality control processes. It includes information such as defect type, detection date, location within the product, severity level, inspection method used, and repair costs. This dataset can be used for analyzing defect patterns, improving quality control processes, and assessing the impact of defects on product quality and production costs. Columns: - defect_id: Unique identifier for each defect. - product_id: Identifier for the product associated with the defect. - defect_type: Type or category of the defect (e.g., cosmetic, functional, structural). - defect_description: Description of the defect. - defect_date: Date when the defect was detected. - defect_location: Location within the product where the defect was found (e.g., surface, component). - severity: Severity level of the defect (e.g., minor, moderate, critical). - inspection_method: Method used to detect the defect (e.g., visual inspection, automated testing). - repair_action: Action taken to repair or address the defect. - repair_cost: Cost incurred to repair the defect (in local currency).

Potential Uses: Quality Control Analysis: Analyze defect patterns and trends in manufacturing processes. Process Improvement: Identify areas for process optimization to reduce defect rates. Cost Analysis: Evaluate the financial impact of defects on production costs and profitability. Product Quality Assurance: Enhance product quality assurance strategies based on defect data analysis. This dataset is entirely synthetic and generated for educational and research purposes. It can be a valuable resource for manufacturing engineers, quality assurance professionals, and researchers interested in defect analysis and quality control.
Z
Data quality assurance at research data repositories: Survey data
data.niaid.nih.gov
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kindling, Maxi; Strecker, Dorothea; Wang, Yi (2024). Data quality assurance at research data repositories: Survey data [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6457848
Explore at:
Dataset updated
Jul 16, 2024
Dataset provided by
Berlin School of Library and Information Science, Humboldt-Universität zu Berlin
Authors
Kindling, Maxi; Strecker, Dorothea; Wang, Yi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset documents findings form a survey on the status quo of data quality assurance practices at research data repositories.

The personalized online survey was conducted among repositories indexed in re3data in 2021. It covered the scope of the repository, types of data quality assessment, quality criteria, responsibilities, details of the review process, and data quality information, and yielded 332 complete responses.

The dataset comprises a documentation file, the data file, a codebook, and the survey instrument.

The documentation file (documentation.pdf) outlines details of the survey design and administration, survey response, and data processing. The data file (01_survey_data.csv) contains all 332 complete responses to 19 survey questions, fully anonymized. The codebook (02_codebook.csv) describes the variables, and the survey instrument (03_survey_instrument.pdf) comprises the questionnaire that was distributed to survey participants.
Data from: Assessment of positional accuracy in spatial data using...
scielo.figshare.com
png
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afonso de Paula dos Santos; Dalto Domingos Rodrigues; Nerilson Terra Santos; Joel Gripp Junior (2023). Assessment of positional accuracy in spatial data using techniques of spatial statistics: proposal of a method and an example using the Brazilian standard [Dataset]. http://doi.org/10.6084/m9.figshare.14327671.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14327671.v1
Dataset updated
Jun 5, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Afonso de Paula dos Santos; Dalto Domingos Rodrigues; Nerilson Terra Santos; Joel Gripp Junior
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper presents the importance of simple spatial statistics techniques applied in positional quality control of spatial data. To this end, Analysis methods of point data spatial distribution pattern are presented, as well as bias analysis in the positional discrepancies samples. To evaluate the points spatial distribution Nearest Neighbor and Ripley's K function methods were used. As for bias analysis, the average directional vectors of discrepancies and the circular variance were used. A methodology for positional quality control of spatial data is proposed, in which includes sampling planning and its spatial distribution pattern evaluation, analyzing the data normality through the application of bias tests, and positional accuracy classification according to a standard. For the practical experiment, an orthoimage generated from a PRISM scene of the ALOS satellite was evaluated. Results showed that the orthoimage is accurate on a scale of 1:25,000, being classified as Class A according to the Brazilian standard positional accuracy, not showing bias at the coordinates. The main contribution of this work is the incorporation of spatial statistics techniques in cartographic quality control.
H
Hydroinformatics Instruction Module Example Code: Sensor Data Quality...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Mar 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amber Spackman Jones (2022). Hydroinformatics Instruction Module Example Code: Sensor Data Quality Control with pyhydroqc [Dataset]. https://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924
Explore at:
zip(159.5 MB)Available download formats
Dataset updated
Mar 3, 2022
Dataset provided by
HydroShare
Authors
Amber Spackman Jones
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.

This resources consists of 3 example notebooks and associated data files.

Notebooks: 1. Example 1: Import and plot data 2. Example 2: Perform rules-based quality control 3. Example 3: Perform model-based quality control (ARIMA)

Data files: Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are: - temp: water temperature, degrees C - cond: specific conductance, μS/cm - ph: pH, standard units - do: dissolved oxygen, mg/L - turb: turbidity, NTU - stage: stage height, cm

For each variable, there are 3 columns: - Raw data value measured by the sensor (column header is the variable abbreviation). - Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor'). - Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').
Data from: Encoding laboratory testing data: case studies of the national...
data-staging.niaid.nih.gov
search.dataone.org
+3more
zip
Updated May 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raja Cholan; Gregory Pappas; Greg Rehwoldt; Andrew Sills; Elizabeth Korte; I. Khalil Appleton; Natalie Scott; Wendy Rubinstein; Sara Brenner; Riki Merrick; Wilbur Hadden; Keith Campbell; Michael Waters (2022). Encoding laboratory testing data: case studies of the national implementation of HHS requirements and related standards in five laboratories [Dataset]. http://doi.org/10.5061/dryad.0cfxpnw55
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.0cfxpnw55
Dataset updated
May 10, 2022
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Food and Drug Administrationhttp://www.fda.gov/
Association of Public Health Laboratorieshttps://www.aphl.org/
Office of the National Coordinator for Health Information Technologyhttp://healthit.gov/
University of Maryland, College Park
Deloitte (United States)
Authors
Raja Cholan; Gregory Pappas; Greg Rehwoldt; Andrew Sills; Elizabeth Korte; I. Khalil Appleton; Natalie Scott; Wendy Rubinstein; Sara Brenner; Riki Merrick; Wilbur Hadden; Keith Campbell; Michael Waters
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: Assess the effectiveness of providing Logical Observation Identifiers Names and Codes (LOINC®)-to-In Vitro Diagnostic (LIVD) coding specification, required by the United States Department of Health and Human Services for SARS-CoV-2 reporting, in medical center laboratories and utilize findings to inform future United States Food and Drug Administration policy on the use of real-world evidence in regulatory decisions. Materials and Methods: We compared gaps and similarities between diagnostic test manufacturers’ recommended LOINC® codes and the LOINC® codes used in medical center laboratories for the same tests. Results: Five medical centers and three test manufacturers extracted data from laboratory information systems (LIS) for prioritized tests of interest. The data submission ranged from 74 to 532 LOINC® codes per site. Three test manufacturers submitted 15 LIVD catalogs representing 26 distinct devices, 6956 tests, and 686 LOINC® codes. We identified mismatches in how medical centers use LOINC® to encode laboratory tests compared to how test manufacturers encode the same laboratory tests. Of 331 tests available in the LIVD files, 136 (41%) were represented by a mismatched LOINC® code by the medical centers (chi-square 45.0, 4 df, P < .0001). Discussion: The five medical centers and three test manufacturers vary in how they organize, categorize, and store LIS catalog information. This variation impacts data quality and interoperability. Conclusion: The results of the study indicate that providing the LIVD mappings was not sufficient to support laboratory data interoperability. National implementation of LIVD and further efforts to promote laboratory interoperability will require a more comprehensive effort and continuing evaluation and quality control. Methods Five medical centers and three test manufacturers extracted data from laboratory information systems (LIS) for prioritized tests of interest. The data submission ranged from 74 to 532 LOINC® codes per site. Three test manufacturers submitted 15 LIVD catalogs representing 26 distinct devices, 6,956 tests, and 686 LOINC® codes. We identified mismatches in how medical centers use LOINC® to encode laboratory tests compared to how test manufacturers encode the same laboratory tests. Of 331 tests available in the LIVD files, 136 (41%) were represented by a mismatched LOINC® code by the medical centers (Chi-square 45.0,4 df,p < .0001). Data Collection from Medical Center Laboratory Pilot Sites: Each medical center was asked to extract about 100 LOINC® Codes from their LIS for prioritized tests of interest focused on high-risk conditions and SARS-CoV-2. For each selected test (e.g., SARS-CoV-2 RNA COVID-19), we collected the following data elements: test names/descriptions (e.g., SARS coronavirus 2 RNA [Presence] in Respiratory specimen by NAA with probe detection), associated instruments (e.g., IVD Vendor Model), and LOINC® codes (e.g., 94500-6). High risk conditions were defined by referencing the CDC’s published list of Underlying Medical Conditions Associated with High Risk for Severe COVID-19.[29] A data collection template spreadsheet was created and disseminated to the medical centers to help provide consistency and reporting clarity for data elements from sites. Data Collection from IVD Manufacturers: We coordinated with SHIELD stakeholders and the IICC to request manufacturer LIVD catalogs containing the LOINC® codes per IVD instrument per test from manufacturers.
Test Data Management Market Analysis, Size, and Forecast 2025-2029: North...
technavio.com
pdf
Updated May 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2025). Test Data Management Market Analysis, Size, and Forecast 2025-2029: North America (US and Canada), Europe (France, Germany, Italy, and UK), APAC (Australia, China, India, and Japan), and Rest of World (ROW) [Dataset]. https://www.technavio.com/report/test-data-management-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
May 1, 2025
Dataset provided by
TechNavio
Authors
Technavio
License
https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Time period covered
2025 - 2029
Area covered
United States
Description
Snapshot img

Test Data Management Market Size 2025-2029

The test data management market size is forecast to increase by USD 727.3 million, at a CAGR of 10.5% between 2024 and 2029.

The market is experiencing significant growth, driven by the increasing adoption of automation by enterprises to streamline their testing processes. The automation trend is fueled by the growing consumer spending on technological solutions, as businesses seek to improve efficiency and reduce costs. However, the market faces challenges, including the lack of awareness and standardization in test data management practices. This obstacle hinders the effective implementation of test data management solutions, requiring companies to invest in education and training to ensure successful integration. To capitalize on market opportunities and navigate challenges effectively, businesses must stay informed about emerging trends and best practices in test data management. By doing so, they can optimize their testing processes, reduce risks, and enhance overall quality.

What will be the Size of the Test Data Management Market during the forecast period?

Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free SampleThe market continues to evolve, driven by the ever-increasing volume and complexity of data. Data exploration and analysis are at the forefront of this dynamic landscape, with data ethics and governance frameworks ensuring data transparency and integrity. Data masking, cleansing, and validation are crucial components of data management, enabling data warehousing, orchestration, and pipeline development. Data security and privacy remain paramount, with encryption, access control, and anonymization key strategies. Data governance, lineage, and cataloging facilitate data management software automation and reporting. Hybrid data management solutions, including artificial intelligence and machine learning, are transforming data insights and analytics. Data regulations and compliance are shaping the market, driving the need for data accountability and stewardship. Data visualization, mining, and reporting provide valuable insights, while data quality management, archiving, and backup ensure data availability and recovery. Data modeling, data integrity, and data transformation are essential for data warehousing and data lake implementations. Data management platforms are seamlessly integrated into these evolving patterns, enabling organizations to effectively manage their data assets and gain valuable insights. Data management services, cloud and on-premise, are essential for organizations to adapt to the continuous changes in the market and effectively leverage their data resources.

How is this Test Data Management Industry segmented?

The test data management industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments. ApplicationOn-premisesCloud-basedComponentSolutionsServicesEnd-userInformation technologyTelecomBFSIHealthcare and life sciencesOthersSectorLarge enterpriseSMEsGeographyNorth AmericaUSCanadaEuropeFranceGermanyItalyUKAPACAustraliaChinaIndiaJapanRest of World (ROW).

By Application Insights

The on-premises segment is estimated to witness significant growth during the forecast period.In the realm of data management, on-premises testing represents a popular approach for businesses seeking control over their infrastructure and testing process. This approach involves establishing testing facilities within an office or data center, necessitating a dedicated team with the necessary skills. The benefits of on-premises testing extend beyond control, as it enables organizations to upgrade and configure hardware and software at their discretion, providing opportunities for exploration testing. Furthermore, data security is a significant concern for many businesses, and on-premises testing alleviates the risk of compromising sensitive information to third-party companies. Data exploration, a crucial aspect of data analysis, can be carried out more effectively with on-premises testing, ensuring data integrity and security. Data masking, cleansing, and validation are essential data preparation techniques that can be executed efficiently in an on-premises environment. Data warehousing, data pipelines, and data orchestration are integral components of data management, and on-premises testing allows for seamless integration and management of these elements. Data governance frameworks, lineage, catalogs, and metadata are essential for maintaining data transparency and compliance. Data security, encryption, and access control are paramount, and on-premises testing offers greater control over these aspects. Data reporting, visualization, and insigh
COVID Testing and Testing-Related Services Provided to Medicaid and CHIP...
healthdata.gov
data.virginia.gov
+3more
csv, xlsx, xml
Updated Mar 28, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.medicaid.gov (2023). COVID Testing and Testing-Related Services Provided to Medicaid and CHIP Beneficiaries [Dataset]. https://healthdata.gov/d/x6kx-6hpr
Explore at:
xml, csv, xlsxAvailable download formats
Dataset updated
Mar 28, 2023
Dataset provided by
data.medicaid.gov
Description
This data set includes monthly counts and rates (per 1,000 beneficiaries) of COVID-19 testing services provided to Medicaid and CHIP beneficiaries, by state.

These metrics are based on data in the T-MSIS Analytic Files (TAF). Some states have serious data quality issues for one or more months, making the data unusable for calculating COVID-19 testing services measures. To assess data quality, analysts adapted measures featured in the DQ Atlas. Data for a state and month are considered unusable if at least one of the following topics meets the DQ Atlas threshold for unusable: Total Medicaid and CHIP Enrollment, Procedure Codes - OT Professional, Claims Volume - OT. Please refer to the DQ Atlas at http://medicaid.gov/dq-atlas for more information about data quality assessment methods. Cells with a value of “DQ” indicate that data were suppressed due to unusable data.

Some cells have a value of “DS”. This indicates that data were suppressed for confidentiality reasons because the group included fewer than 11 beneficiaries.
Data from: Standard Quality Controlled Research Weather Data – USDA-ARS,...
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Standard Quality Controlled Research Weather Data – USDA-ARS, Bushland, Texas [Dataset]. https://catalog.data.gov/dataset/standard-quality-controlled-research-weather-data-usda-ars-bushland-texas-f4f0b
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Area covered
Texas, Bushland
Description
[ NOTE – 2022/05/06: this dataset supersedes the earlier versions https://doi.org/10.15482/USDA.ADC/1482548 and https://doi.org/10.15482/USDA.ADC/1526329 ]. This dataset contains 15-minute mean weather data from the USDA-ARS Conservation and Production Laboratory (CPRL), Soil and Water Management Research Unit (SWMRU) research weather station, Bushland, Texas (Lat. 35.186714°, Long. -102.094189°, elevation 1170 m above MSL) for all days in each year. The data are from sensors placed at 2-m height over a level, grass surface mowed to not exceed 12 cm height and irrigated and fertilized to maintain reference conditions as promulgated by Allen et al. (2005, 1998). Irrigation was by surface flood in 1989 through 1994, and by subsurface drip irrigation after 1994. Sensors were replicated and intercompared between replicates and with data from nearby weather stations, which were sometimes used for gap filling. Quality control and assurance methods are described by Evett et al. (2018). Data from a duplicate sensor were used to fill gaps in data from the primary sensor using appropriate regression relationships. Gap filling was also accomplished using sensors deployed at one of the four large weighing lysimeters immediately west of the weather station, or using sensors at other nearby stations when reliable regression relationships could be developed. The primary paper describes details of the sensors used and methods of testing, calibration, inter-comparison, and use. The weather data include air temperature (C) and relative humidity (%), wind speed (m/s), solar irradiance (W m-2), barometric pressure (kPa), and precipitation (rain and snow in mm). Because the large (3 m by 3 m surface area) weighing lysimeters are better rain gages than are tipping bucket gages, the 15-minute precipitation data are derived for each lysimeter from changes in lysimeter mass. The land slope is <0.3% and flat. The mean annual precipitation is ~470 mm, the 20-year pan evaporation record indicates ~2,600 mm Class A pan evaporation per year, and winds are typically from the South and Southwest. The climate is semi-arid with ~70% (350 mm) of the annual precipitation occurring from May to September, during which period the pan evaporation averages ~1520 mm. These datasets originate from research aimed at determining crop water use (ET), crop coefficients for use in ET-based irrigation scheduling based on a reference ET, crop growth, yield, harvest index, and crop water productivity as affected by irrigation method, timing, amount (full or some degree of deficit), agronomic practices, cultivar, and weather. The data have utility for testing simulation models of crop ET, growth, and yield and have been used by the Agricultural Model Intercomparison and Improvement Project (AgMIP), by OPENET, and by many others for testing, and calibrating models of ET that use satellite and/or weather data. See the README for details of each data resource.
Relevant number of data points, discrepancy type, number of discrepancies...
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vivienne X. Guan; Yasmine C. Probst; Elizabeth P. Neale; Linda C. Tapsell (2023). Relevant number of data points, discrepancy type, number of discrepancies and discrepancy rate. [Dataset]. http://doi.org/10.1371/journal.pone.0221047.t002
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0221047.t002
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Vivienne X. Guan; Yasmine C. Probst; Elizabeth P. Neale; Linda C. Tapsell
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Relevant number of data points, discrepancy type, number of discrepancies and discrepancy rate.
U
Water Quality Data from the Yukon River Basin in Alaska and Canada Data...
data.usgs.gov
search.dataone.org
+2more
Updated Nov 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nicole Herman-Mercer (2021). Water Quality Data from the Yukon River Basin in Alaska and Canada Data Quality Assurance Field Blanks [Dataset]. http://doi.org/10.5066/F77D2S7B
Explore at:
Unique identifier
https://doi.org/10.5066/F77D2S7B
Dataset updated
Nov 19, 2021
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Nicole Herman-Mercer
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Area covered
Yukon River, Canada, Alaska
Description
This dataset contains data collected from field blanks. Field blanks are deionized water processed in the field by community technicians using processing methods identical to those for surface water samples. Field blanks are then analyzed in the laboratory following procedures identical to those for surface water samples.
Data from: Behavioral Health Workforce: Quality Assurance Practices in...
data.virginia.gov
healthdata.gov
+1more
html
Updated Sep 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Substance Abuse and Mental Health Services Administration (2025). Behavioral Health Workforce: Quality Assurance Practices in Mental Health Treatment Facilities [Dataset]. https://data.virginia.gov/dataset/behavioral-health-workforce-quality-assurance-practices-in-mental-health-treatment-facilities
Explore at:
htmlAvailable download formats
Dataset updated
Sep 6, 2025
Dataset provided by
Substance Abuse and Mental Health Services Administrationhttps://www.samhsa.gov/
Description
This report examines the number, percentage, and characteristics of specialty mental health treatment facilities in the United States that use three quality assurance practices related to the behavioral health workforce as part of their standard operating procedures.
f
Data from: Comparison of statistical methods and the use of quality control...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 30, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Portier, Chris; Espín-Pérez, Almudena; de Kok, Theo M. C. M.; van Veldhoven, Karin; Chadeau-Hyam, Marc; Kleinjans, Jos C. S. (2018). Comparison of statistical methods and the use of quality control samples for batch effect correction in human transcriptome data [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000664759
Explore at:
Dataset updated
Aug 30, 2018
Authors
Portier, Chris; Espín-Pérez, Almudena; de Kok, Theo M. C. M.; van Veldhoven, Karin; Chadeau-Hyam, Marc; Kleinjans, Jos C. S.
Description
Batch effects are technical sources of variation introduced by the necessity of conducting gene expression analyses on different dates due to the large number of biological samples in population-based studies. The aim of this study is to evaluate the performances of linear mixed models (LMM) and Combat in batch effect removal. We also assessed the utility of adding quality control samples in the study design as technical replicates. In order to do so, we simulated gene expression data by adding “treatment” and batch effects to a real gene expression dataset. The performances of LMM and Combat, with and without quality control samples, are assessed in terms of sensitivity and specificity while correcting for the batch effect using a wide range of effect sizes, statistical noise, sample sizes and level of balanced/unbalanced designs. The simulations showed small differences among LMM and Combat. LMM identifies stronger relationships between big effect sizes and gene expression than Combat, while Combat identifies in general more true and false positives than LMM. However, these small differences can still be relevant depending on the research goal. When any of these methods are applied, quality control samples did not reduce the batch effect, showing no added value for including them in the study design.
r
Data from: Towards CRISP-ML(Q): A Machine Learning Process Model with...
resodate.org
Updated May 19, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stefan Studer; Thanh Binh Bui; Christian Drescher; Alexander Hanuschkin; Ludwig Winkler; Steven Peters; Klaus-Robert Müller (2021). Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology [Dataset]. http://doi.org/10.14279/depositonce-11926
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-11926
Dataset updated
May 19, 2021
Dataset provided by
DepositOnce
Technische Universität Berlin
Authors
Stefan Studer; Thanh Binh Bui; Christian Drescher; Alexander Hanuschkin; Ludwig Winkler; Steven Peters; Klaus-Robert Müller
Description
Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk of performance degradation over time. With each task of the process, this work proposes quality assurance methodology that is suitable to address challenges in machine learning development that are identified in the form of risks. The methodology is drawn from practical experience and scientific literature, and has proven to be general and stable. The process model expands on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks. The presented work proposes an industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance.

Facebook

Twitter

Click to copy link

Link copied

Cite

Regulator of Social Housing (2024). Data quality and methodology (TSM 2024) [Dataset]. https://www.gov.uk/government/statistics/data-quality-and-methodology-tsm-2024

Data quality and methodology (TSM 2024)

Explore at:

Dataset updated

Nov 26, 2024

Dataset provided by

GOV.UKhttp://gov.uk/

Authors

Regulator of Social Housing

Description

Introduction

This report describes the quality assurance arrangements for the registered provider (RP) Tenant Satisfaction Measures statistics, providing more detail on the regulatory and operational context for data collections which feed these statistics and the safeguards that aim to maximise data quality.

Background

The statistics we publish are based on data collected directly from local authority registered provider (LARPs) and from private registered providers (PRPs) through the Tenant Satisfaction Measures (TSM) return. We use the data collected through these returns extensively as a source of administrative data. The United Kingdom Statistics Authority (UKSA) encourages public bodies to use administrative data for statistical purposes and, as such, we publish these data.

These data are first being published in 2024, following the first collection and publication of the TSM.

Official Statistics in development status

In February 2018, the UKSA published the Code of Practice for Statistics. This sets standards for organisations producing and publishing statistics, ensuring quality, trustworthiness and value.

These statistics are drawn from our TSM data collection and are being published for the first time in 2024 as official statistics in development.

Official statistics in development are official statistics that are undergoing development. Over the next year we will review these statistics and consider areas for improvement to guidance, validations, data processing and analysis. We will also seek user feedback with a view to improving these statistics to meet user needs and to explore issues of data quality and consistency.

Change of designation name

Until September 2023, ‘official statistics in development’ were called ‘experimental statistics’. Further information can be found on the https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/guidetoofficialstatisticsindevelopment">Office for Statistics Regulation website.

User feedback

We are keen to increase the understanding of the data, including the accuracy and reliability, and the value to users. Please https://forms.office.com/e/cetNnYkHfL">complete the form or email feedback, including suggestions for improvements or queries as to the source data or processing to enquiries@rsh.gov.uk.

Publication schedule

We intend to publish these statistics in Autumn each year, with the data pre-announced in the release calendar.

All data and additional information (including a list of individuals (if any) with 24 hour pre-release access) are published on our statistics pages.

Quality assurance of administrative data

The data used in the production of these statistics are classed as administrative data. In 2015 the UKSA published a regulatory standard for the quality assurance of administrative data. As part of our compliance to the Code of Practice, and in the context of other statistics published by the UK Government and its agencies, we have determined that the statistics drawn from the TSMs are likely to be categorised as low-quality risk – medium public interest (with a requirement for basic/enhanced assurance).

The publication of these statistics can be considered as medium publi

Clear search

Close search

Google apps

Main menu

Data quality and methodology (TSM 2024)

Contents

Introduction

Background

Official Statistics in development status

Change of designation name

User feedback

Publication schedule

Quality assurance of administrative data

Data from: DATA QUALITY ON THE WEB: INTEGRATIVE REVIEW OF PUBLICATION...

data-quality-assessment-datasets

Dataset

Contents

Data from: Questions and responses to USGS-wide poll on quality assurance...

Data from: Statistical Process Control as a Tool for Quality Improvement A...

Data from: Data quality assurance and quality control measures in large...

Replication Data for: Quality control and correction method for air...

Manufacturing Defects

Data quality assurance at research data repositories: Survey data

Data from: Assessment of positional accuracy in spatial data using...

Hydroinformatics Instruction Module Example Code: Sensor Data Quality...

Data from: Encoding laboratory testing data: case studies of the national...

Test Data Management Market Analysis, Size, and Forecast 2025-2029: North...

Snapshot img

COVID Testing and Testing-Related Services Provided to Medicaid and CHIP...

Data from: Standard Quality Controlled Research Weather Data – USDA-ARS,...

Relevant number of data points, discrepancy type, number of discrepancies...

Water Quality Data from the Yukon River Basin in Alaska and Canada Data...

Data from: Behavioral Health Workforce: Quality Assurance Practices in...

Data from: Comparison of statistical methods and the use of quality control...

Data from: Towards CRISP-ML(Q): A Machine Learning Process Model with...

Data quality and methodology (TSM 2024)

Contents

Introduction

Background

Official Statistics in development status

Change of designation name

User feedback

Publication schedule

Quality assurance of administrative data