Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The EOSC-A FAIR Metrics and Data Quality Task Force (TF) supported the European Open Science Cloud Association (EOSC-A) by providing strategic directions on FAIRness (Findable, Accessible, Interoperable, and Reusable) and data quality. The Task Force conducted a survey using the EUsurvey tool between 15.11.2022 and 18.01.2023, targeting both developers and users of FAIR assessment tools. The survey aimed at supporting the harmonisation of FAIR assessments, in terms of what it evaluated and how, across existing (and future) tools and services, as well as explore if and how a community-driven governance on these FAIR assessments would look like. The survey received 78 responses, mainly from academia, representing various domains and organisational roles. This is the anonymised survey dataset in csv format; most open-ended answers have been dropped. The codebook contains variable names, labels, and frequencies.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
represent the quality metric results from the range test, represent the quality metric results from the null test, and represent the quality metric results from the averaging flag as shown in Table 1.The associated quality metrics for the 10-minute TRAAT average.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Metrics used to give an indication of data quality between our test’s groups. This includes whether documentation was used and what proportion of respondents rounded their answers. Unit and item non-response are also reported.
https://www.nist.gov/open/licensehttps://www.nist.gov/open/license
This repository contains the raw data and analysis scripts supporting the associated publication which introduces a framework to help researchers select fit-for-purpose microbial cell counting methods and optimize protocols for quantification of microbial total cells and viable cells. Escherichia coli cells were enumerated using four methods (colony forming unit assay, impedance flow cytometry - Multisizer 4, impedance flow cytometry - BactoBox, and fluorescent flow cytometry - CytoFLEX LX) and repeated on multiple dates. The experimental design for a single date starts with a cell stock that is divided into 18 sample replicates (3 each for 6 different dilution factors), and each sample is assayed one or two times for a total of 30 observations. Raw data files are provided from the Multisizer 4 (.#m4) and CytoFLEX LX (.fcs 3.0). The colony forming unit assay and BactoBox readings are recorded for each date as are the derived results from the Multisizer 4 and CytoFLEX LX. Also provided are an example analysis script for the *.fcs files and the statistical analysis that was performed.
The dataset contains quality, source code metrics information of 60 versions under 10 different repositories. The dataset is extracted into 3 levels: (1) Class (2) Method (3) Package. The dataset is created upon analyzing 9,420,246 lines of code and 173,237 classes. The provided dataset contains one quality_attributes folder and three associated files: repositories.csv, versions.csv, and attribute-details.csv. The first file (repositories.csv) contains general information(repository name, repository URL, number of commits, stars, forks, etc) in order to understand the size, popularity, and maintainability. File versions.csv contains general information (version unique ID, number of classes, packages, external classes, external packages, version repository link) to provide an overview of versions and how overtime the repository continues to grow. File attribute-details.csv contains detailed information (attribute name, attribute short form, category, and description) about extracted static analysis metrics and code quality attributes. The short form is used in the real dataset as a unique identifier to show value for packages, classes, and methods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains code quality information of more than 86 thousand GitHub repositories containing more than 1.1 billion lines of code mainly written in C# and Java. The code quality information contains detected 7 kinds of architecture smells, 19 kinds of design smells, and 11 kinds of implementation smells, and 27 commonly used code quality metrics computed at project, package, class, and method levels.
https://www.marketreportanalytics.com/privacy-policyhttps://www.marketreportanalytics.com/privacy-policy
The Data Quality Tools market is experiencing robust growth, fueled by the increasing volume and complexity of data across diverse industries. The market, currently valued at an estimated $XX million in 2025 (assuming a logically derived value based on a 17.5% CAGR from a 2019 base year), is projected to reach $YY million by 2033. This substantial expansion is driven by several key factors. Firstly, the rising adoption of cloud-based solutions offers enhanced scalability, flexibility, and cost-effectiveness, attracting both small and medium enterprises (SMEs) and large enterprises. Secondly, the growing need for regulatory compliance (e.g., GDPR, CCPA) necessitates robust data quality management, pushing organizations to invest in advanced tools. Further, the increasing reliance on data-driven decision-making across sectors like BFSI, healthcare, and retail necessitates high-quality, reliable data, thus boosting market demand. The preference for software solutions over on-premise deployments and the substantial investments in services aimed at data integration and cleansing contribute to this growth. However, certain challenges restrain market expansion. High initial investment costs, the complexity of implementation, and the need for skilled professionals to manage these tools can act as barriers for some organizations, particularly SMEs. Furthermore, concerns related to data security and privacy continue to impact adoption rates. Despite these challenges, the long-term outlook for the Data Quality Tools market remains positive, driven by the ever-increasing importance of data quality in a rapidly digitalizing world. The market segmentation highlights significant opportunities across different deployment models, organizational sizes, and industry verticals, suggesting diverse avenues for growth and innovation in the coming years. Competition among established players like IBM, Informatica, and Oracle, alongside emerging players, is intensifying, driving innovation and providing diverse solutions to meet varied customer needs. Recent developments include: September 2022: MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) spin-off DataCebo announced the launch of a new tool, dubbed Synthetic Data (SD) Metrics, to help enterprises compare the quality of machine-generated synthetic data by pitching it against real data sets., May 2022: Pyramid Analytics, which developed its flagship platform, Pyramids Decision Intelligence, announced that it raised USD 120 million in a Series E round of funding. The Pyramid Decision Intelligence platform combines business analytics, data preparation, and data science capabilities with AI guidance functionality. It enables governed self-service analytics in a no-code environment.. Key drivers for this market are: Increasing Use of External Data Sources Owing to Mobile Connectivity Growth. Potential restraints include: Increasing Use of External Data Sources Owing to Mobile Connectivity Growth. Notable trends are: Healthcare is Expected to Witness Significant Growth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the replication data for our research article "ParSetgnostics: Quality Metrics for Parallel Sets." It contains the datasets used to obtain optimized Parallel Sets visualizations. We used the following six datasets for our experiments, which we describe on a per-file basis. All datasets are purely categorical datasets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 3. An example for data quality management of tweets. Example code can be compared with the corresponding data views (Figs. 14, 15).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Additional file 2. The file contains quality metrics for evaluation of timeliness, relevancy, and popularity.
This data release documents spatiotemporal water-quality, landscape, and climatic conditions in Fairfax County, Virginia from 2007 through 2018. These data were used to evaluate the water-quality and ecological condition of 20 Fairfax County watersheds monitored since 2007. Data include measures of water-quality, precipitation, air temperature, land use, land cover, wastewater and stormwater infrastructure, soil properties, geologic setting, and stream networks. Annual values from 2007 through 2018 are reported for data expected to change over time. Watershed-specific values are reported for data that differ across the landscape. Annual values for the 20 study watersheds and Fairfax County are reported in the file "Spatiotemporal Predictors.csv". A description of each spatiotemporal variable, including data sources, are provided in the file "Data Dictionary.csv".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We developed a suite of quality metrics that characterize the annual and year-to-year frequency of satellite observations, their year-to-year recurrence, and their within-year distribution. These equate the quality of individual observations as reported by the data providers, and limitations to the usability of these data caused by cloud cover. This dataset includes a zip file for each of the following metrics:
nrTiles - Number of Landsat tiles used to compute quality metric.
imageFrequency - Number of collected images.
monthFrequency - Number of months with collected images.
maxQuality - Maximum image quality.
totalQuality - Number of collected images, weighted by the quality of each image.
lastYear - Closest year with usable data, from the start of the time-series to the reference year.
distributionBalance - Average of the maximum monthly image quality.
distributionQuality - Ratio between the distribution balance of the first and second half of the year.
We calculated these quality metrics for each descending tile as drawn in the World Reference System 2 (WRS-2), and for each year. Then, for each year, we combined the tile-specific metrics by averaging them into global grids with a 1-km resolution using the script "map_landsat_quality.py". This was executed in python 3.10. and the associated module requirements are recorded in the file "requirements.txt". All tile-specific metrics, which is an input for the python script, are provided through the file "LTQA_metadata.csv".
The calculation of quality metrics is informed by metadata of all unique acquisitions obtained through the Landsat’s bulk metadata service. This considers images acquired with Landsat 4, 5, 7, 8, and 9, but disregards those from Landsat’s Multispectral Scanner System (MSS). The original metadata is provided through the file "LTQA_metadata.zip", which also includes the R code used in the calculation of quality metrics, and a data structure that can be updated to generate new values.
We calculated these metrics are calculated annually discounting those unusable due to 100% cloud cover or advanced image degradation.
This repository provides metrics of water quality and quantity of 1386 German catchments based on time series of the "WQQDB - water quality and quantity data base Germany" (Musolff, 2020) subsetting years from 2000 to 2015. The data from this repository were created for Ebeling et al. (2021). Selection criteria and results are presented more in depth therein. Natural and anthropogenic catchment characteristics are provided in another, linked repository "CCDB - catchment characteristics data base Germany" (https://doi.org/10.4211/hs.0fc1b5b1be4a475aacfd9545e72e6839). Both repositories use the same unique identifier OBJECTID for each water quality station.
This repository includes:
1.) Data table with unique identifier (OBJECTID), station name and calculated metrics of water quality dynamics for nitrate (NO3-N), phosphate (PO4-P) and total organic carbon (TOC). The metrics are mean concentrations, the slope b of the concentration (C) - discharge (Q) regression in logspace, the corresponding R2, and the ratio of the coefficients of variation CVC/CVQ. The data table also includes a flag "indep" for the independence of catchments (max. 20% area overlap with each of its subcatchments) including criteria (e.g. priority of C-Q catchments) as described in Ebeling et al. (2021). Accordingly 787 catchments are considered as independent. 2.) Readme file providing information on provided data.
Reference: Ebeling, P., Kumar, R., Weber, M., Knoll, L., Fleckenstein, J. H., & Musolff, A. (2021). Archetypes and Controls of Riverine Nutrient Export Across German Catchments. Water Resources Research, 57, e2020WR028134. https://doi.org/10.1029/2020WR028134
Conditions: Please, reference both the original data publisher and this repository for correct acknowledgements, when using the provided data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Previous studies have investigated the reasons behind refactoring operations performed by developers, and proposed methods and tools to recommend refactorings based on quality metric profiles, or on the presence of poor design and implementation choices, i.e., code smells. Nevertheless, the existing literature lacks of observations about the relations between metrics/code smells and refactoring operations performed by de- velopers. In other words, the characteristics of code components pushing developers to refactor them are still unknown. This paper aims at bridging this gap by analyzing which code characteristics trigger the developers refactoring attentions. Specifically, we mined the evolution history of three Java open source projects to investigate whether developers refactoring activities occur on code components for which cer- tain indicators—such as quality metrics or the presence of smells as detected by tools—suggest there might be need for refactoring operations. Results indicate that, more often than not, quality metrics do not show a clear relationship with refactoring. In other words, refactoring operations performed by developers are generally focused on code components for which quality metrics do not suggest there might be need for refactoring operations. Finally, 42% of refactoring operations are performed on code entities affected by code smells. However, only 7% of the performed operations actually remove the code smells from the affected class.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The Data Quality Management (DQM) market is experiencing robust growth, driven by the increasing volume and velocity of data generated across various industries. Businesses are increasingly recognizing the critical need for accurate, reliable, and consistent data to support critical decision-making, improve operational efficiency, and comply with stringent data regulations. The market is estimated to be valued at $15 billion in 2025, exhibiting a Compound Annual Growth Rate (CAGR) of 12% from 2025 to 2033. This growth is fueled by several key factors, including the rising adoption of cloud-based DQM solutions, the expanding use of advanced analytics and AI in data quality processes, and the growing demand for data governance and compliance solutions. The market is segmented by deployment (cloud, on-premises), organization size (small, medium, large enterprises), and industry vertical (BFSI, healthcare, retail, etc.), with the cloud segment exhibiting the fastest growth. Major players in the DQM market include Informatica, Talend, IBM, Microsoft, Oracle, SAP, SAS Institute, Pitney Bowes, Syncsort, and Experian, each offering a range of solutions catering to diverse business needs. These companies are constantly innovating to provide more sophisticated and integrated DQM solutions incorporating machine learning, automation, and self-service capabilities. However, the market also faces some challenges, including the complexity of implementing DQM solutions, the lack of skilled professionals, and the high cost associated with some advanced technologies. Despite these restraints, the long-term outlook for the DQM market remains positive, with continued expansion driven by the expanding digital transformation initiatives across industries and the growing awareness of the significant return on investment associated with improved data quality.
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset provides extensive data from large-scale subjective experiments, encompassing MOS values, eye-tracking data, and raw subjective scores, collected from two laboratories Brno University of Technology and Czech Technical University in Prague (BUT and CTU). This new dataset serves as a comprehensive foundation for future research endeavors into novel objective quality metrics for omnidirectional IQA (OIQA).
▷ contact: xsimka01@vut.cz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource provides files used for a quality assessment of an RDF social media archive where quality information is described using the Data Quality Vocabulary (DQV)and linked to RDF validation rules expressed in W3C SHACL.The actual quality assessment is then performed as SPARQL query on these sources.The research activities were supported by the Belgian Federal Science Policy Office (BELSPO) BRAIN 2.0 Research Project BESOCIAL, Ghent University, imec.More information in the file README.md
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Dataset from the second part of the Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment as health information source), at FEUP, in 2021. It contains the data collected to assess Wikipedia health-related articles for the 1000 most viewed articles listed by WikiProject Medicine, in English. The MediaWiki API was used to collect the current state of the article’s contents and its metadata, revision history, language links, internal wiki links, and external links. Data not available through the API was obtained from the article’s markup. Besides the 7 metrics defined by Stvilia et al., other four proposed metrics and respective features were assessed. This dataset can be used to analyze quality, but also other quantitative aspects of health-related articles from EnglishWikipedia.
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
The global data quality tools industry size was valued at USD XX million in 2025 and is expected to expand at a CAGR of 17.50% over the forecast period (2025-2033). Growing data volumes and the need for accurate, reliable data for decision-making are driving demand for data quality tools. These tools help organizations clean, standardize, and transform data to improve its quality and usability. Key industry trends include the rise of cloud-based data quality tools, the growing adoption of machine learning and artificial intelligence (AI) for data quality automation, and the increasing focus on data governance and compliance. The market is highly competitive, with several established vendors and emerging startups offering a range of data quality solutions. Some of the major players in the industry include SAS Institute Inc., Ataccama Corporation, Experian PLC, IBM Corporation, Pitney Bowes Inc., Information Builders Inc., Syncsort Inc., Oracle Corporation, Informatica LLC, Talend Inc., and SAP SE. The data quality tools market is a rapidly growing industry, driven by the increasing need for businesses to improve the quality of their data. In 2023, the market is expected to be worth $3.5 billion, and it is projected to grow to $6.5 billion by 2028, at a CAGR of 12.3%. The market is highly concentrated, with the top five vendors accounting for over 50% of the market share. The leading vendors include SAS Institute Inc, Ataccama Corporatio, Experian PLC, IBM Corporation, and Pitney Bowes Inc. The market is characterized by innovation, with new products and technologies being introduced regularly. The key market trends include the adoption of cloud-based solutions, the use of artificial intelligence (AI) and machine learning (ML) to improve data quality, and the growing importance of data governance. North America is the largest region for the data quality tools market, followed by Europe and Asia Pacific. The key end-user verticals include BFSI, government, IT & telecom, and retail and e-commerce. The market is expected to be driven by the increasing need for businesses to improve the quality of their data, the adoption of cloud-based solutions, and the use of AI and ML to improve data quality. The challenges and restraints include the lack of skilled professionals, the complexity of data quality tools, and the cost of implementation. Recent developments include: September 2022: MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) spin-off DataCebo announced the launch of a new tool, dubbed Synthetic Data (SD) Metrics, to help enterprises compare the quality of machine-generated synthetic data by pitching it against real data sets., May 2022: Pyramid Analytics, which developed its flagship platform, Pyramids Decision Intelligence, announced that it raised USD 120 million in a Series E round of funding. The Pyramid Decision Intelligence platform combines business analytics, data preparation, and data science capabilities with AI guidance functionality. It enables governed self-service analytics in a no-code environment.. Key drivers for this market are: Increasing Use of External Data Sources Owing to Mobile Connectivity Growth. Potential restraints include: Lack of information and Awareness about the Solutions Among Potential Users. Notable trends are: Healthcare is Expected to Witness Significant Growth.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The provided dataset contains the data used by "Towards Understanding the Impact of Code Modifications on Software Quality Metrics", in order to examine the impact of code changes in software quality metrics and identify types of code changes with similar impact and the results obtained.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The EOSC-A FAIR Metrics and Data Quality Task Force (TF) supported the European Open Science Cloud Association (EOSC-A) by providing strategic directions on FAIRness (Findable, Accessible, Interoperable, and Reusable) and data quality. The Task Force conducted a survey using the EUsurvey tool between 15.11.2022 and 18.01.2023, targeting both developers and users of FAIR assessment tools. The survey aimed at supporting the harmonisation of FAIR assessments, in terms of what it evaluated and how, across existing (and future) tools and services, as well as explore if and how a community-driven governance on these FAIR assessments would look like. The survey received 78 responses, mainly from academia, representing various domains and organisational roles. This is the anonymised survey dataset in csv format; most open-ended answers have been dropped. The codebook contains variable names, labels, and frequencies.