The statistic shows the problems caused by poor quality data for enterprises in North America, according to a survey of North American IT executives conducted by 451 Research in 2015. As of 2015, 44 percent of respondents indicated that having poor quality data can result in extra costs for the business.
Data Quality identifies FMCSA resources for evaluating, monitoring, and improving the quality of data submitted by States to the Motor Carrier Management Information System (MCMIS).
https://www.cognitivemarketresearch.com/privacy-policyhttps://www.cognitivemarketresearch.com/privacy-policy
Global Data Quality Software market size 2025 was XX Million. Data Quality Software Industry compound annual growth rate (CAGR) will be XX% from 2025 till 2033.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Data Quality Vocabulary (DQV) is seen as an extension to DCAT to cover the quality of the data, how frequently is it updated, whether it accepts user corrections, persistence commitments etc. When used by publishers, this vocabulary will foster trust in the data amongst developers. @en
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data (112,013 records) comprises a spliced version of the 26 fields in tsv format and one field obtained from the XML files. The data are presented in the following six Recruitment Type categories: (1) Active, not recruiting (11,094 records), (2) Completed (67,294), (3) Enrolling by invitation (1022), (4) Recruiting (23,223), (5) Suspended (597), and (6) Terminated (8783). The sheets are numbered 1–6, respectively. The file is available at https://osf.io/jcb92 . (ZIP 4860 kb)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Records with a primary completion date and registered with an authority in the USA. The data (5101 records from Additional file 7: Table S5) were sorted into a “USA_PriComplDate” sheet for trials registered with at least one authority in the US, and a “USA_PriComplDate_leftovers” sheet with the remaining records. The data are presented in the following six Recruitment Type categories: (1) Active, not recruiting (1085 selected records with 135 leftovers), (2) Completed (1100; 940), (3) Enrolling by invitation (19; 23), (4) Recruiting (773; 609), (5) Suspended (59; 32), and (6) Terminated (252; 74). The sheets for these categories are numbered 1–6, respectively. (XLS 493 kb)
According to a survey conducted at the EmTech Digital conference in March 2019, U.S. business leaders shared their opinions on trust issues with regard to AI data quality and privacy. Nearly half of respondents reported a lack of trust in the quality of AI data in their companies, showing that there is still a long way to go to get quality AI data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
For more up to date quality metadata, please visit https://w3id.org/lodquator
This dataset is a collection of TRiG files with quality metadata for different datasets on the LOD cloud. Each dataset was assessed for
The length of URIs
Usage of RDF primitives
Re-use of existing terms
Usage of undefined terms
Usage of blank nodes
Indication for different serialisation formats
Usage of multiple languages
This data dump is part of the empirical study conducted for the paper "Are LOD Cloud Datasets Well Represented? A Data Representation Quality Survey."
For more information visit http://jerdeb.github.io/lodqa
The USACE IENCs coverage area consists of 7,260 miles across 21 rivers primarily located in the Central United States. IENCs apply to inland waterways that are maintained for navigation by USACE for shallow-draft vessels (e.g., maintained at a depth of 9-14 feet, dependent upon the waterway project authorization). Generally, IENCs are produced for those commercially navigable waterways which the National Oceanic and Atmospheric Administration (NOAA) does not produce Electronic Navigational Charts (ENCs). However, Special Purpose IENCs may be produced in agreement with NOAA. IENC POC: IENC_POC@usace.army.mil
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data quality flags generated for the Atmospheric Chemistry Experiment Fourier Transform Spectrometer (ACE-FTS) Level 2 (L2) version 5.2 data products. These data quality flags are generated using the technique described in Sheese et al. (2015). One netCDF file is produced for each species, isotopologue or parameter retrieved from the ACE-FTS spectra for version 5.2. Each file contains the data quality flags organized by occultation (orbit number and occultation type). Note, the ACE-FTS Level 2 version 5.2 profiles are not included in these files. The data quality flag files are updated monthly as new Level 2 version 5.2 data are produced for ACE-FTS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A table explaining how to solve data quality issues in digital citizen science. A total of 35 issues and 64 mechanisms to solve them are proposed
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Quality characteristics for 21586 river flow time series from 13 datasets worldwide. The 13 datasets are: the Global Runoff Database from the Global Runoff Data Center (GRDC), the Global River Discharge Data (RIVDIS; Vörösmarty et al., 1998), Surface-Water Data from the United States Geological Survey (USGS), HYDAT from the Water Survey of Canada (WSC), WISKI from the Swedish Meteorological and Hydrological Institute (SMHI), Hidroweb from the Brazilian National Water Agency (ANA), National data from the Australian Bureau of Meteorology (BOM), Spanish river flow data from the Ecological Transition Ministry (Spain), R-ArcticNet v. 4.0 from the Pan-Arctic Project Consortium (R-ArcticNet), Russian River data (NCAR-UCAR; Bodo, 2000), Chinese river flow data from the China Hydrology Data Project (CHDP; Henck et al., 2010, 2011), the European Water Archive from GRDC - EURO-FRIEND-Water (EWA), and the GEWEX Asian Monsoon Experiment (GAME) – Tropics dataset provided by the Royal Irrigation Department of Thailand. Quality characteristics are based on availability, outliers, homogeneity and trends: overall availability (%), longest availability (%), continuity (%), monthly availability (%), outliers ratio (%), homogeneity of annual flows (number of statistical tests agreeing), trend in annual flows, trend in one month of the year.
Bodo, B. (2000) Russian River Flow Data by Bodo. Boulder CO: Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory. Retrieved from http://rda.ucar.edu/datasets/ds553.1/
Henck, A. C., Huntington, K. W., Stone, J. O., Montgomery, D. R. & Hallet, B. (2011) Spatial controls on erosion in the Three Rivers Region, southeastern Tibet and southwestern China. Earth and Planetary Science Letters 303(1–2), 71–83. doi:10.1016/j.epsl.2010.12.038
Henck, A. C., Montgomery, David R., Huntington, K. W. & Liang, C. (2010) Monsoon control of effective discharge, Yunnan and Tibet. Geology 38(11), 975–978. doi:10.1130/G31444.1
Vörösmarty, C. J., Fekete, B. M. & Tucker, B. A. (1998) Global River Discharge, 1807-1991, V[ersion]. 1.1 (RivDIS). doi:10.3334/ornldaac/199
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The 10,572 rejected rows in Additional file 13: Table S11 came from 8907 unique NCT IDs. (ODS 203 kb)
NOAA Ship Henry B. Bigelow Underway Meteorological Data (delayed ~10 days for quality control) are from the Shipboard Automated Meteorological and Oceanographic System (SAMOS) program. IMPORTANT: ALWAYS USE THE QUALITY FLAG DATA! Each data variable's metadata includes a qcindex attribute which indicates a character number in the flag data. ALWAYS check the flag data for each row of data to see which data is good (flag='Z') and which data isn't. For example, to extract just data where time (qcindex=1), latitude (qcindex=2), longitude (qcindex=3), and airTemperature (qcindex=12) are 'good' data, include this constraint in your ERDDAP query: flag=~"ZZZ........Z." in your query. "=~" indicates this is a regular expression constraint. The 'Z's are literal characters. In this dataset, 'Z' indicates 'good' data. The '.'s say to match any character. The '' says to match the previous character 0 or more times. See the tutorial for regular expressions at https://www.vogella.com/tutorials/JavaRegularExpressions/article.html
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is the accompanying data for the paper "Analyzing Dataset Annotation Quality Management in the Wild". Data quality is crucial for training accurate, unbiased, and trustworthy machine learning models and their correct evaluation. Recent works, however, have shown that even popular datasets used to train and evaluate state-of-the-art models contain a non-negligible amount of erroneous annotations, bias or annotation artifacts. There exist best practices and guidelines regarding annotation projects. But to the best of our knowledge, no large-scale analysis has been performed as of yet on how quality management is actually conducted when creating natural language datasets and whether these recommendations are followed. Therefore, we first survey and summarize recommended quality management practices for dataset creation as described in the literature and provide suggestions on how to apply them. Then, we compile a corpus of 591 scientific publications introducing text datasets and annotate it for quality-related aspects, such as annotator management, agreement, adjudication or data validation. Using these annotations, we then analyze how quality management is conducted in practice. We find that a majority of the annotated publications apply good or very good quality management. However, we deem the effort of 30% of the works as only subpar. Our analysis also shows common errors, especially with using inter-annotator agreement and computing annotation error rates.
https://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/VDKYZZhttps://data.aussda.at/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.11587/VDKYZZ
The increasing popularity of online surveys in the social sciences led to an ongoing discussion about mode effects in survey research. The following article tests if commonly discussed mode-effects (e.g. sample differences, data quality; item-non response, social desirability and open-ended question) can indeed be reproduced in a non-experimental mixed-mode study. Using data from two non-full-probabilityrandom samples, collected via an online and face-to-face survey concerning itself with opinions on migration and refugees, most assumptions found in experimental literature can indeed be replicated via research data. Thus, the mode effects need to be accounted for if the usage of mixed-mode designs is necessary, especially if online surveys are involved.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Services Performed: Average Duration: South data was reported at 63.570 Hour in 2022. This records an increase from the previous number of 49.920 Hour for 2021. Services Performed: Average Duration: South data is updated yearly, averaging 69.630 Hour from Dec 2012 (Median) to 2022, with 11 observations. The data reached an all-time high of 111.490 Hour in 2016 and a record low of 22.000 Hour in 2020. Services Performed: Average Duration: South data remains active status in CEIC and is reported by Ministry of Cities. The data is categorized under Brazil Premium Database’s Environmental, Social and Governance Sector – Table BR.EVB013: Quality Indicators: Issues: Services Performed.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Intermittances: Economies Affected: North data was reported at 684.590 Unit in 2022. This records a decrease from the previous number of 978.390 Unit for 2021. Intermittances: Economies Affected: North data is updated yearly, averaging 983.600 Unit from Dec 2012 (Median) to 2022, with 11 observations. The data reached an all-time high of 13,085.960 Unit in 2019 and a record low of 83.490 Unit in 2013. Intermittances: Economies Affected: North data remains active status in CEIC and is reported by Ministry of Cities. The data is categorized under Brazil Premium Database’s Environmental, Social and Governance Sector – Table BR.EVB012: Quality Indicators: Issues: Intermittences.
Logistic regression classification models were fit to manually classified quality control (QC) LC-MS/MS datasets to develop a model that can predict whether a dataset is in or out of control. Model parameters were optimized by minimizing a loss function that accounts for the tradeoff between false positive and false negative errors. In addition to the 1152 training/testing datasets, we are including 2662 additional datasets, all of the same QC sample (whole cell lysate of Shewanella oneidensis). Datasets originate from 6 Thermo instrument platforms: Exactive, LTQ, VelosPro, Orbitrap, Q-Exactive, and Velos Orbitrap.
This raster dataset represents the agricultural census data quality for coffee crop yields. Data quality categories include (0= missing, 0.25= county level census data, 0.5= interpolated with census data from within 2 degrees of latitude/longitude, 0.75= state level census data, 1= country level census data). Croplands cover ~15 million km2 of the planet and provide the bulk of the food and fiber essential to human well-being. Most global land cover datasets from satelites group croplands into just a few categories, thereby excluding information that is critical for answering key questions ranging from biodiversity conservation to food security to biogeochemical cycling. Information about agricultural land use practices like crop selection, yield, and fertilizer use is even more limited.Here we present land use data sets created by combining national, state, and county level census statistics with a recently updated global data set of croplands on a 5 minute by 5 minute (~10km x 10 km) latitude/longitude grid. Temporal resolution: Year 2000- based of average of census data between 1997-2003.
The statistic shows the problems caused by poor quality data for enterprises in North America, according to a survey of North American IT executives conducted by 451 Research in 2015. As of 2015, 44 percent of respondents indicated that having poor quality data can result in extra costs for the business.