41 datasets found

A conceptual framework for quality assessment and management of biodiversity...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson (2023). A conceptual framework for quality assessment and management of biodiversity data [Dataset]. http://doi.org/10.1371/journal.pone.0178731
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0178731
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.
c
Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators,...
s.cnmilf.com
dataverse.harvard.edu
+5more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SEDAC (2025). Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators, Revision 11 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/gridded-population-of-the-world-version-4-gpwv4-data-quality-indicators-revision-11
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
SEDAC
Area covered
World
Description
The Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators, Revision 11 consists of three data layers created to provide context for the population count and density rasters, and explicit information on the spatial precision of the input boundary data. The Data Context raster explains pixels with a "0" population estimate in the population count and density rasters based on information included in the census documents, such as areas that are part of a national park, areas that have no households, etc. The Water Mask raster distinguishes between pixels that are completely water and/or ice (Total Water Pixels), pixels that contain water and land (Partial Water Pixels), pixels that are completely land (Total Land Pixels), and pixels that are completely ocean water (Ocean Pixels). The Mean Administrative Unit Area raster represents the mean input Unit size in square kilometers and provides a quantitative surface that indicates the size of the input Unit(s) from which population count and density rasters are created. The data files were produced as global rasters at 30 arc-second (~1 km at the equator) resolution. To enable faster global processing, and in support of research commUnities, the 30 arc-second data were aggregated to 2.5 arc-minute, 15 arc-minute, 30 arc-minute and 1 degree resolutions.
f
Tailored Site Data Quality Summaries.
datasetcatalog.nlm.nih.gov
figshare.com
Updated Jun 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chrischilles, Elizabeth A.; Davies, Amy Goodwin; Huang, Yungui; Forrest, Christopher B.; Dickinson, Kimberley; Walters, Kellie; Mendonca, Eneida A.; Hanauer, David; Matthews, Kevin; Bailey, L. Charles; Lehmann, Harold; Denburg, Michelle R.; Rosenman, Marc; Chen, Yong; Taylor, Bradley; Bunnell, H. Timothy; Katsoufis, Chryso; Razzaghi, Hanieh; Morse, Keith; Ilunga, K. T. Sandra; Boss, Samuel; Lemas, Dominick J.; Ranade, Daksha (2024). Tailored Site Data Quality Summaries. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001477156
Explore at:
Dataset updated
Jun 27, 2024
Authors
Chrischilles, Elizabeth A.; Davies, Amy Goodwin; Huang, Yungui; Forrest, Christopher B.; Dickinson, Kimberley; Walters, Kellie; Mendonca, Eneida A.; Hanauer, David; Matthews, Kevin; Bailey, L. Charles; Lehmann, Harold; Denburg, Michelle R.; Rosenman, Marc; Chen, Yong; Taylor, Bradley; Bunnell, H. Timothy; Katsoufis, Chryso; Razzaghi, Hanieh; Morse, Keith; Ilunga, K. T. Sandra; Boss, Samuel; Lemas, Dominick J.; Ranade, Daksha
Description
Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study’s outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.
Z
Conceptualization of public data ecosystems
data.niaid.nih.gov
Updated Sep 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
Explore at:
Dataset updated
Sep 26, 2024
Dataset provided by
University of Tartu
University of Hradec Králové
Authors
Anastasija, Nikiforova; Martin, Lnenicka
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

Description of the data in this data set

PublicDataEcosystem_SLR provides the structure of the protocol

Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

Spreadsheets #2 provides the protocol structure.

Spreadsheets #3 provides the filled protocol for relevant studies.

The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

Descriptive Information

Article number

A study number, corresponding to the study number assigned in an Excel worksheet

Complete reference

The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

Year of publication

The year in which the study was published.

Journal article / conference paper / book chapter

The type of the paper, i.e., journal article, conference paper, or book chapter.

Journal / conference / book

Journal article, conference, where the paper is published.

DOI / Website

A link to the website where the study can be found.

Number of words

A number of words of the study.

Number of citations in Scopus and WoS

The number of citations of the paper in Scopus and WoS digital libraries.

Availability in Open Access

Availability of a study in the Open Access or Free / Full Access.

Keywords

Keywords of the paper as indicated by the authors (in the paper).

Relevance for our study (high / medium / low)

What is the relevance level of the paper for our study

Approach- and research design-related information

Approach- and research design-related information

Objective / Aim / Goal / Purpose & Research Questions

The research objective and established RQs.

Research method (including unit of analysis)

The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

Study’s contributions

The study’s contribution as defined by the authors

Qualitative / quantitative / mixed method

Whether the study uses a qualitative, quantitative, or mixed methods approach?

Availability of the underlying research data

Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

Period under investigation

Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

Use of theory / theoretical concepts / approaches? If yes, specify them

Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

Quality-related information

Quality concerns

Whether there are any quality concerns (e.g., limited information about the research methods used)?

Public Data Ecosystem-related information

Public data ecosystem definition

How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

Public data ecosystem evolution / development

Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

What constitutes a public data ecosystem?

What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

Components and relationships

What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

Stakeholders

What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

Actors and their roles

What actors does the public data ecosystem involve? What are their roles?

Data (data types, data dynamism, data categories etc.)

What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

Processes / activities / dimensions, data lifecycle phases

What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

Level (if relevant)

What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

Other elements or relationships (if any)

What other elements or relationships does the public data ecosystem consist of?

Additional comments

Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

New papers

Does the study refer to any other potentially relevant papers?

Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

Format of the file.xls, .csv (for the first spreadsheet only), .docx

Licenses or restrictionsCC-BY

For more info, see README.txt
a
07.2 Assessing Data Quality using ArcGIS Data
training-iowadot.opendata.arcgis.com
hub.arcgis.com
+1more
Updated Feb 23, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Iowa Department of Transportation (2017). 07.2 Assessing Data Quality using ArcGIS Data [Dataset]. https://training-iowadot.opendata.arcgis.com/documents/c6c18d21a59a44588933122e2695022d
Explore at:
Dataset updated
Feb 23, 2017
Dataset authored and provided by
Iowa Department of Transportation
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this seminar, the presenter introduces essential concepts of ArcGIS Data Reviewer and highlights automated and semi-automated methods to streamline and expedite data validation.This seminar was developed to support the following:ArcGIS Desktop 10.3 (Basic, Standard, or Advanced)ArcGIS Server 10.3 Workgroup (Standard Or Advanced)ArcGIS Data Reviewer for DesktopArcGIS Data Reviewer for Server
COREQ checklist: Focus group for 'Streamlining Concept Mapping for Clinical...
zenodo.org
data.niaid.nih.gov
Updated Jul 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michele Zoch; Michele Zoch; Ines Reinecke; Ines Reinecke (2024). COREQ checklist: Focus group for 'Streamlining Concept Mapping for Clinical Data Enrichment: A Process-focused approach in medical Data Warehouses' [Dataset]. http://doi.org/10.5281/zenodo.10827367
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.10827367
Dataset updated
Jul 6, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michele Zoch; Michele Zoch; Ines Reinecke; Ines Reinecke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Presentation of the 32 items on the consolidated criteria for reporting qualitative research (COREQ) checklist. The information is used for the report on a focus group that was conducted as part of the preparation of a publication. The title of the article is (as of submission on 18.03.2024): 'Streamlining Concept Mapping for Clinical Data Enrichment: A Process-focused approach in Medical Data Warehouses'.
Data from: Improving the semantic quality of conceptual models through text...
figshare.com
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tom Willaert (2023). Improving the semantic quality of conceptual models through text mining. A proof of concept [Dataset]. http://doi.org/10.6084/m9.figshare.6951608.v1
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.6951608.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Tom Willaert
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Python code generated in the context of the dissertation 'Improving the semantic quality of conceptual models through text mining. A proof of concept' (Postgraduate studies Big Data & Analytics for Business and Management, KU Leuven Faculty of Economics and Business, 2018)
r
Data from: The story of product quality and its present day meaning
resodate.org
Updated Sep 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deepti Mahajan; Tim Cooper; David Smith (2022). The story of product quality and its present day meaning [Dataset]. http://doi.org/10.14279/depositonce-16166
Explore at:
Unique identifier
https://doi.org/10.14279/depositonce-16166
Dataset updated
Sep 8, 2022
Dataset provided by
Technische Universität Berlin
DepositOnce
Authors
Deepti Mahajan; Tim Cooper; David Smith
Description
An increase in the uptake of longer lasting products will be more likely if consumers associate longevity with quality, but this relationship has rarely been addressed by academics. To increase understanding in this area, this study explores how companies interpret and implement the concept of product quality. A literature review is used to provide a conceptual analysis of product quality and its evolution in management thinking. To explain the current notion of the concept, the paper discusses initial findings from interviews with informants in companies producing durable consumer goods. An argument is proposed that ideas of product quality have expanded to include aspects such as branding and marketing, and consequently there may be a need to revisit the concept in the light of these new developments. Furthermore, the paper’s purpose is to distinguish the concept of product quality from the quality of processes that build up a product’s quality, and to review the dimensions of product quality. Discussion on quality has evolved from a focus on production processes and employee training to customer satisfaction and delivering value. The paper also captures the influential role of marketing in incorporating the quality of products offered by companies and proposes a definition of product quality that forms a stance through which the concept can be studied further.
U
Data set used to develop a conceptual framework for effectively anticipating...
data.usgs.gov
search.dataone.org
+2more
Updated Jan 20, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Capel (2018). Data set used to develop a conceptual framework for effectively anticipating water-quality changes resulting from changes in agricultural activities [Dataset]. http://doi.org/10.5066/F75T3HN9
Explore at:
Unique identifier
https://doi.org/10.5066/F75T3HN9
Dataset updated
Jan 20, 2018
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Paul Capel
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 2013 - Dec 31, 2013
Description
This USGS data release contains 2013 streamflow, baseflow, and precipitation data from three hydrologically-diverse streams in the United States used to develop a conceptual framework for effectively anticipating water-quality changes resulting from changes in agricultural activities. The framework combined generalized concepts on the movement of water, the environmental behavior of chemicals and eroded soil, and the designed functions of various agricultural activities. The framework addresses the impacts on water quality of a broad range of agricultural chemicals and sediment across a variety of hydrologic settings. • Chesterville Branch near Crumpton, Maryland, (USGS site ID - 01493112) had substantial baseflow throughout the year with increased streamflow within a day of rainfall.
• Indian Creek at State Line RD, Leawood, Kansas (USGS site ID - 06893390) was a fastflow-dominated urban steam that was not well connected to shallow groundwater.
• The watershed of Leary-Weber ...
Data from: Statistical Process Control as a Tool for Quality Improvement A...
figshare.com
docx
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Canberk Elmalı; Özge Ural (2023). Statistical Process Control as a Tool for Quality Improvement A Case Study in Denim Pant Production [Dataset]. http://doi.org/10.6084/m9.figshare.22147508.v2
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22147508.v2
Dataset updated
Feb 23, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Canberk Elmalı; Özge Ural
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this paper, we show that concept of Statistical Process Control tools was thoroughly examined and the definitions of quality control concepts were presented. This is significant because of it is anticipated that this study will contribute to the literature as an exemplary application that demonstrates the role of statistical process control (SPC) tools in quality improvement in the evaluation and decision-making phase.

This is significant because of this study is to investigate applications of quality control, to clarify statistical control methods and problem-solving procedures, to generate proposals for problem-solving approaches, and to disseminate improvement studies in the ready-to-wear industry. The basic Statistical Process Control tools used in the study, the most repetitive faults were detected and these faults were divided into sub-headings for more detailed analysis. In this way, it was tried to prevent the repetition of faults by going down to the root causes of any detected fault. With this different perspective, it is expected that the study will contribute to other fields.

We give consent for the publication of identifiable details, which can include photograph(s) and case history and details within the text (“Material”) to be published in the Journal of Quality Technology. We confirm that have seen and been given the opportunity to read both the Material and the Article (as attached) to be published by Taylor & Francis.
f
DQD results of Format 3.
plos.figshare.com
xls
Updated Jan 6, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Melissa Finster; Maxim Moinat; Elham Taghizadeh (2025). DQD results of Format 3. [Dataset]. http://doi.org/10.1371/journal.pone.0311511.t006
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0311511.t006
Dataset updated
Jan 6, 2025
Dataset provided by
PLOS ONE
Authors
Melissa Finster; Maxim Moinat; Elham Taghizadeh
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ObjectiveThe German Health Data Lab is going to provide access to German statutory health insurance claims data ranging from 2009 to the present for research purposes. Due to evolving data formats within the German Health Data Lab, there is a need to standardize this data into a Common Data Model to facilitate collaborative health research and minimize the need for researchers to adapt to multiple data formats. For this purpose we selected transforming the data to the Observational Medical Outcomes Partnership Common Data Model.MethodsWe developed an Extract, Transform, and Load (ETL) pipeline for two distinct German Health Data Lab data formats: Format 1 (2009-2016) and Format 3 (2019 onwards). Due to the identical format structure of Format 1 and Format 2 (2017 -2018), the ETL pipeline of Format 1 can be applied on Format 2 as well. Our ETL process, supported by Observational Health Data Sciences and Informatics tools, includes specification development, SQL skeleton creation, and concept mapping. We detail the process characteristics and present a quality assessment that includes field coverage and concept mapping accuracy using example data.ResultsFor Format 1, we achieved a field coverage of 92.7%. The Data Quality Dashboard showed 100.0% conformance and 80.6% completeness, although plausibility checks were disabled. The mapping coverage for the Condition domain was low at 18.3% due to invalid codes and missing mappings in the provided example data. For Format 3, the field coverage was 86.2%, with Data Quality Dashboard reporting 99.3% conformance and 75.9% completeness. The Procedure domain had very low mapping coverage (2.2%) due to the use of mocked data and unmapped local concepts The Condition domain results with 99.8% of unique codes mapped. The absence of real data limits the comprehensive assessment of quality.ConclusionThe ETL process effectively transforms the data with high field coverage and conformance. It simplifies data utilization for German Health Data Lab users and enhances the use of OHDSI analysis tools. This initiative represents a significant step towards facilitating cross-border research in Europe by providing publicly available, standardized ETL processes (https://github.com/FraunhoferMEVIS/ETLfromHDLtoOMOP) and evaluations of their performance.
d
NCEI-generated data quality assurance descriptive statistics, images, and...
catalog.data.gov
gimi9.com
+2more
Updated Nov 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(Point of Contact) (2025). NCEI-generated data quality assurance descriptive statistics, images, and gridded Level-3 sea surface height anomaly and other parameters from the Jason-3 Level-2 final Geophysical Data Record (GDR) and interim GDR (IGDR) products from 2016-02-12 to 2021-01-12 (NCEI Accession 0225454) [Dataset]. https://catalog.data.gov/dataset/ncei-generated-data-quality-assurance-descriptive-statistics-images-and-gridded-level-3-sea-sur1
Explore at:
Dataset updated
Nov 1, 2025
Dataset provided by
(Point of Contact)
Description
The data quality monitoring system (DQMS) developed by the Satellite Oceanography Program at the NOAA National Centers for Environmental Information (NCEI) is based on the concept of a Rich Inventory developed by the previous NCEI Enterprise Data Systems Group. The principal concept of a Rich Inventory is to calculate the data Quality Assurance (QA) descriptive statistics for selected parameters in each Level-2 data file and publish the pre-generated images and NetCDF-format data to the public. The QA descriptive statistics include valid observation number, observation number over 3-sigma edited, minimum, maximum, mean, and standard deviation. The parameters include sea surface height anomaly, significant wave height, altimeter, and radiometer wind speed, radiometer water vapor content, and radiometer wet tropospheric correction from Jason-3 Level-2 Final Geophysical Data Record (GDR) and Interim Geophysical Data Record (IGDR) products.
e
Eximpedia Export Import Trade
eximpedia.app
Updated Oct 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Oct 3, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Belize, Bahamas, Ghana, China, Guatemala, Sint Eustatius and Saba, Seychelles, Nicaragua, Denmark, Kyrgyzstan
Description
Quality Concepts Manufacturing Inc Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
Data from: A Benchmark Suite for Systematically Evaluating Reasoning...
zenodo.org
data.niaid.nih.gov
zip
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano; Passerini Andrea; Passerini Andrea; Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano (2024). A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts [Dataset]. http://doi.org/10.5281/zenodo.11612556
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11612556
Dataset updated
Jun 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano; Passerini Andrea; Passerini Andrea; Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Codebase [Github] | Dataset [Zenodo]

Abstract

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available on Github.

Usage

We recommend visiting the official code website for instructions on how to use the dataset and accompaying software code.

License

All ready-made data sets and generated datasets are distributed under the CC-BY-SA 4.0 license, with the exception of Kand-Logic, which is derived from Kandinsky-patterns and as such is distributed under the GPL-3.0 license.

Datasets Overview

CLIP-embeddings. This folder contains the saved activations from a pretrained CLIP model applied to the tested dataset. It includes embeddings that represent the dataset in a format suitable for further analysis and experimentation.

BDD_OIA-original-dataset. This directory holds the original files from the X-OIA project by Xu et al. [1]. These datasets have been made publicly available for ease of access and further research. If you are going to use it, please consider citing the original authors.

kand-logic-3k. This folder contains all images generated for the Kand-Logic project. Each image is accompanied by annotations for both concepts and labels.

bbox-kand-logic-3k. In this directory, you will find images from the Kand-Logic project that have undergone a preprocessing step. These images are extracted based on bounding boxes, rescaled, and include annotations for concepts and labels.

sdd-oia. This folder includes all images and labels generated using rsbench.

sdd-oia-embeddings. This directory contains 512-dimensional embeddings extracted from a pretrained ResNet18 model on ImageNet. The embeddings are derived from the sdd-oia`dataset.

BDD-OIA-preprocessed. Here you will find preprocessed data that follow the methodology outlined by Sawada and Nakamura [2]. The folder contains 2048-dimensional embeddings extracted from a pretrained Faster-RCNN model on the BDD-100k dataset.

The original BDD datasets can be downloaded from the following Google Drive link: [Download BDD Dataset].

References

[1] Xu et al., *Explainable Object-Induced Action Decision for Autonomous Vehicles*, CVPR 2020.

[2] Sawada and Nakamura, *Concept Bottleneck Model With Additional Unsupervised Concepts*, IEEE 2022.
f
Data from: Concepts and Software Package for Efficient Quality Control in...
acs.figshare.com
zip
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Kuhring; Alina Eisenberger; Vanessa Schmidt; Nicolle Kränkel; David M. Leistner; Jennifer Kirwan; Dieter Beule (2023). Concepts and Software Package for Efficient Quality Control in Targeted Metabolomics Studies: MeTaQuaC [Dataset]. http://doi.org/10.1021/acs.analchem.0c00136.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1021/acs.analchem.0c00136.s001
Dataset updated
Jun 1, 2023
Dataset provided by
ACS Publications
Authors
Mathias Kuhring; Alina Eisenberger; Vanessa Schmidt; Nicolle Kränkel; David M. Leistner; Jennifer Kirwan; Dieter Beule
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
Targeted quantitative mass spectrometry metabolite profiling is the workhorse of metabolomics research. Robust and reproducible data are essential for confidence in analytical results and are particularly important with large-scale studies. Commercial kits are now available which use carefully calibrated and validated internal and external standards to provide such reliability. However, they are still subject to processing and technical errors in their use and should be subject to a laboratory’s routine quality assurance and quality control measures to maintain confidence in the results. We discuss important systematic and random measurement errors when using these kits and suggest measures to detect and quantify them. We demonstrate how wider analysis of the entire data set alongside standard analyses of quality control samples can be used to identify outliers and quantify systematic trends to improve downstream analysis. Finally, we present the MeTaQuaC software which implements the above concepts and methods for Biocrates kits and other target data sets and creates a comprehensive quality control report containing rich visualization and informative scores and summary statistics. Preliminary unsupervised multivariate analysis methods are also included to provide rapid insight into study variables and groups. MeTaQuaC is provided as an open source R package under a permissive MIT license and includes detailed user documentation.
Population with lack of material (at least 3, 4 concept) by number of...
ine.es
csv, html, json +4
Updated Oct 31, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
INE - Instituto Nacional de Estadística (2025). Population with lack of material (at least 3, 4 concept) by number of concept and sex [Dataset]. https://www.ine.es/jaxiT3/Tabla.htm?t=69472&L=1
Explore at:
text/pc-axis, csv, txt, json, xlsx, html, xlsAvailable download formats
Dataset updated
Oct 31, 2025
Dataset provided by
National Statistics Institutehttp://www.ine.es/
Authors
INE - Instituto Nacional de Estadística
License
https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal
Time period covered
Jan 1, 2004 - Jan 1, 2024
Variables measured
Sex, Type of data, Edad población, Number of concept, Quality of Life Indicator
Description
Quality of Life Indicators: Population with lack of material (at least 3, 4 concept) by number of concept and sex. Annual. National.
Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...
zenodo.org
bin, png, zip
Updated Jul 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
Explore at:
bin, png, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7778291
Dataset updated
Jul 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Technical notes and documentation on the common data model of the project CONCEPT-DM2.

This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

Aims of the CONCEPT-DM2 project:

General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

Main specific aims:

To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations

To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines

To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.

To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)

Exclusion criteria:

patients with no contact with the health system from 01/01/2017 to 31/12/2022

patients that had a T1D (DM1) code opened after the T2D code during the follow-up.

Study period. From 01/01/2017 to 31/12/2022

Files included in this publication:

Datamodel_CONCEPT_DM2_diagram.png

Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)

Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)

sample_data1_dm_patient.csv

sample_data2_dm_param.csv

sample_data3_dm_patient.csv

sample_data4_dm_param.csv

sample_data5_dm_patient.csv

sample_data6_dm_param.csv

sample_data7_dm_param.csv

sample_data8_dm_param.csv

Datamodel_CONCEPT_DM2_explanation.pptx
AART AI Safety
kaggle.com
zip
Updated Aug 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marília Prata (2024). AART AI Safety [Dataset]. https://www.kaggle.com/datasets/mpwolke/aart-ai-safety/code
Explore at:
zip(227163 bytes)Available download formats
Dataset updated
Aug 9, 2024
Authors
Marília Prata
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Image: https://github.com/google-research-datasets/aart-ai-safety-dataset

AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications (Radharapu et al. 2023)

Authors: Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, Preethi Lahoti

https://arxiv.org/abs/2311.08592

"Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. The authors introduced a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. They call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to define, scope and prioritize diversity within the application context. This feeds into a structured LLM-generation process that scales up evaluation priorities. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality."

kevinrobinson-at-elgoog

https://github.com/google-research-datasets/aart-ai-safety-dataset/blob/main/aart-v1-20231117.csv
e
Quality Concept Private Limited Export Import Data | Eximpedia
eximpedia.app
Updated Aug 24, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Quality Concept Private Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/quality-concept-private-limited/63268482
Explore at:
Dataset updated
Aug 24, 2023
Description
Quality Concept Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.
m
Comprehensive Process Model Quality Framework
data.mendeley.com
Updated Mar 12, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jan Claes (2018). Comprehensive Process Model Quality Framework [Dataset]. http://doi.org/10.17632/vh989pfrsn.1
Explore at:
Unique identifier
https://doi.org/10.17632/vh989pfrsn.1
Dataset updated
Mar 12, 2018
Authors
Jan Claes
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data related to the paper Everything you should know about process model quality- The Comprehensive Process Model Quality Framework

Facebook

Twitter

Click to copy link

Link copied

Cite

Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson (2023). A conceptual framework for quality assessment and management of biodiversity data [Dataset]. http://doi.org/10.1371/journal.pone.0178731

A conceptual framework for quality assessment and management of biodiversity data

Explore at:

42 scholarly articles cite this dataset (View in Google Scholar)

pdfAvailable download formats

Unique identifier

https://doi.org/10.1371/journal.pone.0178731

Dataset updated

Jun 1, 2023

Dataset provided by

PLOShttp://plos.org/

Authors

Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.

Clear search

Close search

Google apps

Main menu

A conceptual framework for quality assessment and management of biodiversity...

Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators,...

Tailored Site Data Quality Summaries.

Conceptualization of public data ecosystems

07.2 Assessing Data Quality using ArcGIS Data

COREQ checklist: Focus group for 'Streamlining Concept Mapping for Clinical...

Data from: Improving the semantic quality of conceptual models through text...

Data from: The story of product quality and its present day meaning

Data set used to develop a conceptual framework for effectively anticipating...

Data from: Statistical Process Control as a Tool for Quality Improvement A...

DQD results of Format 3.

NCEI-generated data quality assurance descriptive statistics, images, and...

Eximpedia Export Import Trade

Data from: A Benchmark Suite for Systematically Evaluating Reasoning...

Data from: Concepts and Software Package for Efficient Quality Control in...

Population with lack of material (at least 3, 4 concept) by number of...

Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

AART AI Safety

Quality Concept Private Limited Export Import Data | Eximpedia

Comprehensive Process Model Quality Framework

A conceptual framework for quality assessment and management of biodiversity data