41 datasets found
  1. A conceptual framework for quality assessment and management of biodiversity...

    • plos.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson (2023). A conceptual framework for quality assessment and management of biodiversity data [Dataset]. http://doi.org/10.1371/journal.pone.0178731
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.

  2. c

    Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators,...

    • s.cnmilf.com
    • dataverse.harvard.edu
    • +5more
    Updated Aug 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SEDAC (2025). Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators, Revision 11 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/gridded-population-of-the-world-version-4-gpwv4-data-quality-indicators-revision-11
    Explore at:
    Dataset updated
    Aug 23, 2025
    Dataset provided by
    SEDAC
    Area covered
    World
    Description

    The Gridded Population of the World, Version 4 (GPWv4): Data Quality Indicators, Revision 11 consists of three data layers created to provide context for the population count and density rasters, and explicit information on the spatial precision of the input boundary data. The Data Context raster explains pixels with a "0" population estimate in the population count and density rasters based on information included in the census documents, such as areas that are part of a national park, areas that have no households, etc. The Water Mask raster distinguishes between pixels that are completely water and/or ice (Total Water Pixels), pixels that contain water and land (Partial Water Pixels), pixels that are completely land (Total Land Pixels), and pixels that are completely ocean water (Ocean Pixels). The Mean Administrative Unit Area raster represents the mean input Unit size in square kilometers and provides a quantitative surface that indicates the size of the input Unit(s) from which population count and density rasters are created. The data files were produced as global rasters at 30 arc-second (~1 km at the equator) resolution. To enable faster global processing, and in support of research commUnities, the 30 arc-second data were aggregated to 2.5 arc-minute, 15 arc-minute, 30 arc-minute and 1 degree resolutions.

  3. f

    Tailored Site Data Quality Summaries.

    • datasetcatalog.nlm.nih.gov
    • figshare.com
    Updated Jun 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chrischilles, Elizabeth A.; Davies, Amy Goodwin; Huang, Yungui; Forrest, Christopher B.; Dickinson, Kimberley; Walters, Kellie; Mendonca, Eneida A.; Hanauer, David; Matthews, Kevin; Bailey, L. Charles; Lehmann, Harold; Denburg, Michelle R.; Rosenman, Marc; Chen, Yong; Taylor, Bradley; Bunnell, H. Timothy; Katsoufis, Chryso; Razzaghi, Hanieh; Morse, Keith; Ilunga, K. T. Sandra; Boss, Samuel; Lemas, Dominick J.; Ranade, Daksha (2024). Tailored Site Data Quality Summaries. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001477156
    Explore at:
    Dataset updated
    Jun 27, 2024
    Authors
    Chrischilles, Elizabeth A.; Davies, Amy Goodwin; Huang, Yungui; Forrest, Christopher B.; Dickinson, Kimberley; Walters, Kellie; Mendonca, Eneida A.; Hanauer, David; Matthews, Kevin; Bailey, L. Charles; Lehmann, Harold; Denburg, Michelle R.; Rosenman, Marc; Chen, Yong; Taylor, Bradley; Bunnell, H. Timothy; Katsoufis, Chryso; Razzaghi, Hanieh; Morse, Keith; Ilunga, K. T. Sandra; Boss, Samuel; Lemas, Dominick J.; Ranade, Daksha
    Description

    Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study’s outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.

  4. Z

    Conceptualization of public data ecosystems

    • data.niaid.nih.gov
    Updated Sep 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anastasija, Nikiforova; Martin, Lnenicka (2024). Conceptualization of public data ecosystems [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_13842001
    Explore at:
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    University of Tartu
    University of Hradec Králové
    Authors
    Anastasija, Nikiforova; Martin, Lnenicka
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains data collected during a study "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems" conducted by Martin Lnenicka (University of Hradec Králové, Czech Republic), Anastasija Nikiforova (University of Tartu, Estonia), Mariusz Luterek (University of Warsaw, Warsaw, Poland), Petar Milic (University of Pristina - Kosovska Mitrovica, Serbia), Daniel Rudmark (Swedish National Road and Transport Research Institute, Sweden), Sebastian Neumaier (St. Pölten University of Applied Sciences, Austria), Karlo Kević (University of Zagreb, Croatia), Anneke Zuiderwijk (Delft University of Technology, Delft, the Netherlands), Manuel Pedro Rodríguez Bolívar (University of Granada, Granada, Spain).

    As there is a lack of understanding of the elements that constitute different types of value-adding public data ecosystems and how these elements form and shape the development of these ecosystems over time, which can lead to misguided efforts to develop future public data ecosystems, the aim of the study is: (1) to explore how public data ecosystems have developed over time and (2) to identify the value-adding elements and formative characteristics of public data ecosystems. Using an exploratory retrospective analysis and a deductive approach, we systematically review 148 studies published between 1994 and 2023. Based on the results, this study presents a typology of public data ecosystems and develops a conceptual model of elements and formative characteristics that contribute most to value-adding public data ecosystems, and develops a conceptual model of the evolutionary generation of public data ecosystems represented by six generations called Evolutionary Model of Public Data Ecosystems (EMPDE). Finally, three avenues for a future research agenda are proposed.

    This dataset is being made public both to act as supplementary data for "Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystems ", Telematics and Informatics*, and its Systematic Literature Review component that informs the study.

    Description of the data in this data set

    PublicDataEcosystem_SLR provides the structure of the protocol

    Spreadsheet#1 provides the list of results after the search over three indexing databases and filtering out irrelevant studies

    Spreadsheets #2 provides the protocol structure.

    Spreadsheets #3 provides the filled protocol for relevant studies.

    The information on each selected study was collected in four categories:(1) descriptive information,(2) approach- and research design- related information,(3) quality-related information,(4) HVD determination-related information

    Descriptive Information

    Article number

    A study number, corresponding to the study number assigned in an Excel worksheet

    Complete reference

    The complete source information to refer to the study (in APA style), including the author(s) of the study, the year in which it was published, the study's title and other source information.

    Year of publication

    The year in which the study was published.

    Journal article / conference paper / book chapter

    The type of the paper, i.e., journal article, conference paper, or book chapter.

    Journal / conference / book

    Journal article, conference, where the paper is published.

    DOI / Website

    A link to the website where the study can be found.

    Number of words

    A number of words of the study.

    Number of citations in Scopus and WoS

    The number of citations of the paper in Scopus and WoS digital libraries.

    Availability in Open Access

    Availability of a study in the Open Access or Free / Full Access.

    Keywords

    Keywords of the paper as indicated by the authors (in the paper).

    Relevance for our study (high / medium / low)

    What is the relevance level of the paper for our study

    Approach- and research design-related information

    Approach- and research design-related information

    Objective / Aim / Goal / Purpose & Research Questions

    The research objective and established RQs.

    Research method (including unit of analysis)

    The methods used to collect data in the study, including the unit of analysis that refers to the country, organisation, or other specific unit that has been analysed such as the number of use-cases or policy documents, number and scope of the SLR etc.

    Study’s contributions

    The study’s contribution as defined by the authors

    Qualitative / quantitative / mixed method

    Whether the study uses a qualitative, quantitative, or mixed methods approach?

    Availability of the underlying research data

    Whether the paper has a reference to the public availability of the underlying research data e.g., transcriptions of interviews, collected data etc., or explains why these data are not openly shared?

    Period under investigation

    Period (or moment) in which the study was conducted (e.g., January 2021-March 2022)

    Use of theory / theoretical concepts / approaches? If yes, specify them

    Does the study mention any theory / theoretical concepts / approaches? If yes, what theory / concepts / approaches? If any theory is mentioned, how is theory used in the study? (e.g., mentioned to explain a certain phenomenon, used as a framework for analysis, tested theory, theory mentioned in the future research section).

    Quality-related information

    Quality concerns

    Whether there are any quality concerns (e.g., limited information about the research methods used)?

    Public Data Ecosystem-related information

    Public data ecosystem definition

    How is the public data ecosystem defined in the paper and any other equivalent term, mostly infrastructure. If an alternative term is used, how is the public data ecosystem called in the paper?

    Public data ecosystem evolution / development

    Does the paper define the evolution of the public data ecosystem? If yes, how is it defined and what factors affect it?

    What constitutes a public data ecosystem?

    What constitutes a public data ecosystem (components & relationships) - their "FORM / OUTPUT" presented in the paper (general description with more detailed answers to further additional questions).

    Components and relationships

    What components does the public data ecosystem consist of and what are the relationships between these components? Alternative names for components - element, construct, concept, item, helix, dimension etc. (detailed description).

    Stakeholders

    What stakeholders (e.g., governments, citizens, businesses, Non-Governmental Organisations (NGOs) etc.) does the public data ecosystem involve?

    Actors and their roles

    What actors does the public data ecosystem involve? What are their roles?

    Data (data types, data dynamism, data categories etc.)

    What data do the public data ecosystem cover (is intended / designed for)? Refer to all data-related aspects, including but not limited to data types, data dynamism (static data, dynamic, real-time data, stream), prevailing data categories / domains / topics etc.

    Processes / activities / dimensions, data lifecycle phases

    What processes, activities, dimensions and data lifecycle phases (e.g., locate, acquire, download, reuse, transform, etc.) does the public data ecosystem involve or refer to?

    Level (if relevant)

    What is the level of the public data ecosystem covered in the paper? (e.g., city, municipal, regional, national (=country), supranational, international).

    Other elements or relationships (if any)

    What other elements or relationships does the public data ecosystem consist of?

    Additional comments

    Additional comments (e.g., what other topics affected the public data ecosystems and their elements, what is expected to affect the public data ecosystems in the future, what were important topics by which the period was characterised etc.).

    New papers

    Does the study refer to any other potentially relevant papers?

    Additional references to potentially relevant papers that were found in the analysed paper (snowballing).

    Format of the file.xls, .csv (for the first spreadsheet only), .docx

    Licenses or restrictionsCC-BY

    For more info, see README.txt

  5. a

    07.2 Assessing Data Quality using ArcGIS Data

    • training-iowadot.opendata.arcgis.com
    • hub.arcgis.com
    • +1more
    Updated Feb 23, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iowa Department of Transportation (2017). 07.2 Assessing Data Quality using ArcGIS Data [Dataset]. https://training-iowadot.opendata.arcgis.com/documents/c6c18d21a59a44588933122e2695022d
    Explore at:
    Dataset updated
    Feb 23, 2017
    Dataset authored and provided by
    Iowa Department of Transportation
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this seminar, the presenter introduces essential concepts of ArcGIS Data Reviewer and highlights automated and semi-automated methods to streamline and expedite data validation.This seminar was developed to support the following:ArcGIS Desktop 10.3 (Basic, Standard, or Advanced)ArcGIS Server 10.3 Workgroup (Standard Or Advanced)ArcGIS Data Reviewer for DesktopArcGIS Data Reviewer for Server

  6. COREQ checklist: Focus group for 'Streamlining Concept Mapping for Clinical...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michele Zoch; Michele Zoch; Ines Reinecke; Ines Reinecke (2024). COREQ checklist: Focus group for 'Streamlining Concept Mapping for Clinical Data Enrichment: A Process-focused approach in medical Data Warehouses' [Dataset]. http://doi.org/10.5281/zenodo.10827367
    Explore at:
    Dataset updated
    Jul 6, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michele Zoch; Michele Zoch; Ines Reinecke; Ines Reinecke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Presentation of the 32 items on the consolidated criteria for reporting qualitative research (COREQ) checklist. The information is used for the report on a focus group that was conducted as part of the preparation of a publication. The title of the article is (as of submission on 18.03.2024): 'Streamlining Concept Mapping for Clinical Data Enrichment: A Process-focused approach in Medical Data Warehouses'.

  7. Data from: Improving the semantic quality of conceptual models through text...

    • figshare.com
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tom Willaert (2023). Improving the semantic quality of conceptual models through text mining. A proof of concept [Dataset]. http://doi.org/10.6084/m9.figshare.6951608.v1
    Explore at:
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Tom Willaert
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Python code generated in the context of the dissertation 'Improving the semantic quality of conceptual models through text mining. A proof of concept' (Postgraduate studies Big Data & Analytics for Business and Management, KU Leuven Faculty of Economics and Business, 2018)

  8. r

    Data from: The story of product quality and its present day meaning

    • resodate.org
    Updated Sep 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deepti Mahajan; Tim Cooper; David Smith (2022). The story of product quality and its present day meaning [Dataset]. http://doi.org/10.14279/depositonce-16166
    Explore at:
    Dataset updated
    Sep 8, 2022
    Dataset provided by
    Technische Universität Berlin
    DepositOnce
    Authors
    Deepti Mahajan; Tim Cooper; David Smith
    Description

    An increase in the uptake of longer lasting products will be more likely if consumers associate longevity with quality, but this relationship has rarely been addressed by academics. To increase understanding in this area, this study explores how companies interpret and implement the concept of product quality. A literature review is used to provide a conceptual analysis of product quality and its evolution in management thinking. To explain the current notion of the concept, the paper discusses initial findings from interviews with informants in companies producing durable consumer goods. An argument is proposed that ideas of product quality have expanded to include aspects such as branding and marketing, and consequently there may be a need to revisit the concept in the light of these new developments. Furthermore, the paper’s purpose is to distinguish the concept of product quality from the quality of processes that build up a product’s quality, and to review the dimensions of product quality. Discussion on quality has evolved from a focus on production processes and employee training to customer satisfaction and delivering value. The paper also captures the influential role of marketing in incorporating the quality of products offered by companies and proposes a definition of product quality that forms a stance through which the concept can be studied further.

  9. U

    Data set used to develop a conceptual framework for effectively anticipating...

    • data.usgs.gov
    • search.dataone.org
    • +2more
    Updated Jan 20, 2018
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Capel (2018). Data set used to develop a conceptual framework for effectively anticipating water-quality changes resulting from changes in agricultural activities [Dataset]. http://doi.org/10.5066/F75T3HN9
    Explore at:
    Dataset updated
    Jan 20, 2018
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Paul Capel
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 2013 - Dec 31, 2013
    Description

    This USGS data release contains 2013 streamflow, baseflow, and precipitation data from three hydrologically-diverse streams in the United States used to develop a conceptual framework for effectively anticipating water-quality changes resulting from changes in agricultural activities. The framework combined generalized concepts on the movement of water, the environmental behavior of chemicals and eroded soil, and the designed functions of various agricultural activities. The framework addresses the impacts on water quality of a broad range of agricultural chemicals and sediment across a variety of hydrologic settings. • Chesterville Branch near Crumpton, Maryland, (USGS site ID - 01493112) had substantial baseflow throughout the year with increased streamflow within a day of rainfall.
    • Indian Creek at State Line RD, Leawood, Kansas (USGS site ID - 06893390) was a fastflow-dominated urban steam that was not well connected to shallow groundwater.
    • The watershed of Leary-Weber ...

  10. Data from: Statistical Process Control as a Tool for Quality Improvement A...

    • figshare.com
    docx
    Updated Feb 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Canberk Elmalı; Özge Ural (2023). Statistical Process Control as a Tool for Quality Improvement A Case Study in Denim Pant Production [Dataset]. http://doi.org/10.6084/m9.figshare.22147508.v2
    Explore at:
    docxAvailable download formats
    Dataset updated
    Feb 23, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Canberk Elmalı; Özge Ural
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In this paper, we show that concept of Statistical Process Control tools was thoroughly examined and the definitions of quality control concepts were presented. This is significant because of it is anticipated that this study will contribute to the literature as an exemplary application that demonstrates the role of statistical process control (SPC) tools in quality improvement in the evaluation and decision-making phase.

    This is significant because of this study is to investigate applications of quality control, to clarify statistical control methods and problem-solving procedures, to generate proposals for problem-solving approaches, and to disseminate improvement studies in the ready-to-wear industry. The basic Statistical Process Control tools used in the study, the most repetitive faults were detected and these faults were divided into sub-headings for more detailed analysis. In this way, it was tried to prevent the repetition of faults by going down to the root causes of any detected fault. With this different perspective, it is expected that the study will contribute to other fields.

    We give consent for the publication of identifiable details, which can include photograph(s) and case history and details within the text (“Material”) to be published in the Journal of Quality Technology. We confirm that have seen and been given the opportunity to read both the Material and the Article (as attached) to be published by Taylor & Francis.

  11. f

    DQD results of Format 3.

    • plos.figshare.com
    xls
    Updated Jan 6, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melissa Finster; Maxim Moinat; Elham Taghizadeh (2025). DQD results of Format 3. [Dataset]. http://doi.org/10.1371/journal.pone.0311511.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jan 6, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Melissa Finster; Maxim Moinat; Elham Taghizadeh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveThe German Health Data Lab is going to provide access to German statutory health insurance claims data ranging from 2009 to the present for research purposes. Due to evolving data formats within the German Health Data Lab, there is a need to standardize this data into a Common Data Model to facilitate collaborative health research and minimize the need for researchers to adapt to multiple data formats. For this purpose we selected transforming the data to the Observational Medical Outcomes Partnership Common Data Model.MethodsWe developed an Extract, Transform, and Load (ETL) pipeline for two distinct German Health Data Lab data formats: Format 1 (2009-2016) and Format 3 (2019 onwards). Due to the identical format structure of Format 1 and Format 2 (2017 -2018), the ETL pipeline of Format 1 can be applied on Format 2 as well. Our ETL process, supported by Observational Health Data Sciences and Informatics tools, includes specification development, SQL skeleton creation, and concept mapping. We detail the process characteristics and present a quality assessment that includes field coverage and concept mapping accuracy using example data.ResultsFor Format 1, we achieved a field coverage of 92.7%. The Data Quality Dashboard showed 100.0% conformance and 80.6% completeness, although plausibility checks were disabled. The mapping coverage for the Condition domain was low at 18.3% due to invalid codes and missing mappings in the provided example data. For Format 3, the field coverage was 86.2%, with Data Quality Dashboard reporting 99.3% conformance and 75.9% completeness. The Procedure domain had very low mapping coverage (2.2%) due to the use of mocked data and unmapped local concepts The Condition domain results with 99.8% of unique codes mapped. The absence of real data limits the comprehensive assessment of quality.ConclusionThe ETL process effectively transforms the data with high field coverage and conformance. It simplifies data utilization for German Health Data Lab users and enhances the use of OHDSI analysis tools. This initiative represents a significant step towards facilitating cross-border research in Europe by providing publicly available, standardized ETL processes (https://github.com/FraunhoferMEVIS/ETLfromHDLtoOMOP) and evaluations of their performance.

  12. d

    NCEI-generated data quality assurance descriptive statistics, images, and...

    • catalog.data.gov
    • gimi9.com
    • +2more
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (Point of Contact) (2025). NCEI-generated data quality assurance descriptive statistics, images, and gridded Level-3 sea surface height anomaly and other parameters from the Jason-3 Level-2 final Geophysical Data Record (GDR) and interim GDR (IGDR) products from 2016-02-12 to 2021-01-12 (NCEI Accession 0225454) [Dataset]. https://catalog.data.gov/dataset/ncei-generated-data-quality-assurance-descriptive-statistics-images-and-gridded-level-3-sea-sur1
    Explore at:
    Dataset updated
    Nov 1, 2025
    Dataset provided by
    (Point of Contact)
    Description

    The data quality monitoring system (DQMS) developed by the Satellite Oceanography Program at the NOAA National Centers for Environmental Information (NCEI) is based on the concept of a Rich Inventory developed by the previous NCEI Enterprise Data Systems Group. The principal concept of a Rich Inventory is to calculate the data Quality Assurance (QA) descriptive statistics for selected parameters in each Level-2 data file and publish the pre-generated images and NetCDF-format data to the public. The QA descriptive statistics include valid observation number, observation number over 3-sigma edited, minimum, maximum, mean, and standard deviation. The parameters include sea surface height anomaly, significant wave height, altimeter, and radiometer wind speed, radiometer water vapor content, and radiometer wet tropospheric correction from Jason-3 Level-2 Final Geophysical Data Record (GDR) and Interim Geophysical Data Record (IGDR) products.

  13. e

    Eximpedia Export Import Trade

    • eximpedia.app
    Updated Oct 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
    Explore at:
    .bin, .xml, .csv, .xlsAvailable download formats
    Dataset updated
    Oct 3, 2025
    Dataset provided by
    Eximpedia Export Import Trade Data
    Eximpedia PTE LTD
    Authors
    Seair Exim
    Area covered
    Belize, Bahamas, Ghana, China, Guatemala, Sint Eustatius and Saba, Seychelles, Nicaragua, Denmark, Kyrgyzstan
    Description

    Quality Concepts Manufacturing Inc Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  14. Data from: A Benchmark Suite for Systematically Evaluating Reasoning...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano; Passerini Andrea; Passerini Andrea; Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano (2024). A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts [Dataset]. http://doi.org/10.5281/zenodo.11612556
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 13, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano; Passerini Andrea; Passerini Andrea; Bortolotti Samuele; Marconato Emanuele; Carraro Tommaso; Morettin Paolo; van Krieken Emile; Vergari Antonio; Teso Stefano
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Codebase [Github] | Dataset [Zenodo]

    Abstract

    The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available on Github.

    Usage

    We recommend visiting the official code website for instructions on how to use the dataset and accompaying software code.

    License

    All ready-made data sets and generated datasets are distributed under the CC-BY-SA 4.0 license, with the exception of Kand-Logic, which is derived from Kandinsky-patterns and as such is distributed under the GPL-3.0 license.

    Datasets Overview

    • CLIP-embeddings. This folder contains the saved activations from a pretrained CLIP model applied to the tested dataset. It includes embeddings that represent the dataset in a format suitable for further analysis and experimentation.
    • BDD_OIA-original-dataset. This directory holds the original files from the X-OIA project by Xu et al. [1]. These datasets have been made publicly available for ease of access and further research. If you are going to use it, please consider citing the original authors.
    • kand-logic-3k. This folder contains all images generated for the Kand-Logic project. Each image is accompanied by annotations for both concepts and labels.
    • bbox-kand-logic-3k. In this directory, you will find images from the Kand-Logic project that have undergone a preprocessing step. These images are extracted based on bounding boxes, rescaled, and include annotations for concepts and labels.
    • sdd-oia. This folder includes all images and labels generated using rsbench.
    • sdd-oia-embeddings. This directory contains 512-dimensional embeddings extracted from a pretrained ResNet18 model on ImageNet. The embeddings are derived from the sdd-oia`dataset.
    • BDD-OIA-preprocessed. Here you will find preprocessed data that follow the methodology outlined by Sawada and Nakamura [2]. The folder contains 2048-dimensional embeddings extracted from a pretrained Faster-RCNN model on the BDD-100k dataset.

    The original BDD datasets can be downloaded from the following Google Drive link: [Download BDD Dataset].

    References

    [1] Xu et al., *Explainable Object-Induced Action Decision for Autonomous Vehicles*, CVPR 2020.

    [2] Sawada and Nakamura, *Concept Bottleneck Model With Additional Unsupervised Concepts*, IEEE 2022.

  15. f

    Data from: Concepts and Software Package for Efficient Quality Control in...

    • acs.figshare.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Kuhring; Alina Eisenberger; Vanessa Schmidt; Nicolle Kränkel; David M. Leistner; Jennifer Kirwan; Dieter Beule (2023). Concepts and Software Package for Efficient Quality Control in Targeted Metabolomics Studies: MeTaQuaC [Dataset]. http://doi.org/10.1021/acs.analchem.0c00136.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    ACS Publications
    Authors
    Mathias Kuhring; Alina Eisenberger; Vanessa Schmidt; Nicolle Kränkel; David M. Leistner; Jennifer Kirwan; Dieter Beule
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Targeted quantitative mass spectrometry metabolite profiling is the workhorse of metabolomics research. Robust and reproducible data are essential for confidence in analytical results and are particularly important with large-scale studies. Commercial kits are now available which use carefully calibrated and validated internal and external standards to provide such reliability. However, they are still subject to processing and technical errors in their use and should be subject to a laboratory’s routine quality assurance and quality control measures to maintain confidence in the results. We discuss important systematic and random measurement errors when using these kits and suggest measures to detect and quantify them. We demonstrate how wider analysis of the entire data set alongside standard analyses of quality control samples can be used to identify outliers and quantify systematic trends to improve downstream analysis. Finally, we present the MeTaQuaC software which implements the above concepts and methods for Biocrates kits and other target data sets and creates a comprehensive quality control report containing rich visualization and informative scores and summary statistics. Preliminary unsupervised multivariate analysis methods are also included to provide rapid insight into study variables and groups. MeTaQuaC is provided as an open source R package under a permissive MIT license and includes detailed user documentation.

  16. Population with lack of material (at least 3, 4 concept) by number of...

    • ine.es
    csv, html, json +4
    Updated Oct 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    INE - Instituto Nacional de Estadística (2025). Population with lack of material (at least 3, 4 concept) by number of concept and sex [Dataset]. https://www.ine.es/jaxiT3/Tabla.htm?t=69472&L=1
    Explore at:
    text/pc-axis, csv, txt, json, xlsx, html, xlsAvailable download formats
    Dataset updated
    Oct 31, 2025
    Dataset provided by
    National Statistics Institutehttp://www.ine.es/
    Authors
    INE - Instituto Nacional de Estadística
    License

    https://www.ine.es/aviso_legalhttps://www.ine.es/aviso_legal

    Time period covered
    Jan 1, 2004 - Jan 1, 2024
    Variables measured
    Sex, Type of data, Edad población, Number of concept, Quality of Life Indicator
    Description

    Quality of Life Indicators: Population with lack of material (at least 3, 4 concept) by number of concept and sex. Annual. National.

  17. Data from: CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2...

    • zenodo.org
    bin, png, zip
    Updated Jul 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado (2024). CONCEPT- DM2 DATA MODEL TO ANALYSE HEALTHCARE PATHWAYS OF TYPE 2 DIABETES [Dataset]. http://doi.org/10.5281/zenodo.7778291
    Explore at:
    bin, png, zipAvailable download formats
    Dataset updated
    Jul 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Berta Ibáñez-Beroiz; Berta Ibáñez-Beroiz; Asier Ballesteros-Domínguez; Asier Ballesteros-Domínguez; Ignacio Oscoz-Villanueva; Ignacio Oscoz-Villanueva; Ibai Tamayo; Ibai Tamayo; Julián Librero; Julián Librero; Mónica Enguita-Germán; Mónica Enguita-Germán; Francisco Estupiñán-Romero; Francisco Estupiñán-Romero; Enrique Bernal-Delgado; Enrique Bernal-Delgado
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Technical notes and documentation on the common data model of the project CONCEPT-DM2.

    This publication corresponds to the Common Data Model (CDM) specification of the CONCEPT-DM2 project for the implementation of a federated network analysis of the healthcare pathway of type 2 diabetes.

    Aims of the CONCEPT-DM2 project:

    General aim: To analyse chronic care effectiveness and efficiency of care pathways in diabetes, assuming the relevance of care pathways as independent factors of health outcomes using data from real life world (RWD) from five Spanish Regional Health Systems.

    Main specific aims:

    • To characterize the care pathways in patients with diabetes through the whole care system in terms of process indicators and pharmacologic recommendations
    • To compare these observed care pathways with the theoretical clinical pathways derived from the clinical practice guidelines
    • To assess if the adherence to clinical guidelines influence on important health outcomes, such as cardiovascular hospitalizations.
    • To compare the traditional analytical methods with process mining methods in terms of modeling quality, prediction performance and information provided.

    Study Design: It is a population-based retrospective observational study centered on all T2D patients diagnosed in five Regional Health Services within the Spanish National Health Service. We will include all the contacts of these patients with the health services using the electronic medical record systems including Primary Care data, Specialized Care data, Hospitalizations, Urgent Care data, Pharmacy Claims, and also other registers such as the mortality and the population register.

    Cohort definition: All patients with code of Type 2 Diabetes in the clinical health records

    • Inclusion criteria: patients that, at 01/01/2017 or during the follow-up from 01/01/2017 to 31/12/2022 had active health card (active TIS - tarjeta sanitaria activa) and code of type 2 diabetes (T2D, DM2 in spanish) in the clinical records of primary care (CIAP2 T90 in case of using CIAP code system)
    • Exclusion criteria:
      • patients with no contact with the health system from 01/01/2017 to 31/12/2022
      • patients that had a T1D (DM1) code opened after the T2D code during the follow-up.
    • Study period. From 01/01/2017 to 31/12/2022

    Files included in this publication:

    • Datamodel_CONCEPT_DM2_diagram.png
    • Common data model specification (Datamodel_CONCEPT_DM2_v.0.1.0.xlsx)
    • Synthetic datasets (Datamodel_CONCEPT_DM2_sample_data)
      • sample_data1_dm_patient.csv
      • sample_data2_dm_param.csv
      • sample_data3_dm_patient.csv
      • sample_data4_dm_param.csv
      • sample_data5_dm_patient.csv
      • sample_data6_dm_param.csv
      • sample_data7_dm_param.csv
      • sample_data8_dm_param.csv
    • Datamodel_CONCEPT_DM2_explanation.pptx
  18. AART AI Safety

    • kaggle.com
    zip
    Updated Aug 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marília Prata (2024). AART AI Safety [Dataset]. https://www.kaggle.com/datasets/mpwolke/aart-ai-safety/code
    Explore at:
    zip(227163 bytes)Available download formats
    Dataset updated
    Aug 9, 2024
    Authors
    Marília Prata
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Image: https://github.com/google-research-datasets/aart-ai-safety-dataset

    AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications (Radharapu et al. 2023)

    Authors: Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, Preethi Lahoti

    https://arxiv.org/abs/2311.08592

    "Adversarial testing of large language models (LLMs) is crucial for their safe and responsible deployment. The authors introduced a novel approach for automated generation of adversarial evaluation datasets to test the safety of LLM generations on new downstream applications. They call it AI-assisted Red-Teaming (AART) - an automated alternative to current manual red-teaming efforts. AART offers a data generation and augmentation pipeline of reusable and customizable recipes that reduce human effort significantly and enable integration of adversarial testing earlier in new product development. AART generates evaluation datasets with high diversity of content characteristics critical for effective adversarial testing (e.g. sensitive and harmful concepts, specific to a wide range of cultural and geographic regions and application scenarios). The data generation is steered by AI-assisted recipes to define, scope and prioritize diversity within the application context. This feeds into a structured LLM-generation process that scales up evaluation priorities. Compared to some state-of-the-art tools, AART shows promising results in terms of concept coverage and data quality."

    kevinrobinson-at-elgoog

    https://github.com/google-research-datasets/aart-ai-safety-dataset/blob/main/aart-v1-20231117.csv

  19. e

    Quality Concept Private Limited Export Import Data | Eximpedia

    • eximpedia.app
    Updated Aug 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Quality Concept Private Limited Export Import Data | Eximpedia [Dataset]. https://www.eximpedia.app/companies/quality-concept-private-limited/63268482
    Explore at:
    Dataset updated
    Aug 24, 2023
    Description

    Quality Concept Private Limited Export Import Data. Follow the Eximpedia platform for HS code, importer-exporter records, and customs shipment details.

  20. m

    Comprehensive Process Model Quality Framework

    • data.mendeley.com
    Updated Mar 12, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jan Claes (2018). Comprehensive Process Model Quality Framework [Dataset]. http://doi.org/10.17632/vh989pfrsn.1
    Explore at:
    Dataset updated
    Mar 12, 2018
    Authors
    Jan Claes
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data related to the paper Everything you should know about process model quality- The Comprehensive Process Model Quality Framework

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson (2023). A conceptual framework for quality assessment and management of biodiversity data [Dataset]. http://doi.org/10.1371/journal.pone.0178731
Organization logo

A conceptual framework for quality assessment and management of biodiversity data

Explore at:
42 scholarly articles cite this dataset (View in Google Scholar)
pdfAvailable download formats
Dataset updated
Jun 1, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Allan Koch Veiga; Antonio Mauro Saraiva; Arthur David Chapman; Paul John Morris; Christian Gendreau; Dmitry Schigel; Tim James Robertson
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The increasing availability of digitized biodiversity data worldwide, provided by an increasing number of institutions and researchers, and the growing use of those data for a variety of purposes have raised concerns related to the "fitness for use" of such data and the impact of data quality (DQ) on the outcomes of analyses, reports, and decisions. A consistent approach to assess and manage data quality is currently critical for biodiversity data users. However, achieving this goal has been particularly challenging because of idiosyncrasies inherent in the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs from a data user’s standpoint. This paper defines a formal conceptual framework to support the biodiversity informatics community allowing for the description of the meaning of "fitness for use" from a data user’s perspective in a common and standardized manner. This proposed framework defines nine concepts organized into three classes: DQ Needs, DQ Solutions and DQ Report. The framework is intended to formalize human thinking into well-defined components to make it possible to share and reuse concepts of DQ needs, solutions and reports in a common way among user communities. With this framework, we establish a common ground for the collaborative development of solutions for DQ assessment and management based on data fitness for use principles. To validate the framework, we present a proof of concept based on a case study at the Museum of Comparative Zoology of Harvard University. In future work, we will use the framework to engage the biodiversity informatics community to formalize and share DQ profiles related to DQ needs across the community.

Search
Clear search
Close search
Google apps
Main menu