Facebook
TwitterRoad traffic accidents remain to be a leading cause of death worldwide with roughly 1.3 million fatalities annually. To develop new safety approaches according to real-world challenges, accurate information is needed. Therefore, road safety experts are constantly looking for real-world data to answer the open challenges and ultimately reach “Vision Zero”.The Global Safety Database (GSD) offers access to an one of its kind up-to-date repository of road traffic accident statistics and databases on a meta-data level for road safety analyses.One main objective is the compilation of international data sources, for which a data management system has been developed. In addition to the inventory of road accident data sources, a questionnaire created by road safety experts is used to check the applicability of data sources for specific questions. Therefore, an automated and dynamic matching process enables comparing variables representing the questions with the existing data source content in the GSD. The results are stored in a result matrix which indicates the proportion of variables that correspond to the variables necessary to answer the research question for each data source investigated. In order to identify similarities and differences in road safety within the countries, a clustering methodology is developed to point out the possibilities and limitations of projecting information from the initial countries to other areas. The assessment of the representativeness of the individual data sources is the basis for the clustering. From a general perspective, the GSD is an essential tool pushing forward the worldwide harmonisation of traffic accident statistics and databases. Knowledge about the real-world accident scenery by bringing important databases together empowers the data-driven development which is eventually a key bringing us one step closer to a road system without casualties, the achievement of the Vision Zero.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As part of the “From Data Quality for AI to AI for Data Quality: A Systematic Review of Tools for AI-Augmented Data Quality Management in Data Warehouses” (Tamm & Nikifovora, 2025), a systematic review of DQ tools was conducted to evaluate their automation capabilities, particularly in detecting and recommending DQ rules in data warehouse - a key component of data ecosystems.
To attain this objective, five key research questions were established.
Q1. What is the current landscape of DQ tools?
Q2. What functionalities do DQ tools offer?
Q3. Which data storage systems DQ tools support? and where does the processing of the organization’s data occur?
Q4. What methods do DQ tools use for rule detection?
Q5. What are the advantages and disadvantages of existing solutions?
Candidate DQ tools were identified through a combination of rankings from technology reviewers and academic sources. A Google search was conducted using keyword (“the best data quality tools” OR “the best data quality software” OR “top data quality tools” OR “top data quality software”) AND "2023" (search conducted in December 2023). Additionally, this list was complemented by DQ tools found in academic articles, identified with two queries in Scopus, namely "data quality tool" OR "data quality software" and ("information quality" OR "data quality") AND ("software" OR "tool" OR "application") AND "data quality rule". For selecting DQ tools for further systematic analysis, several exclusion criteria were applied. Tools from sponsored, outdated (pre-2023), non-English, or non-technical sources were excluded. Academic papers were restricted to those published within the last ten years, focusing on the computer science field.
This resulted in 151 DQ tools, which are provided in the file "DQ Tools Selection".
To structure the review process and facilitate answering the established questions (Q1-Q3), a review protocol was developed, consisting of three sections. The initial tool assessment was based on availability, functionality, and trialability (e.g., open-source, demo version, or free trial). Tools that were discontinued or lacked sufficient information were excluded. The second phase (and protocol section) focused on evaluating the functionalities of the identified tools. Initially, the core DQM functionalities were assessed, such as data profiling, custom DQ rule creation, anomaly detection, data cleansing, report generation, rule detection, data enrichment. Subsequently, additional data management functionalities such as master data management, data lineage, data cataloging, semantic discovery, and integration were considered. The final stage of the review examined the tools' compatibility with data warehouses and General Data Protection Regulation (GDPR) compliance. Tools that did not meet these criteria were excluded. As such, the 3rd section of the protocol evaluated the tool's environment and connectivity features, such as whether it operates in the cloud, hybrid, or on-premises, its API support, input data types (.txt, .csv, .xlsx, .json), and its ability to connect to data sources including relational and non-relational databases, data warehouses, cloud data storages, data lakes. Additionally, it assessed whether the tool processes data on-premises or in the vendor’s cloud environment. Tools were excluded based on criteria such as not supporting data warehouses or processing data externally.
These protocols (filled) are available in file "DQ Tools Analysis"
Facebook
TwitterGroundwater samples were collected from 60 public supply wells in the Colorado Plateaus principal aquifer. Water quality evaluations of groundwater for drinking water at public supply depths were made with the purpose of summarizing the current quality of source water (that is, untreated water) from public supply wells using two types of assessments; (1) status: an assessment that describes the current quality of the groundwater resource, and (2) understanding: an evaluation of the natural and human factors affecting the quality of groundwater, including an explanation of statistically significant associations between water quality and selected explanatory factors. To provide context for water-quality data, constituent concentrations of untreated groundwater are compared with available water-quality benchmarks Federal regulatory benchmarks for protecting human health (maximum contaminant levels [MCLs]; U.S. Environmental Protection Agency [USEPA] primary drinking water regulations; U.S. Environmental Protection Agency, 2018a) are used for this evaluation. Additionally, non-regulatory human-health benchmarks (health-based screening levels [HBSLs]; Norman and others, 2018; U.S. Geological Survey, 2018); and federal non-regulatory benchmarks for nuisance chemicals (USEPA secondary maximum contaminant levels [SMCLs]; U.S. Environmental Protection Agency, 2018b) are used. This report considers benchmarks in the context of health-based (MCLs and HBSLs) and non-health based (SMCLs) benchmarks. This sampling approach uses an equal-area grid design (Belitz and others, 2010) which allows for the estimation of the proportion of high, moderate, or low concentrations relative to federal water-quality benchmarks of selected constituents over the entire area of the aquifer. Tables included in this data release: Table 1. Identification, location, and construction information for wells sampled for the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. Table 2. Constituent primary uses and sources; analytical schedules and sampling period; USGS parameter codes; comparison thresholds and reporting levels wells sampled for the for the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. Table 3. Water-quality indicators in groundwater samples collected by the for the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; <, less than] Table 4. Nutrients and dissolved organic carbon in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level] Table 5. Major and minor ions in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; E, estimated] Table 6. Trace elements in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 7. Radionuclides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level] Table 8. Volatile organic compounds (VOCs) in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 9. Pesticides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level; E, estimated] Table 10. Quality control results for constituents analyzed for nutrients and dissolved organic carbon in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 11. Quality control results for constituents analyzed for majors in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: NC, not collected; --, less than minimum laboratory reporting level] Table 12. Quality control results for constituents analyzed for trace elements in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level] Table 13. Quality control results of a replicate analyzed for radionuclides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; NC, not collected] Table 14. Quality control results for constituents analyzed for VOCs in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; NC, not collected; E, estimated] Table 15. Quality control results for constituents analyzed for pesticides in groundwater samples collected by the U.S. Geological Survey National Water-Quality Assessment Project, Colorado Plateaus principal aquifer, June 2013 through December 2017. [Table code definitions: --, less than minimum laboratory reporting level; NC, not collected]
Facebook
TwitterIntegrated Report 2022 (references)The NC Department of Environmental Quality’s Integrated Water Quality Report checks to see if North Carolina’s waterways meet federal and state standards every two years. It incorporates information from multiple data sources, such as local monitoring programs and voluntary work, to group bodies of water into five categories:Category 1: Waters meet all standards.Category 2: Waters meet some standards.Category 3: Waters lacking enough data.Category 4: Impaired waters with a plan.Category 5: Impaired water and need a plan.The data varies in quality and coverage, and not all water bodies are monitored equally, leading to some limitations in assessing smaller or remote areas.https://www.deq.nc.gov/about/divisions/water-resources/water-planning/modeling-assessment/water-quality-data-assessmenthttps://ncdenr.maps.arcgis.com/apps/instant/sidebar/index.html?appid=06dda86e607b4ac6861b19b905c82a8fhttps://ncdenr.maps.arcgis.com/apps/mapviewer/index.html?layers=37696e11dac34786bdc94db84d54ff70NC DEQ AFO ANIMAL FEEDING OPERATION:The AFO (Animal Feeding Operations) program at NC DEQ manages permit applications for big farm operations, such as those raising hogs, chickens and cows. These farms have to comply with regulations on the disposal of waste for water quality reasons. The permits are reviewed every five years to make sure that they’re up to date. Program data are collected through inspections and reports, but they don’t always provide an accurate picture of the environmental impact, and tracking varies by farm. Its limitations are inconsistencies of data and incomplete monitoring of waste management at every site.https://www.deq.nc.gov/about/divisions/water-resources/permitting/animal-feeding-operationsNC Surface Water Supply WatershedsIn NC DEQ’s Surface Water Supply Watersheds maps, waterbodies are assigned to uses such as drinking water, swimming or fishing. These categories set the boundaries for water quality management to safeguard public health and ecosystems. They are based on water monitoring systems and scientific research, but the uncertainties are fluctuating water quality and monitoring performance. These classifications might have to be updated as water use varies.https://www.deq.nc.gov/about/divisions/water-resources/water-planning/classification-standards/classificationsNPDES Wastewater Discharge PermitsNPDES Wastewater Discharge Permits regulate treatment facilities’ discharge of treated wastewater into rivers and lakes to protect water quality. The permits cap the concentration of contaminants according to the water in place. They are the result of daily monitoring and assessments, though sometimes facilities can vary how these are tracked and reported which can impact the consistency of enforcement.https://www.deq.nc.gov/about/divisions/water-resources/permitting/npdes-wastewaterDWR FISH TISSUE MONITORING DATAThe DWR Fish Tissue Monitoring Program inspects North Carolina waterways for heavy metals, pesticides and PCBs (Polychlorinated Biphenyls). PCBs are industrial chemicals that build up in fish, and they are harmful if consumed. It’s data that are used to issue fish-consumption advisories to help keep people safe. They collect samples by electrofishing (which consists of fish that humans eat). Constraints: data are not collected in every watershed, and older data don’t necessarily reflect recent pollution.https://www.deq.nc.gov/about/divisions/water-resources/water-sciences/biological-assessment-branch/dwr-fish-tissue-monitoring-data
Facebook
TwitterThis dataset provides detailed information on availability of model resources (including models and datasets) that support the modeling of six key water-quality constituents (or constituent categories) across the hydrologic system. In addition, resources associated with nine “cross-cutting” topics for modeling water quality are included, with “cross-cutting” defined herein as having relevance to more than one constituent. The model and data resources were generated as a companion product to a related publication (Lucas and others, 2025) that identifies gaps in water-quality modeling capabilities needed for assessments, projections, and evaluation of management alternatives to support ecosystem health and human beneficial use of water resources. Multiple spreadsheet tables include modeling resources for contemporary and representative models that represent an extensive but not exhaustive list; the models or datasets within each worksheet are presented in terms of the model or data source type, relevant hydrologic compartment(s), and software availability (defined at the bottom of each worksheet). Models originating in government, academia, non-governmental organizations, and private industry were considered. We emphasize models that are widely used, open source, and representative of the state of the art; additionally, models were included that are published in the literature and (or) for which documentation is easily available on the internet. This data release includes the metadata and the modeling capabilities workbook, “WQ_Models_Tables_1-14.xlsx” that includes a cross-cutting topics overview tab and the following cross-cutting topics worksheets: Table 1–Climate Forcing Datasets; Table 2–(Bio)geochemical Modeling; Table 3–Watershed Modeling; Table 4–River Modeling; Table 5–Lake and Reservoir Modeling; Table 6–Reservoir Operations and Outflow Modeling; Table 7–Estuary Modeling; Table 8–Groundwater Modeling; Table 9–Water Reuse Modeling; and a constituents tables overview tab and the following constituents worksheets: Table 10–Water Temperature; Table 11–Salinity; Table 12–Nutrients; Table 13–Sediment; Table 14–Geologically Sourced Constituents.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This data set compiles regulatory information about river and stream impairments within the Chesapeake Bay watershed as part of a Chesapeake Bay watershed multi-stressor meta-analysis project. Data are contained in a single combined and name-harmonized dataset originating from a snapshot of the Environmental Protection Agency's Assessment and Total Maximum Daily Load Tracking and Implementation System (ATTAINS) obtained in Spring of 2020. These data were clipped to only waterbodies contained in the Chesapeake Bay watershed and were designated to be free-flowing (e.g., rivers and streams). This compiled dataset contains information on a waterbody's designated uses, parameter impairments, and potential sources of that impairment. Be aware, as data on potential sources were joined to parameter impairments in a "one-parameter to many-sources" format, individual impairments might have multiple rows in this dataset, with one row for each potential source. Use of this dataset to evaluat ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 1.1 (June 2024)
There are multiple well-recognized and peer-reviewed global datasets that can be used to assess water availability and water pollution. Each of these datasets are based on different inputs, modeling approaches, assumptions, and limitations. Therefore, in SBTN Step 1: Assess and Step 2: Interpret & Prioritize, companies are required to consult different global datasets for a robust and comprehensive State of Nature (SoN) assessment for water availability and water pollution.
To streamline this process, WWF, the World Resources Institute (WRI), and SBTN worked together to develop two ready-to-use unified layers of SoN – one for water availability and one for water pollution – in line with the Technical Guidance for Steps 1: Assess and Step 2: Interpret & Prioritize (July 2024). The main outputs contain the maximum values of Water Availability and of Water Pollution as well as the individual indicators' values. This information is available at different spatial resolutions, thus in two data formats: 1) a shapefile with values at HydroBasins (Pfafstetter level 6); and 2) an excel file with values at sub-national divisions (Adm1) and national divisions (Adm0). These datasets and complete documentation are publicly available for download below.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Brackish groundwater (BGW), defined for this assessment as having a dissolved-solids concentration between 1,000 and 10,000 milligrams per liter, is an unconventional source of water that may offer a partial solution to current (2016) and future water challenges. In support of the National Water Census, the U.S. Geological Survey has completed a BGW assessment to gain a better understanding of the occurrence and characteristics of BGW resources of the United States as an alternative source of water. Analyses completed as part of this assessment relied on previously collected data from multiple sources, and no new data were collected. One of the most important contributions of this assessment was the creation of a database containing chemical data and aquifer information for the known quantities of BGW in the United States. Data were compiled from single publications to large datasets and from local studies to national assessments, and includes chemical data on the concentrations of dissolved solids, major ions, trace elements, nutrients, radionuclides, and physical properties of the resource (pH, temperature, specific conductance). This database represents data from a compilation of water-quality samples from 100,000+ (major-ions data) to 300,000+ (dissolved-solids data) groundwater wells across the continental U.S., Alaska, Hawaii, Puerto Rico, the U.S. Virgin Islands, Guam, and American Samoa. The data are published here as an ESRI geodatabase with a point feature class and associated attribute table, and also as non-proprietary comma-separated value table. It was not possible to compile all data available for the Nation, and data selected for this investigation were mostly limited to larger datasets that were available in a digital format.
Facebook
TwitterThis collection, accessible via the United States Geological Survey (USGS) EarthExplorer platform, contains worldwide Landsat-8 and 9 Collection 2 data since the beginning of the two missions. Collection 2 is the result of reprocessing efforts on the archive and on fresh products with significant improvement with respect to Collection 1 in terms of data quality, obtained by means of advancements in data processing algorithm development. The primary characteristic is a relevant improvement in the absolute geolocation accuracy (now re-baselined to the European Space Agency Copernicus Sentinel-2 Global Reference Image, GRI) but includes also updated digital elevation modelling sources, improved Radiometric Calibration (even correction for the TIRS striping effect), enhanced Quality Assessment Bands, updated and consistent metadata files, and usage of Cloud Optimised Georeferenced (COG) Tagged Image File Format. Landsat-8 and 9 Level 1 products combine data from the two Landsat instruments, OLI and TIRS. The Level 1 products generated can be either L1TP or L1GT: L1TP - Level 1 Precision Terrain (Corrected) (L1T) products: Radiometrically calibrated and orthorectified using ground control points (GCPs) and digital elevation model (DEM) data to correct for relief displacement. The highest quality Level 1 products suitable for pixel-level time series analysis. GCPs used for L1TP correction are derived from the Global Land Survey 2000 (GLS2000) dataset. L1GT - Level 1 Systematic Terrain (Corrected) (L1GT) products: L1GT data products consist of L0 product data with systematic radiometric, geometric and terrain corrections applied and resampled for registration to a cartographic projection, referenced to the WGS84, G873, or current version. Three different classes of Level 1 products are available: Real Time (RT): Newly acquired Landsat-8 OLI/TIRS data are processed upon downlink but use an initial TIRS line-of-sight model parameters; the data are made available in less than 12 hours (4-6 hours typically). Once the data have been reprocessed with the refined TIRS parameters, the products are transitioned to either Tier 1 or Tier 2 and removed from the Real-Time tier (in 14-16 days). Landsat-8 only. Tier 1 (T1): Landsat scenes with the highest available data quality are placed into Tier 1 and are considered suitable for time-series analysis. Tier 1 includes Level 1 Precision and Terrain (L1TP) corrected data that have well-characterised radiometry and are inter-calibrated across the different Landsat instruments. The georegistration of Tier 1 scenes is consistent and within prescribed image-to-image tolerances of ≦ 12-metre radial root mean square error (RMSE). Tier 2 (T2): Landsat scenes not meeting Tier 1 criteria during processing are assigned to Tier 2. Tier 2 scenes adhere to the same radiometric standard as Tier 1 scenes, but do not meet the Tier 1 geometry specification due to less accurate orbital information (specific to older Landsat sensors), significant cloud cover, insufficient ground control, or other factors. This includes Systematic Terrain (L1GT) and Systematic (L1GS) processed data. Landsat-8 and 9 Level 2 products are generated from L1GT and L1TP Level 1 products that meet the <76 degrees Solar Zenith Angle constraint and include the required auxiliary data inputs to generate a scientifically viable product. The data are available a couple of days after the Level 1 T1/T2. The Level 2 products generated can be L2SP or L2SR: L2SP - Level 2 Science Products: include Surface Reflectance (SR), Surface Temperature (ST), ST intermediate bands, an angle coefficients file, and Quality Assessment (QA) Bands. L2SR - Level 2 Surface Reflectance: include Surface Reflectance (SR), an angle coefficients file, and Quality Assessment (QA) Bands; it is generated if ST could not be generated. Landsat-8 and 9 Level 3 science products represent biophysical properties of Earth's surface and are generated either from Landsat U.S. Analysis Ready Data inputs (tile-based products) or from Landsat Level 2 scene-based inputs (scene-based products). The following Level 3 products are available: Dynamic Surface Water Extent products: Describes the existence and condition of surface water. Tile-based. Fractional Snow Covered Area products: Indicates the percentage of a pixel covered by snow. Tile-based. Burned Area products: Represents per pixel burn classification and burn probability. Tile-based. Provisional Actual Evapotranspiration products: The quantity of water that is removed from a surface due to the processes of evaporation and transpiration. Scene-based.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitationshttp://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations
(E1a) Information on primary validated assessment data - measurements (Article 10)
Article 10 Primary validated assessment data and primary up-to-date assessment data 1. In accordance with the procedure referred to in Article 5 of this Decision, Member States shall make available the information set out in Part E of Annex II on primary validated assessment data for all sampling points where measurement data is collected for the purpose of the assessment as indicated by Member States according to Article 9 for the pollutants listed in Parts B and C of Annex I. Where in a particular zone or agglomeration modelling techniques are applied, Member States shall make available the information set out in Part E of Annex II at the highest time resolution available. 2. The primary validated assessment data shall be made available to the Commission for a full calendar year as complete time series no later than 9 months after the end of each calendar year. 3. Member States shall, where they make use of the possibility provided for in Articles 20(2) and 21(3) of Directive 2008/50/EC, make available information on the quantification of the contribution from natural sources pursuant to Article 20(1) of Directive 2008/50/EC or from the winter-sanding or -salting of roads pursuant to Article 21(1) and (2) of Directive 2008/50/EC. The information shall include: (a) the spatial extent of the subtraction; (b) the quantity of the primary validated assessment data made available according to paragraph 1 of this Article that can be attributed to natural sources or winter-sanding or -salting; (c) the results of the application of the methods reported according to Article 8. ... 5. Member States shall also make available the information set out in Part E of Annex II on primary validated assessment data for the networks and stations selected by the Member States for the purpose of the reciprocal exchange of information as referred to in point (b) of Article 1 for the pollutants listed in Part B of Annex I and where available for the additional pollutants listed in Part C of Annex I and for the additional pollutants listed on the portal for that purpose. Paragraphs 2 and 3 of this Article shall apply to the exchanged information.
Facebook
Twitter(E1a) Information on primary validated assessment data - measurements (Article 10)
Article 10 Primary validated assessment data and primary up-to-date assessment data 1. In accordance with the procedure referred to in Article 5 of this Decision, Member States shall make available the information set out in Part E of Annex II on primary validated assessment data for all sampling points where measurement data is collected for the purpose of the assessment as indicated by Member States according to Article 9 for the pollutants listed in Parts B and C of Annex I. Where in a particular zone or agglomeration modelling techniques are applied, Member States shall make available the information set out in Part E of Annex II at the highest time resolution available. 2. The primary validated assessment data shall be made available to the Commission for a full calendar year as complete time series no later than 9 months after the end of each calendar year. 3. Member States shall, where they make use of the possibility provided for in Articles 20(2) and 21(3) of Directive 2008/50/EC, make available information on the quantification of the contribution from natural sources pursuant to Article 20(1) of Directive 2008/50/EC or from the winter-sanding or -salting of roads pursuant to Article 21(1) and (2) of Directive 2008/50/EC. The information shall include: (a) the spatial extent of the subtraction; (b) the quantity of the primary validated assessment data made available according to paragraph 1 of this Article that can be attributed to natural sources or winter-sanding or -salting; (c) the results of the application of the methods reported according to Article 8. ... 5. Member States shall also make available the information set out in Part E of Annex II on primary validated assessment data for the networks and stations selected by the Member States for the purpose of the reciprocal exchange of information as referred to in point (b) of Article 1 for the pollutants listed in Part B of Annex I and where available for the additional pollutants listed in Part C of Annex I and for the additional pollutants listed on the portal for that purpose. Paragraphs 2 and 3 of this Article shall apply to the exchanged information.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A key aim of the FNS-Cloud project (grant agreement no. 863059) was to overcome fragmentation within food, nutrition and health data through development of tools and services facilitating matching and merging of data to promote increased reuse. However, in an era of increasing data reuse, it is imperative that the scientific quality of data analysis is maintained. Whilst it is true that many datasets can be reused, questions remain regarding whether they should be, thus, there is a need to support researchers making such a decision. This paper describes the development and evaluation of the FNS-Cloud data quality assessment tool for dietary intake datasets. Markers of quality were identified from the literature for dietary intake, lifestyle, demographic, anthropometric, and consumer behavior data at all levels of data generation (data collection, underlying data sources used, dataset management and data analysis). These markers informed the development of a quality assessment framework, which comprised of decision trees and feedback messages relating to each quality parameter. These fed into a report provided to the researcher on completion of the assessment, with considerations to support them in deciding whether the dataset is appropriate for reuse. This quality assessment framework was transformed into an online tool and a user evaluation study undertaken. Participants recruited from three centres (N = 13) were observed and interviewed while using the tool to assess the quality of a dataset they were familiar with. Participants positively rated the assessment format and feedback messages in helping them assess the quality of a dataset. Several participants quoted the tool as being potentially useful in training students and inexperienced researchers in the use of secondary datasets. This quality assessment tool, deployed within FNS-Cloud, is openly accessible to users as one of the first steps in identifying datasets suitable for use in their specific analyses. It is intended to support researchers in their decision-making process of whether previously collected datasets under consideration for reuse are fit their new intended research purposes. While it has been developed and evaluated, further testing and refinement of this resource would improve its applicability to a broader range of users.
Facebook
TwitterEstache and Goicoechea present an infrastructure database that was assembled from multiple sources. Its main purposes are: (i) to provide a snapshot of the sector as of the end of 2004; and (ii) to facilitate quantitative analytical research on infrastructure sectors. The related working paper includes definitions, source information and the data available for 37 performance indicators that proxy access, affordability and quality of service (most recent data as of June 2005). Additionally, the database includes a snapshot of 15 reform indicators across infrastructure sectors.
This is a first attempt, since the effort made in the World Development Report 1994, at generating a database on infrastructure sectors and it needs to be recognized as such. This database is not a state of the art output—this is being worked on by sector experts on a different time table. The effort has however generated a significant amount of new information. The database already provides enough information to launch a much more quantitative debate on the state of infrastructure. But much more is needed and by circulating this information at this stage, we hope to be able to generate feedback and fill the major knowledge gaps and inconsistencies we have identified.
The database covers the following countries: - Afghanistan - Albania - Algeria - American Samoa - Andorra - Angola - Antigua and Barbuda - Argentina - Armenia - Aruba - Australia - Austria - Azerbaijan - Bahamas, The - Bahrain - Bangladesh - Barbados - Belarus - Belgium - Belize - Benin - Bermuda - Bhutan - Bolivia - Bosnia and Herzegovina - Botswana - Brazil - Brunei - Bulgaria - Burkina Faso - Burundi - Cambodia - Cameroon - Canada - Cape Verde - Cayman Islands - Central African Republic - Chad - Channel Islands - Chile - China - Colombia - Comoros - Congo, Dem. Rep. - Congo, Rep. - Costa Rica - Cote d'Ivoire - Croatia - Cuba - Cyprus - Czech Republic - Denmark - Djibouti - Dominica - Dominican Republic - Ecuador - Egypt, Arab Rep. - El Salvador - Equatorial Guinea - Eritrea - Estonia - Ethiopia - Faeroe Islands - Fiji - Finland - France - French Polynesia - Gabon - Gambia, The - Georgia - Germany - Ghana - Greece - Greenland - Grenada - Guam - Guatemala - Guinea - Guinea-Bissau - Guyana - Haiti - Honduras - Hong Kong, China - Hungary - Iceland - India - Indonesia - Iran, Islamic Rep. - Iraq - Ireland - Isle of Man - Israel - Italy - Jamaica - Japan - Jordan - Kazakhstan - Kenya - Kiribati - Korea, Dem. Rep. - Korea, Rep. - Kuwait - Kyrgyz Republic - Lao PDR - Latvia - Lebanon - Lesotho - Liberia - Libya - Liechtenstein - Lithuania - Luxembourg - Macao, China - Macedonia, FYR - Madagascar - Malawi - Malaysia - Maldives - Mali - Malta - Marshall Islands - Mauritania - Mauritius - Mayotte - Mexico - Micronesia, Fed. Sts. - Moldova - Monaco - Mongolia - Morocco - Mozambique - Myanmar - Namibia - Nepal - Netherlands - Netherlands Antilles - New Caledonia - New Zealand - Nicaragua - Niger - Nigeria - Northern Mariana Islands - Norway - Oman - Pakistan - Palau - Panama - Papua New Guinea - Paraguay - Peru - Philippines - Poland - Portugal - Puerto Rico - Qatar - Romania - Russian Federation - Rwanda - Samoa - San Marino - Sao Tome and Principe - Saudi Arabia - Senegal - Seychelles - Sierra Leone - Singapore - Slovak Republic - Slovenia - Solomon Islands - Somalia - South Africa - Spain - Sri Lanka - St. Kitts and Nevis - St. Lucia - St. Vincent and the Grenadines - Sudan - Suriname - Swaziland - Sweden - Switzerland - Syrian Arab Republic - Tajikistan - Tanzania - Thailand - Togo - Tonga - Trinidad and Tobago - Tunisia - Turkey - Turkmenistan - Uganda - Ukraine - United Arab Emirates - United Kingdom - United States - Uruguay - Uzbekistan - Vanuatu - Venezuela, RB - Vietnam - Virgin Islands (U.S.) - West Bank and Gaza - Yemen, Rep. - Yugoslavia, FR (Serbia/Montenegro) - Zambia - Zimbabwe
Aggregate data [agg]
Face-to-face [f2f]
Sector Performance Indicators
Energy The energy sector is relatively well covered by the database, at least in terms of providing a relatively recent snapshot for the main policy areas. The best covered area is access where data are available for 2000 for about 61% of the 207 countries included in the database. The technical quality indicator is available for 60% of the countries, and at least one of the perceived quality indicators is available for 40% of the countries. Price information is available for about 41% of the countries, distinguishing between residential and non residential.
Water & Sanitation Because the sector is part of the Millennium Development Goals (MDGs), it enjoys a lot of effort on data generation in terms of the access rates. The WHO is the main engine behind this effort in collaboration with the multilateral and bilateral aid agencies. The coverage is actually quite high -some national, urban and rural information is available for 75 to 85% of the countries- but there are significant concerns among the research community about the fact that access rates have been measured without much consideration to the quality of access level. The data on technical quality are only available for 27% of the countries. There are data on perceived quality for roughly 39% of the countries but it cannot be used to qualify the information provided by the raw access rates (i.e. access 3 hours a day is not equivalent to access 24 hours a day).
Information and Communication Technology The ICT sector is probably the best covered among the infrastructure sub-sectors to a large extent thanks to the fact that the International Telecommunications Union (ITU) has taken on the responsibility to collect the data. ITU covers a wide spectrum of activity under the communications heading and its coverage ranges from 85 to 99% for all national access indicators. The information on prices needed to make assessments of affordability is also quite extensive since it covers roughly 85 to 95% of the 207 countries. With respect to quality, the coverage of technical indicators is over 88% while the information on perceived quality is only available for roughly 40% of the countries.
Transport The transport sector is possibly the least well covered in terms of the service orientation of infrastructure indicators. Regarding access, network density is the closest approximation to access to the service and is covered at a rate close to 90% for roads but only at a rate of 50% for rail. The relevant data on prices only cover about 30% of the sample for railways. Some type of technical quality information is available for 86% of the countries. Quality perception is only available for about 40% of the countries.
Institutional Reform Indicators
Electricity The data on electricity policy reform were collected from the following sources: ABS Electricity Deregulation Report (2004), AEI-Brookings telecommunications and electricity regulation database (2003), Bacon (1999), Estache and Gassner (2004), Estache, Trujillo, and Tovar de la Fe (2004), Global Regulatory Network Program (2004), Henisz et al. (2003), International Porwer Finance Review (2003-04), International Power and Utilities Finance Review (2004-05), Kikukawa (2004), Wallsten et al. (2004), World Bank Caribbean Infrastructure Assessment (2004), World Bank Global Energy Sector Reform in Developing Countries (1999), World Bank staff, and country regulators. The coverage for the three types of institutional indicators is quite good for the electricity sector. For regulatory institutions and private participation in generation and distribution, the coverage is about 80% of the 207 counties. It is somewhat lower on the market structure with only 58%.
Water & Sanitation The data on water policy reform were collected from the following sources: ABS Water and Waste Utilities of the World (2004), Asian Developing Bank (2000), Bayliss (2002), Benoit (2004), Budds and McGranahan (2003), Hall, Bayliss, and Lobina (2002), Hall and Lobina (2002), Hall, Lobina, and De La Mote (2002), Halpern (2002), Lobina (2001), World Bank Caribbean Infrastructure Assessment (2004), World Bank Sector Note on Water Supply and Sanitation for Infrastructure in EAP (2004), and World Bank staff. The coverage for institutional reforms in W&S is not as exhaustive as for the other utilities. Information on the regulatory institutions responsible for large utilities is available for about 67% of the countries. Ownership data are available for about 70% of the countries. There is no information on the market structure good enough to be reported here at this stage. In most countries small scale operators are important private actors but there is no systematic record of their existence. Most of the information available on their role and importance is only anecdotal.
Information and Communication Technology The report Trends in Telecommunications Reform from ITU (revised by World Bank staff) is the main source of information for this sector. The information on institutional reforms in the sector is however not as exhaustive as it is for its sector performance indicators. While the coverage on the regulatory institutions is 100%, it varies between 76 and 90% of the countries for more of the other indicators. Quite surprisingly also, in contrast to what is available for other sectors, it proved difficult to obtain data on the timing of reforms and of the creation of the regulatory agencies.
Transport Information on transport institutions and reforms is not systematically generated by any agency. Even though more data are needed to have a more comprenhensive picture of the transport sector, it was possible to collect data on railways policy reform from Janes World Railways (2003-04) and complement it with
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Passage Assessment Database (PAD) geospatial file contains locations of known and potential barriers to salmonid migration in California streams with additional information about each record. The PAD is an ongoing map-based inventory of known and potential barriers to anadromous fish in California, compiled and maintained through a cooperative interagency agreement. The PAD compiles currently available fish passage information from many different sources, allows past and future barrier assessments to be standardized and stored in one place, and enables the analysis of cumulative effects of passage barriers in the context of overall watershed health. The database is set up to capture basic information about each potential barrier. It is designed to be flexible. As the database grows, other modules may be added to increase data detail and complexity. For the PAD to be useful as a restoration tool, the data within the PAD need to accurately depict the on-the ground reality of fish passage constraints. This requires the PAD to retrieve new barrier data and updates to existing sites and to have verified and vetted the information it receives. In 2013, new PAD data standards were designed to standardize this process and refine the data in PAD making the data more robust. They were further refined in 2014 and 2021. The data standards have been combined into one document with the PAD methodology which describes the database structure, data collection procedures and data quality and limitations, and is available online at: https://nrmsecure.dfg.ca.gov/FileHandler.ashx?DocumentID=78802. In the future, the new standards will be implemented for all existing records. If after reading the metadata, additional details about the PAD project are needed, please visit the CalFish website at www.calfish.org/PAD. To send comments about data issues, corrections, edits or to map a new barrier location not yet reported in the PAD please send an email to: Anne.Elston@wildlife.ca.gov. New as of 2020: This feature classes identifies species and life stages that may be blocked or otherwise not blocked by structures and sites. It identifies if it blocks upstream or downstream migration or both. Since one structure/site can be a barrier to more than one species or block a species and not another species there may be multiple records at each site. Please note that these are not duplicates and each site/structure has a unique PAD ID and Passage ID. Preferred citation: California Department of Fish and Wildlife, Passage Assessment Database, September 2025.
Facebook
TwitterThe Assessment Unit (AU) Level Fish Consumption Assessment Results incorporates the water quality results for all water quality monitoring stations, fish tissue sampling sites, and fish consumption advisories associated within an AU that is included in the 2016 NJ Integrated Water Quality Monitoring and Assessment Report (Integrated Report). This data represents the assessment results in NJ's 958 AUs to determine if the Fish Consumption designated use was attained and the results for the 28 fish tissue and water quality parameters associated with the designated use. If an AU includes more than one source of sampling data, the results for each parameter are combined with the ‘worst case’ assessment representing the AU. That is if any of the data sources are impaired for a parameter, then the parameter is impaired at the AU level. If some data sources are fully attaining for a parameter but others have insufficient data, the parameter is fully attaining. The data reflects which of three assessment results each assessment was assigned: Attaining- Fully Supporting, Insufficient Data- Insufficient data was available to assess, Non-Attaining- Non-Supporting. Since it was not possible to show fish consumption advisories and the spatial extent of many fish tissue sampling sites, only the AU level assessment results are mapped. The 303(d) List includes the impairment source for non-supporting Fish Consumption parameters. Because of the large number of parameters associated with fish consumption, if a parameter had insufficient data for an assessment throughout the state then the parameter was removed from the file. The result was a reduction from 62 parameters to 28 parameters that had sufficient data for an assessment.
Facebook
TwitterThe California Water Quality Status Report is an annual data-driven snapshot of the Water Board’s water quality and ecosystem data. This third edition of the report is organized around the watershed from land to sea. Each theme-specific story includes a brief background, a data analysis summary, an overview of management actions, and access to the raw data.
View the 2019 California Water Quality Status Report.
Facebook
TwitterThe CodeSO-QR dataset is a novel, high-quality dataset created by merging two influential data sources: CodeSearchNet and 60k Stack Overflow Questions with Quality Rating. This integrated dataset is designed to advance research in human-AI collaboration within data-driven software engineering, focusing on areas like code retrieval, question-answering, code summarization, and quality assessment of technical discussions.
CodeSearchNet Data: The CodeSearchNet dataset offers a large collection of code snippets and natural language descriptions across several programming languages, including Python, Java, JavaScript, Go, Ruby, and PHP. These are annotated with metadata that facilitates code search and retrieval tasks. By incorporating CodeSearchNet, CodeSO-QR supports various code-related tasks such as code summarization, generation, and context-based code suggestion.
Stack Overflow Questions with Quality Rating: This component comprises a set of 60,000 questions from Stack Overflow, each evaluated with a quality rating. The quality scores help delineate well-formed, high-quality questions from lower-quality posts. Including this data enables training models to assess the quality of questions and answers, a key feature for platforms facilitating collaborative coding, knowledge sharing, and AI-supported question-answering.
Each entry in CodeSO-QR includes:
Code Snippet: A code example in one of the programming languages, annotated with metadata on usage and context. Natural Language Description: Accompanying text explaining the code's purpose or use case. Stack Overflow Question: Real-world questions from developers, including title, body, tags, and additional metadata. Quality Rating: A numerical quality score for each Stack Overflow question, facilitating quality assessment tasks.
Human-AI Collaboration Models: CodeSO-QR is optimized for developing models that foster collaboration between human users and AI systems, particularly in interpreting, generating, and assessing code and questions in software engineering contexts. Code Search and Retrieval: The integration of CodeSearchNet enables robust code search capabilities, where models can retrieve relevant code snippets based on natural language queries. Question Quality Assessment: The quality ratings from Stack Overflow data empower models to filter, prioritize, and improve question quality for enhanced knowledge sharing and collaborative problem-solving. Code Summarization and Generation: By combining code snippets with natural language descriptions, CodeSO-QR aids in generating coherent and context-aware code summaries and assists in automated documentation.
This dataset is well-suited for training and evaluating AI systems in tasks such as:
Enhancing AI-driven code completion tools. Improving question-answering frameworks for technical forums. Enabling models to suggest improvements in question quality on collaborative coding platforms.. Supporting natural language-to-code generation systems and code-to-language summarization tools.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Impaired Streams 2012’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/f6d269eb-a8cc-47b5-89a7-2fd5dd9a35af on 12 November 2021.
--- Dataset description provided by original source is as follows ---
This dataset contains line work representing streams and rivers listed as "impaired" in Iowa's 2012 Section 305(b) Water Quality Assessment and the 303(d) Impaired Waters Report. Together, these two reports are known as Iowa's 2012 Integrated Report. Waterbodies in Iowa each have specific designations based on what they are commonly used for-recreation, such as swimming or fishing; drinking water; or maintaining a healthy population of fish and other aquatic life. Every two years, Iowa must report on its progress in meeting water quality goals to the U.S. Environmental Protection Agency (EPA).
The state prepares one report called the 305(b) Water Quality Assessment or 305(b) list. This 305(b) list categorizes waterbodies to reflect: those that meet all the designated uses (category 1), those in which data availability is insufficient to determine whether any or all designated uses are being met (categories 2 and 3), and those waters in which the water quality prevents it from fully meeting its designated use, and is thus considered "impaired".
New impairments (or category 5 listings) are placed on the "303(d) Impaired Waters Report", commonly referred to as the "impaired waters list." This is named after section 303(d) of the Federal Clean Water Act and means that the stream or lake needs a water quality improvement plan written (also known by a technical name, "Total Maximum Daily Load," or "TMDL"). The water quality improvement plan outlines water quality problems, identifies sources of the problem(s), identifies needed reductions in pollutants and offers possible solutions. Water quality improvement plans are approved by the EPA and then the waters are moved from the 303(d) list back to the 305(b) list as category 4 listings (waters considered impaired, but a water quality improvement plan has been written).
--- Original source retains full ownership of the source dataset ---
Facebook
TwitterThis record provides an overview of the scope and research output of NESP Marine Biodiversity Hub Project B2 - "Analysis and elicitation to support State of the Environment reporting for the full spectrum of data availability". No data outputs are expected for this project. The availability and quality of observation data that may be used to support State of the Environment reporting lies on a spectrum from: (i) high quality (e.g. Reef Life Survey, Long term reef monitoring programme, Temperate Reef Monitoring programme, state-based MPA monitoring programmes); (ii) moderate quality (e.g. continuous plankton recorder, occasional by catch surveys); (iii) low quality (anecdotal information) to (iv) expert beliefs but no empirical observations. We currently lack a principled process for utilising and merging data of varying quality and from different sources to form a national perspective to support State of the Environment reporting. The key unifying principle to support such a process is the extent to which the available data is representative of the environmental asset in question. As the extent to which the empirical observations accurately represent the state of the asset in both space and time diminishes, so the reliance on expert opinion increases, to the limit where the only available information is expert opinion. This project will provide an over-arching framework to consider these issues, develop practical protocols for blending different data streams with or without experts’ judgement as appropriate, and thereby provide a foundation for improving State of Environment reporting for all types of data sources, from high to low quality. It will do this by developing and applying protocols to support development of the marine chapter of SoE 2106. This currently being developed within a separate CSIRO funded project. The project will use the experience of developing this chapter to make recommendations about appropriate methodologies for future environmental reporting. Importantly the statistical approach and analysis principles will be consistent regardless of the amount or quality of the information available. As a result the framework and analysis methods will remain relevant, even as the quality and quantity of environmental data at the department’s disposal changes. This will provide the consistency of analysis and reporting that is essential to SoE. Expected Outcomes • The provision of two or three examples that demonstrate a unified approach to the use of expert opinion in SoE reporting. These examples will be identified in close collaboration with the Department and will be developed in time to support the marine chapter of 2016 State of the Environment report, contingent on the availability of resources in the second year of the project and timely interaction with the department. • Assessments of the status and trends of environmental assets in the State of the Environment report will be based on a principled and statistically defensible process that can merges and utilises data from all sources including expert opinion.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The global challenges and threats from infectious diseases including antimicrobial drug resistance and emerging infections due to the rapidly changing climate require that we continuously revisit the fitness of our infrastructure. The databases used for surveillance represent an important infrastructure. Historically, many databases have evolved from different needs and from different organizations. Despite growing data storage and computing capacities, data are, however, rarely used to their full potential. The objective of this review was to outline different data sources available in Denmark. We applied a one-health perspective and included data sources on animal demographics and movements, medicine prescription, diagnostic test results as well as relevant data on human health. Another objective was to suggest approaches for fit-for-purpose integration of data as a resource for risk assessment and generation of evidence for policies to protect animal and human health. Danish databases were reviewed according to a systematic procedure including ownership, intended purposes of the database, target and study populations, metrics and information used, measuring methods (observers, diagnostic tests), recording procedures, data flow, database structure, and control procedures to ensure data quality. Thereby, structural metadata were gathered across available Danish databases including animal health, zoonotic infections, antimicrobial use, and relevant administrative data that can support the overall aim of supporting risk assessment and development of evidence. Then illustrative cases were used to assess how combinations and integration of databases could improve existing evidence to support decisions in animal health policies (e.g., combination of information on diseases in different herds or regions with information on isolation of pathogens from humans). Due to the complexity of databases, full integration at the individual level is often not possible. Still, integration of data at a higher level (e.g., municipality or region) can provide important information on risks and hence risk management. We conclude by discussing how databases by linkage can be improved in the future, and emphasize that legal issues are important to address in order to optimize the use of the available data.
Facebook
TwitterRoad traffic accidents remain to be a leading cause of death worldwide with roughly 1.3 million fatalities annually. To develop new safety approaches according to real-world challenges, accurate information is needed. Therefore, road safety experts are constantly looking for real-world data to answer the open challenges and ultimately reach “Vision Zero”.The Global Safety Database (GSD) offers access to an one of its kind up-to-date repository of road traffic accident statistics and databases on a meta-data level for road safety analyses.One main objective is the compilation of international data sources, for which a data management system has been developed. In addition to the inventory of road accident data sources, a questionnaire created by road safety experts is used to check the applicability of data sources for specific questions. Therefore, an automated and dynamic matching process enables comparing variables representing the questions with the existing data source content in the GSD. The results are stored in a result matrix which indicates the proportion of variables that correspond to the variables necessary to answer the research question for each data source investigated. In order to identify similarities and differences in road safety within the countries, a clustering methodology is developed to point out the possibilities and limitations of projecting information from the initial countries to other areas. The assessment of the representativeness of the individual data sources is the basis for the clustering. From a general perspective, the GSD is an essential tool pushing forward the worldwide harmonisation of traffic accident statistics and databases. Knowledge about the real-world accident scenery by bringing important databases together empowers the data-driven development which is eventually a key bringing us one step closer to a road system without casualties, the achievement of the Vision Zero.