Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These tables list all GIS attributes associated with both the land cover and urban tree canopy results, along with attribute definitions, and associated raster values from the input raster datasets.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
This data set contains a collection of attributes associated with CloudSat identified echo objects (or contiguous regions of radar/dBZ echo) from15 June 2006 till 17 January 2013. CloudSat is a NASA satellite that carries a 94 GHz (3 mm) nadir pointing cloud profiling radar (CPR). CloudSat makes approximately 14 orbits per day with an equator passing time of 0130 and 1330 local time. Echo objects were identified using CloudSat's 2B-GEOPROF product that includes 2D arrays (alongtrack x vertical) of the radar reflectivity factor and gaseous attenuation correction. Also included in the product is a "cloud mask" with values ranging between 0 and 40 with higher values indicating a greater likelihood of cloud detection. An EO was defined as a contiguous region of cloud mask greater than or eaqual to 20, consisting of at least three pixels with their edges and not merely their corners touching. Each echo object (EO) is assigned multiple attributes. The geographic attributes include minimum, mean, and maximum latitude and longitude, minimum and maximium location along the CloudSat orbit track, and the underlying surface altitude and land mask data, which allows the EOs to be catagorized as occuring over land, sea, or the coast. The geometric attributes include top, mean, and bottom height, width, and the total number of pixels within the EO. Attributes describing the internal structure of the EO are also available including the number of pixels and cells (i.e., group of pixels) greater than 0 dBZ and -17 dBZ. Finally, the time of day of occurance was also recorded to compare the statistics of EOs ocurring during the daytime versus nighttime. In total, we identified 15,181,193 EOs from 15 June 2006 to 17 January 2013. After 17 April 2011, data were only collected during the day due to a battery failure onboard CloudSat. Each attribute is organized as a 1D array where the size of the array corresponds to the number of EOs. This organization allows subsets of EOs to be easily identified using simple "where" statements when writing code. The attributes were used to identify cloud types and analyze global cloud climatology according to season, surface type, and region (i.e., Riley 2009; Riley and Mapes 2009). The varability of EOs across the MJO was also analyzed (Riley et al. 2011). Methods Data:
Raw files were downloaded from ftp1.cloudsat.cira.colostate.edu in directory 2B-GEOPROF.R04 Processed files are in netcdf format
Processing:
Data were processed and analyzed using IDL. See CloudSat_code_README.txt for details The initial processing was done while I was a graduate student at the Univerisity of Miami working on my masters from 2006-2009 Code is available at https://github.com/erileydellaripa/CYGNSS_code
Data file description:
Once the tar.gz file is unpacked, the EO attributes are provided in the EO_masterlistYYYY.nc files, where YYYY corresponds to the different years. I transferred the EO attributes from IDL .save files to netcdf files for sharing. A description of each EO attribute is provide in the README.md and if you do an ncdump -h in a terminal window.
The attributes are organized in 1D arrays, where the element of each array corresponds to a unique EO and the total size of the array corresponds to the total number of EOs identified.
Data are processed from the start of CloudSat 15 June 2006 till 17 January 2013 for the EO attributes.
In total, there are 15,181,193 EOs.
There was a battery failure 17 April 2011. CloudSat resumed collecting data 27 October 2011, but only during the day.
References:
Riley, E. M., B. E. Mapes, and S. N. Tulich, 2011: Clouds Associated with the Madden-Julian Oscillation: A New Perspective from CloudSat. J. Atmos. Sci., 68, 3032-3051, https://doi.org/10.1175/JAS-D-11-030.1.
Riley, E. M., and B. E. Mapes, 2009: Unexpected peak near -15°C in CloudSat echo top climatology. Geophys. Res. Lett., 36, L09819, https://doi.org/10.1029/2009GL037558.
Riley, E. M., 2009: A global survey of clouds by CloudSat. M.S. thesis, Division of Meteorology and Physical Oceanography, University of Miami, 134 pp, https://scholarship.miami.edu/esploro/outputs/991031447848002976.
This dataset contains all of the attribute data. This includes RXNORM provided attributes for currently prescribable drugs. The attributes such as normalized 11-digit National Drug Codes (NDCs), UNII codes, and human or veterinary usage markers, and source-provided attributes, such as labeler, definition, and imprint information. Each attribute has an 'Attribute Name' (ATN) and 'Attribute Value' (ATV) combination. For example, NDCs have an ATN of 'NDC' and an ATV of the actual NDC value.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data Description: This data set contains all records of payments made to vendors by the City of Cincinnati from fiscal year 2014 to present. It includes information such as the department who paid for the service, the reason for payment, and vendor name.
Data Creation: This data is pulled directly from the City's financial software; which centralizes all department financial transactions city wide.
Data Created By: The Cincinnati Financial System (CFS)
Refresh Frequency: Weekly
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this data set.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
This tabular data set represents mean saturation overland flow as a percent of streamflow compiled for two spatial components of the NHDPlus version 2 data suite (NHDPlusv2) for the conterminous United States; 1) individual reach catchments and 2) reach catchments accumulated upstream through the river network. This dataset can be linked to the NHDPlus version 2 data suite by the unique identifier COMID. The source data is "Saturation overland flow estimated by TOPMODEL for the conterminous United States" produced by the United States Geological Survey (Wolock, 2003). Units are percent of total stream flow. Reach catchment information characterizes data at the local scale. Reach catchments accumulated upstream through the river network characterizes cumulative upstream conditions. Network-accumulated values are computed using two methods, 1) divergence-routed and 2) total cumulative drainage area. Both approaches use a modified routing database to navigate the NHDPlus reach network to aggregate (accumulate) the metrics derived from the reach catchment scale. (Schwarz and Wieczorek, 2018).
This is list of data elements and their attributes that are used by data assets at the Federal Highway Administration.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Description:
This dataset presents the spatial outcome of an analysis modelling perceived green space quality across the city of Espoo, Finland. The analysis relies on data gathered through the My Espoo on the Map survey (Mun Espoo kartalla) in the autumn of 2020 as part of the NordForsk-funded research project NORDGREEN. A comprehensive account of the analytical process and potential applications of the dataset is available in the associated publication, "Predicting context-sensitive urban green space quality to support urban green infrastructure planning" (open access: https://doi.org/10.1016/j.landurbplan.2023.104952).
Data Processing:
This dataset results from an analysis that integrates both primary and secondary sources of geospatial data. The primary data were collected with an online public participation GIS (PPGIS) survey directed for the adult inhabitants of Espoo. The data collection took place in September-October 2020 and was executed in collaboration with Aalto University and the City of Espoo. For a detailed overview of the data collection process, please refer to the related publication.
Data characteristics:
Format: Shapefile (50m x 50m grid)
Geographical area: Espoo, Finland
Spatial reference: EUREF FIN TM35FIN
Note: Only grid cells intersecting with greenspace have been included in the dataset. For the employed definition of green areas, please consult the related publication.
Data attributes and their descriptions:
"P_PROB": Probability (P), positive perceived quality
"N_PROB": Probability (P), negative perceived quality
Funding:
This research was funded by NordForsk, Sustainable Urban Development and Smart Cities Programme, Project Smart Planning for Healthy and Green and Nordic Cities – NORDGREEN, under Grant Number: 95322.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
This dataset is the "additional training dataset" for the DCASE 2024 Challenge Task 2.
The data consists of the normal/anomalous operating sounds of nine types of real/toy machines. Each recording is a single-channel audio that includes both a machine's operating sound and environmental noise. The duration of recordings varies from 6 to 10 seconds. The following nine types of real/toy machines are used in this task:
3DPrinter
AirCompressor
BrushlessMotor
HairDryer
HoveringDrone
RoboticArm
Scanner
ToothBrush
ToyCircuit
Overview of the task
Anomalous sound detection (ASD) is the task of identifying whether the sound emitted from a target machine is normal or anomalous. Automatic detection of mechanical failure is an essential technology in the fourth industrial revolution, which involves artificial-intelligence-based factory automation. Prompt detection of machine anomalies by observing sounds is useful for monitoring the condition of machines.
This task is the follow-up from DCASE 2020 Task 2 to DCASE 2023 Task 2. The task this year is to develop an ASD system that meets the following five requirements.
Train a model using only normal sound (unsupervised learning scenario) Because anomalies rarely occur and are highly diverse in real-world factories, it can be difficult to collect exhaustive patterns of anomalous sounds. Therefore, the system must detect unknown types of anomalous sounds that are not provided in the training data. This is the same requirement as in the previous tasks.
Detect anomalies regardless of domain shifts (domain generalization task) In real-world cases, the operational states of a machine or the environmental noise can change to cause domain shifts. Domain-generalization techniques can be useful for handling domain shifts that occur frequently or are hard-to-notice. In this task, the system is required to use domain-generalization techniques for handling these domain shifts. This requirement is the same as in DCASE 2022 Task 2 and DCASE 2023 Task 2.
Train a model for a completely new machine typeFor a completely new machine type, hyperparameters of the trained model cannot be tuned. Therefore, the system should have the ability to train models without additional hyperparameter tuning. This requirement is the same as in DCASE 2023 Task 2.
Train a model using a limited number of machines from its machine typeWhile sounds from multiple machines of the same machine type can be used to enhance the detection performance, it is often the case that only a limited number of machines are available for a machine type. In such a case, the system should be able to train models using a few machines from a machine type. This requirement is the same as in DCASE 2023 Task 2.
5 . Train a model both with or without attribute informationWhile additional attribute information can help enhance the detection performance, we cannot always obtain such information. Therefore, the system must work well both when attribute information is available and when it is not.
The last requirement is newly introduced in DCASE 2024 Task2.
Definition
We first define key terms in this task: "machine type," "section," "source domain," "target domain," and "attributes.".
"Machine type" indicates the type of machine, which in the additional training dataset is one of nine: 3D-printer, air compressor, brushless motor, hair dryer, hovering drone, robotic arm, document scanner (scanner), toothbrush, and Toy circuit.
A section is defined as a subset of the dataset for calculating performance metrics.
The source domain is the domain under which most of the training data and some of the test data were recorded, and the target domain is a different set of domains under which some of the training data and some of the test data were recorded. There are differences between the source and target domains in terms of operating speed, machine load, viscosity, heating temperature, type of environmental noise, signal-to-noise ratio, etc.
Attributes are parameters that define states of machines or types of noise. For several machine types, the attributes are hidden.
Dataset
This dataset consists of nine machine types. For each machine type, one section is provided, and the section is a complete set of training data. A set of test data corresponding to this training data will be provided in another seperate zenodo page as an "evaluation dataset" for the DCASE 2024 Challenge task 2. For each section, this dataset provides (i) 990 clips of normal sounds in the source domain for training and (ii) ten clips of normal sounds in the target domain for training. The source/target domain of each sample is provided. Additionally, the attributes of each sample in the training and test data are provided in the file names and attribute csv files.
File names and attribute csv files
File names and attribute csv files provide reference labels for each clip. The given reference labels for each training clip include machine type, section index, normal/anomaly information, and attributes regarding the condition other than normal/anomaly. The machine type is given by the directory name. The section index is given by their respective file names. For the datasets other than the evaluation dataset, the normal/anomaly information and the attributes are given by their respective file names. Note that for machine types that has its attribute information hidden, the attribute information in each file names are only labeled as "noAttributes". Attribute csv files are for easy access to attributes that cause domain shifts. In these files, the file names, name of parameters that cause domain shifts (domain shift parameter, dp), and the value or type of these parameters (domain shift value, dv) are listed. Each row takes the following format:
[filename (string)], [d1p (string)], [d1v (int | float | string)], [d2p], [d2v]...
For machine types that have their attribute information hidden, all columns except the filename column are left blank for each row.
Recording procedure
Normal/anomalous operating sounds of machines and its related equipment are recorded. Anomalous sounds were collected by deliberately damaging target machines. For simplifying the task, we use only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. The environmental noise samples were recorded in several real factory environments. We will publish papers on the dataset to explain the details of the recording procedure by the submission deadline.
Directory structure
/eval_data
Baseline system
The baseline system is available on the Github repository . The baseline systems provide a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. They are good starting points, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.
Condition of use
This dataset was created jointly by Hitachi, Ltd., NTT Corporation and STMicroelectronics and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
Citation
Contact
If there is any problem, please contact us:
Tomoya Nishida, tomoya.nishida.ax@hitachi.com
Keisuke Imoto, keisuke.imoto@ieee.org
Noboru Harada, noboru@ieee.org
Daisuke Niizumi, daisuke.niizumi.dt@hco.ntt.co.jp
Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com
These tabular data sets represent mean monthly temperature (degrees Celsius) data from 800 meter resolution PRISM for the years 2016 and 2017 compiled for two spatial components of the NHDPlus version 2.1 data suite (NHDPlusv2) for the conterminous United States; 1) individual reach catchments and 2) reach catchments accumulated upstream through the river network. This dataset can be linked to the NHDPlus version 2 data suite by the unique identifier COMID. The source data for mean monthly temperature (degrees Celsius) from 800 meter resolution resolution PRISM data was produced by the PRISM Group at Oregon State University. Units are degrees degrees Celsius. Reach catchment information characterizes data at the local scale. Reach catchments accumulated upstream through the river network characterizes cumulative upstream conditions. Network-accumulated values are computed using two methods, 1) divergence-routed and 2) total cumulative drainage area. Both approaches use a modified routing database to navigate the NHDPlus reach network to aggregate (accumulate) the metrics derived from the reach catchment scale. (Schwarz and Wieczorek, 2018).
The National Hydrography Dataset Plus (NHDplus) maps the lakes, ponds, streams, rivers and other surface waters of the United States. Created by the US EPA Office of Water and the US Geological Survey, the NHDPlus provides mean annual and monthly flow estimates for rivers and streams. Additional attributes provide connections between features facilitating complicated analyses. For more information on the NHDPlus dataset see the NHDPlus v2 User Guide.Dataset SummaryPhenomenon Mapped: Surface waters and related features of the United States and associated territories not including Alaska.Geographic Extent: The United States not including Alaska, Puerto Rico, Guam, US Virgin Islands, Marshall Islands, Northern Marianas Islands, Palau, Federated States of Micronesia, and American SamoaProjection: Web Mercator Auxiliary Sphere Visible Scale: Visible at all scales but layer draws best at scales larger than 1:1,000,000Source: EPA and USGSUpdate Frequency: There is new new data since this 2019 version, so no updates planned in the futurePublication Date: March 13, 2019Prior to publication, the NHDPlus network and non-network flowline feature classes were combined into a single flowline layer. Similarly, the NHDPlus Area and Waterbody feature classes were merged under a single schema.Attribute fields were added to the flowline and waterbody layers to simplify symbology and enhance the layer's pop-ups. Fields added include Pop-up Title, Pop-up Subtitle, On or Off Network (flowlines only), Esri Symbology (waterbodies only), and Feature Code Description. All other attributes are from the original NHDPlus dataset. No data values -9999 and -9998 were converted to Null values for many of the flowline fields.What can you do with this layer?Feature layers work throughout the ArcGIS system. Generally your work flow with feature layers will begin in ArcGIS Online or ArcGIS Pro. Below are just a few of the things you can do with a feature service in Online and Pro.ArcGIS OnlineAdd this layer to a map in the map viewer. The layer is limited to scales of approximately 1:1,000,000 or larger but a vector tile layer created from the same data can be used at smaller scales to produce a webmap that displays across the full range of scales. The layer or a map containing it can be used in an application. Change the layer’s transparency and set its visibility rangeOpen the layer’s attribute table and make selections. Selections made in the map or table are reflected in the other. Center on selection allows you to zoom to features selected in the map or table and show selected records allows you to view the selected records in the table.Apply filters. For example you can set a filter to show larger streams and rivers using the mean annual flow attribute or the stream order attribute. Change the layer’s style and symbologyAdd labels and set their propertiesCustomize the pop-upUse as an input to the ArcGIS Online analysis tools. This layer works well as a reference layer with the trace downstream and watershed tools. The buffer tool can be used to draw protective boundaries around streams and the extract data tool can be used to create copies of portions of the data.ArcGIS ProAdd this layer to a 2d or 3d map. Use as an input to geoprocessing. For example, copy features allows you to select then export portions of the data to a new feature class. Change the symbology and the attribute field used to symbolize the dataOpen table and make interactive selections with the mapModify the pop-upsApply Definition Queries to create sub-sets of the layerThis layer is part of the ArcGIS Living Atlas of the World that provides an easy way to explore the landscape layers and many other beautiful and authoritative maps on hundreds of topics.Questions?Please leave a comment below if you have a question about this layer, and we will get back to you as soon as possible.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset and its metadata statement were supplied to the Bioregional Assessment Programme by a third party and are presented here as originally supplied.
This is Version 2 of the Australian Soil Depth of Regolith product of the Soil and Landscape Grid of Australia (produced 2015-06-01). The Soil and Landscape Grid of Australia has produced a range of digital soil attribute products. The digital soil attribute maps are in raster format at a resolution of 3 arc sec (~90 x 90 m pixels). Attribute Definition: The regolith is the in situ and transported material overlying unweathered bedrock; Units: metres; Spatial prediction method: data mining using piecewise linear regression; Period (temporal coverage; approximately): 1900-2013; Spatial resolution: 3 arc seconds (approx 90m); Total number of gridded maps for this attribute:3; Number of pixels with coverage per layer: 2007M (49200 * 40800); Total size before compression: about 8GB; Total size after compression: about 4GB; Data license : Creative Commons Attribution 3.0 (CC By); Variance explained (cross-validation): R^2 = 0.38; Target data standard: GlobalSoilMap specifications; Format: GeoTIFF.
The methodology consisted of the following steps: (i) drillhole data preparation, (ii) compilation and selection of the environmental covariate raster layers and (iii) model implementation and evaluation. Drillhole data preparation: Drillhole data was sourced from the National Groundwater Information System (NGIS) database. This spatial database holds nationally consistent information about bores that were drilled as part of the Bore Construction Licensing Framework (http://www.bom.gov.au/water/groundwater/ngis/). The database contains 357,834 bore locations with associated lithology, bore construction and hydrostratigraphy records. This information was loaded into a relational database to facilitate analysis. Regolith depth extraction: The first step was to recognise and extract the boundary between the regolith and bedrock within each drillhole record. This was done using a key word look-up table of bedrock or lithology related words from the record descriptions. 1,910 unique descriptors were discovered. Using this list of new standardised terms analysis of the drillholes was conducted, and the depth value associated with the word in the description that was unequivocally pointing to reaching fresh bedrock material was extracted from each record using a tool developed in C# code. The second step of regolith depth extraction involved removal of drillhole bedrock depth records deemed necessary because of the "noisiness" in depth records resulting from inconsistencies we found in drilling and description standards indentified in the legacy database. On completion of the filtering and removal of outliers the drillhole database used in the model comprised of 128,033 depth sites. Selection and preparation of environmental covariates The environmental correlations style of DSM applies environmental covariate datasets to predict target variables, here regolith depth. Strongly performing environmental covariates operate as proxies for the factors that control regolith formation including climate, relief, parent material organisms and time (Jenny, 1941 Depth modelling was implemented using the PC-based R-statistical software (R Core Team, 2014), and relied on the R-Cubist package (Kuhn et al. 2013). To generate modelling uncertainty estimates, the following procedures were followed: (i) the random withholding of a subset comprising 20% of the whole depth record dataset for external validation; (ii) Bootstrap sampling 100 times of the remaining dataset to produce repeated model training datasets, each time. The Cubist model was then run repeated times to produce a unique rule set for each of these training sets. Repeated model runs using different training sets, a procedure referred to as bagging or bootstrap aggregating, is a machine learning ensemble procedure designed to improve the stability and accuracy of the model. The Cubist rule sets generated were then evaluated and applied spatially calculating a mean predicted value (i.e. the final map). The 5% and 95% confidence intervals were estimated for each grid cell (pixel) in the prediction dataset by combining the variance from the bootstrapping process and the variance of the model residuals. Version 2 differs from version 1, in that the modelling of depths was performed on the log scale to better conform to assumptions of normality used in calculating the confidence intervals. The method to estimate the confidence intervals was improved to better represent the full range of variability in the modelling process. (Wilford et al, in press)
CSIRO (2015) AUS Soil and Landscape Grid National Soil Attribute Maps - Depth of Regolith (3" resolution) - Release 2. Bioregional Assessment Source Dataset. Viewed 22 June 2018, http://data.bioregionalassessments.gov.au/dataset/c28597e8-8cfc-4b4f-8777-c9934051cce2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attribute definitions and collected data values.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
OverviewWater companies in the UK are responsible for testing the quality of drinking water. This dataset contains the results of samples taken from the taps in domestic households to make sure they meet the standards set out by UK and European legislation. This data shows the location, date, and measured levels of determinands set out by the Drinking Water Inspectorate (DWI).Key Definitions AggregationProcess involving summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes Anonymisation Anonymised data is a type of information sanitisation in which data anonymisation tools encrypt or remove personally identifiable information from datasets for the purpose of preserving a data subject's privacy Dataset Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields. Determinand A constituent or property of drinking water which can be determined or estimated. DWI Drinking Water Inspectorate, an organisation “providing independent reassurance that water supplies in England and Wales are safe and drinking water quality is acceptable to consumers.” DWI Determinands Constituents or properties that are tested for when evaluating a sample for its quality as per the guidance of the DWI. For this dataset, only determinands with “point of compliance” as “customer taps” are included. Granularity Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours ID Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance. LSOA Lower-Level Super Output Area is made up of small geographic areas used for statistical and administrative purposes by the Office for National Statistics. It is designed to have homogeneous populations in terms of population size, making them suitable for statistical analysis and reporting. Each LSOA is built from groups of contiguous Output Areas with an average of about 1,500 residents or 650 households allowing for granular data collection useful for analysis, planning and policy- making while ensuring privacy. ONS Office for National Statistics Open Data Triage The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data. Sample A sample is a representative segment or portion of water taken from a larger whole for the purpose of analysing or testing to ensure compliance with safety and quality standards. Schema Structure for organizing and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute. Units Standard measurements used to quantify and compare different physical quantities. Water Quality The chemical, physical, biological, and radiological characteristics of water, typically in relation to its suitability for a specific purpose, such as drinking, swimming, or ecological health. It is determined by assessing a variety of parameters, including but not limited to pH, turbidity, microbial content, dissolved oxygen, presence of substances and temperature.Data HistoryData Origin These samples were taken from customer taps. They were then analysed for water quality, and the results were uploaded to a database. This dataset is an extract from this database.Data Triage Considerations Granularity Is it useful to share results as averages or individual? We decided to share as individual results as the lowest level of granularity Anonymisation It is a requirement that this data cannot be used to identify a singular person or household. We discussed many options for aggregating the data to a specific geography to ensure this requirement is met. The following geographical aggregations were discussed: • Water Supply Zone (WSZ) - Limits interoperability with other datasets • Postcode – Some postcodes contain very few households and may not offer necessary anonymisation • Postal Sector – Deemed not granular enough in highly populated areas • Rounded Co-ordinates – Not a recognised standard and may cause overlapping areas • MSOA – Deemed not granular enough • LSOA – Agreed as a recognised standard appropriate for England and Wales • Data Zones – Agreed as a recognised standard appropriate for Scotland Data Triage Review Frequency Annually unless otherwise requested Publish FrequencyAnnuallyData Specifications • Each dataset will cover a year of samples in calendar year • This dataset will be published annually • Historical datasets will be published as far back as 2016 from the introduction of The Water Supply (Water Quality) Regulations 2016 • The determinands included in the dataset are as per the list that is required to be reported to the Drinking Water Inspectorate. • A small proportion of samples could not be allocated to an LSOA – these represented less than 0.1% of samples and were removed from the dataset in 2023. • The postcode to LSOA lookup table used for 2022 was not available when 2023 data was processed, see supplementary information for the lookup table applied to each calendar year of data. Context Many UK water companies provide a search tool on their websites where you can search for water quality in your area by postcode. The results of the search may identify the water supply zone that supplies the postcode searched. Water supply zones are not linked to LSOAs which means the results may differ to this dataset. Some sample results are influenced by internal plumbing and may not be representative of drinking water quality in the wider area. Some samples are tested on site and others are sent to scientific laboratories.Supplementary informationBelow is a curated selection of links for additional reading, which provide a deeper understanding of this dataset. 1. Drinking Water Inspectorate Standards and Regulations: https://www.dwi.gov.uk/drinking-water-standards-and-regulations/ 2. LSOA (England and Wales) and Data Zone (Scotland): https://www.nrscotland.gov.uk/files/geography/2011-census/geography-bckground-info-comparison-of-thresholds.pdf 3. Description for LSOA boundaries by the ONS: https://www.ons.gov.uk/methodology/geography/ukgeographies/censusgeographies/census2021geographies4. Postcode to LSOA lookup tables (2022 calendar year data): https://geoportal.statistics.gov.uk/datasets/postcode-to-2021-census-output-area-to-lower-layer-super-output-area-to-middle-layer-super-output-area-to-local-authority-district-august-2023-lookup-in-the-uk/about 5. Postcode to LSOA lookup tables (2023 calendar year data): https://geoportal.statistics.gov.uk/datasets/b8451168e985446eb8269328615dec62/about6. Legislation history: https://www.dwi.gov.uk/water-companies/legislation/
FiVA Dataset
Project page | Paper
News
[2024-12] We will be uploading version 2 during the first week of December (data/full_data_v2). This update includes additional attribute definitions and a more stringent filtering process. [2024-08] The first version of the FiVA dataset has been released.
Data structure
Folder ./data/data_part0 contains an example subset and folder ./data/full_data contains the full data currently, 1.04M images in total. Under each… See the full description on the dataset page: https://huggingface.co/datasets/FiVA/FiVA.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Data Description: This data set contains all records of payments made to vendors by the City of Cincinnati from fiscal year 2014 to present. It includes information such as the department who paid for the service, the reason for payment, and vendor name.
Data Creation: This data is pulled directly from the City's financial software; which centralizes all department financial transactions city wide.
Data Created By: The Cincinnati Financial System (CFS)
Refresh Frequency: Weekly
Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this data set.
Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).
Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the following data and source code and results.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of supplementary data from the paper evaluating a public-private partnership (PPP) in the veterinary sector in Tunisia: the Sanitary Mandate. The evaluation was conducted remotely from March to June 2021. It includes an Excel spreadsheet containing the analysis of interviews conducted during the evaluation, capturing responses from different groups of stakeholders involved in the PPP. The database is structured to support the qualitative evaluation of collaboration within the PPP and includes analysed data from semi-structured interviews with various actors at both regional and national levels. The analysis focuses on the historical and contextual elements of the evaluated PPP as well as the collaborative processes between public and private actors. Additionally, a supplementary Word document accompanies this dataset, containing three appendices: Appendix 1: The different interview guides used for the semi-structured interviews with various stakeholder groups. Appendix 2: Results of the historical and contextual analysis of the evaluated PPP. Appendix 3: Results of the collaborative evaluation, providing feedback from public and private actors at both regional and national levels, with separate analysis for each group and scale. Box: A definition of the different quality attributes of the PPP Process Evaluation Tool.
https://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/datahttps://www.arcgis.com/sharing/rest/content/items/89679671cfa64832ac2399a0ef52e414/data
An in-depth description of the Street Centerline GIS dataset outlining terms of use, update frequency, attribute explanations, and more.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview This dataset offers valuable insights into yearly domestic water consumption across various Lower Super Output Areas (LSOAs) or Data Zones, accompanied by the count of water meters within each area. It is instrumental for analysing residential water use patterns, facilitating water conservation efforts, and guiding infrastructure development and policy making at a localised level. Key Definitions Aggregation The process of summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes. AMR Meter Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from a water meter remotely and periodically. Dataset Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields. Data Zone Data zones are the key geography for the dissemination of small area statistics in Scotland Dumb Meter A dumb meter or analogue meter is read manually. It does not have any external connectivity. Granularity Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours ID Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance. LSOA Lower Layer Super Output Areas (LSOA) are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales. Open Data Triage The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data. Schema Structure for organising and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute. Smart Meter A smart meter is an electronic device that records information and communicates it to the consumer and the supplier. It differs from automatic meter reading (AMR) in that it enables two-way communication between the meter and the supplier. Units Standard measurements used to quantify and compare different physical quantities. Water Meter Water metering is the practice of measuring water use. Water meters measure the volume of water used by residential and commercial building units that are supplied with water by a public water supply system. Data History Data Origin Domestic consumption data is recorded using water meters. The consumption recorded is then sent back to water companies. This dataset is extracted from the water companies. Data Triage Considerations This section discusses the careful handling of data to maintain anonymity and addresses the challenges associated with data updates, such as identifying household changes or meter replacements. Identification of Critical Infrastructure This aspect is not applicable for the dataset, as the focus is on domestic water consumption and does not contain any information that reveals critical infrastructure details. Commercial Risks and Anonymisation Individual Identification Risks There is a potential risk of identifying individuals or households if the consumption data is updated irregularly (e.g., every 6 months) and an out-of-cycle update occurs (e.g., after 2 months), which could signal a change in occupancy or ownership. Such patterns need careful handling to avoid accidental exposure of sensitive information. Meter and Property Association Challenges arise in maintaining historical data integrity when meters are replaced but the property remains the same. Ensuring continuity in the data without revealing personal information is crucial. Interpretation of Null Consumption Instances of null consumption could be misunderstood as a lack of water use, whereas they might simply indicate missing data. Distinguishing between these scenarios is vital to prevent misleading conclusions. Meter Re-reads The dataset must account for instances where meters are read multiple times for accuracy. Joint Supplies & Multiple Meters per Household Special consideration is required for households with multiple meters as well as multiple households that share a meter as this could complicate data aggregation. Schema Consistency with the Energy Industry: In formulating the schema for the domestic water consumption dataset, careful consideration was given to the potential risks to individual privacy. This evaluation included examining the frequency of data updates, the handling of property and meter associations, interpretations of null consumption, meter re-reads, joint suppliers, and the presence of multiple meters within a single household as described above. After a thorough assessment of these factors and their implications for individual privacy, it was decided to align the dataset's schema with the standards established within the energy industry. This decision was influenced by the energy sector's experience and established practices in managing similar risks associated with smart meters. This ensures a high level of data integrity and privacy protection. Schema The dataset schema is aligned with those used in the energy industry, which has encountered similar challenges with smart meters. However, it is important to note that the energy industry has a much higher density of meter distribution, especially smart meters. Aggregation to Mitigate Risks The dataset employs an elevated level of data aggregation to minimise the risk of individual identification. This approach is crucial in maintaining the utility of the dataset while ensuring individual privacy. The aggregation level is carefully chosen to remove identifiable risks without excluding valuable data, thus balancing data utility with privacy concerns. Data Freshness Users should be aware that this dataset reflects historical consumption patterns and does not represent real-time data. Publish Frequency Annually Data Triage Review Frequency An annual review is conducted to ensure the dataset's relevance and accuracy, with adjustments made based on specific requests or evolving data trends. Data Specifications For the domestic water consumption dataset, the data specifications are designed to ensure comprehensiveness and relevance, while maintaining clarity and focus. The specifications for this dataset include: Each dataset encompasses recordings of domestic water consumption as measured and reported by the data publisher. It excludes commercial consumption. Where it is necessary to estimate consumption, this is calculated based on actual meter readings. Meters of all types (smart, dumb, AMR) are included in this dataset. The dataset is updated and published annually. Historical data may be made available to facilitate trend analysis and comparative studies, although it is not mandatory for each dataset release. Context Users are cautioned against using the dataset for immediate operational decisions regarding water supply management. The data should be interpreted considering potential seasonal and weather-related influences on water consumption patterns. The geographical data provided does not pinpoint locations of water meters within an LSOA. The dataset aims to cover a broad spectrum of households, from single-meter homes to those with multiple meters, to accurately reflect the diversity of water use within an LSOA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Overview
This dataset offers valuable insights into yearly domestic water consumption across various Lower Super Output Areas (LSOAs) or Data Zones, accompanied by the count of water meters within each area. It is instrumental for analysing residential water use patterns, facilitating water conservation efforts, and guiding infrastructure development and policy making at a localised level.
Key Definitions
Aggregation
The process of summarising or grouping data to obtain a single or reduced set of information, often for analysis or reporting purposes.
AMR Meter
Automatic meter reading (AMR) is the technology of automatically collecting consumption, diagnostic, and status data from a water meter remotely and periodically.
Dataset
Structured and organised collection of related elements, often stored digitally, used for analysis and interpretation in various fields.
Data Zone
Data zones are the key geography for the dissemination of small area statistics in Scotland
Dumb Meter
A dumb meter or analogue meter is read manually. It does not have any external connectivity.
Granularity
Data granularity is a measure of the level of detail in a data structure. In time-series data, for example, the granularity of measurement might be based on intervals of years, months, weeks, days, or hours
ID
Abbreviation for Identification that refers to any means of verifying the unique identifier assigned to each asset for the purposes of tracking, management, and maintenance.
LSOA
Lower Layer Super Output Areas (LSOA) are a geographic hierarchy designed to improve the reporting of small area statistics in England and Wales.
Open Data Triage
The process carried out by a Data Custodian to determine if there is any evidence of sensitivities associated with Data Assets, their associated Metadata and Software Scripts used to process Data Assets if they are used as Open Data.
Schema
Structure for organising and handling data within a dataset, defining the attributes, their data types, and the relationships between different entities. It acts as a framework that ensures data integrity and consistency by specifying permissible data types and constraints for each attribute.
Smart Meter
A smart meter is an electronic device that records information and communicates it to the consumer and the supplier. It differs from automatic meter reading (AMR) in that it enables two-way communication between the meter and the supplier.
Units
Standard measurements used to quantify and compare different physical quantities.
Water Meter
Water metering is the practice of measuring water use. Water meters measure the volume of water used by residential and commercial building units that are supplied with water by a public water supply system.
Data History
Data Origin
Domestic consumption data is recorded using water meters. The consumption recorded is then sent back to water companies. This dataset is extracted from the water companies.
Data Triage Considerations
This section discusses the careful handling of data to maintain anonymity and addresses the challenges associated with data updates, such as identifying household changes or meter replacements.
Identification of Critical Infrastructure
This aspect is not applicable for the dataset, as the focus is on domestic water consumption and does not contain any information that reveals critical infrastructure details.
Commercial Risks and Anonymisation
Individual Identification Risks
There is a potential risk of identifying individuals or households if the consumption data is updated irregularly (e.g., every 6 months) and an out-of-cycle update occurs (e.g., after 2 months), which could signal a change in occupancy or ownership. Such patterns need careful handling to avoid accidental exposure of sensitive information.
Meter and Property Association
Challenges arise in maintaining historical data integrity when meters are replaced but the property remains the same. Ensuring continuity in the data without revealing personal information is crucial.
Interpretation of Null Consumption
Instances of null consumption could be misunderstood as a lack of water use, whereas they might simply indicate missing data. Distinguishing between these scenarios is vital to prevent misleading conclusions.
Meter Re-reads
The dataset must account for instances where meters are read multiple times for accuracy.
Joint Supplies & Multiple Meters per Household
Special consideration is required for households with multiple meters as well as multiple households that share a meter as this could complicate data aggregation.
Schema Consistency with the Energy Industry:
In formulating the schema for the domestic water consumption dataset, careful consideration was given to the potential risks to individual privacy. This evaluation included examining the frequency of data updates, the handling of property and meter associations, interpretations of null consumption, meter re-reads, joint suppliers, and the presence of multiple meters within a single household as described above.
After a thorough assessment of these factors and their implications for individual privacy, it was decided to align the dataset's schema with the standards established within the energy industry. This decision was influenced by the energy sector's experience and established practices in managing similar risks associated with smart meters. This ensures a high level of data integrity and privacy protection.
Schema
The dataset schema is aligned with those used in the energy industry, which has encountered similar challenges with smart meters. However, it is important to note that the energy industry has a much higher density of meter distribution, especially smart meters.
Aggregation to Mitigate Risks
The dataset employs an elevated level of data aggregation to minimise the risk of individual identification. This approach is crucial in maintaining the utility of the dataset while ensuring individual privacy. The aggregation level is carefully chosen to remove identifiable risks without excluding valuable data, thus balancing data utility with privacy concerns.
Data Freshness
Users should be aware that this dataset reflects historical consumption patterns and does not represent real-time data.
Publish Frequency
Annually
Data Triage Review Frequency
An annual review is conducted to ensure the dataset's relevance and accuracy, with adjustments made based on specific requests or evolving data trends.
Data Specifications
For the domestic water consumption dataset, the data specifications are designed to ensure comprehensiveness and relevance, while maintaining clarity and focus. The specifications for this dataset include:
·
Each
dataset encompasses recordings of domestic water consumption as measured and
reported by the data publisher. It excludes commercial consumption.
· Where it is necessary to estimate consumption, this is calculated based on actual meter readings.
· Meters of all types (smart, dumb, AMR) are included in this dataset.
·
The
dataset is updated and published annually.
·
Historical
data may be made available to facilitate trend analysis and comparative
studies, although it is not mandatory for each dataset release.
Context
Users are cautioned against using the dataset for immediate operational decisions regarding water supply management. The data should be interpreted considering potential seasonal and weather-related influences on water consumption patterns.
The geographical data provided does not pinpoint locations of water meters within an LSOA.
The dataset aims to cover a broad spectrum of households, from single-meter homes to those with multiple meters, to accurately reflect the diversity of water use within an LSOA.
Supplementary Information
Below is a curated selection of links for additional reading, which provide a deeper understanding of this dataset.
Ofwat guidance on water meters
https://www.ofwat.gov.uk/wp-content/uploads/2015/11/prs_lft_101117meters.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These tables list all GIS attributes associated with both the land cover and urban tree canopy results, along with attribute definitions, and associated raster values from the input raster datasets.