This dataset is comprised of the .prj and .cvf files used to build the database for the Virus Particle Exposure in Residences (ViPER) Webtool, a single zone indoor air quality and ventilation analysis tool developed by the National Institute of Standards and Technology (NIST).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Duplicate the Projection.prj file and rename the duplicate to the same name as the ASCII grid, e.g. MAT.asc and MAT.prj. When MAT.asc is imported to ESRI ArcGIS or QGIS, the GIS systems will automatically pick-up the correct grid projection.
This dataset provides shapefile outlines of the 7,150 lakes that had temperature modeled as part of this study. The format is a shapefile for all lakes combined (.shp, .shx, .dbf, and .prj files). A csv file of lake metadata is also included. This dataset is part of a larger data release of lake temperature model inputs and outputs for 7,150 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9CA6XP8).
This dataset provides shapefile of outlines of the 68 lakes where temperature was modeled as part of this study. The format is a shapefile for all lakes combined (.shp, .shx, .dbf, and .prj files). This dataset is part of a larger data release of lake temperature model inputs and outputs for 68 lakes in the U.S. states of Minnesota and Wisconsin (http://dx.doi.org/10.5066/P9AQPIVD).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This replication package contains datasets and scripts related to the paper: "*How do Hugging Face Models Document Datasets, Bias, and Licenses? An Empirical Study*"
statistics.r
: R script used to compute the correlation between usage and downloads, and the RQ1/RQ2 inter-rater agreementsmodelsInfo.zip
: zip file containing all the downloaded model cards (in JSON format)script
: directory containing all the scripts used to collect and process data. For further details, see README file inside the script directory.Dataset/Dataset_HF-models-list.csv
: list of HF models analyzedDataset/Dataset_github-prj-list.txt
: list of GitHub projects using the transformers libraryDataset/Dataset_github-Prj_model-Used.csv
: contains usage pairs: project, modelDataset/Dataset_prj-num-models-reused.csv
: number of models used by each GitHub projectDataset/Dataset_model-download_num-prj_correlation.csv
contains, for each model used by GitHub projects: the name, the task, the number of reusing projects, and the number of downloadsRQ1/RQ1_dataset-list.txt
: list of HF datasetsRQ1/RQ1_datasetSample.csv
: sample set of models used for the manual analysis of datasetsRQ1/RQ1_analyzeDatasetTags.py
: Python script to analyze model tags for the presence of datasets. it requires to unzip the modelsInfo.zip
in a directory with the same name (modelsInfo
) at the root of the replication package folder. Produces the output to stdout. To redirect in a file fo be analyzed by the RQ2/countDataset.py
scriptRQ1/RQ1_countDataset.py
: given the output of RQ2/analyzeDatasetTags.py
(passed as argument) produces, for each model, a list of Booleans indicating whether (i) the model only declares HF datasets, (ii) the model only declares external datasets, (iii) the model declares both, and (iv) the model is part of the sample for the manual analysisRQ1/RQ1_datasetTags.csv
: output of RQ2/analyzeDatasetTags.py
RQ1/RQ1_dataset_usage_count.csv
: output of RQ2/countDataset.py
RQ2/tableBias.pdf
: table detailing the number of occurrences of different types of bias by model TaskRQ2/RQ2_bias_classification_sheet.csv
: results of the manual labelingRQ2/RQ2_isBiased.csv
: file to compute the inter-rater agreement of whether or not a model documents BiasRQ2/RQ2_biasAgrLabels.csv
: file to compute the inter-rater agreement related to bias categoriesRQ2/RQ2_final_bias_categories_with_levels.csv
: for each model in the sample, this file lists (i) the bias leaf category, (ii) the first-level category, and (iii) the intermediate categoryRQ3/RQ3_LicenseValidation.csv
: manual validation of a sample of licensesRQ3/RQ3_{NETWORK-RESTRICTIVE|RESTRICTIVE|WEAK-RESTRICTIVE|PERMISSIVE}-license-list.txt
: lists of licenses with different permissivenessRQ3/RQ3_prjs_license.csv
: for each project linked to models, among other fields it indicates the license tag and nameRQ3/RQ3_models_license.csv
: for each model, indicates among other pieces of info, whether the model has a license, and if yes what kind of licenseRQ3/RQ3_model-prj-license_contingency_table.csv
: usage contingency table between projects' licenses (columns) and models' licenses (rows)RQ3/RQ3_models_prjs_licenses_with_type.csv
: pairs project-model, with their respective licenses and permissiveness levelContains the scripts used to mine Hugging Face and GitHub. Details are in the enclosed README
The datasets in this zip file are in support of Intelligent Transportation Systems Joint Program Office (ITS JPO) report FHWA-JPO-16-385, "Analysis, Modeling, and Simulation (AMS) Testbed Development and Evaluation to Support Dynamic Mobility Applications (DMA) and Active Transportation and Demand Management (ATDM) Programs — Evaluation Report for ATDM Program," https://rosap.ntl.bts.gov/view/dot/32520 and FHWA-JPO-16-373, "Analysis, modeling, and simulation (AMS) testbed development and evaluation to support dynamic mobility applications (DMA) and active transportation and demand management (ATDM) programs : Dallas testbed analysis plan," https://rosap.ntl.bts.gov/view/dot/32106 The files in this zip file are specifically related to the Dallas Testbed. The compressed zip files total 2.2 GB in size. The files have been uploaded as-is; no further documentation was supplied by NTL. All located .docx files were converted to .pdf document files which are an open, archival format. These pdfs were then added to the zip file alongside the original .docx files. These files can be unzipped using any zip compression/decompression software. This zip file contains files in the following formats: .pdf document files which can be read using any pdf reader; .cvs text files which can be read using any text editor; .txt text files which can be read using any text editor; .docx document files which can be read in Microsoft Word and some other word processing programs; . xlsx spreadsheet files which can be read in Microsoft Excel and some other spreadsheet programs; .dat data files which may be text or multimedia; as well as GIS or mapping files in the fowlling formats: .mxd, .dbf, .prj, .sbn, .shp., .shp.xml; which may be opened in ArcGIS or other GIS software. [software requirements] These files were last accessed in 2017.
This dataset contains shapefiles outlining 558 neighborhoods in 50 major cities in New York state, notably including Albany, Buffalo, Ithaca, New York City, Rochester, and Syracuse. This adds context to your datasets by identifying the neighborhood of any locations you have, as coordinates on their own don't carry a lot of information.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too. What fields does it include? What's the time period of the data and how was it collected?
Four files are included containing data about the shapes: an SHX file, a DBF file, an SHP file, and a PRJ file. Including all of them in your input data are necessary, as they all contain pieces of the data; one file alone will not have everything that you need.
Seeing how none of these files are plaintext, it can be a little difficult to get set up with them. I highly recommend using mapshaper.org to get started- this site will show you the boundaries drawn on a plane, as well as allow you to export the files in a number of different formats (e.g. GeoJSON, CSV) if you are unable to use them in the format they are provided in. Personally, I have found it easier to work with the shapefile format though.
To get started with the shapefile in R, you can use the the rgdal and rgeos packages. To see an example of these being used, be sure to check out my kernel, "Incorporating neighborhoods into your model".
These files were provided by Zillow and are available under a Creative Commons license.
I'll be using these in the NYC Taxi Trip Duration competition to add context to the pickup and dropoff locations of the taxi rides and hopefully greatly improve my predictions.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset provides shapefile outlines of the 881 lakes that had temperature modeled as part of this study. The format is a shapefile for all lakes combined (.shp, .shx, .dbf, and .prj files). A csv file of lake metadata is also included. This dataset is part of a larger data release of lake temperature model inputs and outputs for 881 lakes in the U.S. state of Minnesota (https://doi.org/10.5066/P9PPHJE2).
GeoJunxion‘s ZIP+4 is a complete dataset based on US postal data consisting of plus 35 millions of polygons. The dataset is NOT JUST a table of spot data, which can be downloaded as csv or other text file as it happens with other suppliers. The data can be delivered as shapefile through a single RAW data delivery or through an API.
The January 2021 USPS data source has significantly changed since the previous delivery. Some States have sizably lower ZIP+4 totals across all counties when compared with previous deliveries due to USPS parcelpoint cleanup, while other States have a significant increase in ZIP+4 totals across all counties due to cleanup and other rezoning. California and North Carolina in particular have several new ZIP5s, contributing to the increase in distinct ZIPs and ZIP+4s.
GeoJunxion‘s ZIP+4 data can be used as an additional layer on an existing map to run customer or other analysis, e.g. who is my customer who not, what is the density of my customer base in a certain ZIP+4 etc.
Information can be put into visual context, due to the polygons, which is good for complex overviews or management decisions. CRM data can be enriched with the ZIP+4 to have more detailed customer information.
Key specifications:
Topologized ZIP polygons
GeoJunxion ZIP+4 polygons follow USPS postal codes
ZIP+4 code polygons:
ZIP5 attributes
State codes.
Overlapping ZIP+4
boundaries for multiple ZIP+4 addresses on one area
Updated USPS source (January 2021)
Distinct ZIP5 codes: 34 731
Distinct ZIP+4 codes: 35 146 957
The ZIP + 4 polygons are delivered in Esri shapefile format. This format allows the storage of geometry and attribute information for each of the features.
The four components for the shapefile data are:
.shp – This file stores the geometry of the feature
.shx –This file stores an index that stores the feature geometry
.dbf –This file stores attribute information relating to individual features
.prj –This file stores projection information associated with features
Current release version 2021. Earlier versions from previous years available on request.
Climate change has been shown to influence lake temperatures in different ways. To better understand the diversity of lake responses to climate change and give managers tools to manage individual lakes, we focused on improving prediction accuracy for daily water temperature profiles in 7,150 lakes in Minnesota and Wisconsin during 1980-2019.
The data are organized into these items:
This study was funded by the Department of the Interior Northeast and North Central Climate Adaptation Science Centers. Access to computing facilities was provided by USGS Core Science Analytics and Synthesis Advanced Research Computing, USGS Yeti Supercomputer (https://doi.org/10.5066/F7D798MJ).
This is a compiled geospatial dataset in ESRI polygon shapefile format of ultramafic soils of the Americas showing the location of ultramafic soils in Canada, the United States of America, Mexico, Guatemala, Cuba, Dominican Republic, Puerto Rico, Costa Rica, Colombia, Argentina, Chile, Venezuela, Ecuador, Brazil, Suriname, French Guiana, and Bolivia. The R code used to compile the dataset as well as an image of the compiled dataset are also included. , The data are derived from ten geospatial datasets. Original datasets were subset to include only ultramafic areas, datasets were assigned a common projection (WGS84), attribute tables were reconciled to a common set of fields, and the datasets were combined., , README: Geodatabase of ultramafic soils of the Americas
Author: Catherine Hulshof, Virginia Commonwealth University, cmhulshof@vcu.edu
Abstract: This is a compiled geospatial dataset in ESRI polygon shapefile format of ultramafic soils of many countries in the Americas showing the location of ultramafic soils in Canada, the United States of America, Guatemala, Cuba, Dominican Republic, Puerto Rico, Costa Rica, Colombia, Argentina, Chile, Venezuela, Ecuador, Brazil, Suriname, French Guiana, and Bolivia. The data are derived from nine geospatial datasets. Original datasets were subset to include only ultramafic areas, datasets were assigned a common projection (WGS84), attribute tables were reconciled to a common set of fields, and the datasets were combined.
Contents: The data are in ESRI shapefile format and thus have four components with extensions .shp, .shx, .prj, and .dbf. The .shp file contains the feature geometries, the .prj file contains the geographic coordin...
This dataset mainly includes the twice a day (ascending-descending orbit) brightness temperature (K) of the space-borne microwave radiometers SSM / I and SSMIS carried by the US Defense Meteorological Satellite Program satellites (DMSP-F08, DMSP-F11, DMSP-F13, and DMSP-F17), time coverage from September 15, 1987 to December 31, 2015. The SSM/I brightness temperature of DMSP-F08, DMSP-F11 and DMSP-F13 include 7 channels: 19.35H, 19.35V, 22.24V, 37.05H, 37.05V, 85.50H and 85.50V; The SSMIS brightness temperature observation of DMSP-F17 consists of seven channels: 19.35H, 19.35V, 22.24V, 37.05H, 37.05V, 91.66H and 91.66v. Among them, DMSP-F08 satellite brightness temperature coverage time is from September 15, 1987 to December 31, 1991; DMSP-F11 satellite brightness temperature coverage time is from January 1, 1992 to December 31, 1995; The coverage time of DMSP-F13 satellite brightness temperature is from January 1, 1996 to April 29, 2009; The coverage time of DMSP-F17 satellite brightness temperature is from January 1, 2009 to December 31, 2015. 1. File format and naming: The brightness temperature is stored separately in units of years, and each directory is composed of remote sensing data files of each frequency, and the SSMIS data also contains the .TIM time information file. The data file names and their naming rules are as follows: EASE-Fnn-ML / HyyyydddA / D.subset.ccH / V (remote sensing data) EASE-Fnn-ML / HyyyydddA / D.subset.TIM (time information file) Among them: EASE stands for EASE-Grid projection method; Fnn stands for satellite number (F08, F11, F13, F17); ML / H stands for multi-channel low-resolution and multi-channel high-resolution respectively; yyyy represents the year; ddd represents Julian Day of the year (1-365 / 366); A / D stands for ascending (A) and descending (D) respectively; subset represents brightness temperature data in China; cc represents frequency (19.35GHz, 22.24 GHz, 37.05GHz, (85.50GHz, 91.66GHz); H / V stands for horizontal polarization (H) and vertical polarization (V), respectively. 2. Coordinate system and projection: The projection method of this data set is EASE-Grid, which is an equal area secant cylindrical projection, and the double standard parallels are 30 ° north and south. For more information about EASE-GRID, please refer to http://www.ncgia.ucsb.edu/globalgrids-book/ease_grid/. If you need to convert the EASE-Grid projection to Geographic projection, please refer to the ease2geo.prj file, the content is as follows: Input projection cylindrical units meters parameters 6371228 6371228 1 / * Enter projection type (1, 2, or 3) 0 00 00 / * Longitude of central meridian 30 00 00 / * Latitude of standard parallel Output Projection GEOGRAPHIC Spheroid KRASovsky Units dd parameters end 3. Data format: Stored as integer binary, Row number: 308 *166,each data occupies 2 bytes. The actual data stored in this dataset is the brightness temperature * 10. After reading the data, you need to divide by 10 to get the real brightness temperature. 4. Data resolution: Spatial resolution: 25.067525km, 12.5km (SSM / I 85GHz, SSMIS 91GHz) Time resolution: daily, from 1978 to 2015. 5. Spatial range: Longitude: 60.1 ° -140.0 ° east longitude; Latitude: 14.9 ° -55.0 ° north latitude. 6. Data reading: Remote sensing image data files in each set of data can be opened in ArcMap, ENVI and ERDAS software.
The Ontario government, generates and maintains thousands of datasets. Since 2012, we have shared data with Ontarians via a data catalogue. Open data is data that is shared with the public. Click here to learn more about open data and why Ontario releases it. Ontario’s Open Data Directive states that all data must be open, unless there is good reason for it to remain confidential. Ontario’s Chief Digital and Data Officer also has the authority to make certain datasets available publicly. Datasets listed in the catalogue that are not open will have one of the following labels: If you want to use data you find in the catalogue, that data must have a licence – a set of rules that describes how you can use it. A licence: Most of the data available in the catalogue is released under Ontario’s Open Government Licence. However, each dataset may be shared with the public under other kinds of licences or no licence at all. If a dataset doesn’t have a licence, you don’t have the right to use the data. If you have questions about how you can use a specific dataset, please contact us. The Ontario Data Catalogue endeavors to publish open data in a machine readable format. For machine readable datasets, you can simply retrieve the file you need using the file URL. The Ontario Data Catalogue is built on CKAN, which means the catalogue has the following features you can use when building applications. APIs (Application programming interfaces) let software applications communicate directly with each other. If you are using the catalogue in a software application, you might want to extract data from the catalogue through the catalogue API. Note: All Datastore API requests to the Ontario Data Catalogue must be made server-side. The catalogue's collection of dataset metadata (and dataset files) is searchable through the CKAN API. The Ontario Data Catalogue has more than just CKAN's documented search fields. You can also search these custom fields. You can also use the CKAN API to retrieve metadata about a particular dataset and check for updated files. Read the complete documentation for CKAN's API. Some of the open data in the Ontario Data Catalogue is available through the Datastore API. You can also search and access the machine-readable open data that is available in the catalogue. How to use the API feature: Read the complete documentation for CKAN's Datastore API. The Ontario Data Catalogue contains a record for each dataset that the Government of Ontario possesses. Some of these datasets will be available to you as open data. Others will not be available to you. This is because the Government of Ontario is unable to share data that would break the law or put someone's safety at risk. You can search for a dataset with a word that might describe a dataset or topic. Use words like “taxes” or “hospital locations” to discover what datasets the catalogue contains. You can search for a dataset from 3 spots on the catalogue: the homepage, the dataset search page, or the menu bar available across the catalogue. On the dataset search page, you can also filter your search results. You can select filters on the left hand side of the page to limit your search for datasets with your favourite file format, datasets that are updated weekly, datasets released by a particular organization, or datasets that are released under a specific licence. Go to the dataset search page to see the filters that are available to make your search easier. You can also do a quick search by selecting one of the catalogue’s categories on the homepage. These categories can help you see the types of data we have on key topic areas. When you find the dataset you are looking for, click on it to go to the dataset record. Each dataset record will tell you whether the data is available, and, if so, tell you about the data available. An open dataset might contain several data files. These files might represent different periods of time, different sub-sets of the dataset, different regions, language translations, or other breakdowns. You can select a file and either download it or preview it. Make sure to read the licence agreement to make sure you have permission to use it the way you want. Read more about previewing data. A non-open dataset may be not available for many reasons. Read more about non-open data. Read more about restricted data. Data that is non-open may still be subject to freedom of information requests. The catalogue has tools that enable all users to visualize the data in the catalogue without leaving the catalogue – no additional software needed. Have a look at our walk-through of how to make a chart in the catalogue. Get automatic notifications when datasets are updated. You can choose to get notifications for individual datasets, an organization’s datasets or the full catalogue. You don’t have to provide and personal information – just subscribe to our feeds using any feed reader you like using the corresponding notification web addresses. Copy those addresses and paste them into your reader. Your feed reader will let you know when the catalogue has been updated. The catalogue provides open data in several file formats (e.g., spreadsheets, geospatial data, etc). Learn about each format and how you can access and use the data each file contains. A file that has a list of items and values separated by commas without formatting (e.g. colours, italics, etc.) or extra visual features. This format provides just the data that you would display in a table. XLSX (Excel) files may be converted to CSV so they can be opened in a text editor. How to access the data: Open with any spreadsheet software application (e.g., Open Office Calc, Microsoft Excel) or text editor. Note: This format is considered machine-readable, it can be easily processed and used by a computer. Files that have visual formatting (e.g. bolded headers and colour-coded rows) can be hard for machines to understand, these elements make a file more human-readable and less machine-readable. A file that provides information without formatted text or extra visual features that may not follow a pattern of separated values like a CSV. How to access the data: Open with any word processor or text editor available on your device (e.g., Microsoft Word, Notepad). A spreadsheet file that may also include charts, graphs, and formatting. How to access the data: Open with a spreadsheet software application that supports this format (e.g., Open Office Calc, Microsoft Excel). Data can be converted to a CSV for a non-proprietary format of the same data without formatted text or extra visual features. A shapefile provides geographic information that can be used to create a map or perform geospatial analysis based on location, points/lines and other data about the shape and features of the area. It includes required files (.shp, .shx, .dbt) and might include corresponding files (e.g., .prj). How to access the data: Open with a geographic information system (GIS) software program (e.g., QGIS). A package of files and folders. The package can contain any number of different file types. How to access the data: Open with an unzipping software application (e.g., WinZIP, 7Zip). Note: If a ZIP file contains .shp, .shx, and .dbt file types, it is an ArcGIS ZIP: a package of shapefiles which provide information to create maps or perform geospatial analysis that can be opened with ArcGIS (a geographic information system software program). A file that provides information related to a geographic area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open using a GIS software application to create a map or do geospatial analysis. It can also be opened with a text editor to view raw information. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format for sharing data in a machine-readable way that can store data with more unconventional structures such as complex lists. How to access the data: Open with any text editor (e.g., Notepad) or access through a browser. Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A text-based format to store and organize data in a machine-readable way that can store data with more unconventional structures (not just data organized in tables). How to access the data: Open with any text editor (e.g., Notepad). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. A file that provides information related to an area (e.g., phone number, address, average rainfall, number of owl sightings in 2011 etc.) and its geospatial location (i.e., points/lines). How to access the data: Open with a geospatial software application that supports the KML format (e.g., Google Earth). Note: This format is machine-readable, and it can be easily processed and used by a computer. Human-readable data (including visual formatting) is easy for users to read and understand. This format contains files with data from tables used for statistical analysis and data visualization of Statistics Canada census data. How to access the data: Open with the Beyond 20/20 application. A database which links and combines data from different files or applications (including HTML, XML, Excel, etc.). The database file can be converted to a CSV/TXT to make the data machine-readable, but human-readable formatting will be lost. How to access the data: Open with Microsoft Office Access (a database management system used to develop application software). A file that keeps the original layout and
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set provides a grid of quads and projection information to be used for rover operations and the informal geographic naming convention for the regional geography of Oxia Planum. Both subject to update prior to the landed mission.Contents This data set contains 4 shapefiles and 1 zipped folder.OxiaPlanum_GeographicFeatures_2021_08_26. Point shapefile with the names of geographic features last updated at the date indicatedOxiaPlanum_GeographicRegions_2021_08_26. Polygon shapefile with the outlines of geographic regions fitted to the master quad grid and last updated at the date indicated.OxiaPlanum_QuadGrid_1km. Polygon shapefile of 1km quad that will be used for ExoMars rover missionOxiaPlanum_Origin_clong_335_45E_18_20N. The center point of the Oxia Planum as defined by the Rover Operations and Control center and origin point used for the Quad gridCRS_PRJ_Equirectangular_OxiaPlanum_Mars2000.zip. Zip folder containing the projection information use for all the data associated with this study. These are saved in the ESRI projection (.prj) and well know text formal (.wkt)Guide to individual filesFile name (example) Description OxiaPlanum_QuadGrid_1km.cpg Text display information OxiaPlanum_QuadGrid_1km.dbf Database file OxiaPlanum_QuadGrid_1km.prj Projection information OxiaPlanum_QuadGrid_1km.sbx Spatial index file OxiaPlanum_QuadGrid_1km.shp Shape file data
This dataset provides site locations as shapefile points. The format is a shapefile for all sites combined (.shp, .shx, .dbf, and .prj files). This dataset is part of a larger data release of metabolism model inputs and outputs for 356 streams and rivers across the United States (https://doi.org/10.5066/F70864KX). The complete release includes: modeled estimates of gross primary productivity, ecosystem respiration, and the gas exchange coefficient; model input data and alternative input data; model fit and diagnostic information; site catchment boundaries and site point locations; and potential predictors of metabolism such as discharge and light availability.
This data set contains spatial data that represent the results of data worth analyses based on linear prediction uncertainty analysis and using the original GABtran groundwater flow model. Datasets with the suffix 'increase 'represent the data worth of observations calculated from their inclusion in a model calibration process. Datasets with the suffix 'decrease 'represent the data worth of observations calculated from their removal from a model calibration process. The remaining part of the filename indicates for which GABWRA reporting region the dataset relates. Projection information is in the file GABWRA.prj. Cell size is 5000m x 5000m'No data' value is -9999 This data and metadata were produced by CSIRO for the Great Artesian Basin Water Resource Assessment. The data is used in figures 5.10-5.16 of Welsh WD, Moore CR, Turnadge CJ, Smith AJ and Barr TM (2012) 'Modelling of climate and groundwater development. A technical report to the Australian Government from the CSIRO Great Artesian Basin Water Resource Assessment '. CSIRO Water for a Healthy Country Flagship, Australia.Projection is Albers equal area conic, with central meridian 143 degrees longitude, standard parallels at -21 and -29 degrees latitude and latitude of projection's origin at -25.
This dataset provides shapefile outlines of the catchments contributing to sites where metabolism was or could have been estimated. The format is a shapefile for all sites combined (.shp, .shx, .dbf, and .prj files). This dataset is part of a larger data release of metabolism model inputs and outputs for 356 streams and rivers across the United States (https://doi.org/10.5066/F70864KX). The complete release includes: modeled estimates of gross primary productivity, ecosystem respiration, and the gas exchange coefficient; model input data and alternative input data; model fit and diagnostic information; site catchment boundaries and site point locations; and potential predictors of metabolism such as discharge and light availability.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Global prevalence of non-perennial rivers and streamsJune 2021prepared by Mathis L. Messager (mathis.messager@mail.mcgill.ca)Bernhard Lehner (bernhard.lehner@mcgill.ca)1. Overview and background 2. Repository content3. Data format and projection4. License and citations4.1 License agreement4.2 Citations and acknowledgements1. Overview and backgroundThis documentation describes the data produced for the research article: Messager, M. L., Lehner, B., Cockburn, C., Lamouroux, N., Pella, H., Snelder, T., Tockner, K., Trautmann, T., Watt, C. & Datry, T. (2021). Global prevalence of non-perennial rivers and streams. Nature. https://doi.org/10.1038/s41586-021-03565-5In this study, we developed a statistical Random Forest model to produce the first reach-scale estimate of the global distribution of non-perennial rivers and streams. For this purpose, we linked quality-checked observed streamflow data from 5,615 gauging stations (on 4,428 perennial and 1,187 non-perennial reaches) with 113 candidate environmental predictors available globally. Predictors included variables describing climate, physiography, land cover, soil, geology, and groundwater as well as estimates of long-term naturalised (i.e., without anthropogenic water use in the form of abstractions or impoundments) mean monthly and mean annual flow (MAF), derived from a global hydrological model (WaterGAP 2.2; Müller Schmied et al. 2014). Following model training and validation, we predicted the probability of flow intermittence for all river reaches in the RiverATLAS database (Linke et al. 2019), a digital representation of the global river network at high spatial resolution.The data repository includes two datasets resulting from this study:1. a geometric network of the global river system where each river segment is associated with:i. 113 hydro-environmental predictors used in model development and predictions, andii. the probability and class of flow intermittence predicted by the model.2. point locations of the 5,516 gauging stations used in model training/testing, where each station is associated with a line segment representing a reach in the river network, and a set of metadata.These datasets have been generated with source code located at messamat.github.io/globalirmap/.Note that, although several attributes initially included in RiverATLAS version 1.0 have been updated for this study, the dataset provided here is not an established new version of RiverATLAS. 2. Repository contentThe data repository has the following structure (for usage, see section 3. Data Format and Projection; GIRES stands for Global Intermittent Rivers and Ephemeral Streams):— GIRES_v10_gdb.zip/ : file geodatabase in ESRI® geodatabase format containing two feature classes (zipped) |——— GIRES_v10_rivers : river network lines |——— GIRES_v10_stations : points with streamflow summary statistics and metadata— GIRES_v10_shp.zip/ : directory containing ten shapefiles (zipped) Same content as GIRES_v10_gdb.zip for users that cannot read ESRI geodatabases (tiled by region due to size limitations). |——— GIRES_v10_rivers_af.shp : Africa |——— GIRES_v10_rivers_ar.shp : North American Arctic |——— GIRES_v10_rivers_as.shp : Asia |——— GIRES_v10_rivers_au.shp : Australasia|——— GIRES_v10_rivers_eu.shp : Europe|——— GIRES_v10_rivers_gr.shp : Greenland|——— GIRES_v10_rivers_na.shp : North America|——— GIRES_v10_rivers_sa.shp : South America|——— GIRES_v10_rivers_si.shp : Siberia|——— GIRES_v10_stations.shp : points with streamflow summary statistics and metadata— Other_technical_documentations.zip/ : directory containing three documentation files (zipped)|——— HydroATLAS_TechDoc_v10.pdf : documentation for river network framework|——— RiverATLAS_Catalog_v10.pdf : documentation for river network hydro-environmental attributes|——— Readme_GSIM_part1.txt : documentation for gauging stations from the Global Streamflow Indices and Metadata (GSIM) archive— README_Technical_documentation_GIRES_v10.pdf : full documentation for this repository3. Data format and projectionThe geometric network (lines) and gauging stations (points) datasets are distributed both in ESRI® file geodatabase and shapefile formats. The file geodatabase contains all data and is the prime, recommended format. Shapefiles are provided as a copy for users that cannot read the geodatabase. Each shapefile consists of five main files (.dbf, .sbn, .sbx, .shp, .shx), and projection information is provided in an ASCII text file (.prj). The attribute table can be accessed as a stand-alone file in dBASE format (.dbf) which is included in the Shapefile format. These datasets are available electronically in compressed zip file format. To use the data files, the zip files must first be decompressed.All data layers are provided in geographic (latitude/longitude) projection, referenced to datum WGS84. In ESRI® software this projection is defined by the geographic coordinate system GCS_WGS_1984 and datum D_WGS_1984 (EPSG: 4326).4. License and citations4.1 License agreement This documentation and datasets are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (CC-BY-4.0 License). For all regulations regarding license grants, copyright, redistribution restrictions, required attributions, disclaimer of warranty, indemnification, liability, waiver of damages, and a precise definition of licensed materials, please refer to the License Agreement (https://creativecommons.org/licenses/by/4.0/legalcode). For a human-readable summary of the license, please see https://creativecommons.org/licenses/by/4.0/.4.2 Citations and acknowledgements.Citations and acknowledgements of this dataset should be made as follows:Messager, M. L., Lehner, B., Cockburn, C., Lamouroux, N., Pella, H., Snelder, T., Tockner, K., Trautmann, T., Watt, C. & Datry, T. (2021). Global prevalence of non-perennial rivers and streams. Nature. https://doi.org/10.1038/s41586-021-03565-5 We kindly ask users to cite this study in any published material produced using it. If possible, online links to this repository (https://doi.org/10.6084/m9.figshare.14633022) should also be provided.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The folder contains the output data relative to the paper:
Citrini, A., Sangiorgio, M., and Rosa, L. Global multi-model trends of unsustainable irrigation water consumption under 21st century climate change scenarios
Contact: lrosa@carnegiescience.edu
##List of files
- Irrigation_regions.zip: Geospatial extent of irrigation regions (WGS 1984, ESRI shapefile)
## Shapefile Structure
The shapefile includes the following files:
- `Irrigation_regions.shp`: Geometry of the objects.
- `Irrigation_regions.shx`: Geometry index.
- `Irrigation_regions.dbf`: Database of attributes associated with the geometry.
- `Irrigation_regions.prj`: Projection file.
- `Irrigation_regions.cpg`: Character encoding file.
## Shapefile Attributes
The attributes present in the `.dbf` file are described below:
- **ID**: [Long] - ID Irrigation region
- **Name**: [Text] - Name Irrigation region
- **Country1**: [Text] - Main Country covered by the irrigation region (according to the covered area) (ISO3166-1 alpha-3)
- **Country2**: [Text] - Other Countries covered by the irrigation region (ISO3166-1 alpha-3)
- **Continent**: [Text] - Continent covered by the irrigated region (AF: Africa, AS: Asia, AU: Oceania, EU: Europe, NA: North America, SA: South America)
- **Area_sqkm**: [Double] - Geodesic area of Irrigation region in km2
- UIWC_Irrigation_regions.xlsx: Multi-model results of Unsustainable Irrigation Water Consumption (UIWC) volumes (km3/year) aggregated by irrigation region for each decade of 21st century under SSP1-2.6 and SSP5-8.5 scenarios
- Reliance_Irrigation_regions.xlsx: Multi-model results of Reliance on Unsustainable Irrigation Water Consumption (%) aggregated by irrigation region for each decade of 21st century under SSP1-2.6 and SSP5-8.5 scenarios
- Irr_cons_Irrigation_region.xlsx: Multi-model results of Irrigation Water Consumption volumes (km3/year) aggregated by irrigation region for each decade of 21st century under SSP1-2.6 and SSP5-8.5 scenarios
- Share_Irr_tot_cons_Irrigation_regions.xlsx: Multi-model results of Share of Irrigation Water Consumption over Total Water Consumption (%) aggregated by irrigation region for each decade of 21st century under SSP1-2.6 and SSP5-8.5 scenarios
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset consist of two vector files which show the change in the building stock of the City of DaNang retrieved from satellite image analysis. Buildings were first identified from a Pléiades satellite image from 24.10.2015 and classified into 9 categories in a semi-automatic workflow desribed by Warth et al. (2019) and Vetter-Gindele et al. (2019).
In a second step, these buildings were inspected for changes based on a second Pléiades satellite image acquired on 13.08.2017 based on visual interpretation. Changes were also classified into 5 categories and aggregated by administrative wards (first dataset: adm) and a hexagon grid of 250 meter length (second dataset: hex).
The full workflow of the generation of this dataset, including a detailled description of its contents and a discussion on its potential use is published by Braun et al. 2020: Changes in the building stock of DaNang between 2015 and 2017
Contents
Both datasets (adm and hex) are stored as ESRI shapefiles which can be used in common Geographic Information Systems (GIS) and consist of the following parts:
shp: polygon geometries (geometries of the administrative boundaries and hexagons)
dbf: attribute table (containing the number of buildings per class for 2015 and 2017 and the underlying changes (e.g. number of new buildings, number of demolished buildings, ect.)
shx: index file combining the geometries with the attributes
cpg: encoding of the attributes (UTF-8)
prj: spatial reference of the datasets (UTM zone 49 North, EPSG:32649) for ArcGIS
qpj: spatial reference of the datasets (UTM zone 49 North, EPSG:32649) for QGIS
lyr: symbology suggestion for the polygons(predefined is the number of local type shophouses in 2017) for ArcGIS
qml: symbology suggestion for the polygons (predefined is the number of new buildings between 2015 and 2017) for QGIS
Citation and documentation
To cite this dataset, please refer to the publication
Braun, A.; Warth, G.; Bachofer, F.; Quynh Bui, T.T.; Tran, H.; Hochschild, V. (2020): Changes in the Building Stock of Da Nang between 2015 and 2017. Data, 5, 42. doi:10.3390/data5020042
This article contains a detailed description of the dataset, the defined building type classes and the types of changes which were analyzed. Furthermore, the article makes recommendations on the use of the datasets and discusses potential error sources.
This dataset is comprised of the .prj and .cvf files used to build the database for the Virus Particle Exposure in Residences (ViPER) Webtool, a single zone indoor air quality and ventilation analysis tool developed by the National Institute of Standards and Technology (NIST).