Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
Facebook
TwitterOur dataset delivers unprecedented scale and diversity for geospatial AI training:
🌍 Massive scale: 165,000 unique 3D map sequences and locations, 82,500,000 images, 0.73 PB of Data, orders of magnitude larger than datasets currently used for SOTA Vision/Spatial Models.
⏱️ Constantly growing dataset: 12k new 3D Map sequences and locations monthly.
📷 Full-frame, high-res captures: OVER retains full-resolution, dynamic aspect-ratio images with complete Exif metadata (GPS, timestamp, device orientation), multiple resolutions 1920x1080 - 3840x2880, pre-computed COLMAP poses.
🧭 Global diversity: Environments span urban, suburban, rural, and natural settings across 120+ countries, capturing architectural, infrastructural, and environmental variety.
📐 Rich metadata: Per-image geolocation (±3 m accuracy), timestamps, device pose, COLMAP pose; per-map calibration data (camera intrinsics/extrinsics).
🧠 Applications: Spatial Models Training, Multi-view stereo & NeRF/3DGS training, semantic segmentation, novel view synthesis, 3D object detection, geolocation, urban planning, AR/VR, autonomous navigation.
Facebook
TwitterThe establishment of a BES Multi-User Geodatabase (BES-MUG) allows for the storage, management, and distribution of geospatial data associated with the Baltimore Ecosystem Study. At present, BES data is distributed over the internet via the BES website. While having geospatial data available for download is a vast improvement over having the data housed at individual research institutions, it still suffers from some limitations. BES-MUG overcomes these limitations; improving the quality of the geospatial data available to BES researches, thereby leading to more informed decision-making. BES-MUG builds on Environmental Systems Research Institute's (ESRI) ArcGIS and ArcSDE technology. ESRI was selected because its geospatial software offers robust capabilities. ArcGIS is implemented agency-wide within the USDA and is the predominant geospatial software package used by collaborating institutions. Commercially available enterprise database packages (DB2, Oracle, SQL) provide an efficient means to store, manage, and share large datasets. However, standard database capabilities are limited with respect to geographic datasets because they lack the ability to deal with complex spatial relationships. By using ESRI's ArcSDE (Spatial Database Engine) in conjunction with database software, geospatial data can be handled much more effectively through the implementation of the Geodatabase model. Through ArcSDE and the Geodatabase model the database's capabilities are expanded, allowing for multiuser editing, intelligent feature types, and the establishment of rules and relationships. ArcSDE also allows users to connect to the database using ArcGIS software without being burdened by the intricacies of the database itself. For an example of how BES-MUG will help improve the quality and timeless of BES geospatial data consider a census block group layer that is in need of updating. Rather than the researcher downloading the dataset, editing it, and resubmitting to through ORS, access rules will allow the authorized user to edit the dataset over the network. Established rules will ensure that the attribute and topological integrity is maintained, so that key fields are not left blank and that the block group boundaries stay within tract boundaries. Metadata will automatically be updated showing who edited the dataset and when they did in the event any questions arise. Currently, a functioning prototype Multi-User Database has been developed for BES at the University of Vermont Spatial Analysis Lab, using Arc SDE and IBM's DB2 Enterprise Database as a back end architecture. This database, which is currently only accessible to those on the UVM campus network, will shortly be migrated to a Linux server where it will be accessible for database connections over the Internet. Passwords can then be handed out to all interested researchers on the project, who will be able to make a database connection through the Geographic Information Systems software interface on their desktop computer. This database will include a very large number of thematic layers. Those layers are currently divided into biophysical, socio-economic and imagery categories. Biophysical includes data on topography, soils, forest cover, habitat areas, hydrology and toxics. Socio-economics includes political and administrative boundaries, transportation and infrastructure networks, property data, census data, household survey data, parks, protected areas, land use/land cover, zoning, public health and historic land use change. Imagery includes a variety of aerial and satellite imagery. See the readme: http://96.56.36.108/geodatabase_SAL/readme.txt See the file listing: http://96.56.36.108/geodatabase_SAL/diroutput.txt
Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">
This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.
| Feature | Description | Range |
|---|---|---|
| 10 Features | Economic, environmental & social indicators | Realistically scaled |
| 300 Cities | Europe, Asia, Americas, Africa, Oceania | Diverse distributions |
| Strong Correlations | Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6) | ML-ready |
| No Missing Values | Clean, preprocessed data | Ready for analysis |
| 4-5 Natural Clusters | Metropolitan hubs, eco-towns, developing centers | Pre-validated |
✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)
# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)
# Analyze
print(df.groupby('cluster').mean())
After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics
| Cluster | Characteristics | Example Cities |
|---|---|---|
| Metropolitan Tech Hubs | High income, density, rent | Silicon Valley, Singapore |
| Eco-Friendly Towns | Low density, clean air, high happiness | Nordic cities |
| Developing Centers | Mid income, high density, poor air | Emerging markets |
| Low-Income Suburban | Low infrastructure, income | Rural areas |
| Industrial Mega-Cities | Very high density, pollution | Manufacturing hubs |
Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code
✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights
This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.
Happy Clustering! 🎉
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication data for the turnout example in Chapter 6 of Spatial Analysis for the Social Sciences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geostatistics analyzes and predicts the values associated with spatial or spatial-temporal phenomena. It incorporates the spatial (and in some cases temporal) coordinates of the data within the analyses. It is a practical means of describing spatial patterns and interpolating values for locations where samples were not taken (and measures the uncertainty of those values, which is critical to informed decision making). This archive contains results of geostatistical analysis of COVID-19 case counts for all available US counties. Test results were obtained with ArcGIS Pro (ESRI). Sources are state health departments, which are scraped and aggregated by the Johns Hopkins Coronavirus Resource Center and then pre-processed by MappingSupport.com.
This update of the Zenodo dataset (version 6) consists of three compressed archives containing geostatistical analyses of SARS-CoV-2 testing data. This dataset utilizes many of the geostatistical techniques used in previous versions of this Zenodo archive, but has been significantly expanded to include analyses of up-to-date U.S. COVID-19 case data (from March 24th to September 8th, 2020):
Archive #1: “1.Geostat. Space-Time analysis of SARS-CoV-2 in the US (Mar24-Sept6).zip” – results of a geostatistical analysis of COVID-19 cases incorporating spatially-weighted hotspots that are conserved over one-week timespans. Results are reported starting from when U.S. COVID-19 case data first became available (March 24th, 2020) for 25 consecutive 1-week intervals (March 24th through to September 6th, 2020). Hotspots, where found, are reported in each individual state, rather than the entire continental United States.
Archive #2: "2.Geostat. Spatial analysis of SARS-CoV-2 in the US (Mar24-Sept8).zip" – the results from geostatistical spatial analyses only of corrected COVID-19 case data for the continental United States, spanning the period from March 24th through September 8th, 2020. The geostatistical techniques utilized in this archive includes ‘Hot Spot’ analysis and ‘Cluster and Outlier’ analysis.
Archive #3: "3.Kriging and Densification of SARS-CoV-2 in LA and MA.zip" – this dataset provides preliminary kriging and densification analysis of COVID-19 case data for certain dates within the U.S. states of Louisiana and Massachusetts.
These archives consist of map files (as both static images and as animations) and data files (including text files which contain the underlying data of said map files [where applicable]) which were generated when performing the following Geostatistical analyses: Hot Spot analysis (Getis-Ord Gi*) [‘Archive #1’: consecutive weeklong Space-Time Hot Spot analysis; ‘Archive #2’: daily Hot Spot Analysis], Cluster and Outlier analysis (Anselin Local Moran's I) [‘Archive #2’], Spatial Autocorrelation (Global Moran's I) [‘Archive #2’], and point-to-point comparisons with Kriging and Densification analysis [‘Archive #3’].
The Word document provided ("Description-of-Archive.Updated-Geostatistical-Analysis-of-SARS-CoV-2 (version 6).docx") details the contents of each file and folder within these three archives and gives general interpretations of these results.
Facebook
TwitterThe files linked to this reference are the geospatial data created as part of the completion of the baseline vegetation inventory project for the NPS park unit. Current format is ArcGIS file geodatabase but older formats may exist as shapefiles. We converted the photointerpreted data into a format usable in a geographic information system (GIS) by employing three fundamental processes: (1) orthorectify, (2) digitize, and (3) develop the geodatabase. All digital map automation was projected in Universal Transverse Mercator (UTM), Zone 16, using the North American Datum of 1983 (NAD83). Orthorectify: We orthorectified the interpreted overlays by using OrthoMapper, a softcopy photogrammetric software for GIS. One function of OrthoMapper is to create orthorectified imagery from scanned and unrectified imagery (Image Processing Software, Inc., 2002). The software features a method of visual orientation involving a point-and-click operation that uses existing orthorectified horizontal and vertical base maps. Of primary importance to us, OrthoMapper also has the capability to orthorectify the photointerpreted overlays of each photograph based on the reference information provided. Digitize: To produce a polygon vector layer for use in ArcGIS (Environmental Systems Research Institute [ESRI], Redlands, California), we converted each raster-based image mosaic of orthorectified overlays containing the photointerpreted data into a grid format by using ArcGIS. In ArcGIS, we used the ArcScan extension to trace the raster data and produce ESRI shapefiles. We digitally assigned map-attribute codes (both map-class codes and physiognomic modifier codes) to the polygons and checked the digital data against the photointerpreted overlays for line and attribute consistency. Ultimately, we merged the individual layers into a seamless layer. Geodatabase: At this stage, the map layer has only map-attribute codes assigned to each polygon. To assign meaningful information to each polygon (e.g., map-class names, physiognomic definitions, links to NVCS types), we produced a feature-class table, along with other supportive tables and subsequently related them together via an ArcGIS Geodatabase. This geodatabase also links the map to other feature-class layers produced from this project, including vegetation sample plots, accuracy assessment (AA) sites, aerial photo locations, and project boundary extent. A geodatabase provides access to a variety of interlocking data sets, is expandable, and equips resource managers and researchers with a powerful GIS tool.
Facebook
TwitterThis project is a component of a broader effort focused on geothermal heating and cooling (GHC) with the aim of illustrating the numerous benefits of incorporating GHC and geothermal heat exchange (GHX) into community energy planning and national decarbonization strategies. To better assist private sector investment, it is currently necessary to define and assess the potential of low-temperature geothermal resources. For shallow GHC/GHX fields, there is no formal compilation of subsurface characteristics shared among industry practitioners that can improve system design and operations. Alaska is specifically noted in this work, because heretofore, it has not received a similar focus in geothermal potential evaluations as the contiguous United States. The methodology consists of leveraging relevant data to generate a baseline geospatial dataset of low-temperature resources (less than 150 degrees C) to compare and analyze information accessible to anyone trying to understand the potential of GHC/GHX and small-scale low-temperature geothermal power in Alaska (e.g., energy modelers, communities, planners, and policymakers). Importantly, this project identifies data related to (1) the evaluation of GHC/GHX in the shallow subsurface, and (2) the evaluation of low-temperature geothermal resource availability. Additionally, data is being compiled to assess repurposing of oil and gas wells to contribute co-produced fluids toward the geothermal direct use and heating and cooling resource potential. In this work we identified new data from three different datasets of isolated geothermal systems in Alaska and bottom-hole temperature data from oil and gas wells that can be leveraged for evaluation of low-temperature geothermal resource potential. The goal of this project is to facilitate future deployment of GHC/GHX analysis and community-led programs and update the low-temperature geothermal resources assessment of Alaska. A better understanding of shallow potential for GHX will improve design and operations of highly efficient GHC systems. The deployment and impact that can be achieved for low-temperature geothermal resources will contribute to decarbonization goals and facilitate widespread electrification by shaving and shifting grid loads. Most of the data uses WGS84 coordinate system. However, each dataset come from different sources and has a metadata file with the original coordinate system.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Have you ever wanted to create your own maps, or integrate and visualize spatial datasets to examine changes in trends between locations and over time? Follow along with these training tutorials on QGIS, an open source geographic information system (GIS) and learn key concepts, procedures and skills for performing common GIS tasks – such as creating maps, as well as joining, overlaying and visualizing spatial datasets. These tutorials are geared towards new GIS users. We’ll start with foundational concepts, and build towards more advanced topics throughout – demonstrating how with a few relatively easy steps you can get quite a lot out of GIS. You can then extend these skills to datasets of thematic relevance to you in addressing tasks faced in your day-to-day work.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper develops a new method for analyzing the relationship between a set of points and another single point, the latter of which we call a reference point.
Facebook
Twitterhttps://creativecommons.org/share-your-work/public-domain/pdmhttps://creativecommons.org/share-your-work/public-domain/pdm
This collection consists of geospatial data layers and summary data at the country and country sub-division levels that are part of USAID's Demographic Health Survey Spatial Data Repository. This collection includes geographically-linked health and demographic data from the DHS Program and the U.S. Census Bureau for mapping in a geographic information system (GIS). The data includes indicators related to: fertility, family planning, maternal and child health, gender, HIV/AIDS, literacy, malaria, nutrition, and sanitation. Each set of files is associated with a specific health survey for a given year for over 90 different countries that were part of the following surveys:Demographic Health Survey (DHS)Malaria Indicator Survey (MIS)Service Provisions Assessment (SPA)Other qualitative surveys (OTH)Individual files are named with identifiers that indicate: country, survey year, survey, and in some cases the name of a variable or indicator. A list of the two-letter country codes is included in a CSV file.Datasets are subdivided into the following folders:Survey boundaries: polygon shapefiles of administrative subdivision boundaries for countries used in specific surveys. Indicator data: polygon shapefiles and geodatabases of countries and subdivisions with 25 of the most common health indicators collected in the DHS. Estimates generated from survey data.Modeled surfaces: geospatial raster files that represent gridded population and health indicators generated from survey data, for several countries.Geospatial covariates: CSV files that link survey cluster locations to ancillary data (known as covariates) that contain data on topics including population, climate, and environmental factors.Population estimates: spreadsheets and polygon shapefiles for countries and subdivisions with 5-year age/sex group population estimates and projections for 2000-2020 from the US Census Bureau, for designated countries in the PEPFAR program.Workshop materials: a tutorial with sample data for learning how to map health data using DHS SDR datasets with QGIS. Documentation that is specific to each dataset is included in the subfolders, and a methodological summary for all of the datasets is included in the root folder as an HTML file. File-level metadata is available for most files. Countries for which data included in the repository include: Afghanistan, Albania, Angola, Armenia, Azerbaijan, Bangladesh, Benin, Bolivia, Botswana, Brazil, Burkina Faso, Burundi, Cape Verde, Cambodia, Cameroon, Central African Republic, Chad, Colombia, Comoros, Congo, Congo (Democratic Republic of the), Cote d'Ivoire, Dominican Republic, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Eswatini (Swaziland), Ethiopia, Gabon, Gambia, Ghana, Guatemala, Guinea, Guyana, Haiti, Honduras, India, Indonesia, Jordan, Kazakhstan, Kenya, Kyrgyzstan, Lesotho, Liberia, Madagascar, Malawi, Maldives, Mali, Mauritania, Mexico, Moldova, Morocco, Mozambique, Myanmar, Namibia, Nepal, Nicaragua, Niger, Nigeria, Pakistan, Papua New Guinea, Paraguay, Peru, Philippines, Russia, Rwanda, Samoa, Sao Tome and Principe, Senegal, Sierra Leone, South Africa, Sri Lanka, Sudan, Tajikistan, Tanzania, Thailand, Timor-Leste, Togo, Trinidad and Tobago, Tunisia, Turkey, Turkmenistan, Uganda, Ukraine, Uzbekistan, Viet Nam, Yemen, Zambia, Zimbabwe
Facebook
Twitterhttps://www.icpsr.umich.edu/web/ICPSR/studies/8379/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/8379/terms
This dataset consists of cartographic data in digital line graph (DLG) form for the northeastern states (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island and Vermont). Information is presented on two planimetric base categories, political boundaries and administrative boundaries, each available in two formats: the topologically structured format and a simpler format optimized for graphic display. These DGL data can be used to plot base maps and for various kinds of spatial analysis. They may also be combined with other geographically referenced data to facilitate analysis, for example the Geographic Names Information System.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An example of voter data with real, Zip4, Street Segment, Census Block Group, and Zip centroid details.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This resource contains the pre-computed land indication datasets (also known as the "Prior" datasets) developed for the preprint "Methodological Framework for Determining the Land Eligibility of Renewable Energy Sources" [1] and evaluated in the publication "Evaluating Land Eligibility Constraints of Renewable Energy Sources in Europe" [2]. Please cite these sources if this data is used in published works. Note that the original sources from which these datasets were processed are also indicated in the README.txt file, which should also be credited.
The main aspects of the Prior datasets are: 1. Each Prior dataset is a raster file covering the European domain (see note N1) * Spatial reference system is set as "EPSG:3035" * Spatial resolution is 100 meters * Datatype is "UInt8" 2. Each dataset represents exactly one geospatial criteria * Example: "road_proximity" = The distance of each pixel from the nearest roadway 3. Pixel values represent edge indexes, rather than the explicit values themselves * This was chosen to conserve storage space * Example: For "road_proximity"... "Index Value" "Edge Value" 0 = "within 0 meters" 1 = "within 50 meters" 2 = "within 100 meters" 3 = "within 200 meters" 4. Each prior dataset represents a processed view of the fundamental data source, therefore if any of this data is used in a published work the fundamental source should also be cited
View README.txt for more information
The following datasets are available with this resource: 1. agriculture_arable_proximity 2. agriculture_heterogeneous_proximity 3. agriculture_pasture_proximity 4. agriculture_permanent_crop_proximity 5. agriculture_proximity 6. airfield_proximity 7. airport_proximity 8. camping_proximity 9. dni_threshold 10. elevation_threshold 11. ghi_threshold 12. industrial_proximity 13. lake_proximity 14. leisure_proximity 15. mining_proximity 16. ocean_proximity 17. power_line_proximity 18. connection_distance 19. protected_biosphere_proximity 20. protected_bird_proximity 21. protected_habitat_proximity 22. protected_landscape_proximity 23. protected_natural_monument_proximity 24. protected_park_proximity 25. protected_reserve_proximity 26. protected_wilderness_proximity 27. railway_proximity 28. river_proximity 29. roads_main_proximity 30. roads_proximity 31. access_distance 32. roads_secondary_proximity 33. sand_proximity 34. settlement_proximity 35. settlement_urban_proximity 36. slope_north_facing_threshold 37. slope_threshold 38. touristic_proximity 39. waterbody_proximity 40. wetland_proximity 41. windspeed_100m_threshold 42. windspeed_50m_threshold 43. woodland_coniferous_proximity 44. woodland_deciduous_proximity 45. woodland_mixed_proximity 46. woodland_proximity
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Spatial Transcriptomics (ST) data matching with Matrix Assisted Laser Desorption/Ionization - Mass Spetrometry Imaging (MALDI-MSI). This data is complementary to data contained in the same project. FIles with the same identifiers in the two datasets originated from the very same tissue section and can be combined in a multimodal ST-MSI object. For more information about the dataset please see our manuscript posted on BioRxiv (doi: https://doi.org/10.1101/2023.01.26.525195). This dataset includes ST data from 19 tissue sections, including human post-mortem and mouse samples. The spatial transcriptomics data was generated using the Visium protocol (10x Genomics). The murine tissue sections come from three different mice unilaterally injected with 6-OHDA. 6-OHDA is a neurotoxin that when injected in the brain can selectively destroy dopaminergic neurons. We used this mouse model to show the applicability of the technology that we developed, named Spatial Multimodal Analysis (SMA). Using our technology on these mouse brain tissue sections we were able to detect both dopamine with MALDI-MSI and the corresponding gene expression with ST. This dataset includes also one human post-mortem striatum sample that was placed on one Visium slide across the four capture areas. This sample was analyzed with a different ST protocol named RRST (Mirzazadeh, R., Andrusivova, Z., Larsson, L. et al. Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples. Nat Commun 14, 509 (2023). https://doi.org/10.1038/s41467-023-36071-5), where probes capturing the whole transcriptome are first hybridized in the tissue section and then spatially detected. Each tissue section contained in the dataset has been given a unique identifier that is composed of the Visium array ID and capture area ID of the Visium slide that the tissue section was placed on. This unique identifier is included in the file names of all the files relative to the same tissue section, including the MALDI-MSI files published in the other dataset included in this project. In this dataset you will find the following files for each tissue section: - raw files: these are the read one fastq files (containing the pattern *R1*fastq.gz in the file name), read two fastq files (containing the pattern *R1*fastq.gz in the file name) and the raw microscope images (containing the pattern Spot.jpg in the file name). These are the only files needed to run the Space Ranger pipeline, which is freely available for any user (please see the 10x Genomics website for information on how to install and run Space Ranger); - processed data files: we provide processed data files of two types: a) Space Ranger outputs that were used to produce the figures in our publication; b) manual annotation tables in csv format produced using Loupe Browser 6 (csv tables with file names ending _RegionLoupe.csv, _filter.csv, _dopamine.csv, _lesion.csv, _region.csv patterns); c) json files that we used as input for Space Ranger in the cases where the automatic tissue detection included in the pipeline failed to recognize the tissue or the fiducials. Using these processed files the user can reproduce the figures of our publication without having to restart from the raw data files. The MALDI-MSI analyses preceding ST was performed with different matrices in different tissue section. We used 1) 9-aminoacridine (9-AA) for detection of metabolites in negative ionization mode, 2) 2,5-dihydroxybenzoic acid (DHB) for detection of metabolites in positive ionization mode, 3) 4-(anthracen-9-yl)-2-fluoro-1-ethylpyridin-1-ium iodide (FMP-10), which charge-tags molecules with phenolic hydroxyls and/or primary amines, including neurotransmitters. The information about which matrix was sprayed on the tissue sections and other information about the samples is included in the metadata table. We also used three types of control samples: - standard Visium: samples processed with standard Visium (i.e. no matrix spraying, no MALDI-MSI, protocol as recommended by 10x Gemomics with no exeptions) - internal controls (iCTRL): samples not sprayed with any matrix, neither processed with MALDI-MSI, but located on the same Visium slide were other samples were processed with MALDI-MSI - FMP-10-iCTRL: sample sprayed with FMP-10, and then processed as an iCTRL. This and other information is provided in the metadata table.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
I wanted to make some geospatial visualizations to convey the current severity of COVID19 in different parts of the U.S..
I liked the NYTimes COVID dataset, but it was lacking information on county boundary shape data, population per county, new cases / deaths per day, and per capita calculations, and county demographics.
After a lot of work tracking down the different data sources I wanted and doing all of the data wrangling and joins in python, I wanted to open-source the final enriched data set in order to give others a head start in their COVID-19 related analytic, modeling, and visualization efforts.
This dataset is enriched with county shapes, county center point coordinates, 2019 census population estimates, county population densities, cases and deaths per capita, and calculated per day cases / deaths metrics. It contains daily data per county back to January, allowing for analyizng changes over time.
UPDATE: I have also included demographic information per county, including ages, races, and gender breakdown. This could help determine which counties are most susceptible to an outbreak.
Geospatial analysis and visualization - Which counties are currently getting hit the hardest (per capita and totals)? - What patterns are there in the spread of the virus across counties? (network based spread simulations using county center lat / lons) -county population densities play a role in how quickly the virus spreads? -how does a specific county/state cases and deaths compare to other counties/states? Join with other county level datasets easily (with fips code column)
See the column descriptions for more details on the dataset
COVID-19 U.S. Time-lapse: Confirmed Cases per County (per capita)
https://github.com/ringhilterra/enriched-covid19-data/blob/master/example_viz/covid-cases-final-04-06.gif?raw=true" alt="">-
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Microsoft released a U.S.-wide vector building dataset in 2018. Although the vector building layers provide relatively accurate geometries, their use in large-extent geospatial analysis comes at a high computational cost. We used High-Performance Computing (HPC) to develop an algorithm that calculates six summary values for each cell in a raster representation of each U.S. state, excluding Alaska and Hawaii: (1) total footprint coverage, (2) number of unique buildings intersecting each cell, (3) number of building centroids falling inside each cell, and area of the (4) average, (5) smallest, and (6) largest area of buildings that intersect each cell. These values are represented as raster layers with 30 m cell size covering the 48 conterminous states. We also identify errors in the original building dataset. We evaluate precision and recall in the data for three large U.S. urban areas. Precision is high and comparable to results reported by Microsoft while recall is high for buildings with footprints larger than 200 m2 but lower for progressively smaller buildings.
Building footprints are a critical environmental descriptor. Microsoft produced a U.S.-wide vector building dataset in 20181 that was generated from aerial images available to Bing Maps using deep learning methods for object classification2. The main goal of this product has been to increase the coverage of building footprints available for OpenStreetMap. Microsoft identified building footprints in two phases; first, using semantic segmentation to identify building pixels from aerial imagery using Deep Neural Networks and second, converting building pixel blobs into polygons. The final dataset includes 125,192,184 building footprint polygon geometries in GeoJSON vector format, covering all 50 U.S. States, with data for each state distributed separately. These data have 99.3% precision and 93.5% pixel recall accuracy2. Temporal resolution of the data (i.e., years of the aerial imagery used to derive the data) are not provided by Microsoft in the metadata.
Using vector layers for large-extent (i.e., national or state-level) spatial analysis and modelling (e.g., mapping the Wildland-Urban Interface, flood and coastal hazards, or large-extent urban typology modelling) is challenging in practice. Although vector data provide accurate geometries, incorporating them in large-extent spatial analysis comes at a high computational cost. We used High Performance Computing (HPC) to develop an algorithm that calculates six summary statistics (described below) for buildings at 30-m cell size in the 48 conterminous U.S. states, to better support national-scale and multi-state modelling that requires building footprint data. To develop these six derived products from the Microsoft buildings dataset, we created an algorithm that took every single building and built a small meshgrid (a 2D array) for the bounding box of the building and calculated unique values for each cell of the meshgrid. This grid structure is aligned with National Land Cover Database (NLCD) products (projected using Albers Equal Area Conic system), enabling researchers to combine or compare our products with standard national-scale datasets such as land cover, tree canopy cover, and urban imperviousness3.
Locations, shapes, and distribution patterns of structures in urban and rural areas are the subject of many studies. Buildings represent the density of built up areas as an indicator of urban morphology or spatial structures of cities and metropolitan areas4,5. In local studies, the use of vector data types is easier6,7. However, in regional and national studies a raster dataset would be more preferable. For example in measuring the spatial structure of metropolitan areas a rasterized building layer would be more useful than the original vector datasets8.
Our output raster products are: (1) total building footprint coverage per cell (m2 of building footprint per 900 m2 cell); (2) number of buildings that intersect each cell; (3) number of building centroids falling within each cell; (4) area of the largest building intersecting each cell (m2); (5) area of the smallest building intersecting each cell (m2); and (6) average area of all buildings intersecting each cell (m2). The last three area metrics include building area that falls outside the cell but where part of the building intersects the cell (Fig. 1). These values can be used to describe the intensity and typology of the built environment.
Our software is available through U.S. Geological Survey code r...
Facebook
TwitterOur model is a full-annual-cycle population model {hostetler2015full} that tracks groups of bat surviving through four seasons: breeding season/summer, fall migration, non-breeding/winter, and spring migration. Our state variables are groups of bats that use a specific maternity colony/breeding site and hibernaculum/non-breeding site. Bats are also accounted for by life stages (juveniles/first-year breeders versus adults) and seasonal habitats (breeding versus non-breeding) during each year, This leads to four states variable (here depicted in vector notation): the population of juveniles during the non-breeding season, the population of adults during the non-breeding season, the population of juveniles during the breeding season, and the population of adults during the breeding season, Each vector's elements depict a specific migratory pathway, e.g., is comprised of elements, {non-breeding sites}, {breeding sites}The variables may be summed by either breeding site or non-breeding site to calculate the total population using a specific geographic location. Within our code, we account for this using an index column for breeding sites and an index column for non-breeding sides within the data table. Our choice of state variables caused the time step (i.e. (t)) to be 1 year. However, we recorded the population of each group during the breeding and non-breeding season as an artifact of our state-variable choice. We choose these state variables partially for their biological information and partially to simplify programming. We ran our simulation for 30 years because the USFWS currently issues Indiana Bat take permits for 30 years. Our model covers the range of the Indiana Bat, which is approximately the eastern half of the contiguous United States (Figure \ref{fig:BatInput}). The boundaries of our range was based upon the United States boundary, the NatureServe Range map, and observations of the species. The maximum migration distance was 500-km, which was based upon field observations reported in the literature \citep{gardner2002seasonal, winhold2006aspects}. The landscape was covered with approximately 33,000, 6475-ha grid cells and the grid size was based upon management considerations. The U.S.~Fish and Wildlife Service considers a 2.5 mile radius around a known maternity colony to be its summer habitat range and all of the hibernaculum within a 2.5 miles radius to be a single management unit. Hence the choice of 5-by-5 square grids (25 miles(^2) or 6475 ha). Each group of bats within the model has a summer and winter grid cell as well as a pathway connecting the cells. It is possible for a group to be in the cell for both seasons, but improbable for females (which we modeled). The straight line between summer and winter cells were buffered with different distances (1-km, 2-km, 10-km, 20-km, 100-km, and 200-km) as part of the turbine sensitivity and uncertainty analysis. We dropped the largest two buffer sizes during the model development processes because they were biologically unrealistic and including them caused all populations to go extinct all of the time. Note a 1-km buffer would be a 2-km wide path. An example of two pathways are included in Figure \ref{fig:BatPath}. The buffers accounts for bats not migrating in a straight line. If we had precise locations for all summer maternity colonies, other approaches such as Circuitscape \citep{hanks2013circuit} could have been used to model migration routes and this would have reduced migration uncertainty.
Facebook
TwitterReplication data for the higher education spending example in Chapter 6 of Spatial Analysis for the Social Sciences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.