Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Replication data for the turnout example in Chapter 6 of Spatial Analysis for the Social Sciences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper presents the importance of simple spatial statistics techniques applied in positional quality control of spatial data. To this end, Analysis methods of point data spatial distribution pattern are presented, as well as bias analysis in the positional discrepancies samples. To evaluate the points spatial distribution Nearest Neighbor and Ripley's K function methods were used. As for bias analysis, the average directional vectors of discrepancies and the circular variance were used. A methodology for positional quality control of spatial data is proposed, in which includes sampling planning and its spatial distribution pattern evaluation, analyzing the data normality through the application of bias tests, and positional accuracy classification according to a standard. For the practical experiment, an orthoimage generated from a PRISM scene of the ALOS satellite was evaluated. Results showed that the orthoimage is accurate on a scale of 1:25,000, being classified as Class A according to the Brazilian standard positional accuracy, not showing bias at the coordinates. The main contribution of this work is the incorporation of spatial statistics techniques in cartographic quality control.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Geographic Information System Analytics Market Size 2024-2028
The geographic information system analytics market size is forecast to increase by USD 12 billion at a CAGR of 12.41% between 2023 and 2028.
The GIS Analytics Market analysis is experiencing significant growth, driven by the increasing need for efficient land management and emerging methods in data collection and generation. The defense industry's reliance on geospatial technology for situational awareness and real-time location monitoring is a major factor fueling market expansion. Additionally, the oil and gas industry's adoption of GIS for resource exploration and management is a key trend. Building Information Modeling (BIM) and smart city initiatives are also contributing to market growth, as they require multiple layered maps for effective planning and implementation. The Internet of Things (IoT) and Software as a Service (SaaS) are transforming GIS analytics by enabling real-time data processing and analysis.
Augmented reality is another emerging trend, as it enhances the user experience and provides valuable insights through visual overlays. Overall, heavy investments are required for setting up GIS stations and accessing data sources, making this a promising market for technology innovators and investors alike.
What will be the Size of the GIS Analytics Market during the forecast period?
Request Free Sample
The geographic information system analytics market encompasses various industries, including government sectors, agriculture, and infrastructure development. Smart city projects, building information modeling, and infrastructure development are key areas driving market growth. Spatial data plays a crucial role in sectors such as transportation, mining, and oil and gas. Cloud technology is transforming GIS analytics by enabling real-time data access and analysis. Startups are disrupting traditional GIS markets with innovative location-based services and smart city planning solutions. Infrastructure development in sectors like construction and green buildings relies on modern GIS solutions for efficient planning and management. Smart utilities and telematics navigation are also leveraging GIS analytics for improved operational efficiency.
GIS technology is essential for zoning and land use management, enabling data-driven decision-making. Smart public works and urban planning projects utilize mapping and geospatial technology for effective implementation. Surveying is another sector that benefits from advanced GIS solutions. Overall, the GIS analytics market is evolving, with a focus on providing actionable insights to businesses and organizations.
How is this Geographic Information System Analytics Industry segmented?
The geographic information system analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.
End-user
Retail and Real Estate
Government
Utilities
Telecom
Manufacturing and Automotive
Agriculture
Construction
Mining
Transportation
Healthcare
Defense and Intelligence
Energy
Education and Research
BFSI
Components
Software
Services
Deployment Modes
On-Premises
Cloud-Based
Applications
Urban and Regional Planning
Disaster Management
Environmental Monitoring Asset Management
Surveying and Mapping
Location-Based Services
Geospatial Business Intelligence
Natural Resource Management
Geography
North America
US
Canada
Europe
France
Germany
UK
APAC
China
India
South Korea
Middle East and Africa
UAE
South America
Brazil
Rest of World
By End-user Insights
The retail and real estate segment is estimated to witness significant growth during the forecast period.
The GIS analytics market analysis is witnessing significant growth due to the increasing demand for advanced technologies in various industries. In the retail sector, for instance, retailers are utilizing GIS analytics to gain a competitive edge by analyzing customer demographics and buying patterns through real-time location monitoring and multiple layered maps. The retail industry's success relies heavily on these insights for effective marketing strategies. Moreover, the defense industries are integrating GIS analytics into their operations for infrastructure development, permitting, and public safety. Building Information Modeling (BIM) and 4D GIS software are increasingly being adopted for construction project workflows, while urban planning and designing require geospatial data for smart city planning and site selection.
The oil and gas industry is leveraging satellite imaging and IoT devices for land acquisition and mining operations. In the public sector, gover
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Geostatistics analyzes and predicts the values associated with spatial or spatial-temporal phenomena. It incorporates the spatial (and in some cases temporal) coordinates of the data within the analyses. It is a practical means of describing spatial patterns and interpolating values for locations where samples were not taken (and measures the uncertainty of those values, which is critical to informed decision making). This archive contains results of geostatistical analysis of COVID-19 case counts for all available US counties. Test results were obtained with ArcGIS Pro (ESRI). Sources are state health departments, which are scraped and aggregated by the Johns Hopkins Coronavirus Resource Center and then pre-processed by MappingSupport.com.
This update of the Zenodo dataset (version 6) consists of three compressed archives containing geostatistical analyses of SARS-CoV-2 testing data. This dataset utilizes many of the geostatistical techniques used in previous versions of this Zenodo archive, but has been significantly expanded to include analyses of up-to-date U.S. COVID-19 case data (from March 24th to September 8th, 2020):
Archive #1: “1.Geostat. Space-Time analysis of SARS-CoV-2 in the US (Mar24-Sept6).zip” – results of a geostatistical analysis of COVID-19 cases incorporating spatially-weighted hotspots that are conserved over one-week timespans. Results are reported starting from when U.S. COVID-19 case data first became available (March 24th, 2020) for 25 consecutive 1-week intervals (March 24th through to September 6th, 2020). Hotspots, where found, are reported in each individual state, rather than the entire continental United States.
Archive #2: "2.Geostat. Spatial analysis of SARS-CoV-2 in the US (Mar24-Sept8).zip" – the results from geostatistical spatial analyses only of corrected COVID-19 case data for the continental United States, spanning the period from March 24th through September 8th, 2020. The geostatistical techniques utilized in this archive includes ‘Hot Spot’ analysis and ‘Cluster and Outlier’ analysis.
Archive #3: "3.Kriging and Densification of SARS-CoV-2 in LA and MA.zip" – this dataset provides preliminary kriging and densification analysis of COVID-19 case data for certain dates within the U.S. states of Louisiana and Massachusetts.
These archives consist of map files (as both static images and as animations) and data files (including text files which contain the underlying data of said map files [where applicable]) which were generated when performing the following Geostatistical analyses: Hot Spot analysis (Getis-Ord Gi*) [‘Archive #1’: consecutive weeklong Space-Time Hot Spot analysis; ‘Archive #2’: daily Hot Spot Analysis], Cluster and Outlier analysis (Anselin Local Moran's I) [‘Archive #2’], Spatial Autocorrelation (Global Moran's I) [‘Archive #2’], and point-to-point comparisons with Kriging and Densification analysis [‘Archive #3’].
The Word document provided ("Description-of-Archive.Updated-Geostatistical-Analysis-of-SARS-CoV-2 (version 6).docx") details the contents of each file and folder within these three archives and gives general interpretations of these results.
Facebook
TwitterReplication data for the higher education spending example in Chapter 6 of Spatial Analysis for the Social Sciences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the data required to reproduce all analyses presented for the manuscript:
MuSpAn: A Toolbox for Multiscale Spatial Analysis
The data is organised into two main folders:
domains_for_figs_2_to_6 (MuSpAn domains)
Four domains of increasing size from regions within a healthy mouse colon (10x Genomics Colon Atlas panel).
Four samples of AKPT mouse tumors (10x Genomics 480 custom panel).
misc_checkpoint_data (Metadata - analysis checkpointing)
Colormap dictionaries for consistent visualization with the published figures.
Checkpointing files to support analyses requiring extended computation times.
Annotation data used for MuSpAn labeling.
The MuSpAn domains were created and saved using v1.2.0 of MuSpAn. This data is to be used with the associate python notebooks which can be found at:
https://github.com/joshwillmoore1/Supporting_material_muspan_paper
These notebooks both reproduce the analysis conducted in the study and serve as example material for MuSpAn usage, fully explained and linked to relevent documentation.
Facebook
TwitterReplication data for the poverty rates example in Chapter 4 of Spatial Analysis for the Social Sciences.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).
It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:
In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).
Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).
After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.
Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).
Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.
On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).
Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).
Validation set
Model
True
False
Presence
A
B
Background
C
D
We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).
The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.
Regarding the model evaluation and estimation, we selected the following estimators:
1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).
2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Have you ever wanted to create your own maps, or integrate and visualize spatial datasets to examine changes in trends between locations and over time? Follow along with these training tutorials on QGIS, an open source geographic information system (GIS) and learn key concepts, procedures and skills for performing common GIS tasks – such as creating maps, as well as joining, overlaying and visualizing spatial datasets. These tutorials are geared towards new GIS users. We’ll start with foundational concepts, and build towards more advanced topics throughout – demonstrating how with a few relatively easy steps you can get quite a lot out of GIS. You can then extend these skills to datasets of thematic relevance to you in addressing tasks faced in your day-to-day work.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper provides an abstract analysis of parallel processing strategies for spatial and spatio-temporal data. It isolates aspects such as data locality and computational locality as well as redundancy and locally sequential access as central elements of parallel algorithm design for spatial data. Furthermore, the paper gives some examples from simple and advanced GIS and spatial data analysis highlighting both that big data systems have been around long before the current hype of big data and that they follow some design principles which are inevitable for spatial data including distributed data structures and messaging, which are, however, incompatible with the popular MapReduce paradigm. Throughout this discussion, the need for a replacement or extension of the MapReduce paradigm for spatial data is derived. This paradigm should be able to deal with the imperfect data locality inherent to spatial data hindering full independence of non-trivial computational tasks. We conclude that more research is needed and that spatial big data systems should pick up more concepts like graphs, shortest paths, raster data, events, and streams at the same time instead of solving exactly the set of spatially separable problems such as line simplifications or range queries in manydifferent ways.
Facebook
TwitterThe establishment of a BES Multi-User Geodatabase (BES-MUG) allows for the storage, management, and distribution of geospatial data associated with the Baltimore Ecosystem Study. At present, BES data is distributed over the internet via the BES website. While having geospatial data available for download is a vast improvement over having the data housed at individual research institutions, it still suffers from some limitations. BES-MUG overcomes these limitations; improving the quality of the geospatial data available to BES researches, thereby leading to more informed decision-making. BES-MUG builds on Environmental Systems Research Institute's (ESRI) ArcGIS and ArcSDE technology. ESRI was selected because its geospatial software offers robust capabilities. ArcGIS is implemented agency-wide within the USDA and is the predominant geospatial software package used by collaborating institutions. Commercially available enterprise database packages (DB2, Oracle, SQL) provide an efficient means to store, manage, and share large datasets. However, standard database capabilities are limited with respect to geographic datasets because they lack the ability to deal with complex spatial relationships. By using ESRI's ArcSDE (Spatial Database Engine) in conjunction with database software, geospatial data can be handled much more effectively through the implementation of the Geodatabase model. Through ArcSDE and the Geodatabase model the database's capabilities are expanded, allowing for multiuser editing, intelligent feature types, and the establishment of rules and relationships. ArcSDE also allows users to connect to the database using ArcGIS software without being burdened by the intricacies of the database itself. For an example of how BES-MUG will help improve the quality and timeless of BES geospatial data consider a census block group layer that is in need of updating. Rather than the researcher downloading the dataset, editing it, and resubmitting to through ORS, access rules will allow the authorized user to edit the dataset over the network. Established rules will ensure that the attribute and topological integrity is maintained, so that key fields are not left blank and that the block group boundaries stay within tract boundaries. Metadata will automatically be updated showing who edited the dataset and when they did in the event any questions arise. Currently, a functioning prototype Multi-User Database has been developed for BES at the University of Vermont Spatial Analysis Lab, using Arc SDE and IBM's DB2 Enterprise Database as a back end architecture. This database, which is currently only accessible to those on the UVM campus network, will shortly be migrated to a Linux server where it will be accessible for database connections over the Internet. Passwords can then be handed out to all interested researchers on the project, who will be able to make a database connection through the Geographic Information Systems software interface on their desktop computer. This database will include a very large number of thematic layers. Those layers are currently divided into biophysical, socio-economic and imagery categories. Biophysical includes data on topography, soils, forest cover, habitat areas, hydrology and toxics. Socio-economics includes political and administrative boundaries, transportation and infrastructure networks, property data, census data, household survey data, parks, protected areas, land use/land cover, zoning, public health and historic land use change. Imagery includes a variety of aerial and satellite imagery. See the readme: http://96.56.36.108/geodatabase_SAL/readme.txt See the file listing: http://96.56.36.108/geodatabase_SAL/diroutput.txt
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset constitutes an introduction to plotting and mapping and the essential concepts of spatial data management and modeling. And data ready for several examples of regression and classification algorithms (Multiple Linear Regression, Generalized Linear Models, CART and Random Forest), also exploring classic interpolation methods (Inverse Distance Weighting and Kriging).
Facebook
TwitterLearn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets
Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.
Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.
airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).windvectors.csv, annual-precip.json).This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Map (1:10m) | us-10m.json | 627 KB | TopoJSON | CC-BY-4.0 | US state and county boundaries. Contains states and counties objects. Ideal for choropleths. | id (FIPS code) property on geometries |
| World Map (1:110m) | world-110m.json | 117 KB | TopoJSON | CC-BY-4.0 | World country boundaries. Contains countries object. Suitable for world-scale viz. | id property on geometries |
| London Boroughs | londonBoroughs.json | 14 KB | TopoJSON | CC-BY-4.0 | London borough boundaries. | properties.BOROUGHN (name) |
| London Centroids | londonCentroids.json | 2 KB | GeoJSON | CC-BY-4.0 | Center points for London boroughs. | properties.id, properties.name |
| London Tube Lines | londonTubeLines.json | 78 KB | GeoJSON | CC-BY-4.0 | London Underground network lines. | properties.name, properties.color |
| Dataset | File | Size | Format | License | Description | Key Fields / Join Info |
|---|---|---|---|---|---|---|
| US Airports | airports.csv | 205 KB | CSV | Public Domain | US airports with codes and coordinates. | iata, state, `l... |
Facebook
TwitterReplication data for the blue states versus red states example in Chapter 4 of Spatial Analysis for the Social Sciences.
Facebook
TwitterAttribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
I wanted to make some geospatial visualizations to convey the current severity of COVID19 in different parts of the U.S..
I liked the NYTimes COVID dataset, but it was lacking information on county boundary shape data, population per county, new cases / deaths per day, and per capita calculations, and county demographics.
After a lot of work tracking down the different data sources I wanted and doing all of the data wrangling and joins in python, I wanted to open-source the final enriched data set in order to give others a head start in their COVID-19 related analytic, modeling, and visualization efforts.
This dataset is enriched with county shapes, county center point coordinates, 2019 census population estimates, county population densities, cases and deaths per capita, and calculated per day cases / deaths metrics. It contains daily data per county back to January, allowing for analyizng changes over time.
UPDATE: I have also included demographic information per county, including ages, races, and gender breakdown. This could help determine which counties are most susceptible to an outbreak.
Geospatial analysis and visualization - Which counties are currently getting hit the hardest (per capita and totals)? - What patterns are there in the spread of the virus across counties? (network based spread simulations using county center lat / lons) -county population densities play a role in how quickly the virus spreads? -how does a specific county/state cases and deaths compare to other counties/states? Join with other county level datasets easily (with fips code column)
See the column descriptions for more details on the dataset
COVID-19 U.S. Time-lapse: Confirmed Cases per County (per capita)
https://github.com/ringhilterra/enriched-covid19-data/blob/master/example_viz/covid-cases-final-04-06.gif?raw=true" alt="">-
Facebook
TwitterThe files linked to this reference are the geospatial data created as part of the completion of the baseline vegetation inventory project for the NPS park unit. Current format is ArcGIS file geodatabase but older formats may exist as shapefiles. We converted the photointerpreted data into a format usable in a geographic information system (GIS) by employing three fundamental processes: (1) orthorectify, (2) digitize, and (3) develop the geodatabase. All digital map automation was projected in Universal Transverse Mercator (UTM), Zone 16, using the North American Datum of 1983 (NAD83). Orthorectify: We orthorectified the interpreted overlays by using OrthoMapper, a softcopy photogrammetric software for GIS. One function of OrthoMapper is to create orthorectified imagery from scanned and unrectified imagery (Image Processing Software, Inc., 2002). The software features a method of visual orientation involving a point-and-click operation that uses existing orthorectified horizontal and vertical base maps. Of primary importance to us, OrthoMapper also has the capability to orthorectify the photointerpreted overlays of each photograph based on the reference information provided. Digitize: To produce a polygon vector layer for use in ArcGIS (Environmental Systems Research Institute [ESRI], Redlands, California), we converted each raster-based image mosaic of orthorectified overlays containing the photointerpreted data into a grid format by using ArcGIS. In ArcGIS, we used the ArcScan extension to trace the raster data and produce ESRI shapefiles. We digitally assigned map-attribute codes (both map-class codes and physiognomic modifier codes) to the polygons and checked the digital data against the photointerpreted overlays for line and attribute consistency. Ultimately, we merged the individual layers into a seamless layer. Geodatabase: At this stage, the map layer has only map-attribute codes assigned to each polygon. To assign meaningful information to each polygon (e.g., map-class names, physiognomic definitions, links to NVCS types), we produced a feature-class table, along with other supportive tables and subsequently related them together via an ArcGIS Geodatabase. This geodatabase also links the map to other feature-class layers produced from this project, including vegetation sample plots, accuracy assessment (AA) sites, aerial photo locations, and project boundary extent. A geodatabase provides access to a variety of interlocking data sets, is expandable, and equips resource managers and researchers with a powerful GIS tool.
Facebook
TwitterThe OneMap template can be used to connect multiple organizations to collaborate and share with internal and external stakeholders.Today, organizations must work beyond borders, jurisdictions, and sectors to address shared challenges. Collaboration is key whether you call your initiative a Spatial Data Infrastructure (SDI), Open Data, Digital Twin, Knowledge Infrastructure, Digital Ecosystem, or otherwise. The term ‘OneMap’ is a placeholder for your community GIS branding.View example hubsDiscover good practice guides and implementation patternsLearn more about integrated geospatial infrastructureThe 'OneMap' Hub concept is multi-organizational. The website is designed to help communities of practice jumpstart your initiatives. Use it to share and collaborate, provide focus on thematic topics, and more.This item is available to ArcGIS Hub Basic and Premium licensed organizations.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">
This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.
| Feature | Description | Range |
|---|---|---|
| 10 Features | Economic, environmental & social indicators | Realistically scaled |
| 300 Cities | Europe, Asia, Americas, Africa, Oceania | Diverse distributions |
| Strong Correlations | Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6) | ML-ready |
| No Missing Values | Clean, preprocessed data | Ready for analysis |
| 4-5 Natural Clusters | Metropolitan hubs, eco-towns, developing centers | Pre-validated |
✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)
# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)
# Analyze
print(df.groupby('cluster').mean())
After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics
| Cluster | Characteristics | Example Cities |
|---|---|---|
| Metropolitan Tech Hubs | High income, density, rent | Silicon Valley, Singapore |
| Eco-Friendly Towns | Low density, clean air, high happiness | Nordic cities |
| Developing Centers | Mid income, high density, poor air | Emerging markets |
| Low-Income Suburban | Low infrastructure, income | Rural areas |
| Industrial Mega-Cities | Very high density, pollution | Manufacturing hubs |
Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code
✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights
This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.
Happy Clustering! 🎉
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Spatial Transcriptomics (ST) data matching with Matrix Assisted Laser Desorption/Ionization - Mass Spetrometry Imaging (MALDI-MSI). This data is complementary to data contained in the same project. FIles with the same identifiers in the two datasets originated from the very same tissue section and can be combined in a multimodal ST-MSI object. For more information about the dataset please see our manuscript posted on BioRxiv (doi: https://doi.org/10.1101/2023.01.26.525195). This dataset includes ST data from 19 tissue sections, including human post-mortem and mouse samples. The spatial transcriptomics data was generated using the Visium protocol (10x Genomics). The murine tissue sections come from three different mice unilaterally injected with 6-OHDA. 6-OHDA is a neurotoxin that when injected in the brain can selectively destroy dopaminergic neurons. We used this mouse model to show the applicability of the technology that we developed, named Spatial Multimodal Analysis (SMA). Using our technology on these mouse brain tissue sections we were able to detect both dopamine with MALDI-MSI and the corresponding gene expression with ST. This dataset includes also one human post-mortem striatum sample that was placed on one Visium slide across the four capture areas. This sample was analyzed with a different ST protocol named RRST (Mirzazadeh, R., Andrusivova, Z., Larsson, L. et al. Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples. Nat Commun 14, 509 (2023). https://doi.org/10.1038/s41467-023-36071-5), where probes capturing the whole transcriptome are first hybridized in the tissue section and then spatially detected. Each tissue section contained in the dataset has been given a unique identifier that is composed of the Visium array ID and capture area ID of the Visium slide that the tissue section was placed on. This unique identifier is included in the file names of all the files relative to the same tissue section, including the MALDI-MSI files published in the other dataset included in this project. In this dataset you will find the following files for each tissue section: - raw files: these are the read one fastq files (containing the pattern *R1*fastq.gz in the file name), read two fastq files (containing the pattern *R1*fastq.gz in the file name) and the raw microscope images (containing the pattern Spot.jpg in the file name). These are the only files needed to run the Space Ranger pipeline, which is freely available for any user (please see the 10x Genomics website for information on how to install and run Space Ranger); - processed data files: we provide processed data files of two types: a) Space Ranger outputs that were used to produce the figures in our publication; b) manual annotation tables in csv format produced using Loupe Browser 6 (csv tables with file names ending _RegionLoupe.csv, _filter.csv, _dopamine.csv, _lesion.csv, _region.csv patterns); c) json files that we used as input for Space Ranger in the cases where the automatic tissue detection included in the pipeline failed to recognize the tissue or the fiducials. Using these processed files the user can reproduce the figures of our publication without having to restart from the raw data files. The MALDI-MSI analyses preceding ST was performed with different matrices in different tissue section. We used 1) 9-aminoacridine (9-AA) for detection of metabolites in negative ionization mode, 2) 2,5-dihydroxybenzoic acid (DHB) for detection of metabolites in positive ionization mode, 3) 4-(anthracen-9-yl)-2-fluoro-1-ethylpyridin-1-ium iodide (FMP-10), which charge-tags molecules with phenolic hydroxyls and/or primary amines, including neurotransmitters. The information about which matrix was sprayed on the tissue sections and other information about the samples is included in the metadata table. We also used three types of control samples: - standard Visium: samples processed with standard Visium (i.e. no matrix spraying, no MALDI-MSI, protocol as recommended by 10x Gemomics with no exeptions) - internal controls (iCTRL): samples not sprayed with any matrix, neither processed with MALDI-MSI, but located on the same Visium slide were other samples were processed with MALDI-MSI - FMP-10-iCTRL: sample sprayed with FMP-10, and then processed as an iCTRL. This and other information is provided in the metadata table.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.