100+ datasets found

Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN
ckan.americaview.org
Updated Sep 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.americaview.org (2022). Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/open-source-spatial-analytics-r
Explore at:
Dataset updated
Sep 10, 2022
Dataset provided by
CKANhttps://ckan.org/
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.
H
Replication Data for the Turnout Example in Chapter 6 of Spatial Analysis...
dataverse.harvard.edu
Updated Jun 28, 2015
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Darmofal (2015). Replication Data for the Turnout Example in Chapter 6 of Spatial Analysis for the Social Sciences [Dataset]. http://doi.org/10.7910/DVN/NGVKUS
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/NGVKUS
Dataset updated
Jun 28, 2015
Dataset provided by
Harvard Dataverse
Authors
David Darmofal
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Replication data for the turnout example in Chapter 6 of Spatial Analysis for the Social Sciences.
Data from: Assessment of positional accuracy in spatial data using...
scielo.figshare.com
png
Updated Jun 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Afonso de Paula dos Santos; Dalto Domingos Rodrigues; Nerilson Terra Santos; Joel Gripp Junior (2023). Assessment of positional accuracy in spatial data using techniques of spatial statistics: proposal of a method and an example using the Brazilian standard [Dataset]. http://doi.org/10.6084/m9.figshare.14327671.v1
Explore at:
pngAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.14327671.v1
Dataset updated
Jun 5, 2023
Dataset provided by
SciELOhttp://www.scielo.org/
Authors
Afonso de Paula dos Santos; Dalto Domingos Rodrigues; Nerilson Terra Santos; Joel Gripp Junior
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper presents the importance of simple spatial statistics techniques applied in positional quality control of spatial data. To this end, Analysis methods of point data spatial distribution pattern are presented, as well as bias analysis in the positional discrepancies samples. To evaluate the points spatial distribution Nearest Neighbor and Ripley's K function methods were used. As for bias analysis, the average directional vectors of discrepancies and the circular variance were used. A methodology for positional quality control of spatial data is proposed, in which includes sampling planning and its spatial distribution pattern evaluation, analyzing the data normality through the application of bias tests, and positional accuracy classification according to a standard. For the practical experiment, an orthoimage generated from a PRISM scene of the ALOS satellite was evaluated. Results showed that the orthoimage is accurate on a scale of 1:25,000, being classified as Class A according to the Brazilian standard positional accuracy, not showing bias at the coordinates. The main contribution of this work is the incorporation of spatial statistics techniques in cartographic quality control.

Geographic Information System Analytics Market Analysis, Size, and Forecast...

technavio.com

pdf

Updated Jul 22, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Technavio (2024). Geographic Information System Analytics Market Analysis, Size, and Forecast 2024-2028: North America (US and Canada), Europe (France, Germany, UK), APAC (China, India, South Korea), Middle East and Africa , and South America [Dataset]. https://www.technavio.com/report/geographic-information-system-analytics-market-industry-analysis

Explore at:

pdfAvailable download formats

Dataset updated

Jul 22, 2024

Dataset provided by

TechNavio

Authors

Technavio

License

https://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice

Time period covered

2024 - 2028

Area covered

United States, Canada

Description

Snapshot img

Geographic Information System Analytics Market Size 2024-2028

The geographic information system analytics market size is forecast to increase by USD 12 billion at a CAGR of 12.41% between 2023 and 2028.

The GIS Analytics Market analysis is experiencing significant growth, driven by the increasing need for efficient land management and emerging methods in data collection and generation. The defense industry's reliance on geospatial technology for situational awareness and real-time location monitoring is a major factor fueling market expansion. Additionally, the oil and gas industry's adoption of GIS for resource exploration and management is a key trend. Building Information Modeling (BIM) and smart city initiatives are also contributing to market growth, as they require multiple layered maps for effective planning and implementation. The Internet of Things (IoT) and Software as a Service (SaaS) are transforming GIS analytics by enabling real-time data processing and analysis.
Augmented reality is another emerging trend, as it enhances the user experience and provides valuable insights through visual overlays. Overall, heavy investments are required for setting up GIS stations and accessing data sources, making this a promising market for technology innovators and investors alike.

What will be the Size of the GIS Analytics Market during the forecast period?

Request Free Sample

The geographic information system analytics market encompasses various industries, including government sectors, agriculture, and infrastructure development. Smart city projects, building information modeling, and infrastructure development are key areas driving market growth. Spatial data plays a crucial role in sectors such as transportation, mining, and oil and gas. Cloud technology is transforming GIS analytics by enabling real-time data access and analysis. Startups are disrupting traditional GIS markets with innovative location-based services and smart city planning solutions. Infrastructure development in sectors like construction and green buildings relies on modern GIS solutions for efficient planning and management. Smart utilities and telematics navigation are also leveraging GIS analytics for improved operational efficiency.
GIS technology is essential for zoning and land use management, enabling data-driven decision-making. Smart public works and urban planning projects utilize mapping and geospatial technology for effective implementation. Surveying is another sector that benefits from advanced GIS solutions. Overall, the GIS analytics market is evolving, with a focus on providing actionable insights to businesses and organizations.

How is this Geographic Information System Analytics Industry segmented?

The geographic information system analytics industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018-2022 for the following segments.

End-user

  Retail and Real Estate
  Government
  Utilities
  Telecom
  Manufacturing and Automotive
  Agriculture
  Construction
  Mining
  Transportation
  Healthcare
  Defense and Intelligence
  Energy
  Education and Research
  BFSI


Components

  Software
  Services


Deployment Modes

  On-Premises
  Cloud-Based


Applications

  Urban and Regional Planning
  Disaster Management
  Environmental Monitoring Asset Management
  Surveying and Mapping
  Location-Based Services
  Geospatial Business Intelligence
  Natural Resource Management


Geography

  North America

    US
    Canada


  Europe

    France
    Germany
    UK


  APAC

    China
    India
    South Korea


  Middle East and Africa

    UAE


  South America

    Brazil


  Rest of World

By End-user Insights

The retail and real estate segment is estimated to witness significant growth during the forecast period.

The GIS analytics market analysis is witnessing significant growth due to the increasing demand for advanced technologies in various industries. In the retail sector, for instance, retailers are utilizing GIS analytics to gain a competitive edge by analyzing customer demographics and buying patterns through real-time location monitoring and multiple layered maps. The retail industry's success relies heavily on these insights for effective marketing strategies. Moreover, the defense industries are integrating GIS analytics into their operations for infrastructure development, permitting, and public safety. Building Information Modeling (BIM) and 4D GIS software are increasingly being adopted for construction project workflows, while urban planning and designing require geospatial data for smart city planning and site selection.

The oil and gas industry is leveraging satellite imaging and IoT devices for land acquisition and mining operations. In the public sector, gover

Geostatistical Analysis of SARS-CoV-2 Positive Cases in the United States
zenodo.org
data.niaid.nih.gov
Updated Sep 17, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Peter K. Rogan; Peter K. Rogan (2020). Geostatistical Analysis of SARS-CoV-2 Positive Cases in the United States [Dataset]. http://doi.org/10.5281/zenodo.4032708
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.4032708
Dataset updated
Sep 17, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Peter K. Rogan; Peter K. Rogan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Geostatistics analyzes and predicts the values associated with spatial or spatial-temporal phenomena. It incorporates the spatial (and in some cases temporal) coordinates of the data within the analyses. It is a practical means of describing spatial patterns and interpolating values for locations where samples were not taken (and measures the uncertainty of those values, which is critical to informed decision making). This archive contains results of geostatistical analysis of COVID-19 case counts for all available US counties. Test results were obtained with ArcGIS Pro (ESRI). Sources are state health departments, which are scraped and aggregated by the Johns Hopkins Coronavirus Resource Center and then pre-processed by MappingSupport.com.

This update of the Zenodo dataset (version 6) consists of three compressed archives containing geostatistical analyses of SARS-CoV-2 testing data. This dataset utilizes many of the geostatistical techniques used in previous versions of this Zenodo archive, but has been significantly expanded to include analyses of up-to-date U.S. COVID-19 case data (from March 24th to September 8^th, 2020):

Archive #1: “1.Geostat. Space-Time analysis of SARS-CoV-2 in the US (Mar24-Sept6).zip” – results of a geostatistical analysis of COVID-19 cases incorporating spatially-weighted hotspots that are conserved over one-week timespans. Results are reported starting from when U.S. COVID-19 case data first became available (March 24^th, 2020) for 25 consecutive 1-week intervals (March 24th through to September 6th, 2020). Hotspots, where found, are reported in each individual state, rather than the entire continental United States.

Archive #2: "2.Geostat. Spatial analysis of SARS-CoV-2 in the US (Mar24-Sept8).zip" – the results from geostatistical spatial analyses only of corrected COVID-19 case data for the continental United States, spanning the period from March 24^th through September 8th, 2020. The geostatistical techniques utilized in this archive includes ‘Hot Spot’ analysis and ‘Cluster and Outlier’ analysis.

Archive #3: "3.Kriging and Densification of SARS-CoV-2 in LA and MA.zip" – this dataset provides preliminary kriging and densification analysis of COVID-19 case data for certain dates within the U.S. states of Louisiana and Massachusetts.

These archives consist of map files (as both static images and as animations) and data files (including text files which contain the underlying data of said map files [where applicable]) which were generated when performing the following Geostatistical analyses: Hot Spot analysis (Getis-Ord Gi*) [‘Archive #1’: consecutive weeklong Space-Time Hot Spot analysis; ‘Archive #2’: daily Hot Spot Analysis], Cluster and Outlier analysis (Anselin Local Moran's I) [‘Archive #2’], Spatial Autocorrelation (Global Moran's I) [‘Archive #2’], and point-to-point comparisons with Kriging and Densification analysis [‘Archive #3’].

The Word document provided ("Description-of-Archive.Updated-Geostatistical-Analysis-of-SARS-CoV-2 (version 6).docx") details the contents of each file and folder within these three archives and gives general interpretations of these results.
d
Replication Data for the Higher Education Spending Example in Chapter 6 of...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darmofal, David (2023). Replication Data for the Higher Education Spending Example in Chapter 6 of Spatial Analysis for the Social Sciences [Dataset]. http://doi.org/10.7910/DVN/7YZEVD
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/7YZEVD
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Darmofal, David
Description
Replication data for the higher education spending example in Chapter 6 of Spatial Analysis for the Social Sciences.
Supporting data for MuSpAn: A toolbox for Multiscale Spatial Analysis
zenodo.org
zip
Updated Oct 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joshua W. Moore; Joshua W. Moore; Joshua Bull; Joshua Bull; Shania Corry; Muyang Lin; Hayley Belnoue-Davis; Eoghan Mulholland-Illingworth; Simon Leedham; Helen Byrne; Shania Corry; Muyang Lin; Hayley Belnoue-Davis; Eoghan Mulholland-Illingworth; Simon Leedham; Helen Byrne (2025). Supporting data for MuSpAn: A toolbox for Multiscale Spatial Analysis [Dataset]. http://doi.org/10.5281/zenodo.17176282
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.17176282
Dataset updated
Oct 22, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joshua W. Moore; Joshua W. Moore; Joshua Bull; Joshua Bull; Shania Corry; Muyang Lin; Hayley Belnoue-Davis; Eoghan Mulholland-Illingworth; Simon Leedham; Helen Byrne; Shania Corry; Muyang Lin; Hayley Belnoue-Davis; Eoghan Mulholland-Illingworth; Simon Leedham; Helen Byrne
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains the data required to reproduce all analyses presented for the manuscript:

MuSpAn: A Toolbox for Multiscale Spatial Analysis

The data is organised into two main folders:

domains_for_figs_2_to_6 (MuSpAn domains)

Four domains of increasing size from regions within a healthy mouse colon (10x Genomics Colon Atlas panel).

Four samples of AKPT mouse tumors (10x Genomics 480 custom panel).

misc_checkpoint_data (Metadata - analysis checkpointing)

Colormap dictionaries for consistent visualization with the published figures.

Checkpointing files to support analyses requiring extended computation times.

Annotation data used for MuSpAn labeling.

The MuSpAn domains were created and saved using v1.2.0 of MuSpAn. This data is to be used with the associate python notebooks which can be found at:

https://github.com/joshwillmoore1/Supporting_material_muspan_paper

These notebooks both reproduce the analysis conducted in the study and serve as example material for MuSpAn usage, fully explained and linked to relevent documentation.
d
Replication Data for the Poverty Rates Example in Chapter 4 of Spatial...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darmofal, David (2023). Replication Data for the Poverty Rates Example in Chapter 4 of Spatial Analysis for the Social Sciences [Dataset]. http://doi.org/10.7910/DVN/OCINEV
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/OCINEV
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Darmofal, David
Description
Replication data for the poverty rates example in Chapter 4 of Spatial Analysis for the Social Sciences.
Z
Codes in R for spatial statistics analysis, ecological response models and...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Feb 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rössel-Ramírez, D. W.; Palacio-Núñez, J.; Espinosa, S.; Martínez-Montoya, J. F. (2023). Codes in R for spatial statistics analysis, ecological response models and spatial distribution models [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7603556
Explore at:
Dataset updated
Feb 6, 2023
Dataset provided by
Campus San Luis, Colegio de Postgraduados. Salinas de Hidalgo, S.L.P. México.
Facultad de Ciencias, Universidad Autónoma de San Luis Potosí. San Luis Potosí, S.L.P. México.
Authors
Rössel-Ramírez, D. W.; Palacio-Núñez, J.; Espinosa, S.; Martínez-Montoya, J. F.
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the last decade, a plethora of algorithms have been developed for spatial ecology studies. In our case, we use some of these codes for underwater research work in applied ecology analysis of threatened endemic fishes and their natural habitat. For this, we developed codes in Rstudio® script environment to run spatial and statistical analyses for ecological response and spatial distribution models (e.g., Hijmans & Elith, 2017; Den Burg et al., 2020). The employed R packages are as follows: caret (Kuhn et al., 2020), corrplot (Wei & Simko, 2017), devtools (Wickham, 2015), dismo (Hijmans & Elith, 2017), gbm (Freund & Schapire, 1997; Friedman, 2002), ggplot2 (Wickham et al., 2019), lattice (Sarkar, 2008), lattice (Musa & Mansor, 2021), maptools (Hijmans & Elith, 2017), modelmetrics (Hvitfeldt & Silge, 2021), pander (Wickham, 2015), plyr (Wickham & Wickham, 2015), pROC (Robin et al., 2011), raster (Hijmans & Elith, 2017), RColorBrewer (Neuwirth, 2014), Rcpp (Eddelbeuttel & Balamura, 2018), rgdal (Verzani, 2011), sdm (Naimi & Araujo, 2016), sf (e.g., Zainuddin, 2023), sp (Pebesma, 2020) and usethis (Gladstone, 2022).

It is important to follow all the codes in order to obtain results from the ecological response and spatial distribution models. In particular, for the ecological scenario, we selected the Generalized Linear Model (GLM) and for the geographic scenario we selected DOMAIN, also known as Gower's metric (Carpenter et al., 1993). We selected this regression method and this distance similarity metric because of its adequacy and robustness for studies with endemic or threatened species (e.g., Naoki et al., 2006). Next, we explain the statistical parameterization for the codes immersed in the GLM and DOMAIN running:

In the first instance, we generated the background points and extracted the values of the variables (Code2_Extract_values_DWp_SC.R). Barbet-Massin et al. (2012) recommend the use of 10,000 background points when using regression methods (e.g., Generalized Linear Model) or distance-based models (e.g., DOMAIN). However, we considered important some factors such as the extent of the area and the type of study species for the correct selection of the number of points (Pers. Obs.). Then, we extracted the values of predictor variables (e.g., bioclimatic, topographic, demographic, habitat) in function of presence and background points (e.g., Hijmans and Elith, 2017).

Subsequently, we subdivide both the presence and background point groups into 75% training data and 25% test data, each group, following the method of Soberón & Nakamura (2009) and Hijmans & Elith (2017). For a training control, the 10-fold (cross-validation) method is selected, where the response variable presence is assigned as a factor. In case that some other variable would be important for the study species, it should also be assigned as a factor (Kim, 2009).

After that, we ran the code for the GBM method (Gradient Boost Machine; Code3_GBM_Relative_contribution.R and Code4_Relative_contribution.R), where we obtained the relative contribution of the variables used in the model. We parameterized the code with a Gaussian distribution and cross iteration of 5,000 repetitions (e.g., Friedman, 2002; kim, 2009; Hijmans and Elith, 2017). In addition, we considered selecting a validation interval of 4 random training points (Personal test). The obtained plots were the partial dependence blocks, in function of each predictor variable.

Subsequently, the correlation of the variables is run by Pearson's method (Code5_Pearson_Correlation.R) to evaluate multicollinearity between variables (Guisan & Hofer, 2003). It is recommended to consider a bivariate correlation ± 0.70 to discard highly correlated variables (e.g., Awan et al., 2021).

Once the above codes were run, we uploaded the same subgroups (i.e., presence and background groups with 75% training and 25% testing) (Code6_Presence&backgrounds.R) for the GLM method code (Code7_GLM_model.R). Here, we first ran the GLM models per variable to obtain the p-significance value of each variable (alpha ≤ 0.05); we selected the value one (i.e., presence) as the likelihood factor. The generated models are of polynomial degree to obtain linear and quadratic response (e.g., Fielding and Bell, 1997; Allouche et al., 2006). From these results, we ran ecological response curve models, where the resulting plots included the probability of occurrence and values for continuous variables or categories for discrete variables. The points of the presence and background training group are also included.

On the other hand, a global GLM was also run, from which the generalized model is evaluated by means of a 2 x 2 contingency matrix, including both observed and predicted records. A representation of this is shown in Table 1 (adapted from Allouche et al., 2006). In this process we select an arbitrary boundary of 0.5 to obtain better modeling performance and avoid high percentage of bias in type I (omission) or II (commission) errors (e.g., Carpenter et al., 1993; Fielding and Bell, 1997; Allouche et al., 2006; Kim, 2009; Hijmans and Elith, 2017).

Table 1. Example of 2 x 2 contingency matrix for calculating performance metrics for GLM models. A represents true presence records (true positives), B represents false presence records (false positives - error of commission), C represents true background points (true negatives) and D represents false backgrounds (false negatives - errors of omission).

Validation set

Model

True

False

Presence

A

B

Background

C

D

We then calculated the Overall and True Skill Statistics (TSS) metrics. The first is used to assess the proportion of correctly predicted cases, while the second metric assesses the prevalence of correctly predicted cases (Olden and Jackson, 2002). This metric also gives equal importance to the prevalence of presence prediction as to the random performance correction (Fielding and Bell, 1997; Allouche et al., 2006).

The last code (i.e., Code8_DOMAIN_SuitHab_model.R) is for species distribution modelling using the DOMAIN algorithm (Carpenter et al., 1993). Here, we loaded the variable stack and the presence and background group subdivided into 75% training and 25% test, each. We only included the presence training subset and the predictor variables stack in the calculation of the DOMAIN metric, as well as in the evaluation and validation of the model.

Regarding the model evaluation and estimation, we selected the following estimators:

1) partial ROC, which evaluates the approach between the curves of positive (i.e., correctly predicted presence) and negative (i.e., correctly predicted absence) cases. As farther apart these curves are, the model has a better prediction performance for the correct spatial distribution of the species (Manzanilla-Quiñones, 2020).

2) ROC/AUC curve for model validation, where an optimal performance threshold is estimated to have an expected confidence of 75% to 99% probability (De Long et al., 1988).
G
QGIS Training Tutorials: Using Spatial Data in Geographic Information...
open.canada.ca
datasets.ai
+1more
html
Updated Oct 5, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistics Canada (2021). QGIS Training Tutorials: Using Spatial Data in Geographic Information Systems [Dataset]. https://open.canada.ca/data/en/dataset/89be0c73-6f1f-40b7-b034-323cb40b8eff
Explore at:
htmlAvailable download formats
Dataset updated
Oct 5, 2021
Dataset provided by
Statistics Canada
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Description
Have you ever wanted to create your own maps, or integrate and visualize spatial datasets to examine changes in trends between locations and over time? Follow along with these training tutorials on QGIS, an open source geographic information system (GIS) and learn key concepts, procedures and skills for performing common GIS tasks – such as creating maps, as well as joining, overlaying and visualizing spatial datasets. These tutorials are geared towards new GIS users. We’ll start with foundational concepts, and build towards more advanced topics throughout – demonstrating how with a few relatively easy steps you can get quite a lot out of GIS. You can then extend these skills to datasets of thematic relevance to you in addressing tasks faced in your day-to-day work.
f
fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf
frontiersin.figshare.com
pdf
Updated Jun 3, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Martin Werner (2023). fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf [Dataset]. http://doi.org/10.3389/fdata.2019.00044.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fdata.2019.00044.s001
Dataset updated
Jun 3, 2023
Dataset provided by
Frontiers
Authors
Martin Werner
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This paper provides an abstract analysis of parallel processing strategies for spatial and spatio-temporal data. It isolates aspects such as data locality and computational locality as well as redundancy and locally sequential access as central elements of parallel algorithm design for spatial data. Furthermore, the paper gives some examples from simple and advanced GIS and spatial data analysis highlighting both that big data systems have been around long before the current hype of big data and that they follow some design principles which are inevitable for spatial data including distributed data structures and messaging, which are, however, incompatible with the popular MapReduce paradigm. Throughout this discussion, the need for a replacement or extension of the MapReduce paradigm for spatial data is derived. This paradigm should be able to deal with the imperfect data locality inherent to spatial data hindering full independence of non-trivial computational tasks. We conclude that more research is needed and that spatial big data systems should pick up more concepts like graphs, shortest paths, raster data, events, and streams at the same time instead of solving exactly the set of spatially separable problems such as line simplifications or range queries in manydifferent ways.
Geodatabase for the Baltimore Ecosystem Study Spatial Data
search.dataone.org
portal.edirepository.org
Updated Apr 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Spatial Analysis Lab; Jarlath O'Neal-Dunne; Morgan Grove (2020). Geodatabase for the Baltimore Ecosystem Study Spatial Data [Dataset]. https://search.dataone.org/view/https%3A%2F%2Fpasta.lternet.edu%2Fpackage%2Fmetadata%2Feml%2Fknb-lter-bes%2F3120%2F150
Explore at:
Dataset updated
Apr 1, 2020
Dataset provided by
Long Term Ecological Research Networkhttp://www.lternet.edu/
Authors
Spatial Analysis Lab; Jarlath O'Neal-Dunne; Morgan Grove
Time period covered
Jan 1, 1999 - Jun 1, 2014
Area covered

Description
The establishment of a BES Multi-User Geodatabase (BES-MUG) allows for the storage, management, and distribution of geospatial data associated with the Baltimore Ecosystem Study. At present, BES data is distributed over the internet via the BES website. While having geospatial data available for download is a vast improvement over having the data housed at individual research institutions, it still suffers from some limitations. BES-MUG overcomes these limitations; improving the quality of the geospatial data available to BES researches, thereby leading to more informed decision-making. BES-MUG builds on Environmental Systems Research Institute's (ESRI) ArcGIS and ArcSDE technology. ESRI was selected because its geospatial software offers robust capabilities. ArcGIS is implemented agency-wide within the USDA and is the predominant geospatial software package used by collaborating institutions. Commercially available enterprise database packages (DB2, Oracle, SQL) provide an efficient means to store, manage, and share large datasets. However, standard database capabilities are limited with respect to geographic datasets because they lack the ability to deal with complex spatial relationships. By using ESRI's ArcSDE (Spatial Database Engine) in conjunction with database software, geospatial data can be handled much more effectively through the implementation of the Geodatabase model. Through ArcSDE and the Geodatabase model the database's capabilities are expanded, allowing for multiuser editing, intelligent feature types, and the establishment of rules and relationships. ArcSDE also allows users to connect to the database using ArcGIS software without being burdened by the intricacies of the database itself. For an example of how BES-MUG will help improve the quality and timeless of BES geospatial data consider a census block group layer that is in need of updating. Rather than the researcher downloading the dataset, editing it, and resubmitting to through ORS, access rules will allow the authorized user to edit the dataset over the network. Established rules will ensure that the attribute and topological integrity is maintained, so that key fields are not left blank and that the block group boundaries stay within tract boundaries. Metadata will automatically be updated showing who edited the dataset and when they did in the event any questions arise. Currently, a functioning prototype Multi-User Database has been developed for BES at the University of Vermont Spatial Analysis Lab, using Arc SDE and IBM's DB2 Enterprise Database as a back end architecture. This database, which is currently only accessible to those on the UVM campus network, will shortly be migrated to a Linux server where it will be accessible for database connections over the Internet. Passwords can then be handed out to all interested researchers on the project, who will be able to make a database connection through the Geographic Information Systems software interface on their desktop computer. This database will include a very large number of thematic layers. Those layers are currently divided into biophysical, socio-economic and imagery categories. Biophysical includes data on topography, soils, forest cover, habitat areas, hydrology and toxics. Socio-economics includes political and administrative boundaries, transportation and infrastructure networks, property data, census data, household survey data, parks, protected areas, land use/land cover, zoning, public health and historic land use change. Imagery includes a variety of aerial and satellite imagery. See the readme: http://96.56.36.108/geodatabase_SAL/readme.txt See the file listing: http://96.56.36.108/geodatabase_SAL/diroutput.txt
C
Introduction to spatial statistics
dataverse.csuc.cat
txt, zip
Updated Oct 18, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pere Joan Gelabert Vadillo; Pere Joan Gelabert Vadillo; Marcos Rodrigues Mimbrero; Marcos Rodrigues Mimbrero (2024). Introduction to spatial statistics [Dataset]. http://doi.org/10.34810/data1784
Explore at:
zip(3340716343), txt(13737)Available download formats
Unique identifier
https://doi.org/10.34810/data1784
Dataset updated
Oct 18, 2024
Dataset provided by
CORA.Repositori de Dades de Recerca
Authors
Pere Joan Gelabert Vadillo; Pere Joan Gelabert Vadillo; Marcos Rodrigues Mimbrero; Marcos Rodrigues Mimbrero
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Area covered
Catalunya, Spain, La Rioja, Spain, Aragón, Spain
Dataset funded by
Agencia Estatal de Investigación
Description
This dataset constitutes an introduction to plotting and mapping and the essential concepts of spatial data management and modeling. And data ready for several examples of regression and classification algorithms (Multiple Linear Regression, Generalized Linear Models, CART and Random Forest), also exploring classic interpolation methods (Inverse Distance Weighting and Kriging).

Geospatial Data Pack for Visualization

kaggle.com

zip

Updated Oct 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Vega Datasets (2025). Geospatial Data Pack for Visualization [Dataset]. https://www.kaggle.com/datasets/vega-datasets/geospatial-data-pack

Explore at:

zip(1422109 bytes)Available download formats

Dataset updated

Oct 21, 2025

Dataset authored and provided by

Vega Datasets

Description

Geospatial Data Pack for Visualization 🗺️

Learn Geographic Mapping with Altair, Vega-Lite and Vega using Curated Datasets

Complete geographic and geophysical data collection for mapping and visualization. This consolidation includes 18 complementary datasets used by 31+ Vega, Vega-Lite, and Altair examples 📊. Perfect for learning geographic visualization techniques including projections, choropleths, point maps, vector fields, and interactive displays.

Source data lives on GitHub and can also be accessed via CDN. The vega-datasets project serves as a common repository for example datasets used across these visualization libraries and related projects.

Why Use This Dataset? 🤔

Comprehensive Geospatial Types: Explore a variety of core geospatial data models:
- Vector Data: Includes points (like airports.csv), lines (like londonTubeLines.json), and polygons (like us-10m.json).
- Raster-like Data: Work with gridded datasets (like windvectors.csv, annual-precip.json).
Diverse Formats: Gain experience with standard and efficient geospatial formats like GeoJSON (see Table 1, 2, 4), compressed TopoJSON (see Table 1), and plain CSV/TSV (see Table 2, 3, 4) for point data and attribute tables ready for joining.
Multi-Scale Coverage: Practice visualization across different geographic scales, from global and national (Table 1, 4) down to the city level (Table 1).
Rich Thematic Mapping: Includes multiple datasets (Table 3) specifically designed for joining attributes to geographic boundaries (like states or counties from Table 1) to create insightful choropleth maps.
Ready-to-Use & Example-Driven: Cleaned datasets tightly integrated with 31+ official examples (see Appendix) from Altair, Vega-Lite, and Vega, allowing you to immediately practice techniques like projections, point maps, network maps, and interactive displays.
Python Friendly: Works seamlessly with essential Python libraries like Altair (which can directly read TopoJSON/GeoJSON), Pandas, and GeoPandas, fitting perfectly into the Kaggle notebook environment.

Dataset Inventory 🗂️

This pack includes 18 datasets covering base maps, reference points, statistical data for choropleths, and geophysical data.

1. BASE MAP BOUNDARIES (Topological Data)

Dataset	File	Size	Format	License	Description	Key Fields / Join Info
US Map (1:10m)	`us-10m.json`	627 KB	TopoJSON	CC-BY-4.0	US state and county boundaries. Contains `states` and `counties` objects. Ideal for choropleths.	`id` (FIPS code) property on geometries
World Map (1:110m)	`world-110m.json`	117 KB	TopoJSON	CC-BY-4.0	World country boundaries. Contains `countries` object. Suitable for world-scale viz.	`id` property on geometries
London Boroughs	`londonBoroughs.json`	14 KB	TopoJSON	CC-BY-4.0	London borough boundaries.	`properties.BOROUGHN` (name)
London Centroids	`londonCentroids.json`	2 KB	GeoJSON	CC-BY-4.0	Center points for London boroughs.	`properties.id`, `properties.name`
London Tube Lines	`londonTubeLines.json`	78 KB	GeoJSON	CC-BY-4.0	London Underground network lines.	`properties.name`, `properties.color`

2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

Dataset	File	Size	Format	License	Description	Key Fields / Join Info
US Airports	`airports.csv`	205 KB	CSV	Public Domain	US airports with codes and coordinates.	`iata`, `state`, `l...

d
Replication Data for the Blue States Versus Red States Example in Chapter 4...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darmofal, David (2023). Replication Data for the Blue States Versus Red States Example in Chapter 4 of Spatial Analysis for the Social Sciences [Dataset]. http://doi.org/10.7910/DVN/5HC1NP
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/5HC1NP
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Darmofal, David
Description
Replication data for the blue states versus red states example in Chapter 4 of Spatial Analysis for the Social Sciences.
Enriched NYTimes COVID19 U.S. County Dataset
kaggle.com
zip
Updated Jun 14, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ringhilterra17 (2020). Enriched NYTimes COVID19 U.S. County Dataset [Dataset]. https://www.kaggle.com/ringhilterra17/enrichednytimescovid19
Explore at:
zip(11291611 bytes)Available download formats
Dataset updated
Jun 14, 2020
Authors
ringhilterra17
License
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Area covered
United States
Description
Overview and Inspiration

I wanted to make some geospatial visualizations to convey the current severity of COVID19 in different parts of the U.S..

I liked the NYTimes COVID dataset, but it was lacking information on county boundary shape data, population per county, new cases / deaths per day, and per capita calculations, and county demographics.

After a lot of work tracking down the different data sources I wanted and doing all of the data wrangling and joins in python, I wanted to open-source the final enriched data set in order to give others a head start in their COVID-19 related analytic, modeling, and visualization efforts.

This dataset is enriched with county shapes, county center point coordinates, 2019 census population estimates, county population densities, cases and deaths per capita, and calculated per day cases / deaths metrics. It contains daily data per county back to January, allowing for analyizng changes over time.

UPDATE: I have also included demographic information per county, including ages, races, and gender breakdown. This could help determine which counties are most susceptible to an outbreak.

How this data can be used

Geospatial analysis and visualization - Which counties are currently getting hit the hardest (per capita and totals)? - What patterns are there in the spread of the virus across counties? (network based spread simulations using county center lat / lons) -county population densities play a role in how quickly the virus spreads? -how does a specific county/state cases and deaths compare to other counties/states? Join with other county level datasets easily (with fips code column)

Content Details

See the column descriptions for more details on the dataset

Visualizations and Analysis Examples

COVID-19 U.S. Time-lapse: Confirmed Cases per County (per capita)

https://github.com/ringhilterra/enriched-covid19-data/blob/master/example_viz/covid-cases-final-04-06.gif?raw=true" alt="">-

Other Data Notes

Please review nytimes README for detailed notes on Covid-19 data - https://github.com/nytimes/covid-19-data/

The only update I made in regards to 'Geographic Exceptions', is that I took 'New York City' county provided in the Covid-19 data, which has all cases for 'for the five boroughs of New York City (New York, Kings, Queens, Bronx and Richmond counties) and replaced the missing FIPS for those rows with the 'New York County' fips code 36061. That way I could join to a geometry, and then I used the sum of those five boroughs population estimates for the 'New York City' estimate, which allowed me calculate 'per capita' metrics for 'New York City' entries in the Covid-19 dataset

Acknowledgements

Special thanks to NYTimes for all of their hard work gathering and consolidating all of the U.S. COVID19 related data on daily basis. Their git repo https://github.com/nytimes/covid-19-data/

Also, thanks to ykzeng for the county population density estimates: https://github.com/ykzeng/covid-19/tree/master/data-
Geospatial data for the Vegetation Mapping Inventory Project of Pictured...
catalog.data.gov
Updated Nov 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Park Service (2025). Geospatial data for the Vegetation Mapping Inventory Project of Pictured Rocks National Lakeshore [Dataset]. https://catalog.data.gov/dataset/geospatial-data-for-the-vegetation-mapping-inventory-project-of-pictured-rocks-national-la
Explore at:
Dataset updated
Nov 25, 2025
Dataset provided by
National Park Servicehttp://www.nps.gov/
Area covered
Pictured Rocks
Description
The files linked to this reference are the geospatial data created as part of the completion of the baseline vegetation inventory project for the NPS park unit. Current format is ArcGIS file geodatabase but older formats may exist as shapefiles. We converted the photointerpreted data into a format usable in a geographic information system (GIS) by employing three fundamental processes: (1) orthorectify, (2) digitize, and (3) develop the geodatabase. All digital map automation was projected in Universal Transverse Mercator (UTM), Zone 16, using the North American Datum of 1983 (NAD83). Orthorectify: We orthorectified the interpreted overlays by using OrthoMapper, a softcopy photogrammetric software for GIS. One function of OrthoMapper is to create orthorectified imagery from scanned and unrectified imagery (Image Processing Software, Inc., 2002). The software features a method of visual orientation involving a point-and-click operation that uses existing orthorectified horizontal and vertical base maps. Of primary importance to us, OrthoMapper also has the capability to orthorectify the photointerpreted overlays of each photograph based on the reference information provided. Digitize: To produce a polygon vector layer for use in ArcGIS (Environmental Systems Research Institute [ESRI], Redlands, California), we converted each raster-based image mosaic of orthorectified overlays containing the photointerpreted data into a grid format by using ArcGIS. In ArcGIS, we used the ArcScan extension to trace the raster data and produce ESRI shapefiles. We digitally assigned map-attribute codes (both map-class codes and physiognomic modifier codes) to the polygons and checked the digital data against the photointerpreted overlays for line and attribute consistency. Ultimately, we merged the individual layers into a seamless layer. Geodatabase: At this stage, the map layer has only map-attribute codes assigned to each polygon. To assign meaningful information to each polygon (e.g., map-class names, physiognomic definitions, links to NVCS types), we produced a feature-class table, along with other supportive tables and subsequently related them together via an ArcGIS Geodatabase. This geodatabase also links the map to other feature-class layers produced from this project, including vegetation sample plots, accuracy assessment (AA) sites, aerial photo locations, and project boundary extent. A geodatabase provides access to a variety of interlocking data sets, is expandable, and equips resource managers and researchers with a powerful GIS tool.
Minneapolis Fire Department Spatial Analysis
umn.hub.arcgis.com
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Minnesota (2024). Minneapolis Fire Department Spatial Analysis [Dataset]. https://umn.hub.arcgis.com/content/27f3173088f3422bad3a353a3c0636ba
Explore at:
Dataset updated
Mar 6, 2024
Dataset provided by
University of Minnesota Systemhttps://system.umn.edu/
Authors
University of Minnesota
Area covered
Minneapolis
Description
The OneMap template can be used to connect multiple organizations to collaborate and share with internal and external stakeholders.Today, organizations must work beyond borders, jurisdictions, and sectors to address shared challenges. Collaboration is key whether you call your initiative a Spatial Data Infrastructure (SDI), Open Data, Digital Twin, Knowledge Infrastructure, Digital Ecosystem, or otherwise. The term ‘OneMap’ is a placeholder for your community GIS branding.View example hubsDiscover good practice guides and implementation patternsLearn more about integrated geospatial infrastructureThe 'OneMap' Hub concept is multi-organizational. The website is designed to help communities of practice jumpstart your initiatives. Use it to share and collaborate, provide focus on thematic topics, and more.This item is available to ArcGIS Hub Basic and Premium licensed organizations.

🌆 City Lifestyle Segmentation Dataset

kaggle.com

zip

Updated Nov 15, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

UmutUygurr (2025). 🌆 City Lifestyle Segmentation Dataset [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/city-lifestyle-segmentation-dataset

Explore at:

zip(11274 bytes)Available download formats

Dataset updated

Nov 15, 2025

Authors

UmutUygurr

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22121490%2F7189944f8fc292a094c90daa799d08ca%2FChatGPT%20Image%2015%20Kas%202025%2014_07_37.png?generation=1763204959770660&alt=media" alt="">

🌆 About This Dataset

This synthetic dataset simulates 300 global cities across 6 major geographic regions, designed specifically for unsupervised machine learning and clustering analysis. It explores how economic status, environmental quality, infrastructure, and digital access shape urban lifestyles worldwide.

🎯 Perfect For:

📊 K-Means, DBSCAN, Agglomerative Clustering
🔬 PCA & t-SNE Dimensionality Reduction
🗺️ Geospatial Visualization (Plotly, Folium)
📈 Correlation Analysis & Feature Engineering
🎓 Educational Projects (Beginner to Intermediate)

📦 What's Inside?

Feature	Description	Range
10 Features	Economic, environmental & social indicators	Realistically scaled
300 Cities	Europe, Asia, Americas, Africa, Oceania	Diverse distributions
Strong Correlations	Income ↔ Rent (+0.8), Density ↔ Pollution (+0.6)	ML-ready
No Missing Values	Clean, preprocessed data	Ready for analysis
4-5 Natural Clusters	Metropolitan hubs, eco-towns, developing centers	Pre-validated

🔥 Key Features

✅ Realistic Correlations: Income strongly predicts rent (+0.8), internet access (+0.7), and happiness (+0.6)
✅ Regional Diversity: Each region has distinct economic and environmental characteristics
✅ Clustering-Ready: Naturally separable into 4-5 lifestyle archetypes
✅ Beginner-Friendly: No data cleaning required, includes example code
✅ Documented: Comprehensive README with methodology and use cases

🚀 Quick Start Example

import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load and prepare
df = pd.read_csv('city_lifestyle_dataset.csv')
X = df.drop(['city_name', 'country'], axis=1)
X_scaled = StandardScaler().fit_transform(X)

# Cluster
kmeans = KMeans(n_clusters=5, random_state=42)
df['cluster'] = kmeans.fit_predict(X_scaled)

# Analyze
print(df.groupby('cluster').mean())

🎓 Learning Outcomes

After working with this dataset, you will be able to: 1. Apply K-Means, DBSCAN, and Hierarchical Clustering 2. Use PCA for dimensionality reduction and visualization 3. Interpret correlation matrices and feature relationships 4. Create geographic visualizations with cluster assignments 5. Profile and name discovered clusters based on characteristics

📚 Ideal For These Projects

🏆 Kaggle Competitions: Practice clustering techniques
📝 Academic Projects: Urban planning, sociology, environmental science
💼 Portfolio Work: Showcase ML skills to employers
🎓 Learning: Hands-on practice with unsupervised learning
🔬 Research: Urban lifestyle segmentation studies

🌍 Expected Clusters

Cluster	Characteristics	Example Cities
Metropolitan Tech Hubs	High income, density, rent	Silicon Valley, Singapore
Eco-Friendly Towns	Low density, clean air, high happiness	Nordic cities
Developing Centers	Mid income, high density, poor air	Emerging markets
Low-Income Suburban	Low infrastructure, income	Rural areas
Industrial Mega-Cities	Very high density, pollution	Manufacturing hubs

🛠️ Technical Details

Format: CSV (UTF-8)
Size: ~300 rows × 10 columns
Missing Values: 0%
Data Types: 2 categorical, 8 numerical
Target Variable: None (unsupervised)
Correlation Strength: Pre-validated (r: 0.4 to 0.8)

📖 What Makes This Dataset Special?

Unlike random synthetic data, this dataset was carefully engineered with: - ✨ Realistic correlation structures based on urban research - 🌍 Regional characteristics matching real-world patterns - 🎯 Optimal cluster separability (validated via silhouette scores) - 📚 Comprehensive documentation and starter code

🏅 Use This Dataset If You Want To:

✓ Learn clustering without data cleaning hassles
✓ Practice PCA and dimensionality reduction
✓ Create beautiful geographic visualizations
✓ Understand feature correlation in real-world contexts
✓ Build a portfolio project with clear business insights

📊 Acknowledgments

This dataset was designed for educational purposes in machine learning and data science. While synthetic, it reflects real patterns observed in global urban development research.

Happy Clustering! 🎉

s
Spatial Multimodal Analysis (SMA) - Spatial Transcriptomics
figshare.scilifelab.se
demo.researchdata.se
+1more
json
Updated Jan 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marco Vicari; Reza Mirzazadeh; Anna Nilsson; Patrik Bjärterot; Ludvig Larsson; Hower Lee; Mats Nilsson; Julia Foyer; Markus Ekvall; Paulo Czarnewski; Xiaoqun Zhang; Per Svenningsson; Per Andrén; Lukas Käll; Joakim Lundeberg (2025). Spatial Multimodal Analysis (SMA) - Spatial Transcriptomics [Dataset]. http://doi.org/10.17044/scilifelab.22778920.v1
Explore at:
jsonAvailable download formats
Unique identifier
https://doi.org/10.17044/scilifelab.22778920.v1
Dataset updated
Jan 15, 2025
Dataset provided by
KTH Royal Institute of Technology, Science for Life Laboratory
Authors
Marco Vicari; Reza Mirzazadeh; Anna Nilsson; Patrik Bjärterot; Ludvig Larsson; Hower Lee; Mats Nilsson; Julia Foyer; Markus Ekvall; Paulo Czarnewski; Xiaoqun Zhang; Per Svenningsson; Per Andrén; Lukas Käll; Joakim Lundeberg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains Spatial Transcriptomics (ST) data matching with Matrix Assisted Laser Desorption/Ionization - Mass Spetrometry Imaging (MALDI-MSI). This data is complementary to data contained in the same project. FIles with the same identifiers in the two datasets originated from the very same tissue section and can be combined in a multimodal ST-MSI object. For more information about the dataset please see our manuscript posted on BioRxiv (doi: https://doi.org/10.1101/2023.01.26.525195). This dataset includes ST data from 19 tissue sections, including human post-mortem and mouse samples. The spatial transcriptomics data was generated using the Visium protocol (10x Genomics). The murine tissue sections come from three different mice unilaterally injected with 6-OHDA. 6-OHDA is a neurotoxin that when injected in the brain can selectively destroy dopaminergic neurons. We used this mouse model to show the applicability of the technology that we developed, named Spatial Multimodal Analysis (SMA). Using our technology on these mouse brain tissue sections we were able to detect both dopamine with MALDI-MSI and the corresponding gene expression with ST. This dataset includes also one human post-mortem striatum sample that was placed on one Visium slide across the four capture areas. This sample was analyzed with a different ST protocol named RRST (Mirzazadeh, R., Andrusivova, Z., Larsson, L. et al. Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples. Nat Commun 14, 509 (2023). https://doi.org/10.1038/s41467-023-36071-5), where probes capturing the whole transcriptome are first hybridized in the tissue section and then spatially detected. Each tissue section contained in the dataset has been given a unique identifier that is composed of the Visium array ID and capture area ID of the Visium slide that the tissue section was placed on. This unique identifier is included in the file names of all the files relative to the same tissue section, including the MALDI-MSI files published in the other dataset included in this project. In this dataset you will find the following files for each tissue section: - raw files: these are the read one fastq files (containing the pattern *R1*fastq.gz in the file name), read two fastq files (containing the pattern *R1*fastq.gz in the file name) and the raw microscope images (containing the pattern Spot.jpg in the file name). These are the only files needed to run the Space Ranger pipeline, which is freely available for any user (please see the 10x Genomics website for information on how to install and run Space Ranger); - processed data files: we provide processed data files of two types: a) Space Ranger outputs that were used to produce the figures in our publication; b) manual annotation tables in csv format produced using Loupe Browser 6 (csv tables with file names ending _RegionLoupe.csv, _filter.csv, _dopamine.csv, _lesion.csv, _region.csv patterns); c) json files that we used as input for Space Ranger in the cases where the automatic tissue detection included in the pipeline failed to recognize the tissue or the fiducials. Using these processed files the user can reproduce the figures of our publication without having to restart from the raw data files. The MALDI-MSI analyses preceding ST was performed with different matrices in different tissue section. We used 1) 9-aminoacridine (9-AA) for detection of metabolites in negative ionization mode, 2) 2,5-dihydroxybenzoic acid (DHB) for detection of metabolites in positive ionization mode, 3) 4-(anthracen-9-yl)-2-fluoro-1-ethylpyridin-1-ium iodide (FMP-10), which charge-tags molecules with phenolic hydroxyls and/or primary amines, including neurotransmitters. The information about which matrix was sprayed on the tissue sections and other information about the samples is included in the metadata table. We also used three types of control samples: - standard Visium: samples processed with standard Visium (i.e. no matrix spraying, no MALDI-MSI, protocol as recommended by 10x Gemomics with no exeptions) - internal controls (iCTRL): samples not sprayed with any matrix, neither processed with MALDI-MSI, but located on the same Visium slide were other samples were processed with MALDI-MSI - FMP-10-iCTRL: sample sprayed with FMP-10, and then processed as an iCTRL. This and other information is provided in the metadata table.

Facebook

Twitter

Click to copy link

Link copied

Cite

ckan.americaview.org (2022). Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN [Dataset]. https://ckan.americaview.org/dataset/open-source-spatial-analytics-r

Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN

Explore at:

Dataset updated

Sep 10, 2022

Dataset provided by

CKANhttps://ckan.org/

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

In this course, you will learn to work within the free and open-source R environment with a specific focus on working with and analyzing geospatial data. We will cover a wide variety of data and spatial data analytics topics, and you will learn how to code in R along the way. The Introduction module provides more background info about the course and course set up. This course is designed for someone with some prior GIS knowledge. For example, you should know the basics of working with maps, map projections, and vector and raster data. You should be able to perform common spatial analysis tasks and make map layouts. If you do not have a GIS background, we would recommend checking out the West Virginia View GIScience class. We do not assume that you have any prior experience with R or with coding. So, don't worry if you haven't developed these skill sets yet. That is a major goal in this course. Background material will be provided using code examples, videos, and presentations. We have provided assignments to offer hands-on learning opportunities. Data links for the lecture modules are provided within each module while data for the assignments are linked to the assignment buttons below. Please see the sequencing document for our suggested order in which to work through the material. After completing this course you will be able to: prepare, manipulate, query, and generally work with data in R. perform data summarization, comparisons, and statistical tests. create quality graphs, map layouts, and interactive web maps to visualize data and findings. present your research, methods, results, and code as web pages to foster reproducible research. work with spatial data in R. analyze vector and raster geospatial data to answer a question with a spatial component. make spatial models and predictions using regression and machine learning. code in the R language at an intermediate level.

Clear search

Close search

Google apps

Main menu

Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN

Replication Data for the Turnout Example in Chapter 6 of Spatial Analysis...

Data from: Assessment of positional accuracy in spatial data using...

Geographic Information System Analytics Market Analysis, Size, and Forecast...

Snapshot img

Geostatistical Analysis of SARS-CoV-2 Positive Cases in the United States

Replication Data for the Higher Education Spending Example in Chapter 6 of...

Supporting data for MuSpAn: A toolbox for Multiscale Spatial Analysis

Replication Data for the Poverty Rates Example in Chapter 4 of Spatial...

Codes in R for spatial statistics analysis, ecological response models and...

QGIS Training Tutorials: Using Spatial Data in Geographic Information...

fdata-02-00044_Parallel Processing Strategies for Big Geospatial Data.pdf

Geodatabase for the Baltimore Ecosystem Study Spatial Data

Introduction to spatial statistics

Geospatial Data Pack for Visualization

Geospatial Data Pack for Visualization 🗺️

Why Use This Dataset? 🤔

Table of Contents

Dataset Inventory 🗂️

1. BASE MAP BOUNDARIES (Topological Data)

2. GEOGRAPHIC REFERENCE POINTS (Point Data) 📍

Replication Data for the Blue States Versus Red States Example in Chapter 4...

Enriched NYTimes COVID19 U.S. County Dataset

Overview and Inspiration

How this data can be used

Content Details

Visualizations and Analysis Examples

Other Data Notes

Acknowledgements

Geospatial data for the Vegetation Mapping Inventory Project of Pictured...

Minneapolis Fire Department Spatial Analysis

🌆 City Lifestyle Segmentation Dataset

🌆 About This Dataset

🎯 Perfect For:

📦 What's Inside?

🔥 Key Features

🚀 Quick Start Example

🎓 Learning Outcomes

📚 Ideal For These Projects

🌍 Expected Clusters

🛠️ Technical Details

📖 What Makes This Dataset Special?

🏅 Use This Dataset If You Want To:

📊 Acknowledgments

Spatial Multimodal Analysis (SMA) - Spatial Transcriptomics

Open-Source Spatial Analytics (R) - Datasets - AmericaView - CKAN