Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a set of POI data sets of Shenzhen, Guangzhou, Beijing, and Shanghai cities, China.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Increasing popularity of social networks made them a viable data source for many data mining applications and event detection is no exception. Researchers aim not only to find events that happen in networks but more importantly to identify and locate events occurring in the real world.In this paper, we propose an enhanced version of quadtree - convolutional quadtree (ConvTree) - and demonstrate its advantage compared to the standard quadtree. We also introduce the algorithm for searching events of different scales using geospatial data obtained from social networks. The algorithm is based on statistical analysis of historical data, generation of ConvTrees representing the normal state of the city and anomalies evaluation for events detection.Experimental study conducted on the dataset of 60 million geotagged Instagram posts in the New York City area demonstrates that the proposed approach is able to find a wide range of events from very local (indie band concert or wedding party) to city (baseball game or holiday march) and even country scale (political protest or Christmas) events. This opens up a perspective of building simple and fast yet powerful system for real-time multiscale events monitoring.
From the site: "Coal Pillar Locations are pillars of coal that must remain in place to provide support for a coal mine."
Spatially continuous data of environmental variables is often required for marine conservation and management. However, information for environmental variables is usually collected by point sampling, particularly for the deep ocean. Thus, methods generating such spatially continuous data by using point samples to estimate values for unknown locations become essential tools. Such methods are, however, often data- or even variable- specific and it is difficult to select an appropriate method for any given dataset. In this study, 14 methods (37 sub-methods) are compared using samples of mud content with five levels of sample density across the southwest Australian margin. Bathymetry, distance to coast, and slope were used as secondary variables. Ten-fold cross validation with relative mean absolute error (RMAE) and visual examination were used to assess the performance of these methods. A total of 1,850 prediction datasets were produced and used to assess the performance of the methods. Considering both the accuracy and the visual examination, we found that a combined method, random forest and ordinary kriging (RKrf), is the most robust method. No threshold in sample density was detected in relation to prediction accuracy. No consistent patterns were observed between the performance of the methods and data variation. The RMAE of three most accurate methods is about 30% lower than that of the best methods in previous publications, highlighting the robustness of the methods selected in this study. The limitations of this study were discussed and a number of suggestions were provided for further studies.
From the site: "Coverages containing industrial mineral mining data by quadrangle for the state of Pennsylvania. Digitized from the Harrisburg Bureau of Mining and Reclamation mylar map system each quadrangle contains multiple coverages identifying seams in that quad. Also includes coverages indicating coal mining refuse disposal sites, permitted sites, point coverages of deep mine entry and other surface features of deep mines and Small Operators Assistance Program (SOAP) areas."
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Travel regions are not necessarily defined by political or administrative boundaries. For example, in the Schengen region of Europe, tourists can travel freely across borders irrespective of national borders. Identifying transboundary travel regions is an interesting problem which we aim to solve using mobility analysis of Twitter users. Our proposed solution comprises collecting geotagged tweets, combining them into trajectories and, thus, mining thousands of trips undertaken by twitter users. After aggregating these trips into a mobility graph, we apply a community detection algorithm to find coherent regions throughout the world. The discovered regions provide insights into international travel and can reveal both domestic and transnational travel regions.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Chytridiomycosis, caused by the fungal pathogen Batrachochytrium dendrobatidis (Bd), is a major driver of amphibian decline worldwide. The global presence of Bd is driven by a synergy of factors, such as climate, species life history, and amphibian host susceptibility. Here, using a Bayesian data-mining approach, we modeled the epidemiological landscape of Bd to evaluate how infection varies across several spatial, ecological, and phylogenetic scales. We compiled global information on Bd occurrence, climate, species ranges, and phylogenetic diversity to infer the potential distribution and prevalence of Bd. By calculating the degree of co-distribution between Bd and our set of environmental and biological variables (e.g. climate and species), we identified the factors that could potentially be related to Bd presence and prevalence using a geographic correlation metric, epsilon (ε). We fitted five ecological models based on 1) amphibian species identity, 2) phylogenetic species variability values for a given species assemblage, 3) temperature, 4) precipitation and 5) all variables together. Our results extend the findings of previous studies by identifying the epidemiological landscape features of Bd. This ecological modeling framework allowed us to generate explicit spatial predictions for Bd prevalence at the global scale and a ranked list of species with high/low probability of Bd presence. Our geographic model identified areas with high potential for Bd prevalence (potential Bd-risk areas) and areas with low potential Bd prevalence as potential refuges (free Bd). At the amphibian assemblage level, we found a non-relationship with amphibian phylogenetic signals, but a significantly negative correlation between observed species richness and Bd prevalence indicated a potential dilution effect at the landscape scale. Our model may identify species and areas potentially susceptible and at risk for Bd presence, which could be used to prioritize regions for amphibian conservation efforts and to assess species and assemblage at risk. Methods Usage notes
These datasets include the geographic data used to build ecological and geographical models for Batrachochytrium dendrobatidis, as well as supplementary results of the following paper: Basanta et al. Epidemiological landscape of Batrachochytrium dendrobatidis and its impact on amphibian diversity at the global scale. Missing values are denoted by NA. Details for each dataset are provided in the README file. Datasets included:
Information of Bd records. Table S1.xls contains Bd occurrence records and prevalence of infection from the Bd-Maps online database (http://www.bd-maps.net), Olson et al. 2013) accessed in 2013, and searched Google Scholar for recent papers with Bd infection reports using the keywords ‘*Batrachochytrium dendrobatidis’*. We excluded records from studies of captive individuals and those without coordinates, keeping only records in which coordinates reflected site-specific sample locations. Supplementary figures Supplementary information S1.docx contains supplementary figures of results obtained in this study. Description of methods and figures interpretation are in the Method and Results sections of the manuscript. Ecological traits of global amphibians and score-trait values. Table S2.xlsx contains information of amphibians' ecological traits obtained from AmphiBIO (Oliveira et al., 2017), as well as Score-trait values for each species. Methods for calculating Score-trait values are described in the section Methods of the manuscript. Values of ε (C|X) and S(C|X) of each Bd-variable pair combination. Table S3.xlsx contains epsilon and score values obtained by each pair combination of Bd and temperature, precipitation, phylogenetic and amphibian species. Methods for calculating epsilon and score values are described in the section Methods of the manuscript. List of new species confirmed positives for Bd (Olson et al. 2021). Table S4.xlsx contains a list of species recently confirmed positives for Bd (Olson et al. 2021) as well as score values and score quantiles.
This geodatabase reflects the U.S. Geological Survey’s (USGS) ongoing commitment to its mission of understanding the nature and distribution of global mineral commodity supply chains by updating and publishing the georeferenced locations of mineral commodity production and processing facilities, mineral exploration and development sites, and mineral commodity exporting ports in Africa. The geodatabase and geospatial data layers serve to create a new geographic information product in the form of a geospatial portable document format (PDF) map. The geodatabase contains data layers from USGS, foreign governmental, and open-source sources as follows: (1) mineral production and processing facilities, (2) mineral exploration and development sites, (3) mineral occurrence sites and deposits, (4) undiscovered mineral resource tracts for Gabon and Mauritania, (5) undiscovered mineral resource tracts for potash, platinum-group elements, and copper, (6) coal occurrence areas, (7) electric power generating facilities, (8) electric power transmission lines, (9) liquefied natural gas terminals, (10) oil and gas pipelines, (11) undiscovered, technically recoverable conventional and continuous hydrocarbon resources (by USGS geologic/petroleum province), (12) cumulative production, and recoverable conventional resources (by oil- and gas-producing nation), (13) major mineral exporting maritime ports, (14) railroads, (15) major roads, (16) major cities, (17) major lakes, (18) major river systems, (19) first-level administrative division (ADM1) boundaries for all countries in Africa, and (20) international boundaries for all countries in Africa.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Land cover is the visible, biophysical cover on the Earth’s surface including trees, shrubs, grasses, soils, exposed rocks and water bodies, as well as anthropogenic elements such as plantations, crops and built environments. Land cover changes for many reasons, including seasonal weather, severe weather events such as cyclones, floods and fires, and human activities such as mining, agriculture and urbanisation. Remote sensing data recorded over a period of time allows the observation of land cover dynamics. Classifying these responses provides a robust and repeatable way of characterising land cover types. These complement on ground survey where available.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Data from the article "Unraveling spatial, structural, and social country-level conditions for the emergence of the foreign fighter phenomenon: an exploratory data mining approach to the case of ISIS", by Agustin Pájaro, Ignacio J. Duran and Pablo Rodrigo, published in Revista DADOS, v. 65, n. 3, 2022.
The Project Approval Boundary spatial data set provides information on the location of the project approvals granted for each mine in NSW by an approval authority (either NSW Department of Planning or local Council). This information may not align to the mine authorisation (i.e. mine title etc) granted under the Mining Act 1992. This information is created and submitted by each large mine operator to fulfill the Final Landuse and Rehabilitation Plan data submission requirements required under Schedule 8A of the Mining Regulation 2016. \r \r The collection of this spatial data is administered by the Resources Regulator in NSW who conducts reviews of the data submitted for assessment purposes. In some cases, information provided may contain inaccuracies that require adjustment following the assessment process by the Regulator. The Regulator will request data resubmission if issues are identified. \r \r Further information on the reporting requirements associated with mine rehabilitation can be found at https://www.resourcesregulator.nsw.gov.au/rehabilitation/mine-rehabilitation. \r \r Find more information about the data at https://www.seed.nsw.gov.au/project-approvals-boundary-layer\r \r Any data related questions should be directed to nswresourcesregulator@service-now.com
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MRSDA Exploration Graticules, Mineral points and regions, Heavy Mineral Sands areas and lines, Mining and Mineral Operation Locations, Extractive Industry Interest Areas, Deep and Shallow Leads and Shallow Workings. Collected for Earth Resources within DSDBI
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The U.S. Geological Survey (USGS) has compiled a geodatabase containing mineral-related geospatial data for 10 countries of interest in Southwest Asia (area of study): Afghanistan, Cambodia, Laos, India, Indonesia, Iran, Nepal, North Korea, Pakistan, and Thailand. The data can be used in analyses of the extractive fuel and nonfuel mineral industries and related economic and physical infrastructure integral for the successful operation of the mineral industries within the area of study as well as the movement of mineral products across domestic and global markets. This geodatabase reflects the USGS ongoing commitment to its mission of understanding the nature and distribution of global mineral commodity supply chains by updating and publishing the georeferenced locations of mineral commodity production and processing facilities, mineral exploration and development sites, and mineral commodity exporting ports for the countries in the area of study. The geodatabase contains data feat ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
These datasets provide the data underlying the publication on "Lines in the sand: quantifying the cumulative development footprint in the world’s largest remaining temperate woodland" https://link.springer.com/article/10.1007/s10980-017-0558-z. . The datasets are: (A) data in csv format: [1] development footprint by sample area: Information on the 24, ~490 km^2 sample areas assessed in the study, including the different infrastructure types (roads, railways, mapped tracks, un-mapped tracks which have been manually digitized in the study using aerial imagery and hub infrastructure such as mine pits and waste rock dumps, also manually digitized in the study). Also contains some key co-variables assessed as potential explanatory variables for development footprint. The region-wide modelling of development footprint found strong positive effects of mining project density and pastoralism, as well as a highly significant negative interaction between the two. At low mining project densities, development footprints are more extensive in pastoral areas, but at high mining project densities, pastoral areas are relatively less developed than non-pastoral areas, on average. [2] Great Western Woodlands (GWW) 20 km grid: The datasets provides data for the 20x20 km grid placed over the whole GWW and used for the regional estimation of development footprint, linear infrastructure density, and linear infrastructure type based on the region-wide analysis. Data is for each cell in the grid and provides the total length of roads in that grid cell, MINEDEX mining projects, pastoral status, etc. This dateset was used to project the data from the 24 study areas across the whole of the Great Western Woodlands and calculate region-wide estimates of development footprint and linear infrastructure lengths. [3] disturbance by patch: This dataset provides the data for each patch for the analysis of patch-level drivers of development footprint, which was performed to gain further insights into the effects of other landscape variables that what could be gleaned from the region-wide analysis. For this analysis, we divided sample areas into polygonal patch types, each with a unique combination of the following categorical co-variables: pastoral tenure, greenstone lithology, conservation tenure, ironstone formation, schedule-1 area clearing restrictions, environmentally sensitive area designation, vegetation formation, and sample area. For each patch type (n=261), we calculated the following attributes: a) number of mining projects, b) number of dead mineral tenements, c) sum of duration of all live and dead tenements, d) type of tenements (exploration/prospecting tenement, mining and related activities tenement, none), e) primary target commodity (gold, nickel, iron-ore, other), f) distance to wheatbelt, and g) distance to the nearest town. [4] mapped versus digitized tracks: This dataset provides mapped and un-mapped track widths, measured using high-resolution aerial imagery at at least 20 randomly-generated locations within each of 24 sample areas. Pastoral tenure and mining intensity for each sample area are included for analysis purposes. [5] edge effect scenarios: Hypothetical edge effect zones were created, based on effect zones gleaned from the literature and arranged under three scenarios, to reflect potential risks of offsite impacts in areas adjacent to development footprints observed (see appendix 3 of article). The calculated proportion of the entire GWW within edge effect zones varied from ~3% under the conservative scenario to ~35% under the maximal scenario. Within the range of development footprints observed in this study, the proportion of a landscape that lies within edge effect zones increases hyperbolically with the number of mining projects, and approaches 100% in the maximal scenario, 60% in the moderate scenario, and ~20% under the conservative scenario. shapefiles: [6] Great Western Woodlands boundary, [7] sample areas (layer file shows sample areas by category).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the results for a journal paper titled "Spatial Parameters for Circular Construction Hubs: Location Criteria for a Circular Built Environment". For this research, we reviewed policy documents and interviewed exports to identify the spatial parameters (or location requirements) for circular construction hubs, which are facilities that collect, store, and redistribute construction waste as secondary resources. The following files included document the research process and results:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Mining Inspectorate Boundaries and Land Exempted from Exploration or Mining Licences under the MRSDA (not the only land). Collected for Earth Resources within DSDBI
Spatial interpolation methods for generating spatially continuous data from point samples of environmental variables are essential for environmental management and conservation. They may fall into three groups: non-geostatistical methods (e.g., inverse distance weighting), geostatistical methods (e.g., ordinary kriging) and combined/hybrid methods (e.g. regression kriging); and their performance is often data-specific (Li and Heap, 2008). Because of the robustness of machine learning methods, like random forest and support vector machine, in data mining fields, we introduced them into spatial statistics by applying them to the spatial predictions of seabed mud content in combination with existing spatial interpolation methods (Li et al., 2011). This development can be viewed as an extension of the combined methods from statistical methods to machine learning field. These applications have significantly improved the prediction accuracy and opened an alternative source of methods for spatial interpolation. Given that they have only been applied to one variable, several questions remain, namely: are they dataset- specific? How reliable are their predictions for different datasets and variables? Could other machine learning methods (such as boosted regression trees) improve the spatial interpolations? To address these questions, we experimentally compared the predictions of several methods for sand content on the southwest Australian marine margin. We tested a variety of existing spatial interpolation methods, machine learning methods and their combinations. In this study, we discuss the experimental results and the value of this advancement in spatial interpolation, visually examine the spatial predictions, and compare the results with the findings in the previous publications. The outcomes of this study can be applied to the spatial prediction of marine and terrestrial environmental variables.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Statistically significant hotspots of fishing activities in the Mediterranean and Atlanti Seas were identified by the application of the Getis-Ord Gi statistic (Getis and Ord 2010) though the statistical software R using the globalG.test function (spdep package). The function computes a global test for spatial autocorrelation using a Monte Carlo simulation approach. It tests the null hypothesis of no autocorrelation against the alternative hypothesis of positive spatial autocorrelation. Then the local spatial autocorrelation was tested calculating the Gi statistic, using the local_g_perm function (dfdep package), which indicates the strength of the clustering.
Categorization of hotspots was performed, according to the Gi value and the p-value of a folded permutation test obtained for each grid cell, as follows:
Grid cells with a p-value > 0.1 were categorized as Insignificant.
The analyses were performed on cumulative fishing activity data at 0.5° resolution of seven different gears separately for the two macroareas.
The dataset presented includes for each area maps of each gear hotspot and spatial layers of the gears hotspots (.shp; .csv)
These data represent the spatial distribution of median single-family home sale prices for new and resale homes for the period 2001.
Land cover is the observed biophysical cover on the earth’s surface including trees, shrubs, grasses, soils, exposed rocks and water bodies, as well as anthropogenic elements such as plantations, crops and built environments. Land cover changes for many reasons, including seasonal weather, severe weather events such as cyclones, floods and fires, and human activities such as mining, agriculture and urbanisation. Remote sensing data recorded over a period of time allows the observation of land cover dynamics. Classifying these responses provides a robust and repeatable way of characterising land cover types. These complement on-ground surveys where available.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a set of POI data sets of Shenzhen, Guangzhou, Beijing, and Shanghai cities, China.