55 datasets found

f
DataSheet1_Optimal parameters of random forest for land cover classification...
figshare.com
frontiersin.figshare.com
zip
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jing Sun; Suwit Ongsomwang (2023). DataSheet1_Optimal parameters of random forest for land cover classification with suitable data type and dataset on Google Earth Engine.zip [Dataset]. http://doi.org/10.3389/feart.2023.1188093.s001
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.3389/feart.2023.1188093.s001
Dataset updated
Oct 23, 2023
Dataset provided by
Frontiers
Authors
Jing Sun; Suwit Ongsomwang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Exact land cover (LC) map is essential information for understanding the development of human societies and studying the impacts of climate and environmental change. To fulfill this requirement, an optimal parameter of Random Forest (RF) for LC classification with suitable data type and dataset on Google Earth Engine (GEE) was investigated. The research objectives were 1) to examine optimum parameters of RF for LC classification at local scale 2) to classify LC data and assess accuracy in model area (Hefei City), 3) to identify a suitable data type and dataset for LC classification and 4) to validate optimum parameters of RF for LC classification with a suitable data type and dataset in test area (Nanjing City). This study suggests that the suitable data types for LC classification were Sentinel-2 data with auxiliary data. Meanwhile, the suitable dataset for LC classification was monthly and seasonal medians of Sentinel-2, elevation, and nighttime light data. The appropriate values of the number of trees, the variable per split, and the bag fraction for RF were 800, 22, and 0.9, respectively. The overall accuracy (OA) and Kappa index of LC in model area (Hefei City) with suitable dataset was 93.17% and 0.9102. In the meantime, the OA and Kappa index of LC in test area (Nanjing City) was 92.38% and 0.8914. Thus, the developed research methodology can be applied to update LC map where LC changes quickly occur.
s
OpenLandMap-soildb: soil type probability - suborder: Psamments
repository.soilwise-he.eu
zenodo.org
+1more
Updated Jul 9, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). OpenLandMap-soildb: soil type probability - suborder: Psamments [Dataset]. http://doi.org/10.5281/zenodo.15481497
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15481497
Dataset updated
Jul 9, 2025
Description
Sub-dataset: soil type probability - suborder: Psamments Description Global annual maps of soil properties for 2000—2022 produced within the scope of the Land & Carbon Lab, integrating Digital surface/terrain model, vegetation/tillage indices, climatic/bioclimatic variables, and based on tree-based spatiotemporal Machine Learning. While the primary focus is on improving monitoring in global soil properties, the dataset provides wall-to-wall coverage across all terrestrial ecosystems and is organized into 300+ global mosaics in COG (Cloud Optimized GeoTIFF) format. Data are presented at 5-year intervals, across 3 standard depth intervals (0–30 cm, 30–60 cm, 60–100 cm), and cover 79 USDA soil taxonomy suborders. Original layers use the WGS84 Coordinate System (EPSG:4326) at a pixel resolution of 0.00025 degrees, and 0.00075 degrees with uncertainty (STAC and GEE). Layers archived on Zenodo are at 0.00075 degrees with uncertainty but include only the initial and final periods (2000–2005 & 2020–2022), including: Soil Organic Carbon Content (g/kg) As a key indicator of soil fertility, structure, and microbial activity, it represents the concentration of organic carbon in the fine earth fraction of the soil. Standard method of measurement is dry combustion using elemental analyzers (e.g., ISO 10694). Soil Organic Carbon Density (kg/m³) Represents the mass of organic carbon per unit volume of soil. It is derived as: SOC content × bulk density × (1 − coarse fragment volume fraction). This value is critical for estimating total carbon stocks and monitoring soil carbon changes over time. Soil pH Indicates the acidity or alkalinity of soil, affecting nutrient availability and microbial processes. Reported as pH measured in water solution (pH in H₂O). Bulk Density (g/cm³) Refers to the mass of dry fine earth (<2 mm) per unit volume, excluding coarse fragments. It reflects soil compaction and porosity, influencing water retention and root penetration. Commonly determined using the core method or calculated from pedotransfer functions. Soil Texture Fraction Defines the relative proportions of mineral particles by size. Texture influences water movement, nutrient holding capacity, and plant growth. Clay content (%): Proportion of particles <0.002 mm in diameter. Sand content (%): Proportion of particles between 0.05–2.0 mm (some definitions use 0.063 mm as lower threshold). Silt content (%): Particles sized between 0.002–0.05 mm or up to 0.063 mm depending on classification system. Textural fractions follow USDA or FAO particle size classifications. Soil Type Probability Probabilistic classification of soils based on USDA Soil Taxonomy at the subgroup level. Each pixel is assigned a probability distribution across potential soil types, based on legacy point data and environmental covariates. 30m layers can be accessed through STAC and Google Earth Engine GEE) through: OpenLandMap STAC https://stac.openlandmap.org Google Earth Engine https://code.earthengine.google.com/?asset=projects/global-pasture-watch/assets/gsm-30m All modeling framework is publicly available at OpenLandMap GitHub - soildb Data Detail Time period: 2000-2022, in 5-year intervals (last period covers 2020–2022) for soil properties ; 2000-2022 static for soil type Type of data: Spatiotemporal soil data base, with depth ranges and weighted percentage data for soil assessments and static soil type classification. How the data was collected or derived: The data was derived using machine learning models. Statistical methods used: Tree-based spatiotemporal machine learning Depth reference: b30cm..60cm = below ground at 30-60cm interval Limitations or exclusions in the data: no Antarctica; masking out permanent ice and deserts Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 0.00075 degree (~120m) Image size: 360,000P, 132,000L File format: Cloud Optimized Geotiff (COG) format Dataset Contents This dataset includes: soil type probability - suborder: Psamments Related Identifiers SOC density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , SOC content: below ground 0cm-30cm 2000-2005 - part 1 , below ground 0cm-30cm 2000-2005 - part 2 , below ground 0cm-30cm 2020-2022 - part 1 , below ground 0cm-30cm 2020-2022 - part 2 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Bulk density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil ph of water: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil textures fraction: clay below ground 0cm-30cm 2000-2005 , clay below ground 0cm-30cm 2020-2022 , clay below ground 30cm-60cm 2000-2005 , clay below ground 30cm-60cm 2020-2022 , clay below ground 60cm-100cm 2000-2005 , clay below ground 60cm-100cm 2020-2022 , sand below ground 0cm-30cm 2000-2005 , sand below ground 0cm-30cm 2020-2022 , sand below ground 30cm-60cm 2000-2005 , sand below ground 30cm-60cm 2020-2022 , sand below ground 60cm-100cm 2000-2005 , sand below ground 60cm-100cm 2020-2022 , silt below ground 0cm-30cm 2000-2005 , silt below ground 0cm-30cm 2020-2022 , silt below ground 30cm-60cm 2000-2005 , silt below ground 30cm-60cm 2020-2022 , silt below ground 60cm-100cm 2000-2005 , silt below ground 60cm-100cm 2020-2022 , Soil type (Suborder): Uderts , Calcids , Xerands , Orthents , Cryands , Ustalfs , Cryalfs , Aquepts , Udalfs , Cryolls , Durids , Usterts , Boralfs , Orthids , Udands , Torrerts , Histels , Rendolls , Aqualfs , Udepts , Xeralfs , Gelepts , Xerults , Fibrists , Ustepts , Xererts , Ustults , Aquands , Perox , Xerolls , Tropepts , Turbels , Udults , Aquents , Aquerts , Ustox , Aquods , Aquolls , Xerepts , Udox , Cryods , Ustolls , Aquults , Psamments , Arents , Fluvents , Humults , Vitrands , Udolls , Borolls , Orthels , Hemists , Wassents , Albolls , Salids , Cryepts , Saprists , Folists , Gypsids , Ochrepts , Cambids , Argids , Orthods , Data Details Time period: 2000-2022 Type of data: soil type probability - suborder: Psamments How the data was collected or derived: Machine learning models. Statistical Methods used: Random Forest. Limitations or exclusions in the data: The dataset does not include Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 120m Image size: 360,000P x 132,000L File format: Cloud Optimized Geotiff (COG) format. Layer information: File Name Unit Scale Data Type No Data Description oc_iso.10694.1995.mg.cm3 kg/m³ 10 UInt16 32767 Organic carbon density derived by multiply fine earth bulk density and organic carbon content oc_iso.10694.1995.wpml g/kg 10 UInt16 32767 Organic carbon content based on dry combustion weight percent ph.h2o_iso.10390.2021.index - 10 Byte 255 The pH, 1:1 soil-water suspension is the pH of a sample measured in distilled water at a 1:1 soil:solution ratio bd.core_iso.11272.2017.g.cm3 g/cm³ 100 UInt16 32767 Bulk density, <2mm fraction, dry is the weight per unit volume of the <2 mm fraction, with volume measured in laboratory sand.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated sand 0.063 to 2.0 mm particle diameter silt.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated silt 0.002 to 0.063 mm particle size clay.tot_iso.11277.2020.wpct % 1 Byte 255 Total clay is the soil separate with <0.002 mm particle diameter soil.types_ensemble % 1 Byte 255 Probability of soil type occurrence Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for oc_iso.10694.1995.wpml_m_30m_b30cm..60cm_20000101_20051231_g_epsg.4326_v20250204.tif, the fields are: generic
G
MCD12Q1.061 MODIS Land Cover Type Yearly Global 500m
developers.google.com
Updated Jan 1, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NASA LP DAAC at the USGS EROS Center (2024). MCD12Q1.061 MODIS Land Cover Type Yearly Global 500m [Dataset]. http://doi.org/10.5067/MODIS/MCD12Q1.061
Explore at:
Unique identifier
https://doi.org/10.5067/MODIS/MCD12Q1.061
Dataset updated
Jan 1, 2024
Dataset provided by
NASA LP DAAC at the USGS EROS Center
Time period covered
Jan 1, 2001 - Jan 1, 2024
Area covered
Earth
Description
The Terra and Aqua combined Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type (MCD12Q1) Version 6.1 data product provides global land cover types at yearly intervals. The MCD12Q1 Version 6.1 data product is derived using supervised classifications of MODIS Terra and Aqua reflectance data. Land cover types are derived from the International Geosphere-Biosphere Programme (IGBP), University of Maryland (UMD), Leaf Area Index (LAI), BIOME-Biogeochemical Cycles (BGC), and Plant Functional Types (PFT) classification schemes. The supervised classifications then underwent additional post-processing that incorporate prior knowledge and ancillary information to further refine specific classes. Additional land cover property assessment layers are provided by the Food and Agriculture Organization (FAO) Land Cover Classification System (LCCS) for land cover, land use, and surface hydrology. Layers for Land Cover Type 1-5, Land Cover Property 1-3, Land Cover Property Assessment 1-3, Land Cover Quality Control (QC), and a Land Water Mask are also provided. Documentation: User's Guide Algorithm Theoretical Basis Document (ATBD) General Documentation
H
Google Earth Engine code to generate water coverage data, Schaffer-Smith et...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Sep 18, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Margaret Swift (2022). Google Earth Engine code to generate water coverage data, Schaffer-Smith et al 2022 [Dataset]. https://www.hydroshare.org/resource/01c98336686a44d8892d57e7e2637ccb
Explore at:
zip(64.6 KB)Available download formats
Dataset updated
Sep 18, 2022
Dataset provided by
HydroShare
Authors
Margaret Swift
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 1, 2017 - Jul 31, 2020
Description
Surface water in arid regions is essential to many organisms including large mammals of conservation concern. For many regions little is known about the extent, ecology and hydrology of ephemeral waters, because they are challenging to map given their ephemeral nature and small sizes. Our goal was to advance surface water knowledge by mapping and monitoring ephemeral water from the wet to dry seasons across the Kavango-Zambezi (KAZA) transfrontier conservation area of southern Africa (300,000 km2). We mapped individual waterholes for six time points each year from mid-2017 to mid-2020, and described their presence, extent, duration, variability, and recurrence. We further analyzed a wide range of physical and landscape aspects of waterhole locations, including soils, geology, and topography, to climate and soil moisture. We identified 2.1 million previously unmapped ephemeral waterholes (85-89% accuracy) that seasonally extend across 23.5% of the study area. We confirmed a distinct ‘blue wave’ with ephemeral water across the region peaking at the end of the rainy season. We observed a wide range of waterhole types and sizes, with large variances in seasonal and interannual hydrology. We found that ephemeral surface water spatiotemporal patterns were was associated with soil type; loam soils were most likely to hold water for longer periods in the study area. From the wettest time period to the driest, there was a ~44,000 km2 (62%) decrease in ephemeral water extent across the region—these dramatic seasonal fluctuations have implications for wildlife movement. A warmer and drier climate, expected human population growth, and associated agricultural expansion and development may threaten these sensitive and highly variable water resources and the wildlife that depend on them.

This contains Google Earth Engine code to generate water coverage data for Schaffer-Smith et al 2022.
H
GEE-TED: A tsetse ecological distribution model for Google Earth Engine
dataverse.harvard.edu
search.dataone.org
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brad Peter; Joseph Messina (2024). GEE-TED: A tsetse ecological distribution model for Google Earth Engine [Dataset]. http://doi.org/10.7910/DVN/6JR87X
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/6JR87X
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Brad Peter; Joseph Messina
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
GEE-TED: A tsetse ecological distribution model for Google Earth Engine Please refer to the associated publication: Fox, L., Peter, B.G., Frake, A.N. and Messina, J.P., 2023. A Bayesian maximum entropy model for predicting tsetse ecological distributions. International Journal of Health Geographics, 22(1), p.31. https://link.springer.com/article/10.1186/s12942-023-00349-0 Description GEE-TED is a Google Earth Engine (GEE; Gorelick et al. 2017) adaptation of a tsetse ecological distribution (TED) model developed by DeVisser et al. (2010), which was designed for use in ESRI's ArcGIS. TED uses time-series climate and land-use/land-cover (LULC) data to predict the probability of tsetse presence across space based on species habitat preferences (in this case Glossina Morsitans). Model parameterization includes (1) day and night temperatures (MODIS Land Surface Temperature; MOD11A2), (2) available moisture/humidity using a vegetation index as a proxry (MODIS NDVI; MOD13Q1), (3) LULC (MODIS Land Cover Type 1; MCD12Q1), (4) year selections, and (5) fly movement rate (meters/16-days). TED has also been used as a basis for the development of an agent-based model by Lin et al. (2015) and in a cost-benefit analysis of tsetse control in Tanzania by Yang et al. (2017). Parameterization in Fox et al. (2023): Suitable LULC types and climate thresholds used here are specific to Glossina Morsitans in Kenya and are based on the parameterization selections in DeVisser et al. (2010) and DeVisser and Messina (2009). Suitable temperatures range from 17–40°C during the day and 10–40°C at night and available moisture is characterized as NDVI > 0.39. Suitable LULC comprises predominantly woody vegetation; a complete list of suitable categories is available in DeVisser and Messina (2009). In the Fox et al. (Forthcoming) publication, two versions of MCD12Q1 were used to assess suitable LULC types: Versions 051 and 006. The GeoTIFF supplied in this dataset entry (GEE-TED_Kenya_2016-2017.tif) uses the aforementioned parameters to show the probable tsetse distribution across Kenya for the years 2016-2017. A static graphic of this GEE-TED output is shown below and an interactive version can be viewed at: https://cartoscience.users.earthengine.app/view/gee-ted. Figure associated with Fox et al. (2023) GEE code The code supplied below is generalizable across geographies and species; however, it is highly recommended that parameterization is given considerable attention to produce reliable results. Note that output visualization on-the-fly will take some time and it is recommended that results be exported as an asset within GEE or exported as a GeoTIFF. Note: Since completing the Fox et al. (2023) manuscript, GEE has removed Version 051 per NASA's deprecation of the product. The current release of GEE-TED now uses only MCD12Q1 Version 006; however, alternative LULC data selections can be used with minimal modification to the code. // Input options var tempMin = 10 // Temperature thresholds in degrees Celsius var tempMax = 40 var ndviMin = 0.39 // NDVI thresholds; proxy for available moisture/humidity var ndviMax = 1 var movement = 500 // Fly movement rate in meters/16-days var startYear = 2008 // The first 2 years will be used for model initialization var endYear = 2019 // Computed probability is based on startYear+2 to endYear var country = 'KE' // Country codes - https://en.wikipedia.org/wiki/List_of_FIPS_country_codes var crs = 'EPSG:32737' // See https://epsg.io/ for appropriate country UTM zone var rescale = 250 // Output spatial resolution var labelSuffix = '02052020' // For file export labeling only //[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] MODIS/006/MCD12Q1 var lulcOptions006 = [1,1,1,1,1,1,1,1,1, 0, 1, 0, 0, 0, 0, 0, 0] // 1 = suitable 0 = unsuitable // No more input required ------------------------------ // var region = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017") .filterMetadata('country_co', 'equals', country) // Input parameter modifications var tempMinMod = (tempMin+273.15)/0.02 var tempMaxMod = (tempMax+273.15)/0.02 var ndviMinMod = ndviMin*10000 var ndviMaxMod = ndviMax*10000 var ndviResolution = 250 var movementRate = movement+(ndviResolution/2) // Loading image collections var lst = ee.ImageCollection('MODIS/006/MOD11A2').select('LST_Day_1km', 'LST_Night_1km') .filter(ee.Filter.calendarRange(startYear,endYear,'year')) var ndvi = ee.ImageCollection('MODIS/006/MOD13Q1').select('NDVI') .filter(ee.Filter.calendarRange(startYear,endYear,'year')) var lulc006 = ee.ImageCollection('MODIS/006/MCD12Q1').select('LC_Type1') // Lulc mode and boolean reclassification var lulcMask = lulc006.mode().remap([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],lulcOptions006) .eq(1).rename('remapped').clip(region) // Merge NDVI and LST image collections var combined = ndvi.combine(lst, true) var combinedList = combined.toList(10000) // Boolean reclassifications (suitable/unsuitable) for day/night temperatures and ndvi var con =...
USFS TreeMap v2016 (Conterminous United States)
developers.google.com
Updated Jan 1, 2016
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
USDA Forest Service (USFS) Geospatial Technology and Applications Center (GTAC) (2016). USFS TreeMap v2016 (Conterminous United States) [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/USFS_GTAC_TreeMap_v2016
Explore at:
Dataset updated
Jan 1, 2016
Dataset provided by
U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
Time period covered
Jan 1, 2016 - Jan 1, 2017
Area covered

Description
This product is part of the TreeMap data suite. It provides detailed spatial information on forest characteristics including number of live and dead trees, biomass, and carbon across the entire forested extent of the continental United States in 2016. TreeMap v2016 contains one image, a 22-band 30 x 30m resolution …
H
Data from: Monitoring the storage volume of water reservoirs using Google...
hydroshare.org
beta.hydroshare.org
+1more
zip
Updated Sep 24, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joaquim Condeça; João Nascimento; Nuno Barreiras (2021). Monitoring the storage volume of water reservoirs using Google Earth Engine [Dataset]. http://doi.org/10.4211/hs.44849607416745c98fc0946672128100
Explore at:
zip(435.1 KB)Available download formats
Unique identifier
https://doi.org/10.4211/hs.44849607416745c98fc0946672128100
Dataset updated
Sep 24, 2021
Dataset provided by
HydroShare
Authors
Joaquim Condeça; João Nascimento; Nuno Barreiras
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 1, 1984 - Dec 31, 2019
Area covered

Description
Recently, the satellite images have been used in remote sensing allowing observations with high temporal and spatial distribution. The use of water indices has proved to be an effective methodology in the monitoring of surface water resources. However, precise or automatic methodologies using satellite imagery to determine reservoir volumes are lacking. To fulfil that gap, this methodology proposes 3 stages: use Google Earth Engine (GEE) to select images; automatically calculate flooded surface areas applying water indices; determine the volume stored in reservoirs over those years based on the relation between the flooded area and the stored volume. The method was applied in four reservoirs and contemplate Landsat 4 and 5 ETM and Landsat 8 OLI. For the calculation of the flooded area the NDWI Indexes (McFeeters, 1996; Gao, 1996), and the MNDWI index (Xu, 2006) were applied and tested. The estimation of stored volume of water was made based on the area indices and a cross-check between real stored volume and calculated volume was made. Finally, an analysis on the selection of the best fit water indices was made. The results of every case studies herein displayed showed a quantifiable proficiency and reliability for quite a varied natural conditions. As a conclusion, this methodology could be seen as a tool for water resources management in developing countries, and not only, to measure automatically trends of stored volumes and its relation with the precipitation, and could eventually be extended to other types of surface water bodies, as lakes and coastal lagoons.
s
MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory
repository.soilwise-he.eu
dataverse.harvard.edu
+1more
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory [Dataset]. http://doi.org/10.7910/DVN/M4ZGXP
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/M4ZGXP
Dataset updated
Apr 18, 2025
Description
MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

--------------------------------------------------------------------------------------
MSZSI is a data extraction tool for Google Earth Engine that aggregates time-series remote sensing information to multiple administrative levels using the FAO GAUL data layers. The code at the bottom of this page (metadata) can be pasted into the Google Earth Engine JavaScript code editor and ran at https://code.earthengine.google.com/.

Please refer to the associated publication:
Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624.
https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624

Input options:
[1] Country of interest
[2] Start and end year
[3] Start and end month
[4] Option to mask data to a specific land-use/land-cover type
[5] Land-use/land-cover type code from CGLS LULC
[6] Image collection for data aggregation
[7] Desired band from the image collection
[8] Statistics type for the zonal aggregations
[9] Statistic to use for annual aggregation
[10] Scaling options
[11] Export folder and label suffix

Output: Two CSVs containing zonal statistics for each of the FAO GAUL administrative level boundaries
Output fields: system:index, 0-ADM0_CODE, 0-ADM0_NAME, 0-ADM1_CODE, 0-ADM1_NAME, 0-ADMN_CODE, 0-ADMN_NAME, 1-AREA_PERCENT_LULC, 1-AREA_SQM_LULC, 1-AREA_SQM_ZONE, 2-X_2001, 2-X_2002, 2-X_2003, ..., 2-X_2020, .geo

PREPROCESSED DATA DOWNLOAD

The datasets available for download contain zonal statistics at 2 administrative levels (FAO GAUL levels 1 and 2). Select countries from Southeast Asia and Sub-Saharan Africa (Cambodia, Indonesia, Lao PDR, Myanmar, Philippines, Thailand, Vietnam, Burundi, Kenya, Malawi, Mozambique, Rwanda, Tanzania, Uganda, Zambia, Zimbabwe) are included in the current version, with plans to extend the dataset to contain global metrics. Each zip file is described below and two example NDVI tables are available for preview.

Key: [source, data, units, temporal range, aggregation, masking, zonal statistic, notes]

Currently available:
MSZSI-V2_V-NDVI-MEAN.tar: [NASA-MODIS, NDVI, index, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-DAY-MEAN.tar: [NASA-MODIS, LST Day, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_T-LST-NIGHT-MEAN.tar: [NASA-MODIS, LST Night, °C, 2001–2020, annual mean, agriculture, mean, n/a]
MSZSI-V2_R-PRECIP-SUM.tar: [UCSB-CHG-CHIRPS, Precipitation, mm, 2001–2020, annual sum, agriculture, mean, n/a]
MSZSI-V2_S-BDENS-MEAN.tar: [OpenLandMap, Bulk density, g/cm3, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-ORGC-MEAN.tar: [OpenLandMap, Organic carbon, g/kg, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-PH-MEAN.tar: [OpenLandMap, pH in H2O, pH, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-WATER-MEAN.tar: [OpenLandMap, Soil water, % at 33kPa, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SAND-MEAN.tar: [OpenLandMap, Sand, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-SILT-MEAN.tar: [OpenLandMap, Silt, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_S-CLAY-MEAN.tar: [OpenLandMap, Clay, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
MSZSI-V2_E-ELEV-MEAN.tar: [MERIT, [elevation, slope, flowacc, HAND], [m, degrees, km², m], static, n/a, agriculture, mean, n/a]

Coming soon
MSZSI-V2_C-STAX-MEAN.tar: [OpenLandMap, Soil taxonomy, category, static, n/a, agriculture, area sum, n/a]
MSZSI-V2_C-LULC-MEAN.tar: [CGLS-LC100-V3, LULC, category, 2015–2019, mode, none, area sum, n/a]

Data sources:
https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD13Q1
https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD11A2
https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_PENTAD
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_BULKDENS-FINEEARTH_USDA-4A1H_M_v02
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_ORGANIC-CARBON_USDA-6A1C_M_v02
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_PH-H2O_USDA-4C1A2A_M_v02
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_WATERCONTENT-33KPA_USDA-4B1C_M_v01
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_CLAY-WFRACTION_USDA-3A1A1A_M_v02
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_SAND-WFRACTION_USDA-3A1A1A_M_v02
https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_GRTGROUP_USDA-SOILTAX_C_v01
https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_Landcover_100m_Proba-V-C3_Global
https://developers.google.com/earth-engine/datasets/catalog/MERIT_Hydro_v1_0_1
https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level0
https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level1
https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level2

Project information:

https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental

https://cartoscience.users.earthengine.app/view/maup-mapper-multi-scale-modis-ndvi

Google Earth Engine code

/*/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// MSZSI: Multi-Scale Zonal Statistics Inventory Authors: Brad G. Peter, Department of Geography, University of Alabama Joseph Messina, Department of Geography, University of Alabama Austin Raney, Department of Geography, University of Alabama Rodrigo E. Principe, AgriCircle AG Peilei Fan, Department of Geography, Environment, and Spatial Sciences, Michigan State University Citation: Peter, Brad; Messina, Joseph; Raney, Austin; Principe, Rodrigo; Fan, Peilei, 2021, 'MSZSI: Multi-Scale Zonal Statistics Inventory', https://doi.org/10.7910/DVN/YCUBXS, Harvard Dataverse, V# SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)

Image identifiers used in this study (Google Earth Engine)

figshare.com

csv

Updated Aug 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

MPANDA MUKENZA Médard (2025). Image identifiers used in this study (Google Earth Engine) [Dataset]. http://doi.org/10.6084/m9.figshare.29955524.v1

Explore at:

csvAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.29955524.v1

Dataset updated

Aug 21, 2025

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

MPANDA MUKENZA Médard

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

This file contains the identifiers of the 1,250 Sentinel-2 images used in the analysis presented in [Mapping forest types of semi-arid regions of DR Congo, the need to account for leaf phenology]

PNW mountains NAIP 1m resolution

kaggle.com

zip

Updated Jan 31, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Noahbadoa (2022). PNW mountains NAIP 1m resolution [Dataset]. https://www.kaggle.com/datasets/noahbadoa/pnw-mountains-naip-1m-resolution

Explore at:

zip(79307378676 bytes)Available download formats

Dataset updated

Jan 31, 2022

Authors

Noahbadoa

Area covered

Pacific Northwest

Description

Content

This dataset contains 30k images that are 1024 by 1024 rgb. They are 1m satellite images from two patches of Pacific Northwest mountain ranges

Acknowledgements

Data collected by USDA and I got it distributed through google earth engine.

Data from: Evaluation of machine learning methods and multi-source remote...

tandf.figshare.com

doc

Updated Feb 20, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

Xingguang Yan; Jing Li; Andrew R. Smith; Di Yang; Tianyue Ma; YiTing Su; Jiahao Shao (2024). Evaluation of machine learning methods and multi-source remote sensing data combinations to construct forest above-ground biomass models [Dataset]. http://doi.org/10.6084/m9.figshare.24481669.v1

Explore at:

docAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.24481669.v1

Dataset updated

Feb 20, 2024

Dataset provided by

Taylor & Francishttps://taylorandfrancis.com/

Authors

Xingguang Yan; Jing Li; Andrew R. Smith; Di Yang; Tianyue Ma; YiTing Su; Jiahao Shao

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Rapid and accurate estimation of forest biomass are essential to drive sustainable management of forests. Field-based measurements of forest above-ground biomass (AGB) can be costly and difficult to conduct. Multi-source remote sensing data offers the potential to improve the accuracy of modelled AGB predictions. Here, four machine learning methods: Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Classification and Regression Trees (CART), and Minimum Distance (MD) were used to construct forest AGB models of Taiyue Mountain forest, Shanxi Province, China using single and multi-sourced remote sensing data and the Google Earth Engine platform. Results showed that the machine learning method that most accurately predicted AGB were GBDT and spectral index for coniferous (R2 = 0.99; RMSE = 65.52 Mg/ha), broadleaved (R2 = 0.97; RMSE = 29.14 Mg/ha), and mixed-species (R2 = 0.97; RMSE = 81.12 Mg/ha) forest types. Models constructed using bivariate variable combinations that included the spectral index improved the AGB estimation accuracy of mixed-species (R2 = 0.99; RMSE = 59.52 Mg/ha) forest types and reduced slightly the accuracy of coniferous (R2 = 0.99; RMSE = 101.46 Mg/ha) and broadleaved (R2 = 0.97; RMSE = 37.59 Mg/ha) forest AGB estimation. Overall, parameterizing machine learning algorithms with multi-source remote sensing variables can improve the prediction accuracy of mixed-species forests.

USDA NASS Cropland Data Layers

developers.google.com

Updated Jan 1, 2024

Facebook

Twitter

Click to copy link

Link copied

Cite

USDA National Agricultural Statistics Service (2024). USDA NASS Cropland Data Layers [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/USDA_NASS_CDL

Explore at:

Dataset updated

Jan 1, 2024

Dataset provided by

National Agricultural Statistics Servicehttp://www.nass.usda.gov/
United States Department of Agriculturehttp://usda.gov/

Time period covered

Jan 1, 1997 - Jan 1, 2024

Area covered

Description

The Cropland Data Layer (CDL) is a crop-specific land cover data layer created annually for the continental United States using moderate resolution satellite imagery and extensive agricultural ground truth. The CDL is created by the USDA, National Agricultural Statistics Service (NASS), Research and Development Division, Geospatial Information Branch, Spatial Analysis Research Section. For detailed FAQ please visit CropScape and Cropland Data Layers - FAQs. To explore details about the classification accuracies and utility of the data, see state-level omission and commission errors by crop type and year. The asset date is aligned with the calendar year of harvest. For most crops the planted and harvest year are the same. Some exceptions: winter wheat is unique, as it is planted in the prior year. A hay crop like alfalfa could have been planted years prior. For winter wheat the data also have a class called "Double Crop Winter Wheat/Soybeans". Some mid-latitude areas of the US have conditions such that a second crop (usually soybeans) can be planted immediately after the harvest of winter wheat and itself still be harvested within the same year. So for mapping winter wheat areas use both classes (use both values 24 and 26). While the CDL date is aligned with year of harvest, the map itself is more representative of what was planted. In other words, a small percentage of fields on a given year will not be harvested. Some non-agricultural categories are duplicate due to two very different epochs in methodology. The non-ag codes 63-65 and 81-88 are holdovers from the older methodology and will only appear in CDLs from 2007 and earlier. The non-ag codes from 111-195 are from the current methodology which uses the USGS NLCD as non-ag training and will only appear in CDLs 2007 and newer. 2007 was a transition year so there may be both sets of categories in the 2007 national product but will not appear within the same state. Note: The 2024 CDL only has the data band. The cultivated and confidence bands are yet to be released by the provider.

Global map of forest types 2020

developers.google.com

Updated Dec 31, 2020

Facebook

Twitter

Click to copy link

Link copied

Cite

Joint Research Centre, European Commission (2020). Global map of forest types 2020 [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/JRC_GFC2020_subtypes_V0

Explore at:

Dataset updated

Dec 31, 2020

Dataset provided by

Joint Research Centre, European Commission

Time period covered

Jan 1, 2020 - Dec 31, 2020

Area covered

Earth

Description

The global map of forest types provides a spatially explicit representation of primary forest, naturally regenerating forest and planted forest (including plantation forest) for the year 2020 at 10m spatial resolution. The base layer for mapping these forest types is the extent of forest cover of version 1 of the …

OpenLandMap-soildb: soil type probability - suborder: Borolls

repository.soilwise-he.eu
data.europa.eu

Updated Jul 9, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

(2025). OpenLandMap-soildb: soil type probability - suborder: Borolls [Dataset]. http://doi.org/10.5281/zenodo.15481509

Explore at:

Unique identifier

https://doi.org/10.5281/zenodo.15481509

Dataset updated

Jul 9, 2025

Description

Sub-dataset: soil type probability - suborder: Boralfs Description Global annual maps of soil properties for 2000—2022 produced within the scope of the Land & Carbon Lab, integrating Digital surface/terrain model, vegetation/tillage indices, climatic/bioclimatic variables, and based on tree-based spatiotemporal Machine Learning. While the primary focus is on improving monitoring in global soil properties, the dataset provides wall-to-wall coverage across all terrestrial ecosystems and is organized into 300+ global mosaics in COG (Cloud Optimized GeoTIFF) format. Data are presented at 5-year intervals, across 3 standard depth intervals (0–30 cm, 30–60 cm, 60–100 cm), and cover 79 USDA soil taxonomy suborders. Original layers use the WGS84 Coordinate System (EPSG:4326) at a pixel resolution of 0.00025 degrees, and 0.00075 degrees with uncertainty (STAC and GEE). Layers archived on Zenodo are at 0.00075 degrees with uncertainty but include only the initial and final periods (2000–2005 & 2020–2022), including: Soil Organic Carbon Content (g/kg) As a key indicator of soil fertility, structure, and microbial activity, it represents the concentration of organic carbon in the fine earth fraction of the soil. Standard method of measurement is dry combustion using elemental analyzers (e.g., ISO 10694). Soil Organic Carbon Density (kg/m³) Represents the mass of organic carbon per unit volume of soil. It is derived as: SOC content × bulk density × (1 − coarse fragment volume fraction). This value is critical for estimating total carbon stocks and monitoring soil carbon changes over time. Soil pH Indicates the acidity or alkalinity of soil, affecting nutrient availability and microbial processes. Reported as pH measured in water solution (pH in H₂O). Bulk Density (g/cm³) Refers to the mass of dry fine earth (<2 mm) per unit volume, excluding coarse fragments. It reflects soil compaction and porosity, influencing water retention and root penetration. Commonly determined using the core method or calculated from pedotransfer functions. Soil Texture Fraction Defines the relative proportions of mineral particles by size. Texture influences water movement, nutrient holding capacity, and plant growth. Clay content (%): Proportion of particles <0.002 mm in diameter. Sand content (%): Proportion of particles between 0.05–2.0 mm (some definitions use 0.063 mm as lower threshold). Silt content (%): Particles sized between 0.002–0.05 mm or up to 0.063 mm depending on classification system. Textural fractions follow USDA or FAO particle size classifications. Soil Type Probability Probabilistic classification of soils based on USDA Soil Taxonomy at the subgroup level. Each pixel is assigned a probability distribution across potential soil types, based on legacy point data and environmental covariates. 30m layers can be accessed through STAC and Google Earth Engine GEE) through: OpenLandMap STAC https://stac.openlandmap.org Google Earth Engine https://code.earthengine.google.com/?asset=projects/global-pasture-watch/assets/gsm-30m All modeling framework is publicly available at OpenLandMap GitHub - soildb Data Detail Time period: 2000-2022, in 5-year intervals (last period covers 2020–2022) for soil properties ; 2000-2022 static for soil type Type of data: Spatiotemporal soil data base, with depth ranges and weighted percentage data for soil assessments and static soil type classification. How the data was collected or derived: The data was derived using machine learning models. Statistical methods used: Tree-based spatiotemporal machine learning Depth reference: b30cm..60cm = below ground at 30-60cm interval Limitations or exclusions in the data: no Antarctica; masking out permanent ice and deserts Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 0.00075 degree (~120m) Image size: 360,000P, 132,000L File format: Cloud Optimized Geotiff (COG) format Dataset Contents This dataset includes: soil type probability - suborder: Boralfs Related Identifiers SOC density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , SOC content: below ground 0cm-30cm 2000-2005 - part 1 , below ground 0cm-30cm 2000-2005 - part 2 , below ground 0cm-30cm 2020-2022 - part 1 , below ground 0cm-30cm 2020-2022 - part 2 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Bulk density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil ph of water: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil textures fraction: clay below ground 0cm-30cm 2000-2005 , clay below ground 0cm-30cm 2020-2022 , clay below ground 30cm-60cm 2000-2005 , clay below ground 30cm-60cm 2020-2022 , clay below ground 60cm-100cm 2000-2005 , clay below ground 60cm-100cm 2020-2022 , sand below ground 0cm-30cm 2000-2005 , sand below ground 0cm-30cm 2020-2022 , sand below ground 30cm-60cm 2000-2005 , sand below ground 30cm-60cm 2020-2022 , sand below ground 60cm-100cm 2000-2005 , sand below ground 60cm-100cm 2020-2022 , silt below ground 0cm-30cm 2000-2005 , silt below ground 0cm-30cm 2020-2022 , silt below ground 30cm-60cm 2000-2005 , silt below ground 30cm-60cm 2020-2022 , silt below ground 60cm-100cm 2000-2005 , silt below ground 60cm-100cm 2020-2022 , Soil type (Suborder): Uderts , Calcids , Xerands , Orthents , Cryands , Ustalfs , Cryalfs , Aquepts , Udalfs , Cryolls , Durids , Usterts , Boralfs , Orthids , Udands , Torrerts , Histels , Rendolls , Aqualfs , Udepts , Xeralfs , Gelepts , Xerults , Fibrists , Ustepts , Xererts , Ustults , Aquands , Perox , Xerolls , Tropepts , Turbels , Udults , Aquents , Aquerts , Ustox , Aquods , Aquolls , Xerepts , Udox , Cryods , Ustolls , Aquults , Psamments , Arents , Fluvents , Humults , Vitrands , Udolls , Borolls , Orthels , Hemists , Wassents , Albolls , Salids , Cryepts , Saprists , Folists , Gypsids , Ochrepts , Cambids , Argids , Orthods , Data Details Time period: 2000-2022 Type of data: soil type probability - suborder: Boralfs How the data was collected or derived: Machine learning models. Statistical Methods used: Random Forest. Limitations or exclusions in the data: The dataset does not include Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 120m Image size: 360,000P x 132,000L File format: Cloud Optimized Geotiff (COG) format. Layer information: File Name Unit Scale Data Type No Data Description oc_iso.10694.1995.mg.cm3 kg/m³ 10 UInt16 32767 Organic carbon density derived by multiply fine earth bulk density and organic carbon content oc_iso.10694.1995.wpml g/kg 10 UInt16 32767 Organic carbon content based on dry combustion weight percent ph.h2o_iso.10390.2021.index - 10 Byte 255 The pH, 1:1 soil-water suspension is the pH of a sample measured in distilled water at a 1:1 soil:solution ratio bd.core_iso.11272.2017.g.cm3 g/cm³ 100 UInt16 32767 Bulk density, <2mm fraction, dry is the weight per unit volume of the <2 mm fraction, with volume measured in laboratory sand.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated sand 0.063 to 2.0 mm particle diameter silt.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated silt 0.002 to 0.063 mm particle size clay.tot_iso.11277.2020.wpct % 1 Byte 255 Total clay is the soil separate with <0.002 mm particle diameter soil.types_ensemble % 1 Byte 255 Probability of soil type occurrence Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for oc_iso.10694.1995.wpml_m_30m_b30cm..60cm_20000101_20051231_g_epsg.4326_v20250204.tif, the fields are: generic variable

Data from: Annual (1986-2020) land-use/land cover maps of the Santa Cruz...

catalog.data.gov
data.usgs.gov

Updated Nov 26, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2025). Annual (1986-2020) land-use/land cover maps of the Santa Cruz Watershed and Tucson metropolitan area, Arizona [Dataset]. https://catalog.data.gov/dataset/annual-1986-2020-land-use-land-cover-maps-of-the-santa-cruz-watershed-and-tucson-metropoli

Explore at:

Dataset updated

Nov 26, 2025

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Area covered

Tucson Metropolitan Area, Arizona

Description

Annual (1986-2020) land-use/land cover maps at 30-meter resolution of the Tucson metropolitan area, Arizona and the greater Santa Cruz Watershed including Nogales, Sonora, Mexico. Maps were created using a combination of Landsat imagery, derived transformation and indices, texture analysis and other ancillary data fed to a Random Forest classifier in Google Earth Engine. The maps contain 13 classes based on the National Land Cover Classification scheme and modified to reflect local land cover types. Data are presented as a stacked, multi-band raster with one "band" for each year (Band 1 = 1986, Band 2 = 1987 and so on). Note that the year 2012 was left out of our time series because of lack of quality Landsat data. A color file (.clr) is included that can be imported to match the color of the National Land Cover Classification scheme. This data release also contains two JavaScript files with the Google Earth Engine code developed for pre-processing Landsat imagery and for image classification, and a zip folder "Accuracy Data" with five excel files: 1) Accuracy Statistics describing overall accuracy for each LULC year, 2) Confusion Matrices for each LULC year, 3) Land Cover Evolution - changes in pixel count for each class per year, 4) LULC Change Matrix - to and from class changes over the period, and 5) Variable Importance - results of the Random Forest Classification.

Mumbai-Slum-Detection-Dataset

kaggle.com

zip

Updated Jul 22, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Rupesh Kumar Yadav (2025). Mumbai-Slum-Detection-Dataset [Dataset]. https://www.kaggle.com/datasets/rupeshkumaryadav/mumbai-slum-detection-dataset/data

Explore at:

zip(304746333 bytes)Available download formats

Dataset updated

Jul 22, 2025

Authors

Rupesh Kumar Yadav

License

Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically

Area covered

Dharavi Slums, Mumbai

Description

Dataset Summary

This dataset is developed for pixel-level classification of urban informal settlements using satellite imagery. The input data consists of Sentinel-2 imagery (2015–2016), and the ground truth is derived from a government-conducted survey available as a KML vector file, rasterized to align with the imagery.Formats include NumPy arrays and HDF5 files for easy ML integration. Intended for land‑use/land‑cover classification tasks.

🛰️ Data Source

Satellite Imagery: Sentinel‑2 L2A (Surface Reflectance) images from 2015–16, accessed via Google Earth Engine (GEE)

Ground Truth: Official government KML vector file, manually rasterized to match imagery resolution and alignment

📦 Data Format

Ground Truth Source: Government survey KML converted to raster via QGIS

Satellite Data: Sentinel‑2 L2A (Surface Reflectance) images from 2015–16

CRS & Extent: EPSG:4326

Bounding Box: Longitude: 72.7827462580 to 72.9718317340

Latitude: 18.9086328640 to 19.2638524900

Spatial Accuracy: ~±2 m (WGS84)

Raster Size: 2105 × 3954 pixels (Float64 GeoTIFF)

Formats: NumPy (.npy) and HDF5 (.h5) for image bands and per-pixel labels

Pixel size: ~10m (based on Sentinel-2 native resolution)

Label Values:

        1 → Informal/Slum

        0 → Formal/Non-slum

Data Type: float64 (image), uint8 (labels)

📜 Coordinate System Details

CRS Name: EPSG:4326 - WGS 84

Datum: World Geodetic System 1984 (EPSG:6326)

Units: Geographic (degrees)

Accuracy: ≤ 2 meters (approximate)

Type: Geographic 2D

Celestial Body: Earth

Reference: Dynamic (not plate-fixed)

Additional Details

1.Processing Pipeline KML to Raster: Ground truth polygons from KML rasterized using GDAL to match Sentinel-2 extent and resolution. Image Preprocessing: Cloud masking and band selection (R, G, B, NIR) through Google Earth Engine. Export Format: .tif downloaded, converted to .npy and .h5 using rasterio, numpy, and h5py. Alignment: Verified pixel-wise correspondence between image and label arrays.

2.Authorship & Provenance Creators: M Rupesh Kumar Yadav, Mtech, Dept of Centre of Studies in Resources Engineering, IIT Bombay. You can contact through mail rupesh32003@gmail.com, 24m0319@iitb.ac.in, or checkout github for further resources/assistance. orcid id, github, LinkedIn

3.Content & Structure Bands per sample: RGB (3 bands) + NIR (1 band) Ground truth: Per-pixel labels aligned with imagery Data splits: (e.g.) train/val/test percentages or file lists File naming conventions: Explain if files correspond to tiles, dates, etc. Example sample: Show dimensions, dtype, label values, and their mapping to classes.

4.Collection & Processing Satellite imagery: Retrieved via Google Earth Engine over 2015–16; filtered by cloud cover threshold Ground truth conversion: KML survey data rasterized using same spatial resolution and CRS Alignment: Resampled and aligned bands using GEE reprojection Preprocessing steps: Cloud masking, atmospheric correction (L2A), normalization, dtype cast to Float64 Label handling: Ensured spatial overlap and clipping; labeled invalid/missing areas as class 0 or mask

5.Usage & Intended Applications Tasks: Semantic segmentation or pixel-level land-cover mapping Ideal for: Land use change detection, agricultural mapping, validation of remote sensing models Not suitable for: Tasks needing multispectral beyond NIR, very high-res (<10 m) labeling, temporal sequence modeling

6.Limitations & Bias Temporal span: Only covers 2015–2016; may not reflect current conditions Spatial scope bias: Limited geographic area (Mumbai region) Labeling bias: Dependent on government survey accuracy and rasterization fidelity Cloud coverage: Some tiles may still contain residual cloud pixels

Data from: European LCZ map

figshare.com

txt

Updated May 30, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Matthias Demuzere; Benjamin Bechtel; Ariane Middel; Gerald Mills (2023). European LCZ map [Dataset]. http://doi.org/10.6084/m9.figshare.13322450.v4

Explore at:

txtAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.13322450.v4

Dataset updated

May 30, 2023

Dataset provided by

figshare
Figsharehttp://figshare.com/

Authors

Matthias Demuzere; Benjamin Bechtel; Ariane Middel; Gerald Mills

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

A European Local Climate Zone map at a 100 m spatial resolution, derived from multiple earth observation datasets and expert LCZ class labels. There are 10 urban LCZ types, each associated with a set of relevant variables such that the map represent a valuable database of urban properties.

Data from: A dataset of spatiotemporally sampled MODIS Leaf Area Index with...

agdatacommons.nal.usda.gov

application/csv

Updated Nov 22, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Yanghui Kang; Mutlu Ozdogan; Feng Gao; Martha C. Anderson; William A. White; Yun Yang; Yang Yang; Tyler A. Erickson (2025). A dataset of spatiotemporally sampled MODIS Leaf Area Index with corresponding Landsat surface reflectance over the contiguous US [Dataset]. http://doi.org/10.15482/USDA.ADC/1521097

Explore at:

application/csvAvailable download formats

Unique identifier

https://doi.org/10.15482/USDA.ADC/1521097

Dataset updated

Nov 22, 2025

Dataset provided by

Ag Data Commons

Authors

Yanghui Kang; Mutlu Ozdogan; Feng Gao; Martha C. Anderson; William A. White; Yun Yang; Yang Yang; Tyler A. Erickson

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

United States

Description

Leaf Area Index (LAI) is a fundamental vegetation structural variable that drives energy and mass exchanges between the plant and the atmosphere. Moderate-resolution (300m – 7km) global LAI data products have been widely applied to track global vegetation changes, drive Earth system models, monitor crop growth and productivity, etc. Yet, cutting-edge applications in climate adaptation, hydrology, and sustainable agriculture require LAI information at higher spatial resolution (< 100m) to model and understand heterogeneous landscapes. This dataset was built to assist a machine-learning-based approach for mapping LAI from 30m-resolution Landsat images across the contiguous US (CONUS). The data was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6 LAI/FPAR, Landsat Collection 1 surface reflectance, and NLCD Land Cover datasets over 2006 – 2018 using Google Earth Engine. Each record/sample/row includes a MODIS LAI value, corresponding Landsat surface reflectance in green, red, NIR, SWIR1 bands, a land cover (biome) type, geographic location, and other auxiliary information. Each sample represents a MODIS LAI pixel (500m) within which a single biome type dominates 90% of the area. The spatial homogeneity of the samples was further controlled by a screening process based on the coefficient of variation of the Landsat surface reflectance. In total, there are approximately 1.6 million samples, stratified by biome, Landsat sensor, and saturation status from the MODIS LAI algorithm. This dataset can be used to train machine learning models and generate LAI maps for Landsat 5, 7, 8 surface reflectance images within CONUS. Detailed information on the sample generation and quality control can be found in the related journal article. Resources in this dataset:Resource Title: README. File Name: LAI_train_samples_CONUS_README.txtResource Description: Description and metadata of the main datasetResource Software Recommended: Notepad,url: https://www.microsoft.com/en-us/p/windows-notepad/9msmlrh6lzf3?activetab=pivot:overviewtab Resource Title: LAI_training_samples_CONUS. File Name: LAI_train_samples_CONUS_v0.1.1.csvResource Description: This CSV file consists of the training samples for estimating Leaf Area Index based on Landsat surface reflectance images (Collection 1 Tire 1). Each sample has a MODIS LAI value and corresponding surface reflectance derived from Landsat pixels within the MODIS pixel. Contact: Yanghui Kang (kangyanghui@gmail.com)
Column description

UID: Unique identifier. Format: LATITUDE_LONGITUDE_SENSOR_PATHROW_DATE
Landsat_ID: Landsat image ID Date: Landsat image date in "YYYYMMDD" Latitude: Latitude (WGS84) of the MODIS LAI pixel center Longitude: Longitude (WGS84) of the MODIS LAI pixel center MODIS_LAI: MODIS LAI value in "m2/m2" MODIS_LAI_std: MODIS LAI standard deviation in "m2/m2" MODIS_LAI_sat: 0 - MODIS Main (RT) method used no saturation; 1 - MODIS Main (RT) method with saturation NLCD_class: Majority class code from the National Land Cover Dataset (NLCD) NLCD_frequency: Percentage of the area cover by the majority class from NLCD Biome: Biome type code mapped from NLCD (see below for more information) Blue: Landsat surface reflectance in the blue band Green: Landsat surface reflectance in the green band Red: Landsat surface reflectance in the red band Nir: Landsat surface reflectance in the near infrared band Swir1: Landsat surface reflectance in the shortwave infrared 1 band Swir2: Landsat surface reflectance in the shortwave infrared 2 band Sun_zenith: Solar zenith angle from the Landsat image metadata. This is a scene-level value. Sun_azimuth: Solar azimuth angle from the Landsat image metadata. This is a scene-level value. NDVI: Normalized Difference Vegetation Index computed from Landsat surface reflectance EVI: Enhanced Vegetation Index computed from Landsat surface reflectance NDWI: Normalized Difference Water Index computed from Landsat surface reflectance GCI: Green Chlorophyll Index = Nir/Green - 1

Biome code

1 - Deciduous Forest
2 - Evergreen Forest
3 - Mixed Forest
4 - Shrubland
5 - Grassland/Pasture
6 - Cropland
7 - Woody Wetland
8 - Herbaceous Wetland

Reference Dataset: All data was accessed through Google Earth Engine Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment. MODIS Version 6 Leaf Area Index/FPAR 4-day L5 Global 500m Myneni, R., Y. Knyazikhin, T. Park. MOD15A2H MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MOD15A2H.006 Landsat 5/7/8 Collection 1 Surface Reflectance Landsat Level-2 Surface Reflectance Science Product courtesy of the U.S. Geological Survey. Masek, J.G., Vermote, E.F., Saleous N.E., Wolfe, R., Hall, F.G., Huemmrich, K.F., Gao, F., Kutler, J., and Lim, T-K. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters 3(1):68-72. http://dx.doi.org/10.1109/LGRS.2005.857030. Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. http://dx.doi.org/10.1016/j.rse.2016.04.008. National Land Cover Dataset (NLCD) Yang, Limin, Jin, Suming, Danielson, Patrick, Homer, Collin G., Gass, L., Bender, S.M., Case, Adam, Costello, C., Dewitz, Jon A., Fry, Joyce A., Funk, M., Granneman, Brian J., Liknes, G.C., Rigge, Matthew B., Xian, George, A new generation of the United States National Land Cover Database—Requirements, research priorities, design, and implementation strategies: ISPRS Journal of Photogrammetry and Remote Sensing, v. 146, p. 108–123, at https://doi.org/10.1016/j.isprsjprs.2018.09.006 Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel

Data from: Distribution, frequency, and global extent of hypoxia in rivers

catalog.data.gov
data.usgs.gov
+1more

Updated Oct 30, 2025

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

U.S. Geological Survey (2025). Distribution, frequency, and global extent of hypoxia in rivers [Dataset]. https://catalog.data.gov/dataset/distribution-frequency-and-global-extent-of-hypoxia-in-rivers

Explore at:

Dataset updated

Oct 30, 2025

Dataset provided by

United States Geological Surveyhttp://www.usgs.gov/

Description

To assess the distribution, frequency, and global extent of riverine hypoxia, we compiled 118 million paired dissolved oxygen (DO) and water temperature measurements from 125,158 unique locations in rivers in 93 countries and territories across the globe. The dataset also includes site characteristics derived from StreamCat, the National Hydrography and HydroAtlas datasets and proximal land cover derived from MODIS-based IGBP land cover types compiled using Google Earth Engine (GEE).

Annual global forest gain maps from 1984 to 2020

figshare.com

tiff

Updated Mar 8, 2022

Facebook

Twitter

Click to copy link

Link copied

Cite

Zhenrong Du; Le Yu; Jianyu Yang; David Coomes; Haohuan Fu; Peng Gong (2022). Annual global forest gain maps from 1984 to 2020 [Dataset]. http://doi.org/10.6084/m9.figshare.18461609.v1

Explore at:

tiffAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.18461609.v1

Dataset updated

Mar 8, 2022

Dataset provided by

Figsharehttp://figshare.com/

Authors

Zhenrong Du; Le Yu; Jianyu Yang; David Coomes; Haohuan Fu; Peng Gong

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Forest cover is rapidly changing at the global scale as a result of land-use change (principally deforestation in many tropical regions and afforestation in many temperate regions) and climate change. However, a detailed map of global forest gain is still lacking at fine spatial and temporal resolutions. In this study, we developed a new automatic framework to map annual forest gain across the globe, based on Landsat time series, the LandTrendr algorithm and the Google Earth Engine (GEE) platform. First, samples of stable forest collected based on the Global Forest Change product (GFC) were used to determine annual Normalized Burn Ratio (NBR) thresholds for forest gain detection. Secondly, with the NBR time-series from 1982 to 2020 and LandTrendr algorithm, we produced dataset of global forest gain year from 1984 to 2020 based on a set of decision rules. Our results reveal that large areas of forest gain occurred in China, Russia, Brazil and North America, and the vast majority of the global forest gain has occurred since 2000. The new dataset was consistent in both spatial extent and years of forest gain with data from field inventories and alternative remote sensing products. Our dataset is valuable for policy-relevant research on the net impact of forest cover change on the global carbon cycle and provides an efficient and transferable approach for monitoring other types of land cover dynamics.

DataSheet1_Optimal parameters of random forest for land cover classification...

OpenLandMap-soildb: soil type probability - suborder: Psamments

MCD12Q1.061 MODIS Land Cover Type Yearly Global 500m

Google Earth Engine code to generate water coverage data, Schaffer-Smith et...

GEE-TED: A tsetse ecological distribution model for Google Earth Engine

USFS TreeMap v2016 (Conterminous United States)

Data from: Monitoring the storage volume of water reservoirs using Google...

MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

Image identifiers used in this study (Google Earth Engine)

PNW mountains NAIP 1m resolution

Content

Acknowledgements

Data from: Evaluation of machine learning methods and multi-source remote...

USDA NASS Cropland Data Layers

Global map of forest types 2020

OpenLandMap-soildb: soil type probability - suborder: Borolls

Data from: Annual (1986-2020) land-use/land cover maps of the Santa Cruz...

Mumbai-Slum-Detection-Dataset

Dataset Summary

🛰️ Data Source

📦 Data Format

📜 Coordinate System Details

Additional Details

Data from: European LCZ map

Data from: A dataset of spatiotemporally sampled MODIS Leaf Area Index with...

Data from: Distribution, frequency, and global extent of hypoxia in rivers

Annual global forest gain maps from 1984 to 2020

DataSheet1_Optimal parameters of random forest for land cover classification with suitable data type and dataset on Google Earth Engine.zip