55 datasets found
  1. f

    DataSheet1_Optimal parameters of random forest for land cover classification...

    • figshare.com
    • frontiersin.figshare.com
    zip
    Updated Oct 23, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Sun; Suwit Ongsomwang (2023). DataSheet1_Optimal parameters of random forest for land cover classification with suitable data type and dataset on Google Earth Engine.zip [Dataset]. http://doi.org/10.3389/feart.2023.1188093.s001
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2023
    Dataset provided by
    Frontiers
    Authors
    Jing Sun; Suwit Ongsomwang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exact land cover (LC) map is essential information for understanding the development of human societies and studying the impacts of climate and environmental change. To fulfill this requirement, an optimal parameter of Random Forest (RF) for LC classification with suitable data type and dataset on Google Earth Engine (GEE) was investigated. The research objectives were 1) to examine optimum parameters of RF for LC classification at local scale 2) to classify LC data and assess accuracy in model area (Hefei City), 3) to identify a suitable data type and dataset for LC classification and 4) to validate optimum parameters of RF for LC classification with a suitable data type and dataset in test area (Nanjing City). This study suggests that the suitable data types for LC classification were Sentinel-2 data with auxiliary data. Meanwhile, the suitable dataset for LC classification was monthly and seasonal medians of Sentinel-2, elevation, and nighttime light data. The appropriate values of the number of trees, the variable per split, and the bag fraction for RF were 800, 22, and 0.9, respectively. The overall accuracy (OA) and Kappa index of LC in model area (Hefei City) with suitable dataset was 93.17% and 0.9102. In the meantime, the OA and Kappa index of LC in test area (Nanjing City) was 92.38% and 0.8914. Thus, the developed research methodology can be applied to update LC map where LC changes quickly occur.

  2. s

    OpenLandMap-soildb: soil type probability - suborder: Psamments

    • repository.soilwise-he.eu
    • zenodo.org
    • +1more
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). OpenLandMap-soildb: soil type probability - suborder: Psamments [Dataset]. http://doi.org/10.5281/zenodo.15481497
    Explore at:
    Dataset updated
    Jul 9, 2025
    Description

    Sub-dataset: soil type probability - suborder: Psamments Description Global annual maps of soil properties for 2000—2022 produced within the scope of the Land & Carbon Lab, integrating Digital surface/terrain model, vegetation/tillage indices, climatic/bioclimatic variables, and based on tree-based spatiotemporal Machine Learning. While the primary focus is on improving monitoring in global soil properties, the dataset provides wall-to-wall coverage across all terrestrial ecosystems and is organized into 300+ global mosaics in COG (Cloud Optimized GeoTIFF) format. Data are presented at 5-year intervals, across 3 standard depth intervals (0–30 cm, 30–60 cm, 60–100 cm), and cover 79 USDA soil taxonomy suborders. Original layers use the WGS84 Coordinate System (EPSG:4326) at a pixel resolution of 0.00025 degrees, and 0.00075 degrees with uncertainty (STAC and GEE). Layers archived on Zenodo are at 0.00075 degrees with uncertainty but include only the initial and final periods (2000–2005 & 2020–2022), including: Soil Organic Carbon Content (g/kg) As a key indicator of soil fertility, structure, and microbial activity, it represents the concentration of organic carbon in the fine earth fraction of the soil. Standard method of measurement is dry combustion using elemental analyzers (e.g., ISO 10694). Soil Organic Carbon Density (kg/m³) Represents the mass of organic carbon per unit volume of soil. It is derived as: SOC content × bulk density × (1 − coarse fragment volume fraction). This value is critical for estimating total carbon stocks and monitoring soil carbon changes over time. Soil pH Indicates the acidity or alkalinity of soil, affecting nutrient availability and microbial processes. Reported as pH measured in water solution (pH in H₂O). Bulk Density (g/cm³) Refers to the mass of dry fine earth (<2 mm) per unit volume, excluding coarse fragments. It reflects soil compaction and porosity, influencing water retention and root penetration. Commonly determined using the core method or calculated from pedotransfer functions. Soil Texture Fraction Defines the relative proportions of mineral particles by size. Texture influences water movement, nutrient holding capacity, and plant growth. Clay content (%): Proportion of particles <0.002 mm in diameter. Sand content (%): Proportion of particles between 0.05–2.0 mm (some definitions use 0.063 mm as lower threshold). Silt content (%): Particles sized between 0.002–0.05 mm or up to 0.063 mm depending on classification system. Textural fractions follow USDA or FAO particle size classifications. Soil Type Probability Probabilistic classification of soils based on USDA Soil Taxonomy at the subgroup level. Each pixel is assigned a probability distribution across potential soil types, based on legacy point data and environmental covariates. 30m layers can be accessed through STAC and Google Earth Engine GEE) through: OpenLandMap STAC https://stac.openlandmap.org Google Earth Engine https://code.earthengine.google.com/?asset=projects/global-pasture-watch/assets/gsm-30m All modeling framework is publicly available at OpenLandMap GitHub - soildb Data Detail Time period: 2000-2022, in 5-year intervals (last period covers 2020–2022) for soil properties ; 2000-2022 static for soil type Type of data: Spatiotemporal soil data base, with depth ranges and weighted percentage data for soil assessments and static soil type classification. How the data was collected or derived: The data was derived using machine learning models. Statistical methods used: Tree-based spatiotemporal machine learning Depth reference: b30cm..60cm = below ground at 30-60cm interval Limitations or exclusions in the data: no Antarctica; masking out permanent ice and deserts Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 0.00075 degree (~120m) Image size: 360,000P, 132,000L File format: Cloud Optimized Geotiff (COG) format Dataset Contents This dataset includes: soil type probability - suborder: Psamments Related Identifiers SOC density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , SOC content: below ground 0cm-30cm 2000-2005 - part 1 , below ground 0cm-30cm 2000-2005 - part 2 , below ground 0cm-30cm 2020-2022 - part 1 , below ground 0cm-30cm 2020-2022 - part 2 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Bulk density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil ph of water: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil textures fraction: clay below ground 0cm-30cm 2000-2005 , clay below ground 0cm-30cm 2020-2022 , clay below ground 30cm-60cm 2000-2005 , clay below ground 30cm-60cm 2020-2022 , clay below ground 60cm-100cm 2000-2005 , clay below ground 60cm-100cm 2020-2022 , sand below ground 0cm-30cm 2000-2005 , sand below ground 0cm-30cm 2020-2022 , sand below ground 30cm-60cm 2000-2005 , sand below ground 30cm-60cm 2020-2022 , sand below ground 60cm-100cm 2000-2005 , sand below ground 60cm-100cm 2020-2022 , silt below ground 0cm-30cm 2000-2005 , silt below ground 0cm-30cm 2020-2022 , silt below ground 30cm-60cm 2000-2005 , silt below ground 30cm-60cm 2020-2022 , silt below ground 60cm-100cm 2000-2005 , silt below ground 60cm-100cm 2020-2022 , Soil type (Suborder): Uderts , Calcids , Xerands , Orthents , Cryands , Ustalfs , Cryalfs , Aquepts , Udalfs , Cryolls , Durids , Usterts , Boralfs , Orthids , Udands , Torrerts , Histels , Rendolls , Aqualfs , Udepts , Xeralfs , Gelepts , Xerults , Fibrists , Ustepts , Xererts , Ustults , Aquands , Perox , Xerolls , Tropepts , Turbels , Udults , Aquents , Aquerts , Ustox , Aquods , Aquolls , Xerepts , Udox , Cryods , Ustolls , Aquults , Psamments , Arents , Fluvents , Humults , Vitrands , Udolls , Borolls , Orthels , Hemists , Wassents , Albolls , Salids , Cryepts , Saprists , Folists , Gypsids , Ochrepts , Cambids , Argids , Orthods , Data Details Time period: 2000-2022 Type of data: soil type probability - suborder: Psamments How the data was collected or derived: Machine learning models. Statistical Methods used: Random Forest. Limitations or exclusions in the data: The dataset does not include Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 120m Image size: 360,000P x 132,000L File format: Cloud Optimized Geotiff (COG) format. Layer information: File Name Unit Scale Data Type No Data Description oc_iso.10694.1995.mg.cm3 kg/m³ 10 UInt16 32767 Organic carbon density derived by multiply fine earth bulk density and organic carbon content oc_iso.10694.1995.wpml g/kg 10 UInt16 32767 Organic carbon content based on dry combustion weight percent ph.h2o_iso.10390.2021.index - 10 Byte 255 The pH, 1:1 soil-water suspension is the pH of a sample measured in distilled water at a 1:1 soil:solution ratio bd.core_iso.11272.2017.g.cm3 g/cm³ 100 UInt16 32767 Bulk density, <2mm fraction, dry is the weight per unit volume of the <2 mm fraction, with volume measured in laboratory sand.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated sand 0.063 to 2.0 mm particle diameter silt.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated silt 0.002 to 0.063 mm particle size clay.tot_iso.11277.2020.wpct % 1 Byte 255 Total clay is the soil separate with <0.002 mm particle diameter soil.types_ensemble % 1 Byte 255 Probability of soil type occurrence Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for oc_iso.10694.1995.wpml_m_30m_b30cm..60cm_20000101_20051231_g_epsg.4326_v20250204.tif, the fields are: generic

  3. G

    MCD12Q1.061 MODIS Land Cover Type Yearly Global 500m

    • developers.google.com
    Updated Jan 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    NASA LP DAAC at the USGS EROS Center (2024). MCD12Q1.061 MODIS Land Cover Type Yearly Global 500m [Dataset]. http://doi.org/10.5067/MODIS/MCD12Q1.061
    Explore at:
    Dataset updated
    Jan 1, 2024
    Dataset provided by
    NASA LP DAAC at the USGS EROS Center
    Time period covered
    Jan 1, 2001 - Jan 1, 2024
    Area covered
    Earth
    Description

    The Terra and Aqua combined Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type (MCD12Q1) Version 6.1 data product provides global land cover types at yearly intervals. The MCD12Q1 Version 6.1 data product is derived using supervised classifications of MODIS Terra and Aqua reflectance data. Land cover types are derived from the International Geosphere-Biosphere Programme (IGBP), University of Maryland (UMD), Leaf Area Index (LAI), BIOME-Biogeochemical Cycles (BGC), and Plant Functional Types (PFT) classification schemes. The supervised classifications then underwent additional post-processing that incorporate prior knowledge and ancillary information to further refine specific classes. Additional land cover property assessment layers are provided by the Food and Agriculture Organization (FAO) Land Cover Classification System (LCCS) for land cover, land use, and surface hydrology. Layers for Land Cover Type 1-5, Land Cover Property 1-3, Land Cover Property Assessment 1-3, Land Cover Quality Control (QC), and a Land Water Mask are also provided. Documentation: User's Guide Algorithm Theoretical Basis Document (ATBD) General Documentation

  4. H

    Google Earth Engine code to generate water coverage data, Schaffer-Smith et...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Sep 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Margaret Swift (2022). Google Earth Engine code to generate water coverage data, Schaffer-Smith et al 2022 [Dataset]. https://www.hydroshare.org/resource/01c98336686a44d8892d57e7e2637ccb
    Explore at:
    zip(64.6 KB)Available download formats
    Dataset updated
    Sep 18, 2022
    Dataset provided by
    HydroShare
    Authors
    Margaret Swift
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2017 - Jul 31, 2020
    Description

    Surface water in arid regions is essential to many organisms including large mammals of conservation concern. For many regions little is known about the extent, ecology and hydrology of ephemeral waters, because they are challenging to map given their ephemeral nature and small sizes. Our goal was to advance surface water knowledge by mapping and monitoring ephemeral water from the wet to dry seasons across the Kavango-Zambezi (KAZA) transfrontier conservation area of southern Africa (300,000 km2). We mapped individual waterholes for six time points each year from mid-2017 to mid-2020, and described their presence, extent, duration, variability, and recurrence. We further analyzed a wide range of physical and landscape aspects of waterhole locations, including soils, geology, and topography, to climate and soil moisture. We identified 2.1 million previously unmapped ephemeral waterholes (85-89% accuracy) that seasonally extend across 23.5% of the study area. We confirmed a distinct ‘blue wave’ with ephemeral water across the region peaking at the end of the rainy season. We observed a wide range of waterhole types and sizes, with large variances in seasonal and interannual hydrology. We found that ephemeral surface water spatiotemporal patterns were was associated with soil type; loam soils were most likely to hold water for longer periods in the study area. From the wettest time period to the driest, there was a ~44,000 km2 (62%) decrease in ephemeral water extent across the region—these dramatic seasonal fluctuations have implications for wildlife movement. A warmer and drier climate, expected human population growth, and associated agricultural expansion and development may threaten these sensitive and highly variable water resources and the wildlife that depend on them.

    This contains Google Earth Engine code to generate water coverage data for Schaffer-Smith et al 2022.

  5. H

    GEE-TED: A tsetse ecological distribution model for Google Earth Engine

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Jul 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Brad Peter; Joseph Messina (2024). GEE-TED: A tsetse ecological distribution model for Google Earth Engine [Dataset]. http://doi.org/10.7910/DVN/6JR87X
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2024
    Dataset provided by
    Harvard Dataverse
    Authors
    Brad Peter; Joseph Messina
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    GEE-TED: A tsetse ecological distribution model for Google Earth Engine Please refer to the associated publication: Fox, L., Peter, B.G., Frake, A.N. and Messina, J.P., 2023. A Bayesian maximum entropy model for predicting tsetse ecological distributions. International Journal of Health Geographics, 22(1), p.31. https://link.springer.com/article/10.1186/s12942-023-00349-0 Description GEE-TED is a Google Earth Engine (GEE; Gorelick et al. 2017) adaptation of a tsetse ecological distribution (TED) model developed by DeVisser et al. (2010), which was designed for use in ESRI's ArcGIS. TED uses time-series climate and land-use/land-cover (LULC) data to predict the probability of tsetse presence across space based on species habitat preferences (in this case Glossina Morsitans). Model parameterization includes (1) day and night temperatures (MODIS Land Surface Temperature; MOD11A2), (2) available moisture/humidity using a vegetation index as a proxry (MODIS NDVI; MOD13Q1), (3) LULC (MODIS Land Cover Type 1; MCD12Q1), (4) year selections, and (5) fly movement rate (meters/16-days). TED has also been used as a basis for the development of an agent-based model by Lin et al. (2015) and in a cost-benefit analysis of tsetse control in Tanzania by Yang et al. (2017). Parameterization in Fox et al. (2023): Suitable LULC types and climate thresholds used here are specific to Glossina Morsitans in Kenya and are based on the parameterization selections in DeVisser et al. (2010) and DeVisser and Messina (2009). Suitable temperatures range from 17–40°C during the day and 10–40°C at night and available moisture is characterized as NDVI > 0.39. Suitable LULC comprises predominantly woody vegetation; a complete list of suitable categories is available in DeVisser and Messina (2009). In the Fox et al. (Forthcoming) publication, two versions of MCD12Q1 were used to assess suitable LULC types: Versions 051 and 006. The GeoTIFF supplied in this dataset entry (GEE-TED_Kenya_2016-2017.tif) uses the aforementioned parameters to show the probable tsetse distribution across Kenya for the years 2016-2017. A static graphic of this GEE-TED output is shown below and an interactive version can be viewed at: https://cartoscience.users.earthengine.app/view/gee-ted. Figure associated with Fox et al. (2023) GEE code The code supplied below is generalizable across geographies and species; however, it is highly recommended that parameterization is given considerable attention to produce reliable results. Note that output visualization on-the-fly will take some time and it is recommended that results be exported as an asset within GEE or exported as a GeoTIFF. Note: Since completing the Fox et al. (2023) manuscript, GEE has removed Version 051 per NASA's deprecation of the product. The current release of GEE-TED now uses only MCD12Q1 Version 006; however, alternative LULC data selections can be used with minimal modification to the code. // Input options var tempMin = 10 // Temperature thresholds in degrees Celsius var tempMax = 40 var ndviMin = 0.39 // NDVI thresholds; proxy for available moisture/humidity var ndviMax = 1 var movement = 500 // Fly movement rate in meters/16-days var startYear = 2008 // The first 2 years will be used for model initialization var endYear = 2019 // Computed probability is based on startYear+2 to endYear var country = 'KE' // Country codes - https://en.wikipedia.org/wiki/List_of_FIPS_country_codes var crs = 'EPSG:32737' // See https://epsg.io/ for appropriate country UTM zone var rescale = 250 // Output spatial resolution var labelSuffix = '02052020' // For file export labeling only //[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17] MODIS/006/MCD12Q1 var lulcOptions006 = [1,1,1,1,1,1,1,1,1, 0, 1, 0, 0, 0, 0, 0, 0] // 1 = suitable 0 = unsuitable // No more input required ------------------------------ // var region = ee.FeatureCollection("USDOS/LSIB_SIMPLE/2017") .filterMetadata('country_co', 'equals', country) // Input parameter modifications var tempMinMod = (tempMin+273.15)/0.02 var tempMaxMod = (tempMax+273.15)/0.02 var ndviMinMod = ndviMin*10000 var ndviMaxMod = ndviMax*10000 var ndviResolution = 250 var movementRate = movement+(ndviResolution/2) // Loading image collections var lst = ee.ImageCollection('MODIS/006/MOD11A2').select('LST_Day_1km', 'LST_Night_1km') .filter(ee.Filter.calendarRange(startYear,endYear,'year')) var ndvi = ee.ImageCollection('MODIS/006/MOD13Q1').select('NDVI') .filter(ee.Filter.calendarRange(startYear,endYear,'year')) var lulc006 = ee.ImageCollection('MODIS/006/MCD12Q1').select('LC_Type1') // Lulc mode and boolean reclassification var lulcMask = lulc006.mode().remap([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17],lulcOptions006) .eq(1).rename('remapped').clip(region) // Merge NDVI and LST image collections var combined = ndvi.combine(lst, true) var combinedList = combined.toList(10000) // Boolean reclassifications (suitable/unsuitable) for day/night temperatures and ndvi var con =...

  6. USFS TreeMap v2016 (Conterminous United States)

    • developers.google.com
    Updated Jan 1, 2016
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA Forest Service (USFS) Geospatial Technology and Applications Center (GTAC) (2016). USFS TreeMap v2016 (Conterminous United States) [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/USFS_GTAC_TreeMap_v2016
    Explore at:
    Dataset updated
    Jan 1, 2016
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Time period covered
    Jan 1, 2016 - Jan 1, 2017
    Area covered
    Description

    This product is part of the TreeMap data suite. It provides detailed spatial information on forest characteristics including number of live and dead trees, biomass, and carbon across the entire forested extent of the continental United States in 2016. TreeMap v2016 contains one image, a 22-band 30 x 30m resolution …

  7. H

    Data from: Monitoring the storage volume of water reservoirs using Google...

    • hydroshare.org
    • beta.hydroshare.org
    • +1more
    zip
    Updated Sep 24, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joaquim Condeça; João Nascimento; Nuno Barreiras (2021). Monitoring the storage volume of water reservoirs using Google Earth Engine [Dataset]. http://doi.org/10.4211/hs.44849607416745c98fc0946672128100
    Explore at:
    zip(435.1 KB)Available download formats
    Dataset updated
    Sep 24, 2021
    Dataset provided by
    HydroShare
    Authors
    Joaquim Condeça; João Nascimento; Nuno Barreiras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 1, 1984 - Dec 31, 2019
    Area covered
    Description

    Recently, the satellite images have been used in remote sensing allowing observations with high temporal and spatial distribution. The use of water indices has proved to be an effective methodology in the monitoring of surface water resources. However, precise or automatic methodologies using satellite imagery to determine reservoir volumes are lacking. To fulfil that gap, this methodology proposes 3 stages: use Google Earth Engine (GEE) to select images; automatically calculate flooded surface areas applying water indices; determine the volume stored in reservoirs over those years based on the relation between the flooded area and the stored volume. The method was applied in four reservoirs and contemplate Landsat 4 and 5 ETM and Landsat 8 OLI. For the calculation of the flooded area the NDWI Indexes (McFeeters, 1996; Gao, 1996), and the MNDWI index (Xu, 2006) were applied and tested. The estimation of stored volume of water was made based on the area indices and a cross-check between real stored volume and calculated volume was made. Finally, an analysis on the selection of the best fit water indices was made. The results of every case studies herein displayed showed a quantifiable proficiency and reliability for quite a varied natural conditions. As a conclusion, this methodology could be seen as a tool for water resources management in developing countries, and not only, to measure automatically trends of stored volumes and its relation with the precipitation, and could eventually be extended to other types of surface water bodies, as lakes and coastal lagoons.

  8. s

    MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

    • repository.soilwise-he.eu
    • dataverse.harvard.edu
    • +1more
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory [Dataset]. http://doi.org/10.7910/DVN/M4ZGXP
    Explore at:
    Dataset updated
    Apr 18, 2025
    Description

    MSZSI: Multi-Scale Zonal Statistics [AgriClimate] Inventory

    --------------------------------------------------------------------------------------
    MSZSI is a data extraction tool for Google Earth Engine that aggregates time-series remote sensing information to multiple administrative levels using the FAO GAUL data layers. The code at the bottom of this page (metadata) can be pasted into the Google Earth Engine JavaScript code editor and ran at https://code.earthengine.google.com/.

    Please refer to the associated publication:
    Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624.
    https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624

    Input options:
    [1] Country of interest
    [2] Start and end year
    [3] Start and end month
    [4] Option to mask data to a specific land-use/land-cover type
    [5] Land-use/land-cover type code from CGLS LULC
    [6] Image collection for data aggregation
    [7] Desired band from the image collection
    [8] Statistics type for the zonal aggregations
    [9] Statistic to use for annual aggregation
    [10] Scaling options
    [11] Export folder and label suffix

    Output: Two CSVs containing zonal statistics for each of the FAO GAUL administrative level boundaries
    Output fields: system:index, 0-ADM0_CODE, 0-ADM0_NAME, 0-ADM1_CODE, 0-ADM1_NAME, 0-ADMN_CODE, 0-ADMN_NAME, 1-AREA_PERCENT_LULC, 1-AREA_SQM_LULC, 1-AREA_SQM_ZONE, 2-X_2001, 2-X_2002, 2-X_2003, ..., 2-X_2020, .geo



    PREPROCESSED DATA DOWNLOAD

    The datasets available for download contain zonal statistics at 2 administrative levels (FAO GAUL levels 1 and 2). Select countries from Southeast Asia and Sub-Saharan Africa (Cambodia, Indonesia, Lao PDR, Myanmar, Philippines, Thailand, Vietnam, Burundi, Kenya, Malawi, Mozambique, Rwanda, Tanzania, Uganda, Zambia, Zimbabwe) are included in the current version, with plans to extend the dataset to contain global metrics. Each zip file is described below and two example NDVI tables are available for preview.

    Key: [source, data, units, temporal range, aggregation, masking, zonal statistic, notes]

    Currently available:
    MSZSI-V2_V-NDVI-MEAN.tar: [NASA-MODIS, NDVI, index, 2001–2020, annual mean, agriculture, mean, n/a]
    MSZSI-V2_T-LST-DAY-MEAN.tar: [NASA-MODIS, LST Day, °C, 2001–2020, annual mean, agriculture, mean, n/a]
    MSZSI-V2_T-LST-NIGHT-MEAN.tar: [NASA-MODIS, LST Night, °C, 2001–2020, annual mean, agriculture, mean, n/a]
    MSZSI-V2_R-PRECIP-SUM.tar: [UCSB-CHG-CHIRPS, Precipitation, mm, 2001–2020, annual sum, agriculture, mean, n/a]
    MSZSI-V2_S-BDENS-MEAN.tar: [OpenLandMap, Bulk density, g/cm3, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_S-ORGC-MEAN.tar: [OpenLandMap, Organic carbon, g/kg, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_S-PH-MEAN.tar: [OpenLandMap, pH in H2O, pH, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_S-WATER-MEAN.tar: [OpenLandMap, Soil water, % at 33kPa, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_S-SAND-MEAN.tar: [OpenLandMap, Sand, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_S-SILT-MEAN.tar: [OpenLandMap, Silt, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_S-CLAY-MEAN.tar: [OpenLandMap, Clay, %, static, n/a, agriculture, mean, at depths 0-10-30-60-100-200]
    MSZSI-V2_E-ELEV-MEAN.tar: [MERIT, [elevation, slope, flowacc, HAND], [m, degrees, km2, m], static, n/a, agriculture, mean, n/a]

    Coming soon
    MSZSI-V2_C-STAX-MEAN.tar: [OpenLandMap, Soil taxonomy, category, static, n/a, agriculture, area sum, n/a]
    MSZSI-V2_C-LULC-MEAN.tar: [CGLS-LC100-V3, LULC, category, 2015–2019, mode, none, area sum, n/a]




    Data sources:

  9. https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD13Q1
  10. https://developers.google.com/earth-engine/datasets/catalog/MODIS_006_MOD11A2
  11. https://developers.google.com/earth-engine/datasets/catalog/UCSB-CHG_CHIRPS_PENTAD
  12. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_BULKDENS-FINEEARTH_USDA-4A1H_M_v02
  13. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_ORGANIC-CARBON_USDA-6A1C_M_v02
  14. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_PH-H2O_USDA-4C1A2A_M_v02
  15. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_WATERCONTENT-33KPA_USDA-4B1C_M_v01
  16. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_CLAY-WFRACTION_USDA-3A1A1A_M_v02
  17. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_SAND-WFRACTION_USDA-3A1A1A_M_v02
  18. https://developers.google.com/earth-engine/datasets/catalog/OpenLandMap_SOL_SOL_GRTGROUP_USDA-SOILTAX_C_v01
  19. https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_Landcover_100m_Proba-V-C3_Global
  20. https://developers.google.com/earth-engine/datasets/catalog/MERIT_Hydro_v1_0_1
  21. https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level0
  22. https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level1
  23. https://developers.google.com/earth-engine/datasets/catalog/FAO_GAUL_2015_level2

  24. Project information:
    SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes
    http://seagul.info/; https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental
    This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740)

    For an additional interactive visualization, visit: https://cartoscience.users.earthengine.app/view/maup-mapper-multi-scale-modis-ndvi




    Google Earth Engine code
     /*/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// MSZSI: Multi-Scale Zonal Statistics Inventory Authors: Brad G. Peter, Department of Geography, University of Alabama Joseph Messina, Department of Geography, University of Alabama Austin Raney, Department of Geography, University of Alabama Rodrigo E. Principe, AgriCircle AG Peilei Fan, Department of Geography, Environment, and Spatial Sciences, Michigan State University Citation: Peter, Brad; Messina, Joseph; Raney, Austin; Principe, Rodrigo; Fan, Peilei, 2021, 'MSZSI: Multi-Scale Zonal Statistics Inventory', https://doi.org/10.7910/DVN/YCUBXS, Harvard Dataverse, V# SEAGUL: Southeast Asia Globalization, Urbanization, Land and Environment Changes http://seagul.info/ https://lcluc.umd.edu/projects/divergent-local-responses-globalization-urbanization-land-transition-and-environmental This project was made possible by the the NASA Land-Cover/Land-Use Change Program (Grant #: 80NSSC20K0740) 

  • Image identifiers used in this study (Google Earth Engine)

    • figshare.com
    csv
    Updated Aug 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MPANDA MUKENZA Médard (2025). Image identifiers used in this study (Google Earth Engine) [Dataset]. http://doi.org/10.6084/m9.figshare.29955524.v1
    Explore at:
    csvAvailable download formats
    Dataset updated
    Aug 21, 2025
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    MPANDA MUKENZA Médard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This file contains the identifiers of the 1,250 Sentinel-2 images used in the analysis presented in [Mapping forest types of semi-arid regions of DR Congo, the need to account for leaf phenology]

  • PNW mountains NAIP 1m resolution

    • kaggle.com
    zip
    Updated Jan 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Noahbadoa (2022). PNW mountains NAIP 1m resolution [Dataset]. https://www.kaggle.com/datasets/noahbadoa/pnw-mountains-naip-1m-resolution
    Explore at:
    zip(79307378676 bytes)Available download formats
    Dataset updated
    Jan 31, 2022
    Authors
    Noahbadoa
    Area covered
    Pacific Northwest
    Description

    Content

    This dataset contains 30k images that are 1024 by 1024 rgb. They are 1m satellite images from two patches of Pacific Northwest mountain ranges

    Acknowledgements

    Data collected by USDA and I got it distributed through google earth engine.

  • Data from: Evaluation of machine learning methods and multi-source remote...

    • tandf.figshare.com
    doc
    Updated Feb 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xingguang Yan; Jing Li; Andrew R. Smith; Di Yang; Tianyue Ma; YiTing Su; Jiahao Shao (2024). Evaluation of machine learning methods and multi-source remote sensing data combinations to construct forest above-ground biomass models [Dataset]. http://doi.org/10.6084/m9.figshare.24481669.v1
    Explore at:
    docAvailable download formats
    Dataset updated
    Feb 20, 2024
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Xingguang Yan; Jing Li; Andrew R. Smith; Di Yang; Tianyue Ma; YiTing Su; Jiahao Shao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rapid and accurate estimation of forest biomass are essential to drive sustainable management of forests. Field-based measurements of forest above-ground biomass (AGB) can be costly and difficult to conduct. Multi-source remote sensing data offers the potential to improve the accuracy of modelled AGB predictions. Here, four machine learning methods: Random Forest (RF), Gradient Boosting Decision Tree (GBDT), Classification and Regression Trees (CART), and Minimum Distance (MD) were used to construct forest AGB models of Taiyue Mountain forest, Shanxi Province, China using single and multi-sourced remote sensing data and the Google Earth Engine platform. Results showed that the machine learning method that most accurately predicted AGB were GBDT and spectral index for coniferous (R2 = 0.99; RMSE = 65.52 Mg/ha), broadleaved (R2 = 0.97; RMSE = 29.14 Mg/ha), and mixed-species (R2 = 0.97; RMSE = 81.12 Mg/ha) forest types. Models constructed using bivariate variable combinations that included the spectral index improved the AGB estimation accuracy of mixed-species (R2 = 0.99; RMSE = 59.52 Mg/ha) forest types and reduced slightly the accuracy of coniferous (R2 = 0.99; RMSE = 101.46 Mg/ha) and broadleaved (R2 = 0.97; RMSE = 37.59 Mg/ha) forest AGB estimation. Overall, parameterizing machine learning algorithms with multi-source remote sensing variables can improve the prediction accuracy of mixed-species forests.

  • USDA NASS Cropland Data Layers

    • developers.google.com
    Updated Jan 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    USDA National Agricultural Statistics Service (2024). USDA NASS Cropland Data Layers [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/USDA_NASS_CDL
    Explore at:
    Dataset updated
    Jan 1, 2024
    Dataset provided by
    National Agricultural Statistics Servicehttp://www.nass.usda.gov/
    United States Department of Agriculturehttp://usda.gov/
    Time period covered
    Jan 1, 1997 - Jan 1, 2024
    Area covered
    Description

    The Cropland Data Layer (CDL) is a crop-specific land cover data layer created annually for the continental United States using moderate resolution satellite imagery and extensive agricultural ground truth. The CDL is created by the USDA, National Agricultural Statistics Service (NASS), Research and Development Division, Geospatial Information Branch, Spatial Analysis Research Section. For detailed FAQ please visit CropScape and Cropland Data Layers - FAQs. To explore details about the classification accuracies and utility of the data, see state-level omission and commission errors by crop type and year. The asset date is aligned with the calendar year of harvest. For most crops the planted and harvest year are the same. Some exceptions: winter wheat is unique, as it is planted in the prior year. A hay crop like alfalfa could have been planted years prior. For winter wheat the data also have a class called "Double Crop Winter Wheat/Soybeans". Some mid-latitude areas of the US have conditions such that a second crop (usually soybeans) can be planted immediately after the harvest of winter wheat and itself still be harvested within the same year. So for mapping winter wheat areas use both classes (use both values 24 and 26). While the CDL date is aligned with year of harvest, the map itself is more representative of what was planted. In other words, a small percentage of fields on a given year will not be harvested. Some non-agricultural categories are duplicate due to two very different epochs in methodology. The non-ag codes 63-65 and 81-88 are holdovers from the older methodology and will only appear in CDLs from 2007 and earlier. The non-ag codes from 111-195 are from the current methodology which uses the USGS NLCD as non-ag training and will only appear in CDLs 2007 and newer. 2007 was a transition year so there may be both sets of categories in the 2007 national product but will not appear within the same state. Note: The 2024 CDL only has the data band. The cultivated and confidence bands are yet to be released by the provider.

  • G

    Global map of forest types 2020

    • developers.google.com
    Updated Dec 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joint Research Centre, European Commission (2020). Global map of forest types 2020 [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/JRC_GFC2020_subtypes_V0
    Explore at:
    Dataset updated
    Dec 31, 2020
    Dataset provided by
    Joint Research Centre, European Commission
    Time period covered
    Jan 1, 2020 - Dec 31, 2020
    Area covered
    Earth
    Description

    The global map of forest types provides a spatially explicit representation of primary forest, naturally regenerating forest and planted forest (including plantation forest) for the year 2020 at 10m spatial resolution. The base layer for mapping these forest types is the extent of forest cover of version 1 of the …

  • s

    OpenLandMap-soildb: soil type probability - suborder: Borolls

    • repository.soilwise-he.eu
    • data.europa.eu
    Updated Jul 9, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). OpenLandMap-soildb: soil type probability - suborder: Borolls [Dataset]. http://doi.org/10.5281/zenodo.15481509
    Explore at:
    Dataset updated
    Jul 9, 2025
    Description

    Sub-dataset: soil type probability - suborder: Boralfs Description Global annual maps of soil properties for 2000—2022 produced within the scope of the Land & Carbon Lab, integrating Digital surface/terrain model, vegetation/tillage indices, climatic/bioclimatic variables, and based on tree-based spatiotemporal Machine Learning. While the primary focus is on improving monitoring in global soil properties, the dataset provides wall-to-wall coverage across all terrestrial ecosystems and is organized into 300+ global mosaics in COG (Cloud Optimized GeoTIFF) format. Data are presented at 5-year intervals, across 3 standard depth intervals (0–30 cm, 30–60 cm, 60–100 cm), and cover 79 USDA soil taxonomy suborders. Original layers use the WGS84 Coordinate System (EPSG:4326) at a pixel resolution of 0.00025 degrees, and 0.00075 degrees with uncertainty (STAC and GEE). Layers archived on Zenodo are at 0.00075 degrees with uncertainty but include only the initial and final periods (2000–2005 & 2020–2022), including: Soil Organic Carbon Content (g/kg) As a key indicator of soil fertility, structure, and microbial activity, it represents the concentration of organic carbon in the fine earth fraction of the soil. Standard method of measurement is dry combustion using elemental analyzers (e.g., ISO 10694). Soil Organic Carbon Density (kg/m³) Represents the mass of organic carbon per unit volume of soil. It is derived as: SOC content × bulk density × (1 − coarse fragment volume fraction). This value is critical for estimating total carbon stocks and monitoring soil carbon changes over time. Soil pH Indicates the acidity or alkalinity of soil, affecting nutrient availability and microbial processes. Reported as pH measured in water solution (pH in H₂O). Bulk Density (g/cm³) Refers to the mass of dry fine earth (<2 mm) per unit volume, excluding coarse fragments. It reflects soil compaction and porosity, influencing water retention and root penetration. Commonly determined using the core method or calculated from pedotransfer functions. Soil Texture Fraction Defines the relative proportions of mineral particles by size. Texture influences water movement, nutrient holding capacity, and plant growth. Clay content (%): Proportion of particles <0.002 mm in diameter. Sand content (%): Proportion of particles between 0.05–2.0 mm (some definitions use 0.063 mm as lower threshold). Silt content (%): Particles sized between 0.002–0.05 mm or up to 0.063 mm depending on classification system. Textural fractions follow USDA or FAO particle size classifications. Soil Type Probability Probabilistic classification of soils based on USDA Soil Taxonomy at the subgroup level. Each pixel is assigned a probability distribution across potential soil types, based on legacy point data and environmental covariates. 30m layers can be accessed through STAC and Google Earth Engine GEE) through: OpenLandMap STAC https://stac.openlandmap.org Google Earth Engine https://code.earthengine.google.com/?asset=projects/global-pasture-watch/assets/gsm-30m All modeling framework is publicly available at OpenLandMap GitHub - soildb Data Detail Time period: 2000-2022, in 5-year intervals (last period covers 2020–2022) for soil properties ; 2000-2022 static for soil type Type of data: Spatiotemporal soil data base, with depth ranges and weighted percentage data for soil assessments and static soil type classification. How the data was collected or derived: The data was derived using machine learning models. Statistical methods used: Tree-based spatiotemporal machine learning Depth reference: b30cm..60cm = below ground at 30-60cm interval Limitations or exclusions in the data: no Antarctica; masking out permanent ice and deserts Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 0.00075 degree (~120m) Image size: 360,000P, 132,000L File format: Cloud Optimized Geotiff (COG) format Dataset Contents This dataset includes: soil type probability - suborder: Boralfs Related Identifiers SOC density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , SOC content: below ground 0cm-30cm 2000-2005 - part 1 , below ground 0cm-30cm 2000-2005 - part 2 , below ground 0cm-30cm 2020-2022 - part 1 , below ground 0cm-30cm 2020-2022 - part 2 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Bulk density: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil ph of water: below ground 0cm-30cm 2000-2005 , below ground 0cm-30cm 2020-2022 , below ground 30cm-60cm 2000-2005 , below ground 30cm-60cm 2020-2022 , below ground 60cm-100cm 2000-2005 , below ground 60cm-100cm 2020-2022 , Soil textures fraction: clay below ground 0cm-30cm 2000-2005 , clay below ground 0cm-30cm 2020-2022 , clay below ground 30cm-60cm 2000-2005 , clay below ground 30cm-60cm 2020-2022 , clay below ground 60cm-100cm 2000-2005 , clay below ground 60cm-100cm 2020-2022 , sand below ground 0cm-30cm 2000-2005 , sand below ground 0cm-30cm 2020-2022 , sand below ground 30cm-60cm 2000-2005 , sand below ground 30cm-60cm 2020-2022 , sand below ground 60cm-100cm 2000-2005 , sand below ground 60cm-100cm 2020-2022 , silt below ground 0cm-30cm 2000-2005 , silt below ground 0cm-30cm 2020-2022 , silt below ground 30cm-60cm 2000-2005 , silt below ground 30cm-60cm 2020-2022 , silt below ground 60cm-100cm 2000-2005 , silt below ground 60cm-100cm 2020-2022 , Soil type (Suborder): Uderts , Calcids , Xerands , Orthents , Cryands , Ustalfs , Cryalfs , Aquepts , Udalfs , Cryolls , Durids , Usterts , Boralfs , Orthids , Udands , Torrerts , Histels , Rendolls , Aqualfs , Udepts , Xeralfs , Gelepts , Xerults , Fibrists , Ustepts , Xererts , Ustults , Aquands , Perox , Xerolls , Tropepts , Turbels , Udults , Aquents , Aquerts , Ustox , Aquods , Aquolls , Xerepts , Udox , Cryods , Ustolls , Aquults , Psamments , Arents , Fluvents , Humults , Vitrands , Udolls , Borolls , Orthels , Hemists , Wassents , Albolls , Salids , Cryepts , Saprists , Folists , Gypsids , Ochrepts , Cambids , Argids , Orthods , Data Details Time period: 2000-2022 Type of data: soil type probability - suborder: Boralfs How the data was collected or derived: Machine learning models. Statistical Methods used: Random Forest. Limitations or exclusions in the data: The dataset does not include Antarctica. Coordinate reference system: EPSG:4326 Bounding box (Xmin, Ymin, Xmax, Ymax): (-180, -56, 180, 76) Spatial resolution: 120m Image size: 360,000P x 132,000L File format: Cloud Optimized Geotiff (COG) format. Layer information: File Name Unit Scale Data Type No Data Description oc_iso.10694.1995.mg.cm3 kg/m³ 10 UInt16 32767 Organic carbon density derived by multiply fine earth bulk density and organic carbon content oc_iso.10694.1995.wpml g/kg 10 UInt16 32767 Organic carbon content based on dry combustion weight percent ph.h2o_iso.10390.2021.index - 10 Byte 255 The pH, 1:1 soil-water suspension is the pH of a sample measured in distilled water at a 1:1 soil:solution ratio bd.core_iso.11272.2017.g.cm3 g/cm³ 100 UInt16 32767 Bulk density, <2mm fraction, dry is the weight per unit volume of the <2 mm fraction, with volume measured in laboratory sand.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated sand 0.063 to 2.0 mm particle diameter silt.tot_iso.11277.2020.wpct % 1 Byte 255 Total laboratory-estimated silt 0.002 to 0.063 mm particle size clay.tot_iso.11277.2020.wpct % 1 Byte 255 Total clay is the soil separate with <0.002 mm particle diameter soil.types_ensemble % 1 Byte 255 Probability of soil type occurrence Support If you discover a bug, artifact, or inconsistency, or if you have a question please raise a GitHub issue here Naming convention To ensure consistency and ease of use across and within the projects, we follow the standard Ai4SoilHealth and Open-Earth-Monitor file-naming convention. The convention works with 10 fields that describe important properties of the data. In this way users can search files, prepare data analysis etc, without needing to open files. For example, for oc_iso.10694.1995.wpml_m_30m_b30cm..60cm_20000101_20051231_g_epsg.4326_v20250204.tif, the fields are: generic variable

  • d

    Data from: Annual (1986-2020) land-use/land cover maps of the Santa Cruz...

    • catalog.data.gov
    • data.usgs.gov
    Updated Nov 26, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Annual (1986-2020) land-use/land cover maps of the Santa Cruz Watershed and Tucson metropolitan area, Arizona [Dataset]. https://catalog.data.gov/dataset/annual-1986-2020-land-use-land-cover-maps-of-the-santa-cruz-watershed-and-tucson-metropoli
    Explore at:
    Dataset updated
    Nov 26, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Area covered
    Tucson Metropolitan Area, Arizona
    Description

    Annual (1986-2020) land-use/land cover maps at 30-meter resolution of the Tucson metropolitan area, Arizona and the greater Santa Cruz Watershed including Nogales, Sonora, Mexico. Maps were created using a combination of Landsat imagery, derived transformation and indices, texture analysis and other ancillary data fed to a Random Forest classifier in Google Earth Engine. The maps contain 13 classes based on the National Land Cover Classification scheme and modified to reflect local land cover types. Data are presented as a stacked, multi-band raster with one "band" for each year (Band 1 = 1986, Band 2 = 1987 and so on). Note that the year 2012 was left out of our time series because of lack of quality Landsat data. A color file (.clr) is included that can be imported to match the color of the National Land Cover Classification scheme. This data release also contains two JavaScript files with the Google Earth Engine code developed for pre-processing Landsat imagery and for image classification, and a zip folder "Accuracy Data" with five excel files: 1) Accuracy Statistics describing overall accuracy for each LULC year, 2) Confusion Matrices for each LULC year, 3) Land Cover Evolution - changes in pixel count for each class per year, 4) LULC Change Matrix - to and from class changes over the period, and 5) Variable Importance - results of the Random Forest Classification.

  • Mumbai-Slum-Detection-Dataset

    • kaggle.com
    zip
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupesh Kumar Yadav (2025). Mumbai-Slum-Detection-Dataset [Dataset]. https://www.kaggle.com/datasets/rupeshkumaryadav/mumbai-slum-detection-dataset/data
    Explore at:
    zip(304746333 bytes)Available download formats
    Dataset updated
    Jul 22, 2025
    Authors
    Rupesh Kumar Yadav
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Dharavi Slums, Mumbai
    Description

    Dataset Summary

    This dataset is developed for pixel-level classification of urban informal settlements using satellite imagery. The input data consists of Sentinel-2 imagery (2015–2016), and the ground truth is derived from a government-conducted survey available as a KML vector file, rasterized to align with the imagery.Formats include NumPy arrays and HDF5 files for easy ML integration. Intended for land‑use/land‑cover classification tasks.

    🛰️ Data Source

    Satellite Imagery: Sentinel‑2 L2A (Surface Reflectance) images from 2015–16, accessed via Google Earth Engine (GEE)

    Ground Truth: Official government KML vector file, manually rasterized to match imagery resolution and alignment

    📦 Data Format

    Ground Truth Source: Government survey KML converted to raster via QGIS

    Satellite Data: Sentinel‑2 L2A (Surface Reflectance) images from 2015–16

    CRS & Extent: EPSG:4326

    Bounding Box: Longitude: 72.7827462580 to 72.9718317340

    Latitude: 18.9086328640 to 19.2638524900

    Spatial Accuracy: ~±2 m (WGS84)

    Raster Size: 2105 × 3954 pixels (Float64 GeoTIFF)

    Formats: NumPy (.npy) and HDF5 (.h5) for image bands and per-pixel labels

    Pixel size: ~10m (based on Sentinel-2 native resolution)

    Label Values:

            1 → Informal/Slum
    
            0 → Formal/Non-slum
    

    Data Type: float64 (image), uint8 (labels)

    📜 Coordinate System Details

    CRS Name: EPSG:4326 - WGS 84

    Datum: World Geodetic System 1984 (EPSG:6326)

    Units: Geographic (degrees)

    Accuracy: ≤ 2 meters (approximate)

    Type: Geographic 2D

    Celestial Body: Earth

    Reference: Dynamic (not plate-fixed)

    Additional Details

    1.Processing Pipeline KML to Raster: Ground truth polygons from KML rasterized using GDAL to match Sentinel-2 extent and resolution. Image Preprocessing: Cloud masking and band selection (R, G, B, NIR) through Google Earth Engine. Export Format: .tif downloaded, converted to .npy and .h5 using rasterio, numpy, and h5py. Alignment: Verified pixel-wise correspondence between image and label arrays.

    2.Authorship & Provenance Creators: M Rupesh Kumar Yadav, Mtech, Dept of Centre of Studies in Resources Engineering, IIT Bombay. You can contact through mail rupesh32003@gmail.com, 24m0319@iitb.ac.in, or checkout github for further resources/assistance. orcid id, github, LinkedIn

    3.Content & Structure Bands per sample: RGB (3 bands) + NIR (1 band) Ground truth: Per-pixel labels aligned with imagery Data splits: (e.g.) train/val/test percentages or file lists File naming conventions: Explain if files correspond to tiles, dates, etc. Example sample: Show dimensions, dtype, label values, and their mapping to classes.

    4.Collection & Processing Satellite imagery: Retrieved via Google Earth Engine over 2015–16; filtered by cloud cover threshold Ground truth conversion: KML survey data rasterized using same spatial resolution and CRS Alignment: Resampled and aligned bands using GEE reprojection Preprocessing steps: Cloud masking, atmospheric correction (L2A), normalization, dtype cast to Float64 Label handling: Ensured spatial overlap and clipping; labeled invalid/missing areas as class 0 or mask

    5.Usage & Intended Applications Tasks: Semantic segmentation or pixel-level land-cover mapping Ideal for: Land use change detection, agricultural mapping, validation of remote sensing models Not suitable for: Tasks needing multispectral beyond NIR, very high-res (<10 m) labeling, temporal sequence modeling

    6.Limitations & Bias Temporal span: Only covers 2015–2016; may not reflect current conditions Spatial scope bias: Limited geographic area (Mumbai region) Labeling bias: Dependent on government survey accuracy and rasterization fidelity Cloud coverage: Some tiles may still contain residual cloud pixels

  • Data from: European LCZ map

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Matthias Demuzere; Benjamin Bechtel; Ariane Middel; Gerald Mills (2023). European LCZ map [Dataset]. http://doi.org/10.6084/m9.figshare.13322450.v4
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Matthias Demuzere; Benjamin Bechtel; Ariane Middel; Gerald Mills
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A European Local Climate Zone map at a 100 m spatial resolution, derived from multiple earth observation datasets and expert LCZ class labels. There are 10 urban LCZ types, each associated with a set of relevant variables such that the map represent a valuable database of urban properties.

  • u

    Data from: A dataset of spatiotemporally sampled MODIS Leaf Area Index with...

    • agdatacommons.nal.usda.gov
    application/csv
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yanghui Kang; Mutlu Ozdogan; Feng Gao; Martha C. Anderson; William A. White; Yun Yang; Yang Yang; Tyler A. Erickson (2025). A dataset of spatiotemporally sampled MODIS Leaf Area Index with corresponding Landsat surface reflectance over the contiguous US [Dataset]. http://doi.org/10.15482/USDA.ADC/1521097
    Explore at:
    application/csvAvailable download formats
    Dataset updated
    Nov 22, 2025
    Dataset provided by
    Ag Data Commons
    Authors
    Yanghui Kang; Mutlu Ozdogan; Feng Gao; Martha C. Anderson; William A. White; Yun Yang; Yang Yang; Tyler A. Erickson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    Leaf Area Index (LAI) is a fundamental vegetation structural variable that drives energy and mass exchanges between the plant and the atmosphere. Moderate-resolution (300m – 7km) global LAI data products have been widely applied to track global vegetation changes, drive Earth system models, monitor crop growth and productivity, etc. Yet, cutting-edge applications in climate adaptation, hydrology, and sustainable agriculture require LAI information at higher spatial resolution (< 100m) to model and understand heterogeneous landscapes. This dataset was built to assist a machine-learning-based approach for mapping LAI from 30m-resolution Landsat images across the contiguous US (CONUS). The data was derived from the Moderate Resolution Imaging Spectroradiometer (MODIS) Version 6 LAI/FPAR, Landsat Collection 1 surface reflectance, and NLCD Land Cover datasets over 2006 – 2018 using Google Earth Engine. Each record/sample/row includes a MODIS LAI value, corresponding Landsat surface reflectance in green, red, NIR, SWIR1 bands, a land cover (biome) type, geographic location, and other auxiliary information. Each sample represents a MODIS LAI pixel (500m) within which a single biome type dominates 90% of the area. The spatial homogeneity of the samples was further controlled by a screening process based on the coefficient of variation of the Landsat surface reflectance. In total, there are approximately 1.6 million samples, stratified by biome, Landsat sensor, and saturation status from the MODIS LAI algorithm. This dataset can be used to train machine learning models and generate LAI maps for Landsat 5, 7, 8 surface reflectance images within CONUS. Detailed information on the sample generation and quality control can be found in the related journal article. Resources in this dataset:Resource Title: README. File Name: LAI_train_samples_CONUS_README.txtResource Description: Description and metadata of the main datasetResource Software Recommended: Notepad,url: https://www.microsoft.com/en-us/p/windows-notepad/9msmlrh6lzf3?activetab=pivot:overviewtab Resource Title: LAI_training_samples_CONUS. File Name: LAI_train_samples_CONUS_v0.1.1.csvResource Description: This CSV file consists of the training samples for estimating Leaf Area Index based on Landsat surface reflectance images (Collection 1 Tire 1). Each sample has a MODIS LAI value and corresponding surface reflectance derived from Landsat pixels within the MODIS pixel. Contact: Yanghui Kang (kangyanghui@gmail.com)
    Column description

    UID: Unique identifier. Format: LATITUDE_LONGITUDE_SENSOR_PATHROW_DATE
    Landsat_ID: Landsat image ID Date: Landsat image date in "YYYYMMDD" Latitude: Latitude (WGS84) of the MODIS LAI pixel center Longitude: Longitude (WGS84) of the MODIS LAI pixel center MODIS_LAI: MODIS LAI value in "m2/m2" MODIS_LAI_std: MODIS LAI standard deviation in "m2/m2" MODIS_LAI_sat: 0 - MODIS Main (RT) method used no saturation; 1 - MODIS Main (RT) method with saturation NLCD_class: Majority class code from the National Land Cover Dataset (NLCD) NLCD_frequency: Percentage of the area cover by the majority class from NLCD Biome: Biome type code mapped from NLCD (see below for more information) Blue: Landsat surface reflectance in the blue band Green: Landsat surface reflectance in the green band Red: Landsat surface reflectance in the red band Nir: Landsat surface reflectance in the near infrared band Swir1: Landsat surface reflectance in the shortwave infrared 1 band Swir2: Landsat surface reflectance in the shortwave infrared 2 band Sun_zenith: Solar zenith angle from the Landsat image metadata. This is a scene-level value. Sun_azimuth: Solar azimuth angle from the Landsat image metadata. This is a scene-level value. NDVI: Normalized Difference Vegetation Index computed from Landsat surface reflectance EVI: Enhanced Vegetation Index computed from Landsat surface reflectance NDWI: Normalized Difference Water Index computed from Landsat surface reflectance GCI: Green Chlorophyll Index = Nir/Green - 1

    Biome code

    1 - Deciduous Forest
    2 - Evergreen Forest
    3 - Mixed Forest
    4 - Shrubland
    5 - Grassland/Pasture
    6 - Cropland
    7 - Woody Wetland
    8 - Herbaceous Wetland

    Reference Dataset: All data was accessed through Google Earth Engine Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment. MODIS Version 6 Leaf Area Index/FPAR 4-day L5 Global 500m Myneni, R., Y. Knyazikhin, T. Park. MOD15A2H MODIS/Terra Leaf Area Index/FPAR 8-Day L4 Global 500m SIN Grid V006. 2015, distributed by NASA EOSDIS Land Processes DAAC, https://doi.org/10.5067/MODIS/MOD15A2H.006 Landsat 5/7/8 Collection 1 Surface Reflectance Landsat Level-2 Surface Reflectance Science Product courtesy of the U.S. Geological Survey. Masek, J.G., Vermote, E.F., Saleous N.E., Wolfe, R., Hall, F.G., Huemmrich, K.F., Gao, F., Kutler, J., and Lim, T-K. (2006). A Landsat surface reflectance dataset for North America, 1990–2000. IEEE Geoscience and Remote Sensing Letters 3(1):68-72. http://dx.doi.org/10.1109/LGRS.2005.857030. Vermote, E., Justice, C., Claverie, M., & Franch, B. (2016). Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sensing of Environment. http://dx.doi.org/10.1016/j.rse.2016.04.008. National Land Cover Dataset (NLCD) Yang, Limin, Jin, Suming, Danielson, Patrick, Homer, Collin G., Gass, L., Bender, S.M., Case, Adam, Costello, C., Dewitz, Jon A., Fry, Joyce A., Funk, M., Granneman, Brian J., Liknes, G.C., Rigge, Matthew B., Xian, George, A new generation of the United States National Land Cover Database—Requirements, research priorities, design, and implementation strategies: ISPRS Journal of Photogrammetry and Remote Sensing, v. 146, p. 108–123, at https://doi.org/10.1016/j.isprsjprs.2018.09.006 Resource Software Recommended: Microsoft Excel,url: https://www.microsoft.com/en-us/microsoft-365/excel

  • d

    Data from: Distribution, frequency, and global extent of hypoxia in rivers

    • catalog.data.gov
    • data.usgs.gov
    • +1more
    Updated Oct 30, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Geological Survey (2025). Distribution, frequency, and global extent of hypoxia in rivers [Dataset]. https://catalog.data.gov/dataset/distribution-frequency-and-global-extent-of-hypoxia-in-rivers
    Explore at:
    Dataset updated
    Oct 30, 2025
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Description

    To assess the distribution, frequency, and global extent of riverine hypoxia, we compiled 118 million paired dissolved oxygen (DO) and water temperature measurements from 125,158 unique locations in rivers in 93 countries and territories across the globe. The dataset also includes site characteristics derived from StreamCat, the National Hydrography and HydroAtlas datasets and proximal land cover derived from MODIS-based IGBP land cover types compiled using Google Earth Engine (GEE).

  • Annual global forest gain maps from 1984 to 2020

    • figshare.com
    tiff
    Updated Mar 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenrong Du; Le Yu; Jianyu Yang; David Coomes; Haohuan Fu; Peng Gong (2022). Annual global forest gain maps from 1984 to 2020 [Dataset]. http://doi.org/10.6084/m9.figshare.18461609.v1
    Explore at:
    tiffAvailable download formats
    Dataset updated
    Mar 8, 2022
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Zhenrong Du; Le Yu; Jianyu Yang; David Coomes; Haohuan Fu; Peng Gong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Forest cover is rapidly changing at the global scale as a result of land-use change (principally deforestation in many tropical regions and afforestation in many temperate regions) and climate change. However, a detailed map of global forest gain is still lacking at fine spatial and temporal resolutions. In this study, we developed a new automatic framework to map annual forest gain across the globe, based on Landsat time series, the LandTrendr algorithm and the Google Earth Engine (GEE) platform. First, samples of stable forest collected based on the Global Forest Change product (GFC) were used to determine annual Normalized Burn Ratio (NBR) thresholds for forest gain detection. Secondly, with the NBR time-series from 1982 to 2020 and LandTrendr algorithm, we produced dataset of global forest gain year from 1984 to 2020 based on a set of decision rules. Our results reveal that large areas of forest gain occurred in China, Russia, Brazil and North America, and the vast majority of the global forest gain has occurred since 2000. The new dataset was consistent in both spatial extent and years of forest gain with data from field inventories and alternative remote sensing products. Our dataset is valuable for policy-relevant research on the net impact of forest cover change on the global carbon cycle and provides an efficient and transferable approach for monitoring other types of land cover dynamics.

  • Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jing Sun; Suwit Ongsomwang (2023). DataSheet1_Optimal parameters of random forest for land cover classification with suitable data type and dataset on Google Earth Engine.zip [Dataset]. http://doi.org/10.3389/feart.2023.1188093.s001

    DataSheet1_Optimal parameters of random forest for land cover classification with suitable data type and dataset on Google Earth Engine.zip

    Related Article
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2023
    Dataset provided by
    Frontiers
    Authors
    Jing Sun; Suwit Ongsomwang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exact land cover (LC) map is essential information for understanding the development of human societies and studying the impacts of climate and environmental change. To fulfill this requirement, an optimal parameter of Random Forest (RF) for LC classification with suitable data type and dataset on Google Earth Engine (GEE) was investigated. The research objectives were 1) to examine optimum parameters of RF for LC classification at local scale 2) to classify LC data and assess accuracy in model area (Hefei City), 3) to identify a suitable data type and dataset for LC classification and 4) to validate optimum parameters of RF for LC classification with a suitable data type and dataset in test area (Nanjing City). This study suggests that the suitable data types for LC classification were Sentinel-2 data with auxiliary data. Meanwhile, the suitable dataset for LC classification was monthly and seasonal medians of Sentinel-2, elevation, and nighttime light data. The appropriate values of the number of trees, the variable per split, and the bag fraction for RF were 800, 22, and 0.9, respectively. The overall accuracy (OA) and Kappa index of LC in model area (Hefei City) with suitable dataset was 93.17% and 0.9102. In the meantime, the OA and Kappa index of LC in test area (Nanjing City) was 92.38% and 0.8914. Thus, the developed research methodology can be applied to update LC map where LC changes quickly occur.

    Search
    Clear search
    Close search
    Google apps
    Main menu