Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98–99%)..., , # Mapped built infrastructure (MBI)
These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses. These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.
If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be...,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the companion dataset to publication {TBD}. It contains 1) seasonal composites of predicted maize cover and yield at 10 m resolution in Rwanda for two annual agricultural seasons over five years, 2) scripts for the end-to-end machine learning pipeline that produces these data products, and 3) data or references needed as inputs to the pipeline.
The data are provided here as netCDF4 files with four dimensions for x, y, band, and season. They can also be accessed as Google Earth ImageCollections at:
The land cover classification file is found at data/composites/lulc_classifier_Rwanda_2019to2023.nc
.
The land cover classification images contain 3 bands/variables: maizeProb, the raw predicted probability of the pixel being maize given by the gradient boosted tree model; majorityClass, the categorical land cover class with the highest predicted probability among any of the nine classes in the respective pixel; and optimalClass, the categorical land cover class adjusted to agree with national statistics for expected maize area.
The land cover classes map to the raster values as follows:
{
1: 'maize',
2: 'nonmaize_annual',
3: 'nonmaize_perennial',
4: 'scrub_shrub_land',
5: 'forest',
6: 'flooded_vegetation',
7: 'water',
8: 'structure',
9: 'bare'
}
The dataset includes 5 years (2019-2023) and 10 seasons - the available time period at time of publication. In Rwanda, maize is typically planted and harvested during two distinct agricultural seasons per year: Season A from September to February and Season B from March to June. Therefore the seasons in the data are: 2019_Season_A, 2019_Season_B, 2020_Season_A, 2020_Season_B, 2021_Season_A, 2021_Season_B, 2022_Season_A, 2022_Season_B, 2023_Season_A, 2023_Season_B.
The maize yield file is found at data/composites/maize_yield_Rwanda_2019to2023.nc
.
Each of the images in the yield composites has 3 bands/variables also: maizeYield, the model's output of continuous predicted yield (kg/ha) in each pixel regardless of land class; maizeYield_majorityClass, predicted maize yield masked to the majority class land classification; and maizeYieldAdj_optimalClass, where the raw predicted yields were masked to the optimal maize classification land cover layer and normalized to national statistics.
The dataset includes the same seasons as the classification product; see above for a description.
All earth observation imagery, analysis, and outputs unless otherwise stated were hosted in the Google Earth Engine (GEE) environment and developed with the Earth Engine Python API in Python v3.10. To set up a local conda environment use the scripts/environment.yml
file. The user must have Google Cloud Storage (GCS) and Google Earth Engine (GEE) accounts. The pipeline, at this scale, will incur some processing and storage fees, although Google offers a free trial to all new users and the total cost of the high-resolution wall-to-wall predictions is nominal (~$20 for one season).
The scripts needed to perform the pipeline are located in the scripts
folder.
The files contained in the scripts/helpers
directory will be called by various subsequent scripts and do not to be run interactively by the user.
Follow the script in the order described below. The user should pause after running each script and confirm that all outputs were created and loaded to GCS before continuing the pipeline; for some steps this may take hours to days depending on processing speed.
Users should specify the names of the bucket and asset project that were chosen during set up of their GCS and GEE environments in the Objects section of scripts/helpers/maize_pipeline_0_workspace.py
.
In scripts/pipeline_setup
, you will find the following scripts to perform data preparation of inputs into model building and prediction.
maize_pipeline_1_clean_training_data.py
- Cleans and merges all available crop label and yield data for model training and validationmaize_pipeline_2_dwnld_data_training.py
- Downloads satellite-derived and auxiliary features at training data points for model buildingmaize_pipeline_3_dwnld_data_inference.py
- Downloads satellite-derived and auxiliary features at every 10 m pixel in Rwanda on a district-wise basis for predictionIn scripts/maize_classification
, you will find the following scripts to perform model building, prediction, and post-processing for the classificaton of land cover type and maize cover.
maize_classifier_1_feature_selection.py
- Selects features subset for land cover classification with mutual information score or variable importancemaize_classifier_2_build_model.py
- Builds gradient boosted tree model for land cover classification from training datamaize_classifier_3_prediction.py
- Applies model for land cover classification to every 10 m pixel in Rwanda by season and districtmaize_classifier_4_postprocess.py
- Mosaics district-wise predictions and normalizes maize cover predictions to national agricultural statisticsIn scripts/maize_yield
, you will find the following scripts to perform modeling building, prediction, and post-processing for maize yield estimation.
maize_yield_1_build_model.py
- Builds gradient boosted tree model and performs bias correction for maize yield estimation from training datamaize_yield_2_prediction.py
- Applies model for maize yield estimation to every 10 m pixel in Rwanda by season and districtmaize_yield_3_postprocess.py
- Mosaics district-wise predictions and normalizes maize yield predictions to national agricultural statisticsIf you are running the entire pipeline with refreshed training data and model building, run each of these scripts, in order. By default, the script will run all A and B seasons from 2019A to current. Otherwise, if you just wish to re-run or update seasonal predictions from the existing classification or yield model run maize_pipeline_3_dwnld_data_inference.py
to download the seasonal feature data across Rwanda and maize_classifier_3_prediction.py
and maize_classifier_4_postprocess.py
for classification predictions or maize_yield_2_prediction.py
and maize_yield_3_postprocess.py
for yield predictions, making sure to specify which season(s) are of interest in each script. However to do this, you also need to have a copy of the previously built models in your GCS (provided at data/models
).
A description of datasets that must be sourced outside of the GEE platform is provided below. When available, the primary data source is also included in the directory data/baselayers
. All other data, including Sentinel-2 imagery, auxiliary data, and other existing global land cover classificaiton products are hosted on GEE and called by the scripts directly. All datasets last accessed on 12 March 2024.
data/baselayers/World_Countries
.data/baselayers/WB_NISR_2018
. This should be loaded into a FeatureCollection GEE asset named districts_fc for use in the pipeline. data/baselayers/MINAGRI_AEZ_1980
. This should be loaded into a FeatureCollection GEE asset named aez_rwanda for use in the pipeline. data/baselayers/impactobs_lulc_rwa_2021.tif
. This should be loaded into an ImageCollection GEE asset named impact_obs_lulc for use in the pipeline.(The others - Dynamic World and ESA's WorldCover - are hosted on GEE directly.)
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98–99%)..., , # Mapped built infrastructure (MBI)
These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses. These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.
If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be...,