2 datasets found

d
Data from: Mapping built infrastructure in semi-arid systems using data...
search.dataone.org
data.niaid.nih.gov
+1more
Updated Apr 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau (2025). Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification [Dataset]. http://doi.org/10.5061/dryad.mcvdnckb3
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.mcvdnckb3
Dataset updated
Apr 11, 2025
Dataset provided by
Dryad Digital Repository
Authors
Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau
Description
Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98â€“99%)..., , # Mapped built infrastructure (MBI)

Description of the data and file structure

These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses.Â These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.

If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be...,
High-resolution wall-to-wall time series predictions of seasonal maize area...
zenodo.org
zip
Updated May 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katie Fankhauser; Katie Fankhauser; Evan Thomas; Evan Thomas; Christopher Brook; Arsene Gatera; Zia Mehrabi; Zia Mehrabi; Christopher Brook; Arsene Gatera (2024). High-resolution wall-to-wall time series predictions of seasonal maize area and yield for Rwanda over 2019-2023 [Dataset]. http://doi.org/10.5281/zenodo.10659095
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10659095
Dataset updated
May 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katie Fankhauser; Katie Fankhauser; Evan Thomas; Evan Thomas; Christopher Brook; Arsene Gatera; Zia Mehrabi; Zia Mehrabi; Christopher Brook; Arsene Gatera
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Rwanda
Description
This is the companion dataset to publication {TBD}. It contains 1) seasonal composites of predicted maize cover and yield at 10 m resolution in Rwanda for two annual agricultural seasons over five years, 2) scripts for the end-to-end machine learning pipeline that produces these data products, and 3) data or references needed as inputs to the pipeline.

1) Maize cover and yield seasonal composites

The data are provided here as netCDF4 files with four dimensions for x, y, band, and season. They can also be accessed as Google Earth ImageCollections at:

https://code.earthengine.google.com/?asset=projects/b2p-geospatial/assets/lulc_classifier_composite

https://code.earthengine.google.com/?asset=projects/b2p-geospatial/assets/maize_yield_composite

Land cover and maize classification

The land cover classification file is found at data/composites/lulc_classifier_Rwanda_2019to2023.nc.

The land cover classification images contain 3 bands/variables: maizeProb, the raw predicted probability of the pixel being maize given by the gradient boosted tree model; majorityClass, the categorical land cover class with the highest predicted probability among any of the nine classes in the respective pixel; and optimalClass, the categorical land cover class adjusted to agree with national statistics for expected maize area.

The land cover classes map to the raster values as follows:

{
1: 'maize',
2: 'nonmaize_annual',
3: 'nonmaize_perennial',
4: 'scrub_shrub_land',
5: 'forest',
6: 'flooded_vegetation',
7: 'water',
8: 'structure',
9: 'bare'
}

The dataset includes 5 years (2019-2023) and 10 seasons - the available time period at time of publication. In Rwanda, maize is typically planted and harvested during two distinct agricultural seasons per year: Season A from September to February and Season B from March to June. Therefore the seasons in the data are: 2019_Season_A, 2019_Season_B, 2020_Season_A, 2020_Season_B, 2021_Season_A, 2021_Season_B, 2022_Season_A, 2022_Season_B, 2023_Season_A, 2023_Season_B.

Maize yield

The maize yield file is found at data/composites/maize_yield_Rwanda_2019to2023.nc.

Each of the images in the yield composites has 3 bands/variables also: maizeYield, the model's output of continuous predicted yield (kg/ha) in each pixel regardless of land class; maizeYield_majorityClass, predicted maize yield masked to the majority class land classification; and maizeYieldAdj_optimalClass, where the raw predicted yields were masked to the optimal maize classification land cover layer and normalized to national statistics.

The dataset includes the same seasons as the classification product; see above for a description.

2) End-to-end machine learning pipeline

All earth observation imagery, analysis, and outputs unless otherwise stated were hosted in the Google Earth Engine (GEE) environment and developed with the Earth Engine Python API in Python v3.10. To set up a local conda environment use the scripts/environment.yml file. The user must have Google Cloud Storage (GCS) and Google Earth Engine (GEE) accounts. The pipeline, at this scale, will incur some processing and storage fees, although Google offers a free trial to all new users and the total cost of the high-resolution wall-to-wall predictions is nominal (~$20 for one season).

The scripts needed to perform the pipeline are located in the scripts folder.

The files contained in the scripts/helpers directory will be called by various subsequent scripts and do not to be run interactively by the user.

Follow the script in the order described below. The user should pause after running each script and confirm that all outputs were created and loaded to GCS before continuing the pipeline; for some steps this may take hours to days depending on processing speed.

Google Cloud Storage and Earth Engine set-up

Users should specify the names of the bucket and asset project that were chosen during set up of their GCS and GEE environments in the Objects section of scripts/helpers/maize_pipeline_0_workspace.py.

Pipeline set-up

In scripts/pipeline_setup, you will find the following scripts to perform data preparation of inputs into model building and prediction.

maize_pipeline_1_clean_training_data.py - Cleans and merges all available crop label and yield data for model training and validation

maize_pipeline_2_dwnld_data_training.py - Downloads satellite-derived and auxiliary features at training data points for model building

maize_pipeline_3_dwnld_data_inference.py - Downloads satellite-derived and auxiliary features at every 10 m pixel in Rwanda on a district-wise basis for prediction

Land cover and maize classification

In scripts/maize_classification, you will find the following scripts to perform model building, prediction, and post-processing for the classificaton of land cover type and maize cover.

maize_classifier_1_feature_selection.py - Selects features subset for land cover classification with mutual information score or variable importance

maize_classifier_2_build_model.py - Builds gradient boosted tree model for land cover classification from training data

maize_classifier_3_prediction.py - Applies model for land cover classification to every 10 m pixel in Rwanda by season and district

maize_classifier_4_postprocess.py - Mosaics district-wise predictions and normalizes maize cover predictions to national agricultural statistics

Maize yield

In scripts/maize_yield, you will find the following scripts to perform modeling building, prediction, and post-processing for maize yield estimation.

maize_yield_1_build_model.py - Builds gradient boosted tree model and performs bias correction for maize yield estimation from training data

maize_yield_2_prediction.py - Applies model for maize yield estimation to every 10 m pixel in Rwanda by season and district

maize_yield_3_postprocess.py - Mosaics district-wise predictions and normalizes maize yield predictions to national agricultural statistics

If you are running the entire pipeline with refreshed training data and model building, run each of these scripts, in order. By default, the script will run all A and B seasons from 2019A to current. Otherwise, if you just wish to re-run or update seasonal predictions from the existing classification or yield model run maize_pipeline_3_dwnld_data_inference.py to download the seasonal feature data across Rwanda and maize_classifier_3_prediction.pyand maize_classifier_4_postprocess.py for classification predictions or maize_yield_2_prediction.py and maize_yield_3_postprocess.py for yield predictions, making sure to specify which season(s) are of interest in each script. However to do this, you also need to have a copy of the previously built models in your GCS (provided at data/models).

3) Input data into machine learning pipeline

A description of datasets that must be sourced outside of the GEE platform is provided below. When available, the primary data source is also included in the directory data/baselayers. All other data, including Sentinel-2 imagery, auxiliary data, and other existing global land cover classificaiton products are hosted on GEE and called by the scripts directly. All datasets last accessed on 12 March 2024.

Administrative and geological boundaries

World Countries - Downloaded from The World Bank Official Boundaries and included here at data/baselayers/World_Countries.

Rwanda district boundaries - Downloaded from The World Bank Rwanda Admin Boundaries And Villages and included here at data/baselayers/WB_NISR_2018. This should be loaded into a FeatureCollection GEE asset named districts_fc for use in the pipeline.

Rwanda agro-ecological zones - Downloaded from Nzeyimana, Hartemink & Geissen (2016) and included here at data/baselayers/MINAGRI_AEZ_1980. This should be loaded into a FeatureCollection GEE asset named aez_rwanda for use in the pipeline.

Global land cover classification product

Microsoft/Impact Observatory LULC - Although the 10m Annual Land Use Land Cover (9-class) V1 product contains data from 2017-2022, only the LULC map from the year 2021 was used, provided here at data/baselayers/impactobs_lulc_rwa_2021.tif. This should be loaded into an ImageCollection GEE asset named impact_obs_lulc for use in the pipeline.

(The others - Dynamic World and ESA's WorldCover - are hosted on GEE directly.)

Land cover labels and maize yield crop cuttings

One Acre Fund - Contact authors to request access as this dataset is not hosted publicly.

RTI International - The original source of
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau (2025). Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification [Dataset]. http://doi.org/10.5061/dryad.mcvdnckb3

Data from: Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification

Explore at:

Unique identifier

https://doi.org/10.5061/dryad.mcvdnckb3

Dataset updated

Apr 11, 2025

Dataset provided by

Dryad Digital Repository

Authors

Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau

Description

Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98â€“99%)..., , # Mapped built infrastructure (MBI)

Description of the data and file structure

These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses.Â These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.

If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be...,

Clear search

Close search

Google apps

Main menu

Data from: Mapping built infrastructure in semi-arid systems using data...

Description of the data and file structure

High-resolution wall-to-wall time series predictions of seasonal maize area...

1) Maize cover and yield seasonal composites

Land cover and maize classification

Maize yield

2) End-to-end machine learning pipeline

Google Cloud Storage and Earth Engine set-up

Pipeline set-up

Land cover and maize classification

Maize yield

3) Input data into machine learning pipeline

Administrative and geological boundaries

Global land cover classification product

Land cover labels and maize yield crop cuttings

Data from: Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classificationSee More Versions

Description of the data and file structure

Data from: Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification