2 datasets found
  1. d

    Data from: Mapping built infrastructure in semi-arid systems using data...

    • search.dataone.org
    • data.niaid.nih.gov
    • +1more
    Updated Apr 11, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau (2025). Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification [Dataset]. http://doi.org/10.5061/dryad.mcvdnckb3
    Explore at:
    Dataset updated
    Apr 11, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau
    Description

    Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98–99%)..., , # Mapped built infrastructure (MBI)

    Description of the data and file structure

    These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses. These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.

    If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be...,

  2. High-resolution wall-to-wall time series predictions of seasonal maize area...

    • zenodo.org
    zip
    Updated May 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katie Fankhauser; Katie Fankhauser; Evan Thomas; Evan Thomas; Christopher Brook; Arsene Gatera; Zia Mehrabi; Zia Mehrabi; Christopher Brook; Arsene Gatera (2024). High-resolution wall-to-wall time series predictions of seasonal maize area and yield for Rwanda over 2019-2023 [Dataset]. http://doi.org/10.5281/zenodo.10659095
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 2, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katie Fankhauser; Katie Fankhauser; Evan Thomas; Evan Thomas; Christopher Brook; Arsene Gatera; Zia Mehrabi; Zia Mehrabi; Christopher Brook; Arsene Gatera
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Rwanda
    Description

    This is the companion dataset to publication {TBD}. It contains 1) seasonal composites of predicted maize cover and yield at 10 m resolution in Rwanda for two annual agricultural seasons over five years, 2) scripts for the end-to-end machine learning pipeline that produces these data products, and 3) data or references needed as inputs to the pipeline.

    1) Maize cover and yield seasonal composites

    The data are provided here as netCDF4 files with four dimensions for x, y, band, and season. They can also be accessed as Google Earth ImageCollections at:

    Land cover and maize classification

    The land cover classification file is found at data/composites/lulc_classifier_Rwanda_2019to2023.nc.

    The land cover classification images contain 3 bands/variables: maizeProb, the raw predicted probability of the pixel being maize given by the gradient boosted tree model; majorityClass, the categorical land cover class with the highest predicted probability among any of the nine classes in the respective pixel; and optimalClass, the categorical land cover class adjusted to agree with national statistics for expected maize area.

    The land cover classes map to the raster values as follows:

    {
    1: 'maize',
    2: 'nonmaize_annual',
    3: 'nonmaize_perennial',
    4: 'scrub_shrub_land',
    5: 'forest',
    6: 'flooded_vegetation',
    7: 'water',
    8: 'structure',
    9: 'bare'
    }

    The dataset includes 5 years (2019-2023) and 10 seasons - the available time period at time of publication. In Rwanda, maize is typically planted and harvested during two distinct agricultural seasons per year: Season A from September to February and Season B from March to June. Therefore the seasons in the data are: 2019_Season_A, 2019_Season_B, 2020_Season_A, 2020_Season_B, 2021_Season_A, 2021_Season_B, 2022_Season_A, 2022_Season_B, 2023_Season_A, 2023_Season_B.

    Maize yield

    The maize yield file is found at data/composites/maize_yield_Rwanda_2019to2023.nc.

    Each of the images in the yield composites has 3 bands/variables also: maizeYield, the model's output of continuous predicted yield (kg/ha) in each pixel regardless of land class; maizeYield_majorityClass, predicted maize yield masked to the majority class land classification; and maizeYieldAdj_optimalClass, where the raw predicted yields were masked to the optimal maize classification land cover layer and normalized to national statistics.

    The dataset includes the same seasons as the classification product; see above for a description.

    2) End-to-end machine learning pipeline

    All earth observation imagery, analysis, and outputs unless otherwise stated were hosted in the Google Earth Engine (GEE) environment and developed with the Earth Engine Python API in Python v3.10. To set up a local conda environment use the scripts/environment.yml file. The user must have Google Cloud Storage (GCS) and Google Earth Engine (GEE) accounts. The pipeline, at this scale, will incur some processing and storage fees, although Google offers a free trial to all new users and the total cost of the high-resolution wall-to-wall predictions is nominal (~$20 for one season).

    The scripts needed to perform the pipeline are located in the scripts folder.

    The files contained in the scripts/helpers directory will be called by various subsequent scripts and do not to be run interactively by the user.

    Follow the script in the order described below. The user should pause after running each script and confirm that all outputs were created and loaded to GCS before continuing the pipeline; for some steps this may take hours to days depending on processing speed.

    Google Cloud Storage and Earth Engine set-up

    Users should specify the names of the bucket and asset project that were chosen during set up of their GCS and GEE environments in the Objects section of scripts/helpers/maize_pipeline_0_workspace.py.

    Pipeline set-up

    In scripts/pipeline_setup, you will find the following scripts to perform data preparation of inputs into model building and prediction.

    • maize_pipeline_1_clean_training_data.py - Cleans and merges all available crop label and yield data for model training and validation
    • maize_pipeline_2_dwnld_data_training.py - Downloads satellite-derived and auxiliary features at training data points for model building
    • maize_pipeline_3_dwnld_data_inference.py - Downloads satellite-derived and auxiliary features at every 10 m pixel in Rwanda on a district-wise basis for prediction

    Land cover and maize classification

    In scripts/maize_classification, you will find the following scripts to perform model building, prediction, and post-processing for the classificaton of land cover type and maize cover.

    • maize_classifier_1_feature_selection.py - Selects features subset for land cover classification with mutual information score or variable importance
    • maize_classifier_2_build_model.py - Builds gradient boosted tree model for land cover classification from training data
    • maize_classifier_3_prediction.py - Applies model for land cover classification to every 10 m pixel in Rwanda by season and district
    • maize_classifier_4_postprocess.py - Mosaics district-wise predictions and normalizes maize cover predictions to national agricultural statistics

    Maize yield

    In scripts/maize_yield, you will find the following scripts to perform modeling building, prediction, and post-processing for maize yield estimation.

    • maize_yield_1_build_model.py - Builds gradient boosted tree model and performs bias correction for maize yield estimation from training data
    • maize_yield_2_prediction.py - Applies model for maize yield estimation to every 10 m pixel in Rwanda by season and district
    • maize_yield_3_postprocess.py - Mosaics district-wise predictions and normalizes maize yield predictions to national agricultural statistics

    If you are running the entire pipeline with refreshed training data and model building, run each of these scripts, in order. By default, the script will run all A and B seasons from 2019A to current. Otherwise, if you just wish to re-run or update seasonal predictions from the existing classification or yield model run maize_pipeline_3_dwnld_data_inference.py to download the seasonal feature data across Rwanda and maize_classifier_3_prediction.pyand maize_classifier_4_postprocess.py for classification predictions or maize_yield_2_prediction.py and maize_yield_3_postprocess.py for yield predictions, making sure to specify which season(s) are of interest in each script. However to do this, you also need to have a copy of the previously built models in your GCS (provided at data/models).

    3) Input data into machine learning pipeline

    A description of datasets that must be sourced outside of the GEE platform is provided below. When available, the primary data source is also included in the directory data/baselayers. All other data, including Sentinel-2 imagery, auxiliary data, and other existing global land cover classificaiton products are hosted on GEE and called by the scripts directly. All datasets last accessed on 12 March 2024.

    Administrative and geological boundaries

    • World Countries - Downloaded from The World Bank Official Boundaries and included here at data/baselayers/World_Countries.
    • Rwanda district boundaries - Downloaded from The World Bank Rwanda Admin Boundaries And Villages and included here at data/baselayers/WB_NISR_2018. This should be loaded into a FeatureCollection GEE asset named districts_fc for use in the pipeline.
    • Rwanda agro-ecological zones - Downloaded from Nzeyimana, Hartemink & Geissen (2016) and included here at data/baselayers/MINAGRI_AEZ_1980. This should be loaded into a FeatureCollection GEE asset named aez_rwanda for use in the pipeline.

    Global land cover classification product

    • Microsoft/Impact Observatory LULC - Although the 10m Annual Land Use Land Cover (9-class) V1 product contains data from 2017-2022, only the LULC map from the year 2021 was used, provided here at data/baselayers/impactobs_lulc_rwa_2021.tif. This should be loaded into an ImageCollection GEE asset named impact_obs_lulc for use in the pipeline.

    (The others - Dynamic World and ESA's WorldCover - are hosted on GEE directly.)

    Land cover labels and maize yield crop cuttings

    • One Acre Fund - Contact authors to request access as this dataset is not hosted publicly.
    • RTI International - The original source of

  3. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau (2025). Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification [Dataset]. http://doi.org/10.5061/dryad.mcvdnckb3

Data from: Mapping built infrastructure in semi-arid systems using data integration and open-source approaches for image classification

Related Article
Explore at:
Dataset updated
Apr 11, 2025
Dataset provided by
Dryad Digital Repository
Authors
Megan R. Dolman; Nicholas E. Kolarik; T. Trevor Caughlin; Jodi S. Brandt; Rebecca L. Som Castellano; Megan E. Cattau
Description

Accurate land use land cover (LULC) maps that delineate built infrastructure are useful for numerous applications, from urban planning, humanitarian response, disaster management, to informing decision making for reducing human exposure to natural hazards, such as wildfire. Existing products lack sufficient spatial, temporal, and thematic resolution, omitting critical information needed to capture LULC trends accurately over time. Advancements in remote sensing imagery, open-source software and cloud computing offer opportunities to address these challenges. Using Google Earth Engine, we developed a novel built infrastructure detection method in semi-arid systems by applying a random forest classifier to a fusion of Sentinel-1 and Sentinel-2 time series. Our classifier performed well, differentiating three built environment types: residential, infrastructure, and paved, with overall accuracies ranging from 90 to 96%. Producer accuracies were highest for the infrastructure class (98–99%)..., , # Mapped built infrastructure (MBI)

Description of the data and file structure

These data are annual maps of built infrastructure, with six classes, spanning the Snake River Plain ecoregion in southern Idaho. These products are ready-to-use, and can be imported into any geospatial software for analyses. These data were generated from a fusion of Sentinel-1 radar and Sentinel-2 multispectral imagery. The final MBI products are annual raster data types, that is pixelated, categorical data with 6 categories or classes; 1. Residential, 2. Infrastructure, 3. Paved, 4. Agriculture, 5. Vegetation, and 6. Range/Scrub.

If a user wants to generate these products themselves, or reproduce these products for a similar area, then Google Earth Engine and QGIS is required. The user must have an account with Google Earth Engine (GEE), load the MBI scripts into their repository, and run the code. For applying this model outside of the Snake River Plain Level III ecoregion, new training data must be...,

Search
Clear search
Close search
Google apps
Main menu