14 datasets found

Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR)
developers.google.com
Updated Jan 30, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Union/ESA/Copernicus (2020). Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR) [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED
Explore at:
Dataset updated
Jan 30, 2020
Dataset provided by
European Space Agencyhttp://www.esa.int/
Time period covered
Mar 28, 2017 - Dec 2, 2025
Area covered
Description
After 2022-01-25, Sentinel-2 scenes with PROCESSING_BASELINE '04.00' or above have their DN (value) range shifted by 1000. The HARMONIZED collection shifts data in newer scenes to be in the same range as in older scenes. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the …

SEN12TP - Sentinel-1 and -2 images, timely paired

zenodo.org
nde-dev.biothings.io
+1more

json, txt, zip

Updated Apr 20, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

Thomas Roßberg; Thomas Roßberg; Michael Schmitt; Michael Schmitt (2023). SEN12TP - Sentinel-1 and -2 images, timely paired [Dataset]. http://doi.org/10.5281/zenodo.7342060

Explore at:

json, zip, txtAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.7342060

Dataset updated

Apr 20, 2023

Dataset provided by

Zenodohttp://zenodo.org/

Authors

Thomas Roßberg; Thomas Roßberg; Michael Schmitt; Michael Schmitt

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The SEN12TP dataset (Sentinel-1 and -2 imagery, timely paired) contains 2319 scenes of Sentinel-1 radar and Sentinel-2 optical imagery together with elevation and land cover information of 1236 distinct ROIs taken between 28 March 2017 and 31 December 2020. Each scene has a size of 20km x 20km at 10m pixel spacing. The time difference between optical and radar images is at most 12h, but for almost all scenes it is around 6h since the orbits of Sentinel-1 and -2 are shifted like that. Next to the \(\sigma^\circ\) radar backscatter also the radiometric terrain corrected \(\gamma^\circ\) radar backscatter is calculated and included. \(\gamma^\circ\) values are calculated using the volumetric model presented by Vollrath et. al 2020.

The uncompressed dataset has a size of 222 GB and is split spatially into a train (~90%) and a test set (~10%). For easier download the train set is split into four separate zip archives.

Please cite the following paper when using the dataset, in which the design and creation is detailed:
T. Roßberg and M. Schmitt. A globally applicable method for NDVI estimation from Sentinel-1 SAR backscatter using a deep neural network and the SEN12TP dataset. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 2023. https://doi.org/10.1007/s41064-023-00238-y.

The file sen12tp-metadata.json includes metadata of the selected scenes. It includes for each scene the geometry, an ID for the ROI and the scene, the climate and land cover information used when sampling the central point, the timestamps (in ms) when the Sentinel-1 and -2 image was taken, the month of the year, and the EPSG code of the local UTM Grid (e.g. EPSG:32643 - WGS 84 / UTM zone 43N).

Naming scheme: The images are contained in directories called {roi_id}_{scene_id}, as for some unique regions image pairs of multiple dates are included. In each directory are six files for the different modalities with the naming {scene_id}_{modality}.tif. Multiple modalities are included: radar backscatter and multispectral optical images, the elevation as DSM (digital surface model) and different land cover maps.

Data modalities
name	Modality	GEE collection
s1	Sentinel-1 radar backscatter	`COPERNICUS/S1_GRD`
s2	Sentinel-2 Level-2A (Bottom of atmosphere, BOA) multispectral optical data with added cloud probability band	`COPERNICUS/S2_SR` `COPERNICUS/S2_CLOUD_PROBABILITY`
dsm	30m digital surface model	`JAXA/ALOS/AW3D30/V3_2`
worldcover	land cover, 10m resolution	`ESA/WorldCover/v100`

The following bands are included in the tif files, for an further explanation see the documentation on GEE. All bands are resampled to 10m resolution and reprojected to the coordinate reference system of the Sentinel-2 image.

Modality Bands
Modality	Band count	Band names in tif file	Notes
s1	5	VV_sigma0, VH_sigma0, VV_gamma0flat, VH_gamma0flat, incAngle	VV/VH_sigma0 are the \(\sigma^\circ\) values, VV/VH_gamma0flat are the radiometric terrain corrected \(\gamma^\circ\) backscatter values incAngle is the incident angle
s2	13	B1, B2, B3, B4, B5, B7, B7, B8, B8A, B9, B11, B12, cloud_probability	multispectral optical bands and the probability that a pixel is cloudy, calculated with the sentinel2-cloud-detector library optical reflectances are bottom of atmosphere (BOA) reflectances calculated using sen2cor
dsm	1	DSM	Height above sea level. Signed 16 bits. Elevation (in meter) converted from the ellipsoidal height based on ITRF97 and GRS80, using EGM96†1 geoid model.
worldcover	1	Map	Landcover class

Checking the file integrity
After downloading and decompression the file integrity can be checked using the provided file of md5 checksum.
Under Linux: md5sum --check --quiet md5sums.txt

References:

Vollrath, Andreas, Adugna Mullissa, Johannes Reiche (2020). "Angular-Based Radiometric Slope Correction for Sentinel-1 on Google Earth Engine". In: Remote Sensing 12.1, Art no. 1867. https://doi.org/10.3390/rs12111867.

Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and...
data.niaid.nih.gov
search.dataone.org
+2more
zip
Updated Apr 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katarzyna Ewa Lewińska; Stefan Ernst; David Frantz; Ulf Leser; Patrick Hostert (2025). Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and Sentinel-2 (2015-2024) data [Dataset]. http://doi.org/10.5061/dryad.gb5mkkwxm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.gb5mkkwxm
Dataset updated
Apr 11, 2025
Dataset provided by
Trier University of Applied Sciences
Humboldt-Universität zu Berlin
Authors
Katarzyna Ewa Lewińska; Stefan Ernst; David Frantz; Ulf Leser; Patrick Hostert
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Landsat and Sentinel-2 acquisitions are among the most frequently used medium-resolution (i.e., 10-30 m) optical data. The data are extensively used in terrestrial vegetation applications, including but not limited to, land cover and land use mapping, vegetation condition and phenology monitoring, and disturbance and change mapping. While the Landsat archives alone provide over 40 years, and counting, of continuous and consistent observations, since mid-2015 Sentinel-2 has enabled a revisit frequency of up to 2-days. Although the spatio-temporal availability of both data archives is well-known at the scene level, information on the actual availability of usable (i.e., cloud-, snow-, and shade-free) observations at the pixel level needs to be explored for each study to ensure correct parametrization of used algorithms, thus robustness of subsequent analyses. However, a priori data exploration is time and resource‑consuming, thus is rarely performed. As a result, the spatio-temporal heterogeneity of usable data is often inadequately accounted for in the analysis design, risking ill-advised selection of algorithms and hypotheses, and thus inferior quality of final results. Here we present a global dataset comprising precomputed daily availability of usable Landsat and Sentinel-2 data sampled at a pixel-level in a regular 0.18°-point grid. We based the dataset on the complete 1982-2024 Landsat surface reflectance data (Collection 2) and 2015-2024 Seninel-2 top-of-the-atmosphere reflectance scenes (pre‑Collection-1 and Collection-1). Derivation of cloud-, snow-, and shade-free observations followed the methodology developed in our recent study on data availability over Europe (Lewińska et al., 2023; https://doi.org/10.20944/preprints202308.2174.v2). Furthermore, we expanded the dataset with growing season information derived based on the 2001‑2019 time series of the yearly 500 m MODIS land cover dynamics product (MCD12Q2; Collection 6). As such, our dataset presents a unique overview of the spatio-temporal availability of usable daily Landsat and Sentinel-2 data at the global scale, hence offering much-needed a priori information aiding the identification of appropriate methods and challenges for terrestrial vegetation analyses at the local to global scales. The dataset can be viewed using the dedicated GEE App (link in Related Works). As of February 2025 the dataset has been extended with the 2024 data. Methods We based our analyses on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine (Gorelick et al., 2017). We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (Earth Resources Observation And Science (EROS) Center, 1982), Enhanced Thematic Mapper (ETM+) (Earth Resources Observation And Science (EROS) Center, 1999), and Operational Land Imager (OLI) (Earth Resources Observation And Science (EROS) Center, 2013) scanners between 22nd August 1982 and 31st December 2024, and Sentinel-2 TOA reflectance Level-1C scenes (pre‑Collection-1 (European Space Agency, 2015, 2021) and Collection-1 (European Space Agency, 2022)) acquired with the MultiSpectral Instrument (MSI) between 23rd June 2015 and 31st December 2024. We implemented a conservative pixel-quality screening to identify cloud-, snow-, and shade-free land pixels. For the Landsat time series, we relied on the inherent pixel quality bands (Foga et al., 2017; Zhu & Woodcock, 2012) excluding all pixels flagged as cloud, snow, or shadow as well as pixels with the fill-in value of 20,000 (scale factor 0.0001; (Zhang et al., 2022)). Furthermore, due to the Landsat 7 orbit drift (Qiu et al., 2021) we excluded all ETM+ scenes acquired after 31st December 2020. Because Sentinel-2 Level-2A quality masks lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018), we resorted to Level-1C scenes accompanied by the supporting Cloud Probability product. Furthermore, we employed a selection of conditions, including a threshold on Band 10 (SWIR-Cirrus), which is not available at Level‑2A. Overall, our Sentinel-2-specific cloud, shadow, and snow screening comprised:

exclusion of all pixels flagged as clouds and cirrus in the inherent ‘QA60’ cloud mask band; exclusion of all pixels with cloud probability >50% as defined in the corresponding Cloud Probability product available for each scene; exclusion of cirrus clouds (B10 reflectance >0.01); exclusion of clouds based on Cloud Displacement Analysis (CDI<‑0.5) (Frantz et al., 2018); exclusion of dark pixels (B8 reflectance <0.16) within cloud shadows modelled for each scene with scene‑specific sun parameters for the clouds identified in the previous steps. Here we assumed a cloud height of 2,000 m. exclusion of pixels within a 40-m buffer (two pixels at 20-m resolution) around each identified cloud and cloud shadow object. exclusion of snow pixels identified with a snow mask branch of the Sen2Cor processor (Main-Knorn et al., 2017).

Through applying the data screening, we generated a collection of daily availability records for Landsat and Sentinel-2 data archives. We next subsampled the resulting binary time series with a regular 0.18° x 0.18°‑point grid defined in the EPSG:4326 projection, obtaining 475,150 points located over land between ‑179.8867°W and 179.5733°E and 83.50834°N and ‑59.05167°S. Owing to the substantial amount of data comprised in the Landsat and Sentinel-2 archives and the computationally demanding process of cloud-, snow-, and shade-screening, we performed the subsampling in batches corresponding to a 4° x 4° regular grid and consolidated the final data in post-processing. We derived the pixel-specific growing season information from the 2001-2019 time series of the yearly 500‑m MODIS land cover dynamics product (MCD12Q2; Collection 6) available in Google Earth Engine. We only used information on the start and the end of a growing season, excluding all pixels with quality below ‘best’. When a pixel went through more than one growing cycle per year, we approximated a growing season as the period between the beginning of the first growing cycle and the end of the last growing cycle. To fill in data gaps arising from low-quality data and insufficiently pronounced seasonality (Friedl et al., 2019), we used a 5x5 mean moving window filter to ensure better spatial continuity of our growing season datasets. Following (Lewińska et al., 2023), we defined the start of the season as the pixel-specific 25th percentile of the 2001-2019 distribution for the start of the season dates, and the end of the season as the pixel-specific 75th percentile of the 2001-2019 distribution for end of the season dates. Finally, we subsampled the start and end of the season datasets with the same regular 0.18° x 0.18°-point grid defined in the EPSG:4326 projection. References:

Baetens, L., Desjardins, C., & Hagolle, O. (2019). Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sensing, 11(4), 433. https://doi.org/10.3390/rs11040433 Coluzzi, R., Imbrenda, V., Lanfredi, M., & Simoniello, T. (2018). A first assessment of the Sentinel-2 Level 1-C cloud mask product to support informed surface analyses. Remote Sensing of Environment, 217, 426–443. https://doi.org/10.1016/j.rse.2018.08.009 Earth Resources Observation And Science (EROS) Center. (1982). Collection-2 Landsat 4-5 Thematic Mapper (TM) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P918ROHC Earth Resources Observation And Science (EROS) Center. (1999). Collection-2 Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1 Data Products [dataset]. U.S. Geological Survey. https://doi.org/10.5066/P9TU80IG Earth Resources Observation And Science (EROS) Center. (2013). Collection-2 Landsat 8-9 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P975CC9B European Space Agency. (2015). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl European Space Agency. (2021). Sentinel-2 MSI Level-1C TOA Reflectance, Collection 0 [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl European Space Agency. (2022). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-742ikth Foga, S., Scaramuzza, P. L., Guo, S., Zhu, Z., Dilley, R. D., Beckmann, T., Schmidt, G. L., Dwyer, J. L., Joseph Hughes, M., & Laue, B. (2017). Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sensing of Environment, 194, 379–390. https://doi.org/10.1016/j.rse.2017.03.026 Frantz, D., Haß, E., Uhl, A., Stoffels, J., & Hill, J. (2018). Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sensing of Environment, 215, 471–481. https://doi.org/10.1016/j.rse.2018.04.046 Friedl, M., Josh, G., & Sulla-Menashe, D. (2019). MCD12Q2 MODIS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m SIN Grid V006 [dataset]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MCD12Q2.006 Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. https://doi.org/10.1016/j.rse.2017.06.031Lewińska K.E., Ernst S., Frantz D., Leser U., Hostert P., Global Overview of Usable Landsat and Sentinel-2 Data for 1982–2023. Data in Brief 57, (2024) https://doi.org/10.1016/j.dib.2024.111054 Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., & Gascon, F. (2017). Sen2Cor for Sentinel-2. In L. Bruzzone, F. Bovolo,
G
Sentinel-2: Cloud Probability
developers.google.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Union/ESA/Copernicus/SentinelHub, Sentinel-2: Cloud Probability [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_CLOUD_PROBABILITY
Explore at:
Dataset provided by
European Union/ESA/Copernicus/SentinelHub
Time period covered
Jun 27, 2015 - Dec 2, 2025
Area covered
Description
The S2 cloud probability is created with the sentinel2-cloud-detector library (using LightGBM). All bands are upsampled using bilinear interpolation to 10m resolution before the gradient boost base algorithm is applied. The resulting 0..1 floating point probability is scaled to 0..100 and stored as an UINT8. Areas missing any or all …
r
Coral Sea Sentinel 2 Marine Satellite Composite Draft Imagery version 0...
researchdata.edu.au
catalogue.eatlas.org.au
Updated Nov 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lawrey, Eric, Dr (2021). Coral Sea Sentinel 2 Marine Satellite Composite Draft Imagery version 0 (AIMS) [Dataset]. https://researchdata.edu.au/coral-sea-sentinel-0-aims/2973700
Explore at:
Dataset updated
Nov 30, 2021
Dataset provided by
Australian Ocean Data Network
Authors
Lawrey, Eric, Dr
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Time period covered
Oct 1, 2016 - Sep 20, 2021
Area covered

Description
This dataset contains composite satellite images for the Coral Sea region based on 10 m resolution Sentinel 2 imagery from 2015 – 2021. This image collection is intended to allow mapping of the reef and island features of the Coral Sea. This is a draft version of the dataset prepared from approximately 60% of the available Sentinel 2 image. An improved version of this dataset was released https://doi.org/10.26274/NH77-ZW79.

This collection contains composite imagery for 31 Sentinel 2 tiles in the Coral Sea. For each tile there are 5 different colour and contrast enhancement styles intended to highlight different features. These include:
- DeepFalse - Bands: B1 (ultraviolet), B2 (blue), B3 (green): False colour image that shows deep marine features to 50 - 60 m depth. This imagery exploits the clear waters of the Coral Sea to allow the ultraviolet band to provide a much deeper view of coral reefs than is typically achievable with true colour imagery. This technique doesn't work where the water is not as clear as the ultraviolet get scattered easily.
- DeepMarine - Bands: B2 (blue), B3 (green), B4 (red): This is a contrast enhanced version of the true colour imagery, focusing on being able to better see the deeper features. Shallow features are over exposed due to the increased contrast.
- ReefTop - Bands: B3 (red): This imagery is contrast enhanced to create an mask (black and white) of reef tops, delineating areas that are shallower or deeper than approximately 4 - 5 m. This mask is intended to assist in the creating of a GIS layer equivalent to the 'GBR Dry Reefs' dataset. The depth mapping exploits the limited water penetration of the red channel. In clear water the red channel can only see features to approximately 6 m regardless of the substrate type.
- Shallow - Bands: B5 (red edge), B8 (Near Infrared) , B11 (Short Wave infrared): This false colour imagery focuses on identifying very shallow and dry regions in the imagery. It exploits the property that the longer wavelength bands progressively penetrate the water less. B5 penetrates the water approximately 3 - 5 m, B8 approximately 0.5 m and B11 < 0.1 m. Feature less than a couple of metres appear dark blue, dry areas are white.
- TrueColour - Bands: B2 (blue), B3 (green), B4 (red): True colour imagery. This is useful to interpreting what shallow features are and in mapping the vegetation on cays and identifying beach rock.

For most Sentinel tiles there are two versions of the DeepFalse and DeepMarine imagery based on different collections (dates). The R1 imagery are composites made up from the best available imagery while the R2 imagery uses the next best set of imagery. This splitting of the imagery is to allow two composites to be created from the pool of available imagery so that mapped features could be checked against two images. Typically the R2 imagery will have more artefacts from clouds.

The satellite imagery was processed in tiles (approximately 100 x 100 km) to keep each final image small enough to manage. The dataset only covers the portion of the Coral Sea where there are shallow coral reefs.

# Methods:

The satellite image composites were created by combining multiple Sentinel 2 images using the Google Earth Engine. The core algorithm was:
1. For each Sentinel 2 tile, the set of Sentinel images from 2015 – 2021 were reviewed manually. In some tiles the cloud cover threshold was raised to gather more images, particularly if there were less than 20 images available. The Google Earth Engine image IDs of the best images were recorded. These were the images with the clearest water, lowest waves, lowest cloud, and lowest sun glint.
2. A composite image was created from the best images by taking the statistical median of the stack of images selected in the previous stage, after masking out clouds and their shadows (described in detail later).
3. The contrast of the images was enhanced to create a series of products for different uses. The true colour image retained the full range of tones visible, so that bright sand cays still retained some detail. The marine enhanced version stretched the blue, green and red channels so that they focused on the deeper, darker marine features. This stretching was done to ensure that when converted to 8-bit colour imagery that all the dark detail in the deeper areas were visible. This contrast enhancement resulted in bright areas of the imagery clipping, leading to loss of detail in shallow reef areas and colours of land areas looking off. A reef top estimate was produced from the red channel (B4) where the contrast was stretched so that the imagery contains almost a binary mask. The threshold was chosen to approximate the 5 m depth contour for the clear waters of the Coral Sea. Lastly a false colour image was produced to allow mapping of shallow water features such as cays and islands. This image was produced from B5 (far red), B8 (nir), B11 (nir), where blue represents depths from approximately 0.5 – 5 m, green areas with 0 – 0.5 m depth, and brown and white corresponding to dry land.
4. The various contrast enhanced composite images were exported from Google Earth Engine (default of 32 bit GeoTiff) and reprocessed to smaller LZW compresed 8 bit GeoTiff images GDAL.

## Cloud Masking

Prior to combining the best images each image was processed to mask out clouds and their shadows.
The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts.

A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 40% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask.

A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer.

The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. As such there are probably significant potential improvements that could be made to this algorithm.

Erosion, dilation and buffer operations were performed at a lower image resolution than the native satellite image resolution to improve the computational speed. The resolution of these operations were adjusted so that they were performed with approximately a 4 pixel resolution during these operations. This made the cloud mask significantly more spatially coarse than the 10 m Sentinel imagery. This resolution was chosen as a trade-off between the coarseness of the mask verse the processing time for these operations. With 4-pixel filter resolutions these operations were still using over 90% of the total processing resulting in each image taking approximately 10 min to compute on the Google Earth Engine.

## Sun glint removal and atmospheric correction.

Sun glint was removed from the images using the infrared B8 band to estimate the reflection off the water from the sun glint. B8 penetrates water less than 0.5 m and so in water areas it only detects reflections off the surface of the water. The sun glint detected by B8 correlates very highly with the sun glint experienced by the ultra violet and visible channels (B1, B2, B3 and B4) and so the sun glint in these channels can be removed by subtracting B8 from these channels.

This simple sun glint correction fails in very shallow and land areas. On land areas B8 is very bright and thus subtracting it from the other channels results in black land. In shallow areas (< 0.5 m) the B8 channel detects the substrate, resulting in too much sun glint correction. To resolve these issues the sun glint correction was adjusted by transitioning to B11 for shallow areas as it penetrates the water even less than B8. We don't use B11 everywhere because it is half the resolution of B8.

Land areas need their tonal levels to be adjusted to match the water areas after sun glint correction. Ideally this would be achieved using an atmospheric correction that compensates for the contrast loss due to haze in the atmosphere. Complex models for atmospheric correction involve considering the elevation of the surface (higher areas have less atmosphere to pass through) and the weather conditions. Since this dataset is focused on coral reef areas, elevation compensation is unnecessary due to the very low and flat land features being imaged. Additionally the focus of the dataset it on marine features and so only a basic atmospheric correction is needed. Land areas (as determined by very bright B8 areas) where assigned a fixed smaller correction factor to approximate atmospheric correction. This fixed atmospheric correction was determined iteratively so that land areas matched the tonal value of shallow and water areas.

## Image selection

Available Sentinel 2 images with a cloud cover of less than 0.5% were manually reviewed using an Google Earth Engine App 01-select-sentinel2-images.js. Where there were few images available (less than 30 images) the cloud cover threshold was raised to increase the set of images that were raised.

Images were excluded from the composites primarily due to two main
r
North Australia Sentinel 2 Satellite Composite Imagery - 15th percentile...
researchdata.edu.au
Updated Nov 30, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lawrey, Eric; Hammerton, Marc (2021). North Australia Sentinel 2 Satellite Composite Imagery - 15th percentile true colour (NESP MaC 3.17, AIMS) [Dataset]. http://doi.org/10.26274/HD2Z-KM55
Explore at:
Unique identifier
https://doi.org/10.26274/HD2Z-KM55
Dataset updated
Nov 30, 2021
Dataset provided by
Australian Ocean Data Network
Authors
Lawrey, Eric; Hammerton, Marc
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 27, 2015 - May 31, 2024
Area covered

Description
This dataset is true colour cloud-free composite satellite imagery optimised for mapping shallow marine habitats in northern Australia, based on 10-meter resolution Sentinel 2 data collected from 2015 to 2024. It contains composite imagery for 333 Sentinel 2 tiles of northern Australia and the Great Barrier Reef. This dataset offers improved visual clarity of shallow water features as compared to existing satellite imagery, allowing deeper marine features to be observed. These composites were specifically designed to address challenges such as sun glint, clouds and turbidity that typically hinder marine environment analyses. No tides were considered in the selection of the imagery and so this imagery corresponds to an 'All tide' image, approximating mean sea level.

This dataset is an updated version (Version 2), published in July 2024, which succeeds the initial draft version (Version 1, published in March 2024). The current version spans imagery from 2015–2024, an extension of the earlier timeframe that covered 2018–2022. This longer temporal range allowed the imagery to be cleaner with lower image noise allowing deeper marine features to be visible. The deprecated draft version was removed from online download to save on storage space and is now only available on request.

While the final imagery corresponds to true colour based primarily Sentinel 2 bands B2 (blue), B3 (green), and B4 (red), the near infrared (B8) band was used as part of sun glint correction and automated selection of low noise imagery.

Contrast enhancement was applied to the imagery to compress the original 12 bit per channel Sentinel 2 imagery into the final 8-bit per channel GeoTiffs. Black and white point correction was used to enhance the contrast as much as possible without too much clipping of the darkest and lightest marine features. Gamma correction of 2 (red), 2 (green) and 2.3 (blue) was applied allow a wider dynamic range to be represented in the 8-bit data, helping to ensure that little precision was lost in representing darker marine features. As a result, the image brightness is not linearly scaled. Further details of the corrections applied is available from https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp/blob/main/src/processors/s2processor.py.

Methods:

The satellite image composites were created by combining multiple Sentinel 2 images using the Google Earth Engine. The core algorithm was:
1. For each Sentinel 2 tile filter the "COPERNICUS/S2_HARMONIZED" image collection by
- tile ID
- maximum cloud cover 20%
- date between '2015-06-27' and '2024-05-31'
- asset_size > 100000000 (remove small fragments of tiles)
Note: A maximum cloud cover of 20% was used to improve the processing times. In most cases this filtering does not have an effect on the final composite as images with higher cloud coverage mostly result in higher noise levels and are not used in the final composite.
2. Split images by "SENSING_ORBIT_NUMBER" (see "Using SENSING_ORBIT_NUMBER for a more balanced composite" for more information).
3. For each SENSING_ORBIT_NUMBER collection filter out all noise-adding images:
3.1 Calculate image noise level for each image in the collection (see "Image noise level calculation for more information") and sort collection by noise level.
3.2 Remove all images with a very high noise index (>15).
3.3 Calculate a baseline noise level using a minimum number of images (min_images_in_collection=30). This minimum number of images is needed to ensure a smoth composite where cloud "holes" in one image are covered by other images.
3.4 Iterate over remaining images (images not used in base noise level calculation) and check if adding image to the composite adds to or reduces the noise. If it reduces the noise add it to the composite. If it increases the noise stop iterating over images.
4. Combine SENSING_ORBIT_NUMBER collections into one image collection.
5. Remove sun-glint (true colour only) and apply atmospheric correction on each image (see "Sun-glint removal and atmospheric correction" for more information).
6. Duplicate image collection to first create a composite image without cloud masking and using the 30th percentile of the images in the collection (i.e. for each pixel the 30th percentile value of all images is used).
7. Apply cloud masking to all images in the original image collection (see "Cloud Masking" for more information) and create a composite by using the 30th percentile of the images in the collection (i.e. for each pixel the 30th percentile value of all images is used).
8. Combine the two composite images (no cloud mask composite and cloud mask composite). This solves the problem of some coral cays and islands being misinterpreted as clouds and therefore creating holes in the composite image. These holes are "plugged" with the underlying composite without cloud masking. (Lawrey et al. 2022)
9. The final composite was exported as cloud optimized 8 bit GeoTIFF

Note: The following tiles were generated with no "maximum cloud cover" as they did not have enough images to create a composite with the standard settings: 46LGM, 46LGN, 46LHM, 50KKD, 50KPG, 53LMH, 53LMJ, 53LNH, 53LPH, 53LPJ, 54LVP, 57JVH, 59JKJ.

Compositing Process:

The dataset was created using a multi-step compositing process. A percentile-based image compositing technique was employed, with the 15th percentile chosen as the optimal value for most regions. This percentile was identified as the most effective in minimizing noise and enhancing key features such as coral reefs, islands, and other shallow water habitats. The 15th percentile was chosen as a trade off between the desire to select darker pixels that typically correspond to clearer water, and very dark values (often occurring at the 10th percentile) corresponding to cloud shadows.

The cloud masking predictor would often misinterpret very pale areas, such as cays and beaches as clouds. To overcome this limitation a dual-image compositing method was used. A primary composite was generated with cloud masks applied, and a secondary, composite with no cloud masking was layered beneath to fill in potential gaps (or “holes”) caused by the cloud masking mistakes

Image noise level calculation:

The noise level for each image in this dataset is calculated to ensure high-quality composites by minimizing the inclusion of noisy images. This process begins by creating a water mask using the Normalized Difference Water Index (NDWI) derived from the NIR and Green bands. High reflectance areas in the NIR and SWIR bands, indicative of sun-glint, are identified and masked by the water mask to focus on water areas affected by sun-glint. The proportion of high sun-glint pixels within these water areas is calculated and amplified to compute a noise index. If no water pixels are detected, a high noise index value is assigned.

In any set of satellite images, some will be taken under favourable conditions (low wind, low sun-glint, and minimal cloud cover), while others will be affected by high sun-glint or cloud. Combining multiple images into a composite reduces noise by averaging out these fluctuations.

When all images have the same noise level, increasing the number of images in the composite reduces the overall noise. However, in practice, there is a mix of high and low noise images. The optimal composite is created by including as many low-noise images as possible while excluding high-noise ones. The challenge lies in the determining the acceptable noise threshold for a given scene as some areas are more cloudy and sun glint affected than others.

To address this, we rank the available Sentinel 2 images for each scene by their noise index, from lowest to highest. The goal is to determine the ideal number of images (N) to include in the composite to minimize overall noise. For each N, we use the lowest noise images and estimate the final composite noise based on the noise index. This is repeated for all values of N up to a maximum of 200 images, and we select the N that results in the lowest noise.

This approach has some limitations. It estimates noise based on sun glint and residual clouds (after cloud masking) using NIR bands, without accounting for image turbidity. The final composite noise is not directly measured as this would be computationally expensive. It is instead estimated by dividing the average noise of the selected images by the square root of the number of images. We found this method tends to underestimate the ideal image count, so we adjusted the noise estimates, scaling them by the inverse of their ranking, to favor larger sets of images. The algorithm is not fully optimized, and further refinement is needed to improve accuracy.

Full details of the algorithm can be found in https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp/blob/main/src/utilities/noise_predictor.py

Sun glint removal and atmospheric correction:

Sun glint was removed from the images using the infrared B8 band to estimate the reflection off the water from the sun glint. B8 penetrates water less than 0.5 m and so in water areas it only detects reflections off the surface of the water. The sun glint detected by B8 correlates very highly with the sun glint experienced by the visible channels (B2, B3 and B4) and so the sun glint in these channels can be removed by subtracting B8 from these channels.

Eric Lawrey developed this algorithm by fine tuning the value of the scaling between the B8 channel and each individual visible channel (B2, B3 and B4) so that the maximum level of sun glint would be removed. This work was based on a representative set of images, trying to determine a set of values that represent a good compromise across different water surface
Mumbai-Slum-Detection-Dataset
kaggle.com
zip
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rupesh Kumar Yadav (2025). Mumbai-Slum-Detection-Dataset [Dataset]. https://www.kaggle.com/datasets/rupeshkumaryadav/mumbai-slum-detection-dataset/data
Explore at:
zip(304746333 bytes)Available download formats
Dataset updated
Jul 22, 2025
Authors
Rupesh Kumar Yadav
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Area covered
Mumbai, Dharavi Slums
Description
Dataset Summary

This dataset is developed for pixel-level classification of urban informal settlements using satellite imagery. The input data consists of Sentinel-2 imagery (2015–2016), and the ground truth is derived from a government-conducted survey available as a KML vector file, rasterized to align with the imagery.Formats include NumPy arrays and HDF5 files for easy ML integration. Intended for land‑use/land‑cover classification tasks.

🛰️ Data Source

Satellite Imagery: Sentinel‑2 L2A (Surface Reflectance) images from 2015–16, accessed via Google Earth Engine (GEE)

Ground Truth: Official government KML vector file, manually rasterized to match imagery resolution and alignment

📦 Data Format

Ground Truth Source: Government survey KML converted to raster via QGIS

Satellite Data: Sentinel‑2 L2A (Surface Reflectance) images from 2015–16

CRS & Extent: EPSG:4326

Bounding Box: Longitude: 72.7827462580 to 72.9718317340

Latitude: 18.9086328640 to 19.2638524900

Spatial Accuracy: ~±2 m (WGS84)

Raster Size: 2105 × 3954 pixels (Float64 GeoTIFF)

Formats: NumPy (.npy) and HDF5 (.h5) for image bands and per-pixel labels

Pixel size: ~10m (based on Sentinel-2 native resolution)

Label Values:

1 → Informal/Slum 0 → Formal/Non-slum

Data Type: float64 (image), uint8 (labels)

📜 Coordinate System Details

CRS Name: EPSG:4326 - WGS 84

Datum: World Geodetic System 1984 (EPSG:6326)

Units: Geographic (degrees)

Accuracy: ≤ 2 meters (approximate)

Type: Geographic 2D

Celestial Body: Earth

Reference: Dynamic (not plate-fixed)

Additional Details

1.Processing Pipeline KML to Raster: Ground truth polygons from KML rasterized using GDAL to match Sentinel-2 extent and resolution. Image Preprocessing: Cloud masking and band selection (R, G, B, NIR) through Google Earth Engine. Export Format: .tif downloaded, converted to .npy and .h5 using rasterio, numpy, and h5py. Alignment: Verified pixel-wise correspondence between image and label arrays.

2.Authorship & Provenance Creators: M Rupesh Kumar Yadav, Mtech, Dept of Centre of Studies in Resources Engineering, IIT Bombay. You can contact through mail rupesh32003@gmail.com, 24m0319@iitb.ac.in, or checkout github for further resources/assistance. orcid id, github, LinkedIn

3.Content & Structure Bands per sample: RGB (3 bands) + NIR (1 band) Ground truth: Per-pixel labels aligned with imagery Data splits: (e.g.) train/val/test percentages or file lists File naming conventions: Explain if files correspond to tiles, dates, etc. Example sample: Show dimensions, dtype, label values, and their mapping to classes.

4.Collection & Processing Satellite imagery: Retrieved via Google Earth Engine over 2015–16; filtered by cloud cover threshold Ground truth conversion: KML survey data rasterized using same spatial resolution and CRS Alignment: Resampled and aligned bands using GEE reprojection Preprocessing steps: Cloud masking, atmospheric correction (L2A), normalization, dtype cast to Float64 Label handling: Ensured spatial overlap and clipping; labeled invalid/missing areas as class 0 or mask

5.Usage & Intended Applications Tasks: Semantic segmentation or pixel-level land-cover mapping Ideal for: Land use change detection, agricultural mapping, validation of remote sensing models Not suitable for: Tasks needing multispectral beyond NIR, very high-res (<10 m) labeling, temporal sequence modeling

6.Limitations & Bias Temporal span: Only covers 2015–2016; may not reflect current conditions Spatial scope bias: Limited geographic area (Mumbai region) Labeling bias: Dependent on government survey accuracy and rasterization fidelity Cloud coverage: Some tiles may still contain residual cloud pixels
r
Data from: Coral Sea features satellite imagery and raw depth contours...
researchdata.edu.au
Updated Mar 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hammerton, Marc; Lawrey, Eric (2025). Coral Sea features satellite imagery and raw depth contours (Sentinel 2 and Landsat 8) 2015 – 2021 (AIMS) [Dataset]. http://doi.org/10.26274/NH77-ZW79
Explore at:
Unique identifier
https://doi.org/10.26274/NH77-ZW79
Dataset updated
Mar 7, 2025
Dataset provided by
Australian Ocean Data Network
Authors
Hammerton, Marc; Lawrey, Eric
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Oct 1, 2016 - Sep 20, 2021
Area covered

Description
This dataset contains Sentinel 2 and Landsat 8 cloud free composite satellite images of the Coral Sea reef areas and some parts of the Great Barrier Reef. It also contains raw depth contours derived from the satellite imagery. This dataset was developed as the base information for mapping the boundaries of reefs and coral cays in the Coral Sea. It is likely that the satellite imagery is useful for numerous other applications. The full source code is available and can be used to apply these techniques to other locations.

This dataset contains two sets of raw satellite derived bathymetry polygons for 5 m, 10 m and 20 m depths based on both the Landsat 8 and Sentinel 2 imagery. These are intended to be post-processed using clipping and manual clean up to provide an estimate of the top structure of reefs. This dataset also contains select scenes on the Great Barrier Reef and Shark bay in Western Australia that were used to calibrate the depth contours. Areas in the GBR were compared with the GA GBR30 2020 (Beaman, 2017) bathymetry dataset and the imagery in Shark bay was used to tune and verify the Satellite Derived Bathymetry algorithm in the handling of dark substrates such as by seagrass meadows. This dataset also contains a couple of small Sentinel 3 images that were used to check the presence of reefs in the Coral Sea outside the bounds of the Sentinel 2 and Landsat 8 imagery.

The Sentinel 2 and Landsat 8 imagery was prepared using the Google Earth Engine, followed by post processing in Python and GDAL. The processing code is available on GitHub (https://github.com/eatlas/CS_AIMS_Coral-Sea-Features_Img).

This collection contains composite imagery for Sentinel 2 tiles (59 in Coral Sea, 8 in GBR) and Landsat 8 tiles (12 in Coral Sea, 4 in GBR and 1 in WA). For each Sentinel tile there are 3 different colour and contrast enhancement styles intended to highlight different features. These include:
- TrueColour - Bands: B2 (blue), B3 (green), B4 (red): True colour imagery. This is useful to identifying shallow features are and in mapping the vegetation on cays.
- DeepFalse - Bands: B1 (ultraviolet), B2 (blue), B3 (green): False colour image that shows deep marine features to 50 - 60 m depth. This imagery exploits the clear waters of the Coral Sea to allow the ultraviolet band to provide a much deeper view of coral reefs than is typically achievable with true colour imagery. This imagery has a high level of contrast enhancement applied to the imagery and so it appears more noisy (in particular showing artefact from clouds) than the TrueColour styling.
- Shallow - Bands: B5 (red edge), B8 (Near Infrared) , B11 (Short Wave infrared): This false colour imagery focuses on identifying very shallow and dry regions in the imagery. It exploits the property that the longer wavelength bands progressively penetrate the water less. B5 penetrates the water approximately 3 - 5 m, B8 approximately 0.5 m and B11 < 0.1 m. Features less than a couple of metres appear dark blue, dry areas are white. This imagery is intended to help identify coral cay boundaries.

For Landsat 8 imagery only the TrueColour and DeepFalse stylings were rendered.

All Sentinel 2 and Landsat 8 imagery has Satellite Derived Bathymetry (SDB) depth contours.
- Depth5m - This corresponds to an estimate of the area above 5 m depth (Mean Sea Level).
- Depth10m - This corresponds to an estimate of the area above 10 m depth (Mean Sea Level).
- Depth20m - This corresponds to an estimate of the area above 20 m depth (Mean Sea Level).

For most Sentinel and some Landsat tiles there are two versions of the DeepFalse imagery based on different collections (dates). The R1 imagery are composites made up from the best available imagery while the R2 imagery uses the next best set of imagery. This splitting of the imagery is to allow two composites to be created from the pool of available imagery. This allows any mapped features to be checked against two images. Typically the R2 imagery will have more artefacts from clouds. In one Sentinel 2 tile a third image was created to help with mapping the reef platform boundary.

The satellite imagery was processed in tiles (approximately 100 x 100 km for Sentinel 2 and 200 x 200 km for Landsat 8) to keep each final image small enough to manage. These tiles were not merged into a single mosaic as it allowed better individual image contrast enhancement when mapping deep features. The dataset only covers the portion of the Coral Sea where there are shallow coral reefs and where their might have been potential new reef platforms indicated by existing bathymetry datasets and the AHO Marine Charts. The extent of the imagery was limited by those available through the Google Earth Engine.

# Methods:

The Sentinel 2 imagery was created using the Google Earth Engine. The core algorithm was:
1. For each Sentinel 2 tile, images from 2015 – 2021 were reviewed manually after first filtering to remove cloudy scenes. The allowable cloud cover was adjusted so that at least the 50 least cloud free images were reviewed. The typical cloud cover threshold was 1%. Where very few images were available the cloud cover filter threshold was raised to 100% and all images were reviewed. The Google Earth Engine image IDs of the best images were recorded, along with notes to help sort the images based on those with the clearest water, lowest waves, lowest cloud, and lowest sun glint. Images where there were no or few clouds over the known coral reefs were preferred. No consideration of tides was used in the image selection process. The collection of usable images were grouped into two sets that would be combined together into composite images. The best were added to the R1 composite, and the next best images into the R2 composite. Consideration was made as to whether each image would improve the resultant composite or make it worse. Adding clear images to the collection reduces the visual noise in the image allowing deeper features to be observed. Adding images with clouds introduces small artefacts to the images, which are magnified due to the high contrast stretching applied to the imagery. Where there were few images all available imagery was typically used.
2. Sunglint was removed from the imagery using estimates of the sunglint using two of the infrared bands (described in detail in the section on Sun glint removal and atmospheric correction).
3. A composite image was created from the best images by taking the statistical median of the stack of images selected in the previous stage, after masking out clouds and their shadows (described in detail later).
4. The brightness of the composite image was normalised so that all tiles would have a similar average brightness for deep water areas. This correction was applied to allow more consistent contrast enhancement. Note: this brightness adjustment was applied as a single offset across all pixels in the tile and so this does not correct for finer spatial brightness variations.
5. The contrast of the images was enhanced to create a series of products for different uses. The TrueColour colour image retained the full range of tones visible, so that bright sand cays still retain detail. The DeepFalse style was optimised to see features at depth and the Shallow style provides access to far red and infrared bands for assessing shallow features, such as cays and island.
6. The various contrast enhanced composite images were exported from Google Earth Engine and optimised using Python and GDAL. This optimisation added internal tiling and overviews to the imagery. The depth polygons from each tile were merged into shapefiles covering the whole for each depth.

## Cloud Masking

Prior to combining the best images each image was processed to mask out clouds and their shadows.

The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts.

A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 40% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask.

A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer.

The buffering was applied as the cloud masking would often miss significant portions of the edges of clouds and their shadows. The buffering allowed a higher percentage of the cloud to be excluded, whilst retaining as much of the original imagery as possible.

The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. The algorithm used is significantly better than the default Sentinel 2 cloud masking and slightly better than the COPERNICUS/S2_CLOUD_PROBABILITY cloud mask because it masks out shadows, however there is potentially significant improvements that could be made to the method in the future.

Erosion, dilation and buffer operations were performed at a lower image resolution than the native satellite image resolution to improve the computational speed. The resolution of these operations were adjusted so that they were performed with approximately a 4 pixel resolution during these operations. This made the cloud mask
Z
Wetland Land-Cover Segmentation and Classification in the Netherlands...
data.niaid.nih.gov
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gmelich Meijling, Eva (2025). Wetland Land-Cover Segmentation and Classification in the Netherlands (Sentinel-2 satellite imagery and Dynamic World labels) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_15125548
Explore at:
Dataset updated
Apr 2, 2025
Dataset provided by
University of Amsterdam
Authors
Gmelich Meijling, Eva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Netherlands, World
Description
This dataset contains preprocessed Sentinel-2 imagery and corresponding Dynamic World land-cover labels for six wetland areas in the Netherlands. It was created to support land-cover classification and segmentation tasks in ecologically dynamic floodplain environments. The data covers the period January 2017 to November 2024 and includes only scenes with less than 5% cloud cover.

Sentinel-2 imagery was retrieved using the Google Earth Engine (GEE) API from the COPERNICUS/S2 SR HARMONIZED collection, which provides Harmonized Level-2A data at 10 m spatial resolution. From the 26 available bands, 9 were selected based on their relevance for wetland delineation: RGB, Red Edge 1–3, Near-Infrared (NIR), and Shortwave Infrared (SWIR 1–2). The imagery was tiled into 256×256 pixel patches and filtered for quality (e.g., excluding patches with >10% black pixels).

Dynamic World land-cover labels (Brown et al., 2022) were used to generate pixel-wise semantic segmentation masks by selecting the most probable class (out of 9 land-cover types) for each pixel. The resulting masks are single-band images where pixel values 0–8 represent land-cover classes as follows:

0: Water 1: Trees 2: Grass 3: Flooded Vegetation 4: Crops 5: Shrub & Scrub 6: Built 7: Bare 8: Snow & Ice

The dataset includes the following splits:

Training set: Gelderse Poort, Oostvaardersplassen, Loosdrechtse Plassen, Land van Saeftinghe (1,701 images)

Validation set: Lauwersmeer (948 images)

Test set: Biesbosch (1,140 images)

This resource enables benchmarking of supervised and self-supervised learning methods for wetland classification in medium-resolution optical satellite data.

Reference:Brown, C.F., Brumby, S.P., Guzder-Williams, B., Birch, T., Hyde, S.B., Mazzariello, J., Czerwinski, W., Pasquarella, V.J., Haertel, R., Ilyushchenko, S., Schwehr, K., Weisse, M., Stolle, F., Hanson, C., Guinan, O., Moore, R., & Tait, A.M. (2022). Dynamic World, Near real-time global 10 m land use land cover mapping. Scientific Data, 9(1). https://doi.org/10.1038/s41597-022-01307-4
BiSO-CR
kaggle.com
zip
Updated Jul 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
yu xia WHU (2024). BiSO-CR [Dataset]. https://www.kaggle.com/datasets/yuxiawhu/biso-cr/discussion
Explore at:
zip(37272507023 bytes)Available download formats
Dataset updated
Jul 30, 2024
Authors
yu xia WHU
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
Read first

User Guide is available at https://www.researchgate.net/profile/Yu-Xia-28/publications

If you used this dataset, please cite:
"Y. Xia, W. He, and H. Zhang, Began+: Leveraging bi-temporal SAR-optical data fusion to reconstruct clear-sky satellite imagery under large cloud cover. RSE, Submitted, 2024.

If you have any question, please contact the author's email address: whuxiayu@whu.edu.cn.

**Basic information **
Two worldwide bitemporal SAR-optical data based cloud removal datasets were constructed using Landsat-8 and Sentinel-2 sensors with Sentinel-1 sensor. Landsat-8 and Sentinel-2 are optical sensors that suffer from cloud and fog interference, while Sentinel-1 provides cloud-free SAR data for restoring Landsat-8 and Sentinel-2. For more practical applications, Level-2A products in Landsat-8 and Sentinel-2 were acquired, which provide the bottom of atmosphere (BOA) reflectance, and the ground-range detected (GRD) products in Sentinel-1 were collected with VH and VV polarization modes. All the data mentioned above are provided by the European Space Agency (ESA) and U.S. Geological Survey (USGS) (Zuhlke et al., 2015), and are freely available via google earth engine (GEE) (Gorelick et al., 2017). The Landsat-8 data have six bands with a 30 m spatial resolution, while the Sentinel-2 data preserve 10 commonly used bands with a 10/20 m spatial resolution. The Sentinel-1 data have a 5 m range resolution and a 20 m azimuth steering resolution. The revisit cycles of the three sensors are 16 days for Landsat-8, 5/10 days for Sentinel-2, and 6/12 days for Sentinel-1. It is clear that Sentinel-1 data can assist in reconstructing both Landsat-8 and Sentinel-2 data in both temporal and spatial aspects. To ensure the diversity of data, all data were globally gathered from different geographic environments. To form bitemporal SAR-optical data pairs, Landsat-8, Sentinel-2, and Sentinel-1 data in both 2019 and 2020 were collected and co-registered in the UTM/WGS84 projection system (Gianinetto et al., 2005). It is worth noting that the two optical images (O1 and O2) were acquired with the least cloud cover throughout the year, while the acquisition times of the SAR and optical data (S1 and O1/S2 and O2) differed within 5 days.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11027016%2Fa16d87a9afac3396ff1651f3d52f61d6%2F1.png?generation=1723796440682145&alt=media" alt=""> Two SAR-optical datasets, named BiS1L8 and BiS1S2, are constructed, each comprising 1400 curated image pairs. For training and testing the proposed framework in removing clouds with large cover ratios, credible cloud removal datasets are indispensable. Based on the BiS1L8 and BiS1S2 datasets, two simulated cloud removal datasets, BiS1L8-CR and BiS1S2-CR, are created and made available at Kaggle. Specifically, cloud and shadow masks with coverage exceeding 40%, generally considered a large cover ratio, are randomly selected from three cloud detection datasets (Mohajerani and Saeedi, 2019). These masks are applied to clear-sky Landsat-8 and Sentinel-2 images acquired in 2020 to generate simulated cloud pairs, examples of which are displayed in Fig. 6. In the simulated optical images, the cloudy areas represent missing information, with some pixels completely obscured by clouds. To better constrain the training process of the proposed network, potential change masks are generated through threshold segmentation from the two optical images (Rensink, 2002). After manual screening, change masks with higher confidence are retained. Table 2 compares the proposed datasets with recent cloud removal datasets, revealing that BiS1L8-CR and BiS1S2-CR are globally distributed. The BiS1L8-CR dataset is one of the few bi-temporal global distribution datasets that can serve Landsat-8. https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F11027016%2Fd5b5a58f38f54eff99c13bac61ba8d23%2F2.png?generation=1723796398239838&alt=media" alt="">

Note The data set is a worldwide dataset, extreme large (About 100GB in .zip file) and will be updated and released continuously.
d
Tropical Australia Sentinel 2 Satellite Composite Imagery - Low Tide - 30th...
data.gov.au
html, png, wms
Updated Sep 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Ocean Data Network (2025). Tropical Australia Sentinel 2 Satellite Composite Imagery - Low Tide - 30th percentile true colour and near infrared false colour (NESP MaC 3.17, AIMS) [Dataset]. https://www.data.gov.au/data/dataset/tropical-australia-sentinel-2-satellite-composite-imagery-low-tide-30th-percentile-true-colour-
Explore at:
png, html, wmsAvailable download formats
Dataset updated
Sep 28, 2025
Dataset authored and provided by
Australian Ocean Data Network
Area covered
Australia
Description
This dataset contains cloud free, low tide composite satellite images for the tropical Australia region based on 10 m resolution Sentinel 2 imagery from 2018 – 2023. This image collection was created as part of the NESP MaC 3.17 project and is intended to allow mapping of the reef features in tropical Australia. This collection contains composite imagery for 200 Sentinel 2 tiles around the tropical Australian coast. This dataset uses two styles: 1. a true colour contrast and colour enhancement style (TrueColour) using the bands B2 (blue), B3 (green), and B4 (red) 2. a near infrared false colour style (Shallow) using the bands B5 (red edge), B8 (near infrared), and B12 (short wave infrared). These styles are useful for identifying shallow features along the coastline. The Shallow false colour styling is optimised for viewing the first 3 m of the water column, providing an indication of water depth. This is because the different far red and near infrared bands used in this styling have limited penetration of the water column. In clear waters the maximum penetrations of each of the bands is 3-5 m for B5, 0.5 - 1 m for B8 and < 0.05 m for B12. As a result, the image changes in colour with the depth of the water with the following colours indicating the following different depths: - White, brown, bright green, red, light blue: dry land - Grey brown: damp intertidal sediment - Turquoise: 0.05 - 0.5 m of water - Blue: 0.5 - 3 m of water - Black: Deeper than 3 m In very turbid areas the visible limit will be slightly reduced. Change log: Changes to this dataset and metadata will be noted here: 2024-07-24 - Add tiles for the Great Barrier Reef 2024-05-22 - Initial release for low-tide composites using 30th percentile (Git tag: "low_tide_composites_v1") Methods: The satellite image composites were created by combining multiple Sentinel 2 images using the Google Earth Engine. The core algorithm was: 1. For each Sentinel 2 tile filter the "COPERNICUS/S2_HARMONIZED" image collection by - tile ID - maximum cloud cover 0.1% - date between '2018-01-01' and '2023-12-31' - asset_size > 100000000 (remove small fragments of tiles) 2. Remove high sun-glint images (see "High sun-glint image detection" for more information). 3. Split images by "SENSING_ORBIT_NUMBER" (see "Using SENSING_ORBIT_NUMBER for a more balanced composite" for more information). 4. Iterate over all images in the split collections to predict the tide elevation for each image from the image timestamp (see "Tide prediction" for more information). 5. Remove images where tide elevation is above mean sea level to make sure no high tide images are included. 6. Select the 10 images with the lowest tide elevation. 7. Combine SENSING_ORBIT_NUMBER collections into one image collection. 8. Remove sun-glint (true colour only) and apply atmospheric correction on each image (see "Sun-glint removal and atmospheric correction" for more information). 9. Duplicate image collection to first create a composite image without cloud masking and using the 30th percentile of the images in the collection (i.e. for each pixel the 30th percentile value of all images is used). 10. Apply cloud masking to all images in the original image collection (see "Cloud Masking" for more information) and create a composite by using the 30th percentile of the images in the collection (i.e. for each pixel the 30th percentile value of all images is used). 11. Combine the two composite images (no cloud mask composite and cloud mask composite). This solves the problem of some coral cays and islands being misinterpreted as clouds and therefore creating holes in the composite image. These holes are "plugged" with the underlying composite without cloud masking. (Lawrey et al. 2022) 12. The final composite was exported as cloud optimized 8 bit GeoTIFF Note: The following tiles were generated with different settings as they did not have enough images to create a composite with the standard settings: - 51KWA: no high sun-glint filter - 54LXP: maximum cloud cover set to 1% - 54LXP: maximum cloud cover set to 1% - 54LYK: maximum cloud cover set to 2% - 54LYM: maximum cloud cover set to 5% - 54LYN: maximum cloud cover set to 1% - 54LYQ: maximum cloud cover set to 5% - 54LYP: maximum cloud cover set to 1% - 54LZL: maximum cloud cover set to 1% - 54LZM: maximum cloud cover set to 1% - 54LZN: maximum cloud cover set to 1% - 54LZQ: maximum cloud cover set to 5% - 54LZP: maximum cloud cover set to 1% - 55LBD: maximum cloud cover set to 2% - 55LBE: maximum cloud cover set to 1% - 55LCC: maximum cloud cover set to 5% - 55LCD: maximum cloud cover set to 1% High sun-glint image detection: Images with high sun-glint can lead to lower quality composite images. To determine high sun-glint images, a mask is created for all pixels above a high reflectance threshold for the near-infrared and short-wave infrared bands. Then the proportion of this is calculated and compared against a sun-glint threshold. If the image exceeds this threshold, it is filtered out of the image collection. As we are only interested in the sun-glint on water pixels, a water mask is created using NDWI before creating the sun-glint mask. Sun-glint removal and atmospheric correction: Sun-glint was removed from the images using the infrared B8 band to estimate the reflection off the water from the sun-glint. B8 penetrates water less than 0.5 m and so in water areas it only detects reflections off the surface of the water. The sun-glint detected by B8 correlates very highly with the sun-glint experienced by the visible channels (B2, B3 and B4) and so the sun-glint in these channels can be removed by subtracting B8 from these channels. Eric Lawrey developed this algorithm by fine tuning the value of the scaling between the B8 channel and each individual visible channel (B2, B3 and B4) so that the maximum level of sun-glint would be removed. This work was based on a representative set of images, trying to determine a set of values that represent a good compromise across different water surface conditions. This algorithm is an adjustment of the algorithm already used in Lawrey et al. 2022 Tide prediction: To determine the tide elevation in a specific satellite image, we used a tide prediction model to predict the tide elevation for the image timestamp. After investigating and comparing a number of models, it was decided to use the empirical ocean tide model EOT20 (Hart-Davis et al., 2021). The model data can be freely accessed at https://doi.org/10.17882/79489 and works with the Python library pyTMD (https://github.com/tsutterley/pyTMD). In our comparison we found this model was able to predict accurately the tide elevation across multiple points along the study coastline when compared to historic Bureau of Meteorolgy and AusTide data. To determine the tide elevation of the satellite images we manually created a point dataset where we placed a central point on the water for each Sentinel tile in the study area . We used these points as centroids in the ocean models and calculated the tide elevation from the image timestamp. Using "SENSING_ORBIT_NUMBER" for a more balanced composite: Some of the Sentinel 2 tiles are made up of different sections depending on the "SENSING_ORBIT_NUMBER". For example, a tile could have a small triangle on the left side and a bigger section on the right side. If we filter an image collection and use a subset to create a composite, we could end up with a high number of images for one section (e.g. the left side triangle) and only few images for the other section(s). This would result in a composite image with a balanced section and other sections with a very low input. To avoid this issue, the initial unfiltered image collection is divided into multiple image collections by using the image property "SENSING_ORBIT_NUMBER". The filtering and limiting (max number of images in collection) is then performed on each "SENSING_ORBIT_NUMBER" image collection and finally, they are combined back into one image collection to generate the final composite. Cloud Masking: Each image was processed to mask out clouds and their shadows before creating the composite image. The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts. A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 35% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask. A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer. The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. As such there are probably significant potential improvements that could be made to this algorithm. Erosion, dilation and buffer operations were performed at a lower image resolution than the native satellite image resolution to improve the computational speed. The resolution of these operations was adjusted so that they were performed with approximately a 4 pixel resolution during these operations. This made the cloud mask significantly more spatially coarse than the 10 m Sentinel imagery. This resolution was chosen as a trade-off between the coarseness of the mask verse the processing time for these operations. With 4-pixel filter resolutions these operations were still using over 90% of the total
G
Sentinel-5P NRTI NO2: Near Real-Time Nitrogen Dioxide
developers.google.com
Updated Jun 6, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
European Union/ESA/Copernicus (2019). Sentinel-5P NRTI NO2: Near Real-Time Nitrogen Dioxide [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_NRTI_L3_NO2
Explore at:
Dataset updated
Jun 6, 2019
Dataset provided by
European Union/ESA/Copernicus
Time period covered
Jul 10, 2018 - Dec 3, 2025
Area covered
Earth
Description
NRTI/L3_NO2 This dataset provides near real-time high-resolution imagery of NO2 concentrations. Nitrogen oxides (NO2 and NO) are important trace gases in the Earth's atmosphere, present in both the troposphere and the stratosphere. They enter the atmosphere as a result of anthropogenic activities (notably fossil fuel combustion and biomass burning) and natural processes (wildfires, lightning, and microbiological processes in soils). Here, NO2 is used to represent concentrations of collective nitrogen oxides because during daytime, i.e. in the presence of sunlight, a photochemical cycle involving ozone (O3) converts NO into NO2 and vice versa on a timescale of minutes. The TROPOMI NO2 processing system is based on the algorithm developments for the DOMINO-2 product and for the EU QA4ECV NO2 reprocessed dataset for OMI, and has been adapted for TROPOMI. This retrieval-assimilation-modelling system uses the 3-dimensional global TM5-MP chemistry transport model at a resolution of 1x1 degree as an essential element. More information. NRTI L3 Product To make our NRTI L3 products, we use harpconvert to grid the data. Example harpconvert invocation for one tile: harpconvert --format hdf5 --hdf5-compression 9 -a 'tropospheric_NO2_column_number_density_validity>50;derive(datetime_stop {time}); bin_spatial(2001, 50.000000, 0.01, 2001, -120.000000, 0.01); keep(NO2_column_number_density,tropospheric_NO2_column_number_density, stratospheric_NO2_column_number_density,NO2_slant_column_number_density, tropopause_pressure,absorbing_aerosol_index,cloud_fraction, sensor_altitude,sensor_azimuth_angle, sensor_zenith_angle,solar_azimuth_angle,solar_zenith_angle)' S5P_NRTI_L2_NO2_20181107T013042_20181107T013542_05529_01_010200_20181107T021824.nc output.h5 Sentinel-5 Precursor Sentinel-5 Precursor is a satellite launched on 13 October 2017 by the European Space Agency to monitor air pollution. The onboard sensor is frequently referred to as Tropomi (TROPOspheric Monitoring Instrument). All of the S5P datasets, except CH4, have two versions: Near Real-Time (NRTI) and Offline (OFFL). CH4 is available as OFFL only. The NRTI assets cover a smaller area than the OFFL assets, but appear more quickly after acquisition. The OFFL assets contain data from a single orbit (which, due to half the earth being dark, contains data only for a single hemisphere). Because of noise in the data, negative vertical column values are often observed in particular over clean regions or for low SO2 emissions. It is recommended not to filter these values except for outliers, i.e. for vertical columns lower than -0.001 mol/m^2. The original Sentinel 5P Level 2 (L2) data is binned by time, not by latitude/longitude. To make it possible to ingest the data into Earth Engine, each Sentinel 5P L2 product is converted to L3, keeping a single grid per orbit (that is, no aggregation across products is performed). Source products spanning the antimeridian are ingested as two Earth Engine assets, with suffixes _1 and _2. The conversion to L3 is done by the harpconvert tool using the bin_spatial operation. The source data is filtered to remove pixels with QA values less than: 80% for AER_AI 75% for the tropospheric_NO2_column_number_density band of NO2 50% for all other datasets except for O3 and SO2 The O3_TCL product is ingested directly (without running harpconvert).
u
Ecological niche models for mapping cultural ecosystem services (CES)
produccioncientifica.ugr.es
Updated 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Pérez-Girón, José Carlos; Martínez-López, Javier; Alcaraz-Segura, Domingo; Tabik, Siham; Molina Cabrera, Daniel; del Águila, Ana; Khaldi, Rohaifa; Pistón, Nuria; Moreno Llorca, Ricardo Antonio; Ros-Candeira, Andrea; Navarro, Carlos Javier; Elghouat, Akram; ARENAS-CASTRO, SALVADOR; Irati, Nieto Pacho; Manuel, Merino Ceballos; Luis F., Romero; Pérez-Girón, José Carlos; Martínez-López, Javier; Alcaraz-Segura, Domingo; Tabik, Siham; Molina Cabrera, Daniel; del Águila, Ana; Khaldi, Rohaifa; Pistón, Nuria; Moreno Llorca, Ricardo Antonio; Ros-Candeira, Andrea; Navarro, Carlos Javier; Elghouat, Akram; ARENAS-CASTRO, SALVADOR; Irati, Nieto Pacho; Manuel, Merino Ceballos; Luis F., Romero (2025). Ecological niche models for mapping cultural ecosystem services (CES) [Dataset]. https://produccioncientifica.ugr.es/documentos/688b602217bb6239d2d48d67
Explore at:
Dataset updated
2025
Authors
Pérez-Girón, José Carlos; Martínez-López, Javier; Alcaraz-Segura, Domingo; Tabik, Siham; Molina Cabrera, Daniel; del Águila, Ana; Khaldi, Rohaifa; Pistón, Nuria; Moreno Llorca, Ricardo Antonio; Ros-Candeira, Andrea; Navarro, Carlos Javier; Elghouat, Akram; ARENAS-CASTRO, SALVADOR; Irati, Nieto Pacho; Manuel, Merino Ceballos; Luis F., Romero; Pérez-Girón, José Carlos; Martínez-López, Javier; Alcaraz-Segura, Domingo; Tabik, Siham; Molina Cabrera, Daniel; del Águila, Ana; Khaldi, Rohaifa; Pistón, Nuria; Moreno Llorca, Ricardo Antonio; Ros-Candeira, Andrea; Navarro, Carlos Javier; Elghouat, Akram; ARENAS-CASTRO, SALVADOR; Irati, Nieto Pacho; Manuel, Merino Ceballos; Luis F., Romero
Description
Description

This dataset includes the inputs and outputs generated in the spatial modeling of CES using social media data for eight mountain parks in Spain and Portugal (Aigüestortes, Sierra de Guadarrama, Ordesa, Peneda-Gerês, Picos de Europa, Sierra de las Nieves, Sierra Nevada and Teide). This spatial modeling is addressed in the article in preparation entitled: "What drives cultural ecosystem services in mountain protected areas? An AI-assisted answer using social media."

The variables used as inputs to generate the models come from different sources:

-CES presence points come from social media photos (Flickr and Twitter) labeled using AI models and validated by experts. The models used for automatic labeling were Dino v2 and OPENAI's GPT 4.1 model. Consensus was sought on the labels from these two label sources, which showed F1 values above 0.75, and these labels were used as presence data.

The environmental variables used are mainly derived from:

OpenStreetMap (OSM) https://www.openstreetmap.org/

Variables derived from remote sensing

Topographic variables

Current and future climate variables derived from CHELSA (https://chelsa-climate.org/)

Landscape metrics (calculated with Fragstats software https://www.fragstats.org/)

Viewshed

Land use and land cover maps (https://land.copernicus.eu/en/products/corine-land-cover)

The models were generated with the maximum entropy (MaxEnt) algorithm using the biomod2 R package, leveraging its suitability for presence-only data, low sample sizes, and mixed predictor types. To address sampling bias, we generated 10 pseudo-absence replicates based on the “target-group background” method. Models were evaluated using AUC-ROC and True Skill Statistic (TSS), with performance validation via 10-fold cross-validation, resulting in 100 runs per model. Ensemble models were created from runs with AUC-ROC > 0.6, using the median for spatial projections of CES and the coefficient of variation to estimate uncertainty. We implemented two modelling approaches: one assuming consistent CES preferences across parks, and another assuming park-specific preferences shaped by local environmental contexts.

Table 1. Categories used in social media photo tagging: Stoten, based on the scientific framework proposed by Moreno-Llorca et al. (2020) (https://doi.org/10.1016/j.scitotenv.2020.140067).

Stoten

Cultural

Fauna/Flora

Gastronomy

Nature & Landscape

Not relevant

Recreational

Religious

Rural tourism

Sports

Sun and beach

Urban

Table 2. Table of contents of the dataset

Folder

format

Description

Inputs

Base layers

by National Park

100-meter grid

grid_wgs84_atrib

.shp

100 x 100 meter grid for each of the studied national parks that cover the study area

Biosphere Reserve

MAB_wgs84

.shp

Biosphere reserve layers present in each of the national parks studied

Municipality

Municipality

.shp

Layers of municipalities that overlap with the park area, biosphere reserve, Natura 2000 and the socioeconomic influence area with a 100-meter buffer

National park limit

National_park_limit

.shp

Boundaries of each of the national parks studied

Natura 2000

RN2000

.shp

Layers of the Natura 2000 for each of the national parks studied

Socioeconomic influence area

AIS

.shp

Area of socioeconomic influence of each of the parks studied

Readme

.txt

File containing layer metadata, including download locations and descriptions of shape attributes.

by National Park

Accessibility

.tif

Accessibility variables that include routes, streets, parking, and train tracks

Climate

.tif

Chelsea-derived climate variable layers and solar radiation layers

Ecosystem functioning

.tif

Layers derived from remote sensing that are related with the functional attributes of ecosystems

Ecosystem structure

.tif

Landscape and spectral diversity metrics

Geodiversity

.tif

Topographic and derived variables

Land use Land cover

.tif

Layers related to land use and cover

Tourism and Culture

.tif

Layers related to infrastructure associated with tourism such as bars, restaurants, lodgings and places of cultural interest such as monuments

Scripts

Modeling to get output data

Biomod_modelling_by_park

.R

Script used for modeling CES using data from social media by fitting one ENM for each park and CES.

Biomod_modelling_all_parks

.R

Script used for modeling CES using data from social media by fitting one ENM for each CES.

Modeling to get output data

Downloading and processing variables

EFAS

EFAs code

.js

GEE scripts used to download the Ecosystem Functional Attributes (EFAs) (Paruelo et al.2001; Alcaraz-Segura et al. 2006) derived from Sentinel 2 dataset for each of the national parks studied

OSM

1) Download layers

.py

Python scripts used to download the OpenStreetMap layers of interest for each of the national parks studied.

2) Join layers

.py

Scripts used to merge OSM layers belonging to the same category. e.g., primary, secondary, and tertiary highways.

3) Count point

.py

Scripts used to count the number of points in each of the 100 grid cells for each park, used in case of point type data

4) Presence and absence

.py

Scripts used to assess presence in each of the cells of the 100-square grid for each park, used in the case of data types such as points, lines, and polygons.

Remote sensing

Canopy

.js

GEE scripts used to download the canopy (https://gee-community-catalog.org/projects/canopy/) downloaded and cropped for each of the national parks studied

ESPI

.js

GEE scripts used to download the ESPI index (Ecosystem Service Provision Index) downloaded and cropped for each of the national parks studied

European disturbance map

.js

GEE scripts used to download European disturbance maps (//https://www.eea.europa.eu/data-and-maps/figures/biogeographical-regions-in-europe-2)

downloaded and cropped for each of the national parks studied

LST

.js

GEE scripts used to download LST maps (from Landsat Collection)

downloaded and cropped for each of the national parks studied

Night lights

.js

GEE scripts used to download nighttime light maps (https://developers.google.com/earth-engine/datasets/catalog/NOAA_VIIRS_DNB_ANNUAL_V22)

downloaded and cropped for each of the national parks studied

Population density

.js

GEE scripts used to download population density maps (https://developers.google.com/earth-engine/datasets/catalog/CIESIN_GPWv411_GPW_Population_Density)

downloaded and cropped for each of the national parks studied

Soil groups

.js

GEE scripts used to download Hydrologic Soil Group maps (https://gee-community-catalog.org/projects/hihydro_soil/)

downloaded and cropped for each of the national parks studied

Solar radiation

.js

GEE scripts used to download solar radiation maps (https://globalsolaratlas.info/support/faq)

downloaded and cropped for each of the national parks studied

RGB diversity

Seasonal KMeans clustering

.js

GEE scripts were used to calculate seasonal clusters using Sentinel 2 RGB bands with GEE's .wekaKMeans algorithm. These layers were downloaded and cropped for each of the national parks studied.

Colour diversity analysis

.R

R script used to calculate spectral diversity (Shannon, Simpson and inverse Simpson) using the cluster layers and RGB bands derived from Sentinel 2.

Post processing

Align_and_Clip_rasters

.py

Python scripts used to align and clip the downloaded layers to a 100-meter grid reference layer for each of the national parks studied.

Outputs

CES projections

proj_Aiguestortes_Sports_ensemble

.tif

Spatial projections for the best models obtained for each CES and park

References:

Alcaraz-Segura, D., Paruelo, J., and Cabello, J. 2006: Identification of current ecosystem functional types in the Iberian Peninsula, Global Ecol. Biogeogr., 15, 200–212, https://doi.org/10.1111/j.1466-822X.2006.00215.x

Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, H.P., Kessler, M., 2017. Climatologies at high resolution for the earth’s land surface areas. Sci Data 4, 170122. https://doi.org/10.1038/sdata.2017.122

Lobo, J.M., Jiménez-Valverde, A., Hortal, J., 2010. The uncertain nature of absences and their importance in species distribution modelling. Ecography 33, 103–114. https://doi.org/10.1111/j.1600-0587.2009.06039.x

Paruelo, J. M., Jobbágy, E. G., and Sala, O. E. 2001: Current Distribution of Ecosystem Functional Types in Temperate South America, Ecosystems, 4, 683–698, https://doi.org/10.1007/s10021-001-0037-9

Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026

Phillips, S.J., Dudík, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S., 2009. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications 19, 181–197. https://doi.org/10.1890/07-2153.1

Thuiller, W., Georges, D., Gueguen, M., Engler, R., Breiner, F., Lafourcade, B., Patin, R., 2023. biomod2: Ensemble Platform for Species Distribution Modeling.

Sillero, N., Arenas-Castro, S., Enriquez‐Urzelai, U., Vale, C.G., Sousa-Guedes, D., Martínez-Freiría, F., Real,
d
Data from: Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS)
data.gov.au
researchdata.edu.au
html, png
Updated Jun 23, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Ocean Data Network (2025). Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS) [Dataset]. https://www.data.gov.au/data/dataset/australian-coastline-50k-2024-nesp-mac-3-17-aims
Explore at:
html, pngAvailable download formats
Dataset updated
Jun 23, 2025
Dataset authored and provided by
Australian Ocean Data Network
Area covered
Australia
Description
This dataset corresponds to land area polygons of Australian coastline and surrounding islands. It was generated from 10 m Sentinel 2 imagery from 2022 - 2024 using the Normalized Difference Water Index (NDWI) to distinguish land from water. It was estimated from composite imagery made up from images where the tide is above the mean sea level. The coastline approximately corresponds to the mean high water level. This dataset was created as part of the NESP MaC 3.17 northern Australian Reef mapping project. It was developed to allow the inshore edge of digitised fringing reef features to be neatly clipped to the land areas without requiring manual digitisation of the neighbouring coastline. This required a coastline polygon with an edge positional error of below 50 m so as to not distort the shape of small fringing reefs. We found that existing coastline datasets such as the Geodata Coast 100K 2004 and the Australian Hydrographic Office (AHO) Australian land and coastline dataset did not meet our needs. The scale of the Geodata Coast 100K 2004 was too coarse to represent small islands and the the positional error of the Australian Hydrographic Office (AHO) Australian land and coastline dataset was too high (typically 80 m) for our application as the errors would have introduced significant errors in the shape of small fringing reefs. The Digital Earth Australia Coastline (GA) dataset was sufficiently accurate and detailed however the format of the data was unsuitable for our application as the coast was expressed as disconnected line features between rivers, rather than a closed polygon of the land areas. We did however base our approach on the process developed for the DEA coastline described in Bishop-Taylor et al., 2021 (https://doi.org/10.1016/j.rse.2021.112734). Adapting it to our existing Sentinel 2 Google Earth processing pipeline. The difference between the approach used for the DEA coastline and this dataset was the DEA coastline performed the tidal calculations and filtering at the pixel level, where as in this dataset we only estimated a single tidal level for each whole Sentinel image scene. This was done for computational simplicity and to align with our existing Google Earth Engine image processing code. The images in the stack were sorted by this tidal estimate and those with a tidal high greater than the mean seal level were combined into the composite. The Sentinel 2 satellite follows a sun synchronous orbit and so does not observe the full range of tidal levels. This observed tidal range varies spatially due to the relative timing of peak tides with satellite image timing. We made no accommodation for variation in the tidal levels of the images used to calculate the coastline, other than selecting images that were above the mean tide level. This means tidal height that the dataset coastline corresponds to will vary spatially. While this approach is less precise than that used in the DEA Coastline the resulting errors were sufficiently low to meet the project goals.
This simplified approach was chosen because it integrated well with our existing Sentinel 2 processing pipeline for generating composite imagery. To verify the accuracy of this dataset we manually checked the generated coastline with high resolution imagery (ArcGIS World Imagery). We found that 90% of the coastline polygons in this dataset have a horizontal position error of less than 20 m when compared to high-resolution imagery, except for isolated failure cases. During our manual checks we identified some areas where our algorithm can lead to falsely identifying land or not identifying land. We identified specific scenarios, or 'failure modes,' where our algorithm struggled to distinguish between land and water. These are shown in the image "Potential failure modes": a) The coastline is pushed out due to breaking waves (example: western coast, S2 tile ID 49KPG). b) False land polygons are created because of very turbid water due to suspended sediment. In clear water areas the near infrared channel is almost black, starkly different to the bright land areas. In very highly turbid waters the suspended sediment appears in the near infrared channel, raising its brightness to a level where it starts to overlap with the brightness of the dimmest land features. (example: Joseph Bonaparte Gulf, S2 tile ID 52LEJ). This results in turbid rivers not being correctly mapped. In version 1-1 of the dataset the rivers across northern Australia were manually corrected for these failures. c) Very shallow, gentle sloping areas are not recognised as water and the coastline is pushed out (example: Mornington Island, S2 tile ID 54KUG). Update: A second review of this area indicated that the mapped coastline is likely to be very close to the try coastline. d) The coastline is lower than the mean high water level (example: Great Keppel (Wop-pa) Island, S2 tile ID 55KHQ). Some of these potential failure modes could probably be addressed in the future by using a higher resolution tide calculation and using adjusted NDWI thresholds per region to accommodate for regional differences. Some of these failure modes are likely due to the near infrared channel (B8) being able to penetrate the water approximately 0.5 m leading to errors in very shallow areas. Some additional failures include: - Interpreting jetties as land - Interpreting oil rigs as land - Bridges being interpreted as land, cutting off rivers Methods: The coastline polygons were created in four separate steps: 1. Create above mean sea level (AMSL) composite images. 2. Calculate the Normalized Difference Water Index (NDWI) and visualise as a grey scale image. 3. Generate vector polygons from the grey scale image using a NDWI threshold. 4. Clean up and merge polygons. To create the AMSL composite images, multiple Sentinel 2 images were combined using the Google Earth Engine. The core algorithm was: 1. For each Sentinel 2 tile filter the "COPERNICUS/S2_HARMONIZED" image collection by - tile ID - maximum cloud cover 20% - date between '2022-01-01' and '2024-06-30' - asset_size > 100000000 (remove small fragments of tiles) 2. Remove high sun-glint images (see "High sun-glint image detection" for more information). 3. Split images by "SENSING_ORBIT_NUMBER" (see "Using SENSING_ORBIT_NUMBER for a more balanced composite" for more information). 4. Iterate over all images in the split collections to predict the tide elevation for each image from the image timestamp (see "Tide prediction" for more information). 5. Remove images where tide elevation is below mean sea level. 6. Select maximum of 200 images with AMSL tide elevation. 7. Combine SENSING_ORBIT_NUMBER collections into one image collection. 8. Remove sun-glint and apply atmospheric correction on each image (see "Sun-glint removal and atmospheric correction" for more information). 9. Duplicate image collection to first create a composite image without cloud masking and using the 15th percentile of the images in the collection (i.e. for each pixel the 15th percentile value of all images is used). 10. Apply cloud masking to all images in the original image collection (see "Cloud Masking" for more information) and create a composite by using the 15th percentile of the images in the collection (i.e. for each pixel the 15th percentile value of all images is used). 11. Combine the two composite images (no cloud mask composite and cloud mask composite). This solves the problem of some coral cays and islands being misinterpreted as clouds and therefore creating holes in the composite image. These holes are "plugged" with the underlying composite without cloud masking. (Lawrey et al. 2022) Next, for each image the NDWI was calculated: 1. Calculate the normalised difference using the B3 (green) and B8 (near infrared). 2. Shift the value range from between -1 and +1 to values between 1 and 255 (0 reserved as no-data value). 3. Export image as 8 bit unsigned Integer grey scale image. During the next step, we generated vector polygons from the grey scale image using a NDWI threshold: 1. Upscale image to 5 m resolution using bilinear interpolation. This was to help smooth the coastline and reduce the error introduced by the jagged pixel edges. 2. Apply a threshold to create a binary image (see "NDWI Threshold" for more information) with the value 1 for land and 2 for water (0: no data). 3. Create polygons for land values (1) in the binary image. 4. Export as shapefile. Finally, we created a single layer from the vectorised images: 1. Merge and dissolve all vector layers in QGIS. 2. Perform smoothing (QGIS toolbox, Iterations 1, Offset 0.25, Maximum node angle to smooth 180). 3. Perform simplification (QGIS toolbox, tolerance 0.00003). 4. Remove polygon vertices on the inner circle to fill out the continental Australia. 5. Perform manual QA/QC. In this step we removed false polygons created due to sun glint and breaking waves. We also removed very small features (1 – 1.5 pixel sized features, e.g. single mangrove trees) by calculating the area of each feature (in m2) and removing features smaller than 200 m2. 15th percentile composite: The composite image was created using the 15th percentile of the pixels values in the image stack. The 15th percentile was chosen, in preference to the median, to select darker pixels in the stack as these tend to correspond to images with clearer water conditions and higher tides. High sun-glint image detection: Images with high sun-glint can lead to lower quality composite images. To determine high sun-glint images, a land mask was first applied to the image to only retain water pixels. This land mask was estimated using NDWI. The proportion of the water pixels in the near-infrared and short-wave infrared bands above a sun-glint threshold was calculated. Images with a high proportion were then filtered out of the image collection.
Sun-glint removal and atmospheric correction: The Top of Atmosphere L1
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

European Union/ESA/Copernicus (2020). Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR) [Dataset]. https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED

Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR)

Explore at:

150 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 30, 2020

Dataset provided by

European Space Agencyhttp://www.esa.int/

Time period covered

Mar 28, 2017 - Dec 2, 2025

Area covered

Description

After 2022-01-25, Sentinel-2 scenes with PROCESSING_BASELINE '04.00' or above have their DN (value) range shifted by 1000. The HARMONIZED collection shifts data in newer scenes to be in the same range as in older scenes. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the …

Clear search

Close search

Google apps

Main menu

Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR)

SEN12TP - Sentinel-1 and -2 images, timely paired

Global overview of cloud-, snow-, and shade-free Landsat (1982-2024) and...

Sentinel-2: Cloud Probability

Coral Sea Sentinel 2 Marine Satellite Composite Draft Imagery version 0...

North Australia Sentinel 2 Satellite Composite Imagery - 15th percentile...

Mumbai-Slum-Detection-Dataset

Dataset Summary

🛰️ Data Source

📦 Data Format

📜 Coordinate System Details

Additional Details

Data from: Coral Sea features satellite imagery and raw depth contours...

Wetland Land-Cover Segmentation and Classification in the Netherlands...

BiSO-CR

Read first

Tropical Australia Sentinel 2 Satellite Composite Imagery - Low Tide - 30th...

Sentinel-5P NRTI NO2: Near Real-Time Nitrogen Dioxide

Ecological niche models for mapping cultural ecosystem services (CES)

Data from: Australian Coastline 50K 2024 (NESP MaC 3.17, AIMS)

Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR)See More Versions

Harmonized Sentinel-2 MSI: MultiSpectral Instrument, Level-2A (SR)