After 2022-01-25, Sentinel-2 scenes with PROCESSING_BASELINE '04.00' or above have their DN (value) range shifted by 1000. The HARMONIZED collection shifts data in newer scenes to be in the same range as in older scenes. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the …
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).
Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):
For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:
To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.
© Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)
The S2 cloud probability is created with the sentinel2-cloud-detector library (using LightGBM). All bands are upsampled using bilinear interpolation to 10m resolution before the gradient boost base algorithm is applied. The resulting 0..1 floating point probability is scaled to 0..100 and stored as an UINT8. Areas missing any or all …
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The SEN12TP dataset (Sentinel-1 and -2 imagery, timely paired) contains 2319 scenes of Sentinel-1 radar and Sentinel-2 optical imagery together with elevation and land cover information of 1236 distinct ROIs taken between 28 March 2017 and 31 December 2020. Each scene has a size of 20km x 20km at 10m pixel spacing. The time difference between optical and radar images is at most 12h, but for almost all scenes it is around 6h since the orbits of Sentinel-1 and -2 are shifted like that. Next to the \(\sigma^\circ\) radar backscatter also the radiometric terrain corrected \(\gamma^\circ\) radar backscatter is calculated and included. \(\gamma^\circ\) values are calculated using the volumetric model presented by Vollrath et. al 2020.
The uncompressed dataset has a size of 222 GB and is split spatially into a train (~90%) and a test set (~10%). For easier download the train set is split into four separate zip archives.
Please cite the following paper when using the dataset, in which the design and creation is detailed:
T. Roßberg and M. Schmitt. A globally applicable method for NDVI estimation from Sentinel-1 SAR backscatter using a deep neural network and the SEN12TP dataset. PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 2023. https://doi.org/10.1007/s41064-023-00238-y.
The file sen12tp-metadata.json
includes metadata of the selected scenes. It includes for each scene the geometry, an ID for the ROI and the scene, the climate and land cover information used when sampling the central point, the timestamps (in ms) when the Sentinel-1 and -2 image was taken, the month of the year, and the EPSG code of the local UTM Grid (e.g. EPSG:32643 - WGS 84 / UTM zone 43N).
Naming scheme: The images are contained in directories called {roi_id}_{scene_id}, as for some unique regions image pairs of multiple dates are included. In each directory are six files for the different modalities with the naming {scene_id}_{modality}.tif. Multiple modalities are included: radar backscatter and multispectral optical images, the elevation as DSM (digital surface model) and different land cover maps.
name | Modality | GEE collection |
---|---|---|
s1 | Sentinel-1 radar backscatter | COPERNICUS/S1_GRD |
s2 | Sentinel-2 Level-2A (Bottom of atmosphere, BOA) multispectral optical data with added cloud probability band | COPERNICUS/S2_SR COPERNICUS/S2_CLOUD_PROBABILITY |
dsm | 30m digital surface model | JAXA/ALOS/AW3D30/V3_2 |
worldcover | land cover, 10m resolution | ESA/WorldCover/v100 |
The following bands are included in the tif files, for an further explanation see the documentation on GEE. All bands are resampled to 10m resolution and reprojected to the coordinate reference system of the Sentinel-2 image.
Modality | Band count | Band names in tif file | Notes |
s1 | 5 | VV_sigma0, VH_sigma0, VV_gamma0flat, VH_gamma0flat, incAngle | VV/VH_sigma0 are the \(\sigma^\circ\) values, VV/VH_gamma0flat are the radiometric terrain corrected \(\gamma^\circ\) backscatter values incAngle is the incident angle |
s2 | 13 | B1, B2, B3, B4, B5, B7, B7, B8, B8A, B9, B11, B12, cloud_probability | multispectral optical bands and the probability that a pixel is cloudy, calculated with the sentinel2-cloud-detector library optical reflectances are bottom of atmosphere (BOA) reflectances calculated using sen2cor |
dsm | 1 | DSM | Height above sea level. Signed 16 bits. Elevation (in meter) converted from the ellipsoidal height based on ITRF97 and GRS80, using EGM96†1 geoid model. |
worldcover | 1 | Map | Landcover class |
Checking the file integrity
After downloading and decompression the file integrity can be checked using the provided file of md5 checksum.
Under Linux: md5sum --check --quiet md5sums.txt
References:
Vollrath, Andreas, Adugna Mullissa, Johannes Reiche (2020). "Angular-Based Radiometric Slope Correction for Sentinel-1 on Google Earth Engine". In: Remote Sensing 12.1, Art no. 1867. https://doi.org/10.3390/rs12111867.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Links:
Sentinel-2: Cloud Probability in Earth Engine's Public Data Catalog
Sentinel-2: Cloud Probability in Earth Engine STAC viewed with STAC Browser
The S2 cloud probability is created with the sentinel2-cloud-detector library (using LightGBM). All bands are upsampled using bilinear interpolation to 10m resolution before the gradient boost base algorithm is applied. The resulting 0..1 floating point probability is scaled to 0..100 and stored as a UINT8. Areas missing any or all of the bands are masked out. Higher values are more likely to be clouds or highly reflective surfaces (e.g. roof tops or snow).
Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the monitoring of vegetation, soil and water cover, as well as observation of inland waterways and coastal areas.
The Level-2 data can be found in the collection COPERNICUS/S2_SR. The Level-1B data can be found in the collection COPERNICUS/S2. Additional metadata is available on assets in those collections.
See this tutorial explaining how to apply the cloud mask.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Landsat and Sentinel-2 acquisitions are among the most frequently used medium-resolution (i.e., 10-30 m) optical data. The data are extensively used in terrestrial vegetation applications, including but not limited to, land cover and land use mapping, vegetation condition and phenology monitoring, and disturbance and change mapping. While the Landsat archives alone provide over 40 years, and counting, of continuous and consistent observations, since mid-2015 Sentinel-2 has enabled a revisit frequency of up to 2-days. Although the spatio-temporal availability of both data archives is well-known at the scene level, information on the actual availability of usable (i.e., cloud-, snow-, and shade-free) observations at the pixel level needs to be explored for each study to ensure correct parametrization of used algorithms, thus robustness of subsequent analyses. However, a priori data exploration is time and resource‑consuming, thus is rarely performed. As a result, the spatio-temporal heterogeneity of usable data is often inadequately accounted for in the analysis design, risking ill-advised selection of algorithms and hypotheses, and thus inferior quality of final results. Here we present a global dataset comprising precomputed daily availability of usable Landsat and Sentinel-2 data sampled at a pixel-level in a regular 0.18°-point grid. We based the dataset on the complete 1982-2024 Landsat surface reflectance data (Collection 2) and 2015-2024 Seninel-2 top-of-the-atmosphere reflectance scenes (pre‑Collection-1 and Collection-1). Derivation of cloud-, snow-, and shade-free observations followed the methodology developed in our recent study on data availability over Europe (Lewińska et al., 2023; https://doi.org/10.20944/preprints202308.2174.v2). Furthermore, we expanded the dataset with growing season information derived based on the 2001‑2019 time series of the yearly 500 m MODIS land cover dynamics product (MCD12Q2; Collection 6). As such, our dataset presents a unique overview of the spatio-temporal availability of usable daily Landsat and Sentinel-2 data at the global scale, hence offering much-needed a priori information aiding the identification of appropriate methods and challenges for terrestrial vegetation analyses at the local to global scales. The dataset can be viewed using the dedicated GEE App (link in Related Works). As of February 2025 the dataset has been extended with the 2024 data. Methods We based our analyses on freely and openly accessible Landsat and Sentinel-2 data archives available in Google Earth Engine (Gorelick et al., 2017). We used all Landsat surface reflectance Level 2, Tier 1, Collection 2 scenes acquired with the Thematic Mapper (TM) (Earth Resources Observation And Science (EROS) Center, 1982), Enhanced Thematic Mapper (ETM+) (Earth Resources Observation And Science (EROS) Center, 1999), and Operational Land Imager (OLI) (Earth Resources Observation And Science (EROS) Center, 2013) scanners between 22nd August 1982 and 31st December 2024, and Sentinel-2 TOA reflectance Level-1C scenes (pre‑Collection-1 (European Space Agency, 2015, 2021) and Collection-1 (European Space Agency, 2022)) acquired with the MultiSpectral Instrument (MSI) between 23rd June 2015 and 31st December 2024. We implemented a conservative pixel-quality screening to identify cloud-, snow-, and shade-free land pixels. For the Landsat time series, we relied on the inherent pixel quality bands (Foga et al., 2017; Zhu & Woodcock, 2012) excluding all pixels flagged as cloud, snow, or shadow as well as pixels with the fill-in value of 20,000 (scale factor 0.0001; (Zhang et al., 2022)). Furthermore, due to the Landsat 7 orbit drift (Qiu et al., 2021) we excluded all ETM+ scenes acquired after 31st December 2020. Because Sentinel-2 Level-2A quality masks lack the desired scope and accuracy (Baetens et al., 2019; Coluzzi et al., 2018), we resorted to Level-1C scenes accompanied by the supporting Cloud Probability product. Furthermore, we employed a selection of conditions, including a threshold on Band 10 (SWIR-Cirrus), which is not available at Level‑2A. Overall, our Sentinel-2-specific cloud, shadow, and snow screening comprised:
exclusion of all pixels flagged as clouds and cirrus in the inherent ‘QA60’ cloud mask band; exclusion of all pixels with cloud probability >50% as defined in the corresponding Cloud Probability product available for each scene; exclusion of cirrus clouds (B10 reflectance >0.01); exclusion of clouds based on Cloud Displacement Analysis (CDI<‑0.5) (Frantz et al., 2018); exclusion of dark pixels (B8 reflectance <0.16) within cloud shadows modelled for each scene with scene‑specific sun parameters for the clouds identified in the previous steps. Here we assumed a cloud height of 2,000 m. exclusion of pixels within a 40-m buffer (two pixels at 20-m resolution) around each identified cloud and cloud shadow object. exclusion of snow pixels identified with a snow mask branch of the Sen2Cor processor (Main-Knorn et al., 2017).
Through applying the data screening, we generated a collection of daily availability records for Landsat and Sentinel-2 data archives. We next subsampled the resulting binary time series with a regular 0.18° x 0.18°‑point grid defined in the EPSG:4326 projection, obtaining 475,150 points located over land between ‑179.8867°W and 179.5733°E and 83.50834°N and ‑59.05167°S. Owing to the substantial amount of data comprised in the Landsat and Sentinel-2 archives and the computationally demanding process of cloud-, snow-, and shade-screening, we performed the subsampling in batches corresponding to a 4° x 4° regular grid and consolidated the final data in post-processing. We derived the pixel-specific growing season information from the 2001-2019 time series of the yearly 500‑m MODIS land cover dynamics product (MCD12Q2; Collection 6) available in Google Earth Engine. We only used information on the start and the end of a growing season, excluding all pixels with quality below ‘best’. When a pixel went through more than one growing cycle per year, we approximated a growing season as the period between the beginning of the first growing cycle and the end of the last growing cycle. To fill in data gaps arising from low-quality data and insufficiently pronounced seasonality (Friedl et al., 2019), we used a 5x5 mean moving window filter to ensure better spatial continuity of our growing season datasets. Following (Lewińska et al., 2023), we defined the start of the season as the pixel-specific 25th percentile of the 2001-2019 distribution for the start of the season dates, and the end of the season as the pixel-specific 75th percentile of the 2001-2019 distribution for end of the season dates. Finally, we subsampled the start and end of the season datasets with the same regular 0.18° x 0.18°-point grid defined in the EPSG:4326 projection. References:
Baetens, L., Desjardins, C., & Hagolle, O. (2019). Validation of Copernicus Sentinel-2 Cloud Masks Obtained from MAJA, Sen2Cor, and FMask Processors Using Reference Cloud Masks Generated with a Supervised Active Learning Procedure. Remote Sensing, 11(4), 433. https://doi.org/10.3390/rs11040433 Coluzzi, R., Imbrenda, V., Lanfredi, M., & Simoniello, T. (2018). A first assessment of the Sentinel-2 Level 1-C cloud mask product to support informed surface analyses. Remote Sensing of Environment, 217, 426–443. https://doi.org/10.1016/j.rse.2018.08.009 Earth Resources Observation And Science (EROS) Center. (1982). Collection-2 Landsat 4-5 Thematic Mapper (TM) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P918ROHC Earth Resources Observation And Science (EROS) Center. (1999). Collection-2 Landsat 7 Enhanced Thematic Mapper Plus (ETM+) Level-1 Data Products [dataset]. U.S. Geological Survey. https://doi.org/10.5066/P9TU80IG Earth Resources Observation And Science (EROS) Center. (2013). Collection-2 Landsat 8-9 OLI (Operational Land Imager) and TIRS (Thermal Infrared Sensor) Level-1 Data Products [Other]. U.S. Geological Survey. https://doi.org/10.5066/P975CC9B European Space Agency. (2015). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl European Space Agency. (2021). Sentinel-2 MSI Level-1C TOA Reflectance, Collection 0 [dataset]. European Space Agency. https://doi.org/10.5270/S2_-d8we2fl European Space Agency. (2022). Sentinel-2 MSI Level-1C TOA Reflectance [dataset]. European Space Agency. https://doi.org/10.5270/S2_-742ikth Foga, S., Scaramuzza, P. L., Guo, S., Zhu, Z., Dilley, R. D., Beckmann, T., Schmidt, G. L., Dwyer, J. L., Joseph Hughes, M., & Laue, B. (2017). Cloud detection algorithm comparison and validation for operational Landsat data products. Remote Sensing of Environment, 194, 379–390. https://doi.org/10.1016/j.rse.2017.03.026 Frantz, D., Haß, E., Uhl, A., Stoffels, J., & Hill, J. (2018). Improvement of the Fmask algorithm for Sentinel-2 images: Separating clouds from bright surfaces based on parallax effects. Remote Sensing of Environment, 215, 471–481. https://doi.org/10.1016/j.rse.2018.04.046 Friedl, M., Josh, G., & Sulla-Menashe, D. (2019). MCD12Q2 MODIS/Terra+Aqua Land Cover Dynamics Yearly L3 Global 500m SIN Grid V006 [dataset]. NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MCD12Q2.006 Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. https://doi.org/10.1016/j.rse.2017.06.031Lewińska K.E., Ernst S., Frantz D., Leser U., Hostert P., Global Overview of Usable Landsat and Sentinel-2 Data for 1982–2023. Data in Brief 57, (2024) https://doi.org/10.1016/j.dib.2024.111054 Main-Knorn, M., Pflug, B., Louis, J., Debaecker, V., Müller-Wilm, U., & Gascon, F. (2017). Sen2Cor for Sentinel-2. In L. Bruzzone, F. Bovolo,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Land use/land cover (LULC) mapping in fragmented landscapes, characterized by multiple and small land uses, is still a challenge. This study aims to evaluate the effectiveness of Synthetic Aperture Radar (SAR) and multispectral optical data in land cover mapping using Google Earth Engine (GEE), a cloud computing platform allowing big geospatial data analysis. The proposed approach combines multi-source satellite imagery for accurate land cover classification in a fragmented municipal territory in Southern Italy over a 5-month vegetative period. Within the GEE platform, a set of Sentinel-1, Sentinel-2, and Landsat 8 data was acquired and processed to generate a land cover map for the 2021 greenness period. A supervised pixel-based classification was performed, using a Random Forest (RF) machine learning algorithm, to classify the imagery and derived spectral indices into eight land cover classes. Classification accuracy was assessed using Overall Accuracy (OA), Producer’s and User’s accuracies (PA, UA), and F-score. McNemar’s test was applied to assess the significance of difference between classification results. The optical integrated datasets in combination with SAR data and derivate indices (NDVI, GNDVI, NDBI, VHVV) produce the most accurate LULC map among those produced (OA: 89.64%), while SAR-only datasets performed the lowest accuracy (OA: 61.30%). The classification process offers several advantages, including widespread spectral information, SAR’s ability to capture almost all-weather, day-and-night imagery, and the computation of vegetation indices in the near infrared spectrum interval, in a short revisit time. The proposed digital techniques for processing multi-temporal satellite data provide useful tools for understanding territorial and environmental dynamics, supporting decision-making in land use planning, agricultural expansion, and environmental management in fragmented landscapes.
Sentinel-2 is a constellation of two Earth observation satellites, developed under the direction of the European Space Agency, as part of the European Commission’s ambitious Copernicus Earth observation program. The wide-swath, multi-spectral imaging capabilities of the Sentinel-2 satellites provide an unprecedented view of our Earth, covering all of the Earth’s land masses, large islands, and waterways. Sentinel-2 data is ideal for agriculture, forestry, and other land management applications. For example, it can be used to study leaf area as well as chlorophyll and water content; to map forest cover and soils; and to monitor inland waterways and coastal areas. Images of natural disasters such as floods and volcanic eruptions can be used for disaster mapping and humanitarian relief efforts. The Sentinel-2 mission consists of two identical satellites: Sentinel-2A, launched on June 23, 2015, and Sentinel-2B, launched in 2017. With both satellites launched, the constellation can revisit each point on the Earth's surface every five days. Each satellite carries a Multi-Spectral Instrument (MSI) that produces images of the Earth with a resolution as fine as ten meters per pixel and spanning a 290 km field of view in thirteen bands across the visible and infrared. This dataset includes a Pub/Sub topic you can subscribe to in order to be notified of updates. Subscribe to the topic 'projects/gcp-public-data---sentinel-2/topics/gcp-public-data-sentinel-2'. Use the Pub/Sub Quickstarts guide to learn more. Thanks to the free, full, and open data policy of the European Commission and European Space Agency, this dataset is available free as part of the Google Public Cloud Data program. It can be used by anyone as part of Google Cloud.
Cloud Score+ is a quality assessment (QA) processor for medium-to-high resolution optical satellite imagery. The Cloud Score+ S2_HARMONIZED dataset is being operationally produced from the harmonized Sentinel-2 L1C collection, and Cloud Score+ outputs can be used to identify relatively clear pixels and effectively remove clouds and cloud shadows from L1C (Top-of-Atmosphere) or L2A (Surface Reflectance) imagery. The Cloud Score+ S2_HARMONIZED dataset includes two QA bands, cs and cs_cdf, that both grade the usability of individual pixels with respect to surface visibility on a continuous scale between 0 and 1, where 0 represents "not clear" (occluded), while 1 represents "clear" (unoccluded) observations. The cs band scores QA based on a spectral distance between the observed pixel and a (theoretical) clear reference observation, while the cs_cdf band represents the likelihood an observed pixel is clear based on an estimated cumulative distribution of scores for a given location through time. In other words, cs can be thought of as a more instantaneous atmospheric similarity score (i.e., how similar is this pixel to what we'd expect to see in a perfectly a clear reference), while cs_cdf captures an expectation of the estimated score through time (i.e., if we had all the scores for this pixel through time, how would this score rank?). Images in the Cloud Score+ S2_HARMONIZED collection have the same id and system:index properties as the individual Sentinel-2 L1C assets from which they were produced such that Cloud Score+ bands can be linked to source images based on their shared system:index. Cloud Score+ backfill for the entire Sentinel-2 archive is currently in progress and Dataset Availability dates will be updated periodically as new results are added to the Cloud Score+ collection. For more information about the Cloud Score+ dataset and modelling approach, see this Medium post.
The Sentinel-2 Bare Earth thematic product provides the first national scale mosaic of the Australian continent to support improved mapping of soil and geology. The bare earth algorithm using all available Sentinel-2 A and Sentinel-2 B observations up to September 2020 preferentially weights bare pixels through time to significantly reduce the effect of seasonal vegetation in the imagery. The result are image pixels that are more likely to reflect the mineralogy and/or geochemistry of soil and bedrock. The algorithm uses a high-dimensional weighted geometric median approach that maintains the spectral relationships across all Sentinel-2 bands. A similar bare earth algorithm has been applied to Geoscience Australia’s deeper Landsat time series archive (please search for "Landsat barest Earth". Both bare earth products have spectral bands in the visible near infrared and shortwave infrared region of the electromagnetic spectrum. However, the main visible and near-infrared Sentinel-2 bands have a spatial resolution of 10 meters compared to 30m for the Landsat TM equivalents. The weighted median approach is robust to outliers (such as cloud, shadows, saturation, corrupted pixels) and also maintains the relationship between all the spectral wavelengths in the spectra observed through time.
Not all the sentinel-2 bands have been processed - we have excluded atmospheric bands including 1, 9 and 10. The remaining bands have been re-number 1-10 and these bands correlate to the original bands in brackets below: 1 = blue (2) , 2 = green (3) , 3 = red (4), 4 = vegetation red edge (5), 5 = vegetation red edge (6), 6= vegetation red edge (7), 7 = NIR(8), 8 = Narrow NIR (8a), 9 = SWIR1 (11) and 10 = SWIR2(12).
All 10 bands have been resampled to 10 meters to facilitate band integration and use in machine learning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Region of Interest (ROI) is comprised of the Belgium, the Netherlands and Luxembourg
We use the communes adminitrative division which is standardized across Europe by EUROSTAT at: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units This is roughly equivalent to the notion municipalities in most countries.
From the link above, communes definition are taken from COMM_RG_01M_2016_4326.shp and country borders are taken from NUTS_RG_01M_2021_3035.shp.
images: Sentinel2 RGB from 2020-01-01 to 2020-31-12 filtered out pixels with clouds acoording to QA60 band following the example given in GEE dataset info page at: see https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED
see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/sentinel2rgbmedian2020.py
labels: Global Human Settlement Layers, Population Grid 2015
labels range from 0 to 31, with the following meaning:
label value original value in GEE dataset
0 0
1 1-10
2 11-20
3 21-30
...
31 >=291
see https://developers.google.com/earth-engine/datasets/catalog/JRC_GHSL_P2016_POP_GPW_GLOBE_V1
see also https://github.com/rramosp/geetiles/blob/main/geetiles/defs/humanpop2015.py
_aschips.geojson the image chips geometries along with label proportions for easy visualization with QGIS, GeoPandas, etc.
_communes.geojson the communes geometries with their label prortions for easy visualization with QGIS, GeoPandas, etc.
splits.csv contains two splits of image chips in train, test, val - with geographical bands at 45° angles in nw-se direction - the same as above reorganized to that all chips within the same commune fall within the same split.
data/ a pickle file for each image chip containing a dict with - the 100x100 RGB sentinel 2 chip image - the 100x100 chip level lavels - the label proportions of the chip - the aggregated label proportions of the commune the chip belongs to
Description
This dataset contains Sentinel-2 raw bands and reanalysis (NDVI, EVI, TCG) with valid pixel fractions per field. Each parquet file contains multiple sunflower crop fields (identified as field_id). Each field has 36 time stamps per year which is 10-day interval aggregation of the values of scaled and filtered bands and reanalysis. Shape files of field polygons were loaded from Google Earth Engine (GEE) project assets.
Content
Country: Hungary Fields: Sunflower… See the full description on the dataset page: https://huggingface.co/datasets/jaelin215/sentinel-2-reanalysis-NDVI-EVI-TCG.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sentinel-2 L2A 120m mosaic is a derived product, which contains best pixel values for 10-daily periods, modelled by removing the cloudy pixels and then performing interpolation among remaining values. As clouds can be missed and as there are some parts of the world which have lengthy cloudy periods, clouds might be remaining in some parts. The actual modelling script is available here.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Images and 2-class labels for semantic segmentation of Sentinel-2 and Landsat RGB, NIR, and SWIR satellite images of coasts (water, other)
Images and 2-class labels for semantic segmentation of Sentinel-2 and Landsat 5-band (R+G+B+NIR+SWIR) satellite images of coasts (water, other)
Description
3649 images and 3649 associated labels for semantic segmentation of Sentinel-2 and Landsat 5-band (R+G+B+NIR+SWIR) satellite images of coasts. The 2 classes are 1=water, 0=other. Imagery are a mixture of 10-m Sentinel-2 and 15-m pansharpened Landsat 7, 8, and 9 visible-band imagery of various sizes. Red, Green, Blue, near-infrared, and short-wave infrared bands only
These images and labels could be used within numerous Machine Learning frameworks for image segmentation, but have specifically been made for use with the Doodleverse software package, Segmentation Gym**.
Two data sources have been combined
Dataset 1
Dataset 2
3070 image-label pairs from the Sentinel-2 Water Edges Dataset (SWED)***** dataset, https://openmldata.ukho.gov.uk/, described by Seale et al. (2022)******
A subset of the original SWED imagery (256 x 256 x 12) and labels (256 x 256 x 1) have been chosen, based on the criteria of more than 2.5% of the pixels represent water
File descriptions
classes.txt, a file containing the class names
images.zip, a zipped folder containing the 3-band RGB images of varying sizes and extents
labels.zip, a zipped folder containing the 1-band label images
nir.zip, a zipped folder containing the 1-band near-infrared (NIR) images
swir.zip, a zipped folder containing the 1-band shorttwave infrared (SWIR) images
overlays.zip, a zipped folder containing a semi-transparent overlay of the color-coded label on the image (red=1=water, blue=0=other)
resized_images.zip, RGB images resized to 512x512x3 pixels
resized_labels.zip, label images resized to 512x512x1 pixels
resized_nir.zip, NIR images resized to 512x512x1 pixels
resized_swir.zip, SWIR images resized to 512x512x1 pixels
References
*Doodler: Buscombe, D., Goldstein, E.B., Sherwood, C.R., Bodine, C., Brown, J.A., Favela, J., Fitzpatrick, S., Kranenburg, C.J., Over, J.R., Ritchie, A.C. and Warrick, J.A., 2021. Human‐in‐the‐Loop Segmentation of Earth Surface Imagery. Earth and Space Science, p.e2021EA002085https://doi.org/10.1029/2021EA002085. See https://github.com/Doodleverse/dash_doodler.
**Segmentation Gym: Buscombe, D., & Goldstein, E. B. (2022). A reproducible and reusable pipeline for segmentation of geoscientific imagery. Earth and Space Science, 9, e2022EA002332. https://doi.org/10.1029/2022EA002332 See: https://github.com/Doodleverse/segmentation_gym
***Coast Train data release: Wernette, P.A., Buscombe, D.D., Favela, J., Fitzpatrick, S., and Goldstein E., 2022, Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation: U.S. Geological Survey data release, https://doi.org/10.5066/P91NP87I. See https://coasttrain.github.io/CoastTrain/ for more information
****Buscombe, Daniel. (2022). Images and 4-class labels for semantic segmentation of Sentinel-2 and Landsat RGB, NIR, and SWIR satellite images of coasts (water, whitewater, sediment, other) (v1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7344571
*****Seale, C., Redfern, T., Chatfield, P. 2022. Sentinel-2 Water Edges Dataset (SWED) https://openmldata.ukho.gov.uk/
******Seale, C., Redfern, T., Chatfield, P., Luo, C. and Dempsey, K., 2022. Coastline detection in satellite imagery: A deep learning approach on new benchmark data. Remote Sensing of Environment, 278, p.113044.
Dynamic World is a 10m near-real-time (NRT) Land Use/Land Cover (LULC) dataset that includes class probabilities and label information for nine classes. Dynamic World predictions are available for the Sentinel-2 L1C collection from 2015-06-27 to present. The revisit frequency of Sentinel-2 is between 2-5 days depending on latitude. Dynamic World predictions are generated for Sentinel-2 L1C images with CLOUDY_PIXEL_PERCENTAGE <= 35%. Predictions are masked to remove clouds and cloud shadows using a combination of S2 Cloud Probability, Cloud Displacement Index, and Directional Distance Transform. Images in the Dynamic World collection have names matching the individual Sentinel-2 L1C asset names from which they were derived, e.g: ee.Image('COPERNICUS/S2/20160711T084022_20160711T084751_T35PKT') has a matching Dynamic World image named: ee.Image('GOOGLE/DYNAMICWORLD/V1/20160711T084022_20160711T084751_T35PKT'). All probability bands except the "label" band collectively sum to 1. To learn more about the Dynamic World dataset and see examples for generating composites, calculating regional statistics, and working with the time series, see the Introduction to Dynamic World tutorial series. Given Dynamic World class estimations are derived from single images using a spatial context from a small moving window, top-1 "probabilities" for predicted land covers that are in-part defined by cover over time, like crops, can be comparatively low in the absence of obvious distinguishing features. High-return surfaces in arid climates, sand, sunglint, etc may also exhibit this phenomenon. To select only pixels that confidently belong to a Dynamic World class, it is recommended to mask Dynamic World outputs by thresholding the estimated "probability" of the top-1 prediction.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Machine learning algorithms have been widely adopted in the monitoring ecosystem. British Columbia suffers from grassland degradation but the province does not have an accurate spatial database for effective grassland management. Moreover, computational power and storage space remain two of the limiting factors in developing the database. In this study, we leverage supervised machine learning algorithms using the Google Earth Engine to better annual grassland inventory through an automated process. The pilot study was conducted over the Rocky Mountain district. We compared two different classification algorithms: the Random forest, and the Support vector machine. Training data was sampled through stratified and grided sampling. 19 predictor variables were chosen from Sentinel-1 and Sentinel-2 imageries and relevant topological derivatives, spectral indices, and textural indices using a wrapper-based feature selection method. The resultant map was post-processed to remove land features that were confounded with grasslands. Random forest was chosen as the prototype because the algorithm predicted features relevant to the project’s scope at relatively higher accuracy (67% - 86%) than its counterparts (50% - 76%). The prototype was good at delineating the boundaries between treed and non-treed areas and ferreting out opened patches among closed forests. These opened patches are usually disregarded by the VRI but they are deemed essential to grassland stewardship and wildlife ecologists. The prototype demonstrated the feasibility of automating grassland delineation by a Random forest classifier using the Google Earth Engine. Furthermore, grassland stewards can use the product to identify monitoring and restoration areas strategically in the future.
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
This dataset contains composite satellite images for the Coral Sea region based on 10 m resolution Sentinel 2 imagery from 2015 – 2021. This image collection is intended to allow mapping of the reef and island features of the Coral Sea. This is a draft version of the dataset prepared from approximately 60% of the available Sentinel 2 image. An improved version of this dataset was released https://doi.org/10.26274/NH77-ZW79.
This collection contains composite imagery for 31 Sentinel 2 tiles in the Coral Sea. For each tile there are 5 different colour and contrast enhancement styles intended to highlight different features. These include:
- DeepFalse
- Bands: B1 (ultraviolet), B2 (blue), B3 (green): False colour image that shows deep marine features to 50 - 60 m depth. This imagery exploits the clear waters of the Coral Sea to allow the ultraviolet band to provide a much deeper view of coral reefs than is typically achievable with true colour imagery. This technique doesn't work where the water is not as clear as the ultraviolet get scattered easily.
- DeepMarine
- Bands: B2 (blue), B3 (green), B4 (red): This is a contrast enhanced version of the true colour imagery, focusing on being able to better see the deeper features. Shallow features are over exposed due to the increased contrast.
- ReefTop
- Bands: B3 (red): This imagery is contrast enhanced to create an mask (black and white) of reef tops, delineating areas that are shallower or deeper than approximately 4 - 5 m. This mask is intended to assist in the creating of a GIS layer equivalent to the 'GBR Dry Reefs' dataset. The depth mapping exploits the limited water penetration of the red channel. In clear water the red channel can only see features to approximately 6 m regardless of the substrate type.
- Shallow
- Bands: B5 (red edge), B8 (Near Infrared) , B11 (Short Wave infrared): This false colour imagery focuses on identifying very shallow and dry regions in the imagery. It exploits the property that the longer wavelength bands progressively penetrate the water less. B5 penetrates the water approximately 3 - 5 m, B8 approximately 0.5 m and B11 < 0.1 m. Feature less than a couple of metres appear dark blue, dry areas are white.
- TrueColour
- Bands: B2 (blue), B3 (green), B4 (red): True colour imagery. This is useful to interpreting what shallow features are and in mapping the vegetation on cays and identifying beach rock.
For most Sentinel tiles there are two versions of the DeepFalse and DeepMarine imagery based on different collections (dates). The R1 imagery are composites made up from the best available imagery while the R2 imagery uses the next best set of imagery. This splitting of the imagery is to allow two composites to be created from the pool of available imagery so that mapped features could be checked against two images. Typically the R2 imagery will have more artefacts from clouds.
The satellite imagery was processed in tiles (approximately 100 x 100 km) to keep each final image small enough to manage. The dataset only covers the portion of the Coral Sea where there are shallow coral reefs.
# Methods:
The satellite image composites were created by combining multiple Sentinel 2 images using the Google Earth Engine. The core algorithm was:
1. For each Sentinel 2 tile, the set of Sentinel images from 2015 – 2021 were reviewed manually. In some tiles the cloud cover threshold was raised to gather more images, particularly if there were less than 20 images available. The Google Earth Engine image IDs of the best images were recorded. These were the images with the clearest water, lowest waves, lowest cloud, and lowest sun glint.
2. A composite image was created from the best images by taking the statistical median of the stack of images selected in the previous stage, after masking out clouds and their shadows (described in detail later).
3. The contrast of the images was enhanced to create a series of products for different uses. The true colour image retained the full range of tones visible, so that bright sand cays still retained some detail. The marine enhanced version stretched the blue, green and red channels so that they focused on the deeper, darker marine features. This stretching was done to ensure that when converted to 8-bit colour imagery that all the dark detail in the deeper areas were visible. This contrast enhancement resulted in bright areas of the imagery clipping, leading to loss of detail in shallow reef areas and colours of land areas looking off. A reef top estimate was produced from the red channel (B4) where the contrast was stretched so that the imagery contains almost a binary mask. The threshold was chosen to approximate the 5 m depth contour for the clear waters of the Coral Sea. Lastly a false colour image was produced to allow mapping of shallow water features such as cays and islands. This image was produced from B5 (far red), B8 (nir), B11 (nir), where blue represents depths from approximately 0.5 – 5 m, green areas with 0 – 0.5 m depth, and brown and white corresponding to dry land.
4. The various contrast enhanced composite images were exported from Google Earth Engine (default of 32 bit GeoTiff) and reprocessed to smaller LZW compresed 8 bit GeoTiff images GDAL.
## Cloud Masking
Prior to combining the best images each image was processed to mask out clouds and their shadows.
The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts.
A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 40% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask.
A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer.
The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. As such there are probably significant potential improvements that could be made to this algorithm.
Erosion, dilation and buffer operations were performed at a lower image resolution than the native satellite image resolution to improve the computational speed. The resolution of these operations were adjusted so that they were performed with approximately a 4 pixel resolution during these operations. This made the cloud mask significantly more spatially coarse than the 10 m Sentinel imagery. This resolution was chosen as a trade-off between the coarseness of the mask verse the processing time for these operations. With 4-pixel filter resolutions these operations were still using over 90% of the total processing resulting in each image taking approximately 10 min to compute on the Google Earth Engine.
## Sun glint removal and atmospheric correction.
Sun glint was removed from the images using the infrared B8 band to estimate the reflection off the water from the sun glint. B8 penetrates water less than 0.5 m and so in water areas it only detects reflections off the surface of the water. The sun glint detected by B8 correlates very highly with the sun glint experienced by the ultra violet and visible channels (B1, B2, B3 and B4) and so the sun glint in these channels can be removed by subtracting B8 from these channels.
This simple sun glint correction fails in very shallow and land areas. On land areas B8 is very bright and thus subtracting it from the other channels results in black land. In shallow areas (< 0.5 m) the B8 channel detects the substrate, resulting in too much sun glint correction. To resolve these issues the sun glint correction was adjusted by transitioning to B11 for shallow areas as it penetrates the water even less than B8. We don't use B11 everywhere because it is half the resolution of B8.
Land areas need their tonal levels to be adjusted to match the water areas after sun glint correction. Ideally this would be achieved using an atmospheric correction that compensates for the contrast loss due to haze in the atmosphere. Complex models for atmospheric correction involve considering the elevation of the surface (higher areas have less atmosphere to pass through) and the weather conditions. Since this dataset is focused on coral reef areas, elevation compensation is unnecessary due to the very low and flat land features being imaged. Additionally the focus of the dataset it on marine features and so only a basic atmospheric correction is needed. Land areas (as determined by very bright B8 areas) where assigned a fixed smaller correction factor to approximate atmospheric correction. This fixed atmospheric correction was determined iteratively so that land areas matched the tonal value of shallow and water areas.
## Image selection
Available Sentinel 2 images with a cloud cover of less than 0.5% were manually reviewed using an Google Earth Engine App 01-select-sentinel2-images.js. Where there were few images available (less than 30 images) the cloud cover threshold was raised to increase the set of images that were raised.
Images were excluded from the composites primarily due to two main
Após 25/01/2022, as cenas do Sentinel-2 com PROCESSING_BASELINE "04.00" ou superior terão o intervalo de DN (valor) alterado em 1.000. A coleção HARMONIZED muda os dados em cenas mais recentes para que fiquem no mesmo intervalo das cenas mais antigas. O Sentinel-2 é uma missão de imagens multiespectrais de alta resolução e ampla faixa que apoia os estudos de monitoramento da terra do Copernicus, incluindo …
2022 年 1 月 25 日以降、PROCESSING_BASELINE が「04.00」以上の Sentinel-2 シーンでは、DN(値)の範囲が 1, 000 ずつシフトしています。HARMONIZED コレクションは、新しいシーンのデータを古いシーンと同じ範囲にシフトします。 Sentinel-2 は、広範囲にわたる高解像度のマルチスペクトル画像処理ミッションです。植生、土壌、水被覆のモニタリングや、内陸水路と沿岸地域の観測など、コペルニクス陸地モニタリング調査をサポートします。 Sentinel-2 L2 データは CDSE からダウンロードされます。これらは sen2cor を実行して計算されました。警告: EE コレクションの 2017 ~ 2018 年の L2 カバレッジは、まだグローバルではありません。 アセットには、SR を 10000 倍した 12 個の UINT16 スペクトル バンドが含まれています(L1 データとは異なり、B10 はありません)。また、L2 固有のバンドもいくつかあります(詳しくはバンドリストをご覧ください)。詳しくは、Sentinel-2 ユーザー ハンドブックをご覧ください。 QA60 は、2022 年 1 月 25 日までラスタライズされた雲マスク ポリゴンを含んでいたビットマスク バンドです。この日以降、これらのポリゴンの生成は停止されました。2024 年 2 月 28 日より、以前の整合性のある QA60 バンドは MSK_CLASSI クラウド分類バンドから構築されます。詳細については、雲マスクの計算方法に関する詳細な説明をご覧ください。 Sentinel-2 L2 アセットの EE アセット ID の形式は、COPERNICUS/S2_SR/20151128T002653_20151128T102149_T56MNN です。最初の数値部分はセンシングの日時、2 番目の数値部分はプロダクトの生成日時、最後の 6 文字の文字列は UTM グリッド参照を示す一意のグラニュール ID です(MGRS を参照)。 クラウドやクラウド シャドウの検出に役立つデータセットについては、COPERNICUS/S2_CLOUD_PROBABILITY と GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED をご覧ください。 Sentinel-2 の放射分解能の詳細については、こちらのページをご覧ください。
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This study constructed the first global PCGs dataset with a high spatial resolution of 10 meters based on deep learning and active learning. Specifically, we first divided the globe into 2,592 grids with a size of 5°×5° and retained those containing cropland as classification units. Then, we obtained pre-processed multi-temporal Sentinel-2 data through GEE and used random forest to generate initial labels to build the training dataset. Next, we developed a classification workflow that integrates the active-learning and deep learning to optimize weak labels, enhance model robustness, and reduce false positives. Subsequently, we used the trained deep learning model to predict the global distribution of PCGs, generating the Global-PCG-10 dataset.Experimental results show that the global PCGs area is approximately 14,259.85 km² in 2020. PCGs are mainly distributed between 30°N and 40°N, accounting for about 65.84% of the total area. Asia holds the most extensive area of PCGs, covering approximately 9874.5 km², accounting for 69.24% of the global total. China, not only has the largest area of PCGs in Asia but also ranks first worldwide, with a PCGs area of 8,224.90 km2, making up 57.67% of the global and 83.29% of the Asia. We validated the Global-PCG-10 dataset using 46,000 randomly sampled points, which indicates that the overall accuracy is satisfactory of 98.04% ± 0.12%.
After 2022-01-25, Sentinel-2 scenes with PROCESSING_BASELINE '04.00' or above have their DN (value) range shifted by 1000. The HARMONIZED collection shifts data in newer scenes to be in the same range as in older scenes. Sentinel-2 is a wide-swath, high-resolution, multi-spectral imaging mission supporting Copernicus Land Monitoring studies, including the …