What is this dataset?
Nearly 10,000 km² of free high-resolution and matched low-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.
Those locations are also enriched with typically under-represented locations in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk.
Each high-resolution image (1.5 m/pixel) comes with multiple temporally-matched low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites (10 m/pixel).
We accompany this dataset with a paper, datasheet for datasets and an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox.
Why make this?
We hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop the same power of analysis allowed by costly private high-resolution imagery from free public low-resolution Sentinel2 imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution.
Licences
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
validation
The data are 475 thematic land cover raster’s at 2m resolution. Land cover classification was to the land cover classes: Tree (1), Water (2), Barren (3), Other Vegetation (4) and Ice & Snow (8). Cloud cover and Shadow were sometimes coded as Cloud (5) and Shadow (6), however for any land cover application would be considered NoData. Some raster’s may have Cloud and Shadow pixels coded or recoded to NoData already. Commercial high-resolution satellite data was used to create the classifications. Usable image data for the target year (2010) was acquired for 475 of the 500 primary sample locations, with 90% of images acquired within ±2 years of the 2010 target. The remaining 25 of the 500 sample blocks had no usable data so were not able to be mapped. Tabular data is included with the raster classifications indicating the specific high-resolution sensor and date of acquisition for source imagery as well as the stratum to which that sample block belonged. Methods for this classification are described in Pengra et al. (2015). A 1-stage cluster sampling design was used where 500 (475 usable), 5 km x 5 km sample blocks were the primary sampling units (note; the nominal size was 5km x 5km blocks, but some have deviations in dimensions due only partial coverage of the sample block with usable imagery). Sample blocks were selected using stratified random sampling within a sample frame stratified by a modification of the Köppen Climate/Vegetation classification and population density (Olofsson et al., 2012). Secondary sampling units are each of the classified 2m pixels of the raster. This design satisfies the criteria that define a probability sampling design and thus serves as the basis to support rigorous design-based statistical inference (Stehman, 2000).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The satellite image of Canada is a composite of several individual satellite images form the Advanced Very High Resolution Radiometre (AVHRR) sensor on board various NOAA Satellites. The colours reflect differences in the density of vegetation cover: bright green for dense vegetation in humid southern regions; yellow for semi-arid and for mountainous regions; brown for the north where vegetation cover is very sparse; and white for snow and ice. An inset map shows a satellite image mosaic of North America with 35 land cover classes, based on data from the SPOT satellite VGT (vegetation) sensor.
This data set contains high-resolution QuickBird imagery and geospatial data for the entire Barrow QuickBird image area (156.15° W - 157.07° W, 71.15° N - 71.41° N) and Barrow B4 Quadrangle (156.29° W - 156.89° W, 71.25° N - 71.40° N), for use in Geographic Information Systems (GIS) and remote sensing software. The original QuickBird data sets were acquired by DigitalGlobe from 1 to 2 August 2002, and consist of orthorectified satellite imagery. Federal Geographic Data Committee (FGDC)-compliant metadata for all value-added data sets are provided in text, HTML, and XML formats. Accessory layers include: 1:250,000- and 1:63,360-scale USGS Digital Raster Graphic (DRG) mosaic images (GeoTIFF format); 1:250,000- and 1:63,360-scale USGS quadrangle index maps (ESRI Shapefile format); an index map for the 62 QuickBird tiles (ESRI Shapefile format); and a simple polygon layer of the extent of the Barrow QuickBird image area and the Barrow B4 quadrangle area (ESRI Shapefile format). Unmodified QuickBird data comprise 62 data tiles in Universal Transverse Mercator (UTM) Zone 4 in GeoTIFF format. Standard release files describing the QuickBird data are included, along with the DigitalGlobe license agreement and product handbooks. The baseline geospatial data support education, outreach, and multi-disciplinary research of environmental change in Barrow, which is an area of focused scientific interest. Data are provided on four DVDs. This product is available only to investigators funded specifically from the National Science Foundation (NSF), Office of Polar Programs (OPP), Arctic Sciences Section. An NSF OPP award number must be provided when ordering this data.
Satellite sensor artifacts can negatively impact the interpretation of satellite data. One such artifact is linear features in imagery which can be caused by a variety of sensor issues and can present as either wide, consistent features called banding, or as narrow, inconsistent features called striping. This study used high-resolution data from DigitalGlobe's WorldView-3 satellite collected at Lake Okeechobee, Florida, on 30 August 2017. Primarily designed as a land sensor, this study investigated the impact of vertical artifacts on both at-sensor radiance and a spectral index for an aquatic target. This dataset is not publicly accessible because: NGA Nextview license agreements prohibit the distribution of original data files from WorldView due to copyright. It can be accessed through the following means: National Geospatial Intelligence Agency contract details prevent distribution of Maxar data. Questions regarding Nextvew can be sent so NGANextView_License@nga.mil. Questions regarding the NASA Commercial Data Buy can be sent to yvonne.ivey@nasa.gov. Format: high-resolution data from DigitalGlobe's WorldView-3 satellite. This dataset is associated with the following publication: Coffer, M., P. Whitman, B. Schaeffer, V. Hill, R. Zimmerman, W. Salls, M. Lebrasse, and D. Graybill. Vertical artifacts in high-resolution WorldView-2 and WorldView-3 satellite imagery of aquatic systems. INTERNATIONAL JOURNAL OF REMOTE SENSING. Taylor & Francis, Inc., Philadelphia, PA, USA, 43(4): 1199-1225, (2022).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Dataset Description
This dataset consists of paired high-resolution (HR) and low-resolution (LR) satellite images designed for 4x super-resolution tasks. The images are organized into two directories:
All images are geographically aligned and cover the same regions, ensuring pixel-to-pixel correspondence between LR and HR pairs.
Recommended Dataset Split
To ensure robust model training and evaluation, we propose the following 75-15-10 split: - Training Set (75%) Used to train the super-resolution model - Validation Set (15%) Used for hyperparameter tuning - Test Set (10%) Reserved for final evaluation (unseen data to measure model generalization)
Split Methodology: - Stratified Sampling: If images represent diverse terrains (urban, rural, water), ensure each subset reflects this distribution. - Non-overlapping Regions: Prevent data leakage by splitting across geographically distinct areas (e.g., tiles from different zones).
https://artefacts.ceda.ac.uk/licences/specific_licences/msg.pdfhttps://artefacts.ceda.ac.uk/licences/specific_licences/msg.pdf
The Meteosat Second Generation (MSG) satellites, operated by EUMETSAT (The European Organisation for the Exploitation of Meteorological Satellites), provide almost continuous imagery to meteorologists and researchers in Europe and around the world. These include visible, infra-red, water vapour, High Resolution Visible (HRV) images and derived cloud top height, cloud top temperature, fog, snow detection and volcanic ash products. These images are available for a range of geographical areas.
This dataset contains high resolution visible images from MSG satellites over the UK area. Imagery available from March 2005 onwards at a frequency of 15 minutes (some are hourly) and are at least 24 hours old.
The geographic extent for images within this datasets is available via the linked documentation 'MSG satellite imagery product geographic area details'. Each MSG imagery product area can be referenced from the third and fourth character of the image product name giving in the filename. E.g. for EEAO11 the corresponding geographic details can be found under the entry for area code 'AO' (i.e West Africa).
High-resolution satellite images can provide abundant, detailed spatial information for land cover classification, which is particularly important for studying the complicated built environment. However, due to the complex land cover patterns, the costly training sample collections, and the severe distribution shifts of satellite imageries caused by, e.g., geographical differences or acquisition conditions, few studies have applied high-resolution images to land cover mapping in detailed categories at large scale.
We present a large-scale land cover dataset, Five-Billion-Pixels. It contains more than 5 billion labeled pixels of 150 high-resolution Gaofen-2 (4 m) satellite images, annotated in a 24-category system covering artificial-constructed, agricultural, and natural classes.
Correspondence of colors (BGR) and categories:
- 0, 0, 0: unlabeled
- 200, 0, 0: industrial area
- 0, 200, 0: paddy field
- 150, 250, 0: irrigated field
- 150, 200, 150: dry cropland
- 200, 0, 200: garden land
- 150, 0, 250: arbor forest
- 150, 150, 250: shrub forest
- 200, 150, 200: park
- 250, 200, 0: natural meadow
- 200, 200, 0: artificial meadow
- 0, 0, 200: river
- 250, 0, 150: urban residential
- 0, 150, 200: lake
- 0, 200, 250: pond
- 150, 200, 250: fish pond
- 250, 250, 250: snow
- 200, 200, 200: bareland
- 200, 150, 150: rural residential
- 250, 200, 150: stadium
- 150, 150, 0: square
- 250, 150, 150: road
- 250, 150, 0: overpass
- 250, 200, 250: railway station
- 200, 150, 0: airport
Correspondence of indexes and categories:
- 0: unlabeled
- 1: industrial area
- 2: paddy field
- 3: irrigated field
- 4: dry cropland
- 5: garden land
- 6: arbor forest
- 7: shrub forest
- 8: park
- 9: natural meadow
- 10: artificial meadow
- 11: river
- 12: urban residential
- 13: lake
- 14: pond
- 15: fish pond
- 16: snow
- 17: bareland
- 18: rural residential
- 19: stadium
- 20: square
- 21: road
- 22: overpass
- 23: railway station
- 24: airport
Use the PIL library to read 8-bit data (which has been processed as normal images): image = Image.open(imgname).convert('CMYK').
@article{FBP2023,
title={Enabling country-scale land cover mapping with meter-resolution satellite imagery},
author={Tong, Xin-Yi and Xia, Gui-Song and Zhu, Xiao Xiang},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume={196},
pages={178-196},
year={2023}
}
E-mail: xinyi.tong@tum.de
Personal page: Xin-Yi Tong
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to recognize Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 2000 images (.jpg) of size 512x512 pixels partitioned into two classes: Shrubs and NoShrubs. We also provide partitioning of the data into Train (1800 images), Test (100 images), and Validation (100 images) subsets.
Declassified satellite images provide an important worldwide record of land-surface change. With the success of the first release of classified satellite photography in 1995, images from U.S. military intelligence satellites KH-7 and KH-9 were declassified in accordance with Executive Order 12951 in 2002. The data were originally used for cartographic information and reconnaissance for U.S. intelligence agencies. Since the images could be of historical value for global change research and were no longer critical to national security, the collection was made available to the public. Keyhole (KH) satellite systems KH-7 and KH-9 acquired photographs of the Earth’s surface with a telescopic camera system and transported the exposed film through the use of recovery capsules. The capsules or buckets were de-orbited and retrieved by aircraft while the capsules parachuted to earth. The exposed film was developed and the images were analyzed for a range of military applications. The KH-7 surveillance system was a high resolution imaging system that was operational from July 1963 to June 1967. Approximately 18,000 black-and-white images and 230 color images are available from the 38 missions flown during this program. Key features for this program were larger area of coverage and improved ground resolution. The cameras acquired imagery in continuous lengthwise sweeps of the terrain. KH-7 images are 9 inches wide, vary in length from 4 inches to 500 feet long, and have a resolution of 2 to 4 feet. The KH-9 mapping program was operational from March 1973 to October 1980 and was designed to support mapping requirements and exact positioning of geographical points for the military. This was accomplished by using image overlap for stereo coverage and by using a camera system with a reseau grid to correct image distortion. The KH-9 framing cameras produced 9 x 18 inch imagery at a resolution of 20-30 feet. Approximately 29,000 mapping images were acquired from 12 missions. The original film sources are maintained by the National Archives and Records Administration (NARA). Duplicate film sources held in the USGS EROS Center archive are used to produce digital copies of the imagery.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
OpenSatMap Dataset Card
Description
The dataset contains 3,787 high-resolution satellite images with fine-grained annotations, covering diverse geographic locations and popular driving datasets. It can be used for large-scale map construction and downstream tasks like autonomous driving. The images are collected from Google Maps at level 19 resolution (0.3m/pixel) and level 20 resolution (0.15m/pixel), we denote them as OpenSatMap19 and OpenSatMap20, respectively.… See the full description on the dataset page: https://huggingface.co/datasets/z-hb/OpenSatMap.
This data set contains a time series of snow depth maps and related intermediary snow-on and snow-off DEMs for Grand Mesa and the Banded Peak Ranch areas of Colorado derived from very-high-resolution (VHR) satellite stereo images and lidar point cloud data. Two of the snow depth maps coincide temporally with the 2017 NASA SnowEx Grand Mesa field campaign, providing a comparison between the satellite derived snow depth and in-situ snow depth measurements. The VHR stereo images were acquired each year between 2016 and 2022 during the approximate timing of peak snow depth by the Maxar WorldView-2, WorldView-3, and CNES/Airbus Pléiades-HR 1A and 1B satellites, while lidar data was sourced from the USGS 3D Elevation Program.
This data set contains a time series of snow depth maps and related intermediary snow-on and snow-off DEMs for Grand Mesa and the Banded Peak Ranch areas of Colorado derived from very-high-resolution (VHR) satellite stereo images and lidar point cloud data. Two of the snow depth maps coincide temporally with the 2017 NASA SnowEx Grand Mesa field campaign, providing a comparison between the satellite derived snow depth and in-situ snow depth measurements. The VHR stereo images were acquired each year between 2016 and 2022 during the approximate timing of peak snow depth by the Maxar WorldView-2, WorldView-3, and CNES/Airbus Pléiades-HR 1A and 1B satellites, while lidar data was sourced from the USGS 3D Elevation Program.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
For the purposes of training AI-based models to identify (map) road features in rural/remote tropical regions on the basis of true-colour satellite imagery, and subsequently testing the accuracy of these AI-derived road maps, we produced a dataset of 8904 satellite image ‘tiles’ and their corresponding known road features across Equatorial Asia (Indonesia, Malaysia, Papua New Guinea). Methods
The main dataset shared here was derived from a set of 200 input satellite images, also provided here. These 200 images are effectively ‘screenshots’ (i.e., reduced-resolution copies) of high-resolution true-colour satellite imagery (~0.5-1m pixel resolution) observed using the Elvis Elevation and Depth spatial data portal (https://elevation.fsdf.org.au/), which here is functionally equivalent to the more familiar Google Earth. Each of these original images was initially acquired at a resolution of 1920x886 pixels. Actual image resolution was coarser than the native high-resolution imagery. Visual inspection of these 200 images suggests a pixel resolution of ~5 meters, given the number of pixels required to span features of familiar scale, such as roads and roofs, as well as the ready discrimination of specific land uses, vegetation types, etc. These 200 images generally spanned either forest-agricultural mosaics or intact forest landscapes with limited human intervention. Sloan et al. (2023) present a map indicating the various areas of Equatorial Asia from which these images were sourced.
IMAGE NAMING CONVENTION
A common naming convention applies to satellite images’ file names:
XX##.png
where:
XX – denotes the geographical region / major island of Equatorial Asia of the image, as follows: ‘bo’ (Borneo), ‘su’ (Sumatra), ‘sl’ (Sulawesi), ‘pn’ (Papua New Guinea), ‘jv’ (java), ‘ng’ (New Guinea [i.e., Papua and West Papua provinces of Indonesia])
INTERPRETING ROAD FEATURES IN THE IMAGES For each of the 200 input satellite images, its road was visually interpreted and manually digitized to create a reference image dataset by which to train, validate, and test AI road-mapping models, as detailed in Sloan et al. (2023). The reference dataset of road features was digitized using the ‘pen tool’ in Adobe Photoshop. The pen’s ‘width’ was held constant over varying scales of observation (i.e., image ‘zoom’) during digitization. Consequently, at relatively small scales at least, digitized road features likely incorporate vegetation immediately bordering roads. The resultant binary (Road / Not Road) reference images were saved as PNG images with the same image dimensions as the original 200 images.
IMAGE TILES AND REFERENCE DATA FOR MODEL DEVELOPMENT
The 200 satellite images and the corresponding 200 road-reference images were both subdivided (aka ‘sliced’) into thousands of smaller image ‘tiles’ of 256x256 pixels each. Subsequent to image subdivision, subdivided images were also rotated by 90, 180, or 270 degrees to create additional, complementary image tiles for model development. In total, 8904 image tiles resulted from image subdivision and rotation. These 8904 image tiles are the main data of interest disseminated here. Each image tile entails the true-colour satellite image (256x256 pixels) and a corresponding binary road reference image (Road / Not Road).
Of these 8904 image tiles, Sloan et al. (2023) randomly selected 80% for model training (during which a model ‘learns’ to recognize road features in the input imagery), 10% for model validation (during which model parameters are iteratively refined), and 10% for final model testing (during which the final accuracy of the output road map is assessed). Here we present these data in two folders accordingly:
'Training’ – contains 7124 image tiles used for model training in Sloan et al. (2023), i.e., 80% of the original pool of 8904 image tiles. ‘Testing’– contains 1780 image tiles used for model validation and model testing in Sloan et al. (2023), i.e., 20% of the original pool of 8904 image tiles, being the combined set of image tiles for model validation and testing in Sloan et al. (2023).
IMAGE TILE NAMING CONVENTION A common naming convention applies to image tiles’ directories and file names, in both the ‘training’ and ‘testing’ folders: XX##_A_B_C_DrotDDD where
XX – denotes the geographical region / major island of Equatorial Asia of the original input 1920x886 pixel image, as follows: ‘bo’ (Borneo), ‘su’ (Sumatra), ‘sl’ (Sulawesi), ‘pn’ (Papua New Guinea), ‘jv’ (java), ‘ng’ (New Guinea [i.e., Papua and West Papua provinces of Indonesia])
A, B, C and D – can all be ignored. These values, which are one of 0, 256, 512, 768, 1024, 1280, 1536, and 1792, are effectively ‘pixel coordinates’ in the corresponding original 1920x886-pixel input image. They were recorded within the names of image tiles’ sub-directories and file names merely to ensure that names/directory were uniquely named)
rot – implies an image rotation. Not all image tiles are rotated, so ‘rot’ will appear only occasionally.
DDD – denotes the degree of image-tile rotation, e.g., 90, 180, 270. Not all image tiles are rotated, so ‘DD’ will appear only occasionally.
Note that the designator ‘XX##’ is directly equivalent to the filenames of the corresponding 1920x886-pixel input satellite images, detailed above. Therefore, each image tiles can be ‘matched’ with its parent full-scale satellite image. For example, in the ‘training’ folder, the subdirectory ‘Bo12_0_0_256_256’ indicates that its image tile therein (also named ‘Bo12_0_0_256_256’) would have been sourced from the full-scale image ‘Bo12.png’.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of annotated high-resolution aerial imagery of roof materials in Bonn, Germany, in the Ultralytics YOLO instance segmentation dataset format. Aerial imagery was sourced from OpenAerialMap, specifically from the Maxar Open Data Program. Roof material labels and building outlines were sourced from OpenStreetMap. Images and labels are split into training, validation, and test sets, meant for future machine learning models to be trained upon, for both building segmentation and roof type classification.The dataset is intended for applications such as informing studies on thermal efficiency, roof durability, heritage conservation, or socioeconomic analyses. There are six roof material types: roof tiles, tar paper, metal, concrete, gravel, and glass.Note: The data is in a .zip due to file upload limits. Please find a more detailed dataset description in the README.md
QuickBird high resolution optical products are available as part of the Maxar Standard Satellite Imagery products from the QuickBird, WorldView-1/-2/-3/-4, and GeoEye-1 satellites. All details about the data provision, data access conditions and quota assignment procedure are described into the Terms of Applicability available in Resources section.
In particular, QuickBird offers archive panchromatic products up to 0.60 m GSD resolution and 4-Bands Multispectral products up to 2.4 m GSD resolution.
Band Combination Data Processing Level Resolution Panchromatic and 4-bands Standard(2A)/View Ready Standard (OR2A) 15 cm HD, 30 cm HD, 30 cm, 40 cm, 50/60 cm View Ready Stereo 30 cm, 40 cm, 50/60 cm Map-Ready (Ortho) 1:12,000 Orthorectified 15 cm HD, 30 cm HD, 30 cm, 40 cm, 50/60 cm
4-Bands being an option from:
4-Band Multispectral (BLUE, GREEN, RED, NIR1) 4-Band Pan-sharpened (BLUE, GREEN, RED, NIR1) 4-Band Bundle (PAN, BLUE, GREEN, RED, NIR1) 3-Bands Natural Colour (pan-sharpened BLUE, GREEN, RED) 3-Band Colored Infrared (pan-sharpened GREEN, RED, NIR1) Natural Colour / Coloured Infrared (3-Band pan-sharpened) Native 30 cm and 50/60 cm resolution products are processed with MAXAR HD Technology to generate respectively the 15 cm HD and 30 cm HD products: the initial special resolution (GSD) is unchanged but the HD technique intelligently increases the number of pixels and improves the visual clarity achieving aesthetically refined imagery with precise edges and well reconstructed details.
This Study was using three high resolution satellite imagery to estimate bathymetric condition at shollow water coral reef environment around Pulau Panggang, Jakarta. The Worldviww 2 supplay 2 m spasial resolution with 8 spectral band, whereas Quikbird 2 produce 2.44 m spatial resolution in 4 spectral and and ALOS produce 10 m spasial resolution with also 4 spectral band. Red band of ALOS and Quickbird hae high correlation with sand depth and the lowest are blue bands. Among this bands, Quickbird red band is the highest and its blue band is the lowest. Worldview visible bands may have low sand depth correlation because of noise from ripple wave during acquisition. This study shown that, Quickbird image is proven able to map water depth variation up to 8 metre at reef flat and lagoon area of Panggang island, Jakarta with RMSe is 1.1 metre. The result also shown on oppoertunity to implement this approch to bathymetric mapping of shallow water area at remote small islands. Proceedings International Seminar on Hydrography. Hal. 1-11
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The folders in labels.zip contain labels for solar panel objects as part of the Solar Panels in Satellite Imagery dataset. The labels are partitioned based on corresponding image type: 31 cm native and 15.5 cm HD resolution imagery. In total, there are 2,542 object labels for each image type, following the same naming convention as the corresponding image chips. The corresponding image chips may be accessed at:
https://resources.maxar.com/product-samples/15-cm-hd-and-30-cm-view-ready-solar-panels-germany
The naming convention for all labels includes the name of the dataset, image type, tile identification number, minimum x bound, minimum y bound, and window size. The minimum bounds correspond to the origin of the chip in the full tile.
Labels are provided in .txt format compatible with the YOLTv4 architecture, where a single row in a label file contains the following information for one solar panel object: category, x-center, y-center, x-width, and y-width. Center and width values are normalized by chip sizes (416 by 416 pixels for native chips and 832 by 832 pixels for HD chips).
The geocoordinates for each solar panel object may be determined using the native resolution labels (found in the labels_native directory). The center and width values for each object, along with the relative location information provided by the naming convention for each label, may be used to determine the pixel coordinates for each object in the full, corresponding native resolution tile. The pixel coordinates may be translated to geocoordinates using the EPSG:32633 coordinate system and the following geotransform for each tile:
Tile 1: (307670.04, 0.31, 0.0, 5434427.100000001, 0.0, -0.31) Tile 2: (312749.07999999996, 0.31, 0.0, 5403952.860000001, 0.0, -0.31) Tile 3: (312749.07999999996, 0.31, 0.0, 5363320.540000001, 0.0, -0.31)
High resolution orthorectified images combine the image characteristics of an aerial photograph with the geometric qualities of a map. An orthoimage is a uniform-scale image where corrections have been made for feature displacement such as building tilt and for scale variations caused by terrain relief, sensor geometry, and camera tilt. A mathematical equation based on ground control points, sensor calibration information, and a digital elevation model is applied to each pixel to rectify the image to obtain the geometric qualities of a map.
A digital orthoimage may be created from several photographs mosaicked to form the final image. The source imagery may be black-and-white, natural color, or color infrared with a pixel resolution of 1-meter or finer. With orthoimagery, the resolution refers to the distance on the ground represented by each pixel.
What is this dataset?
Nearly 10,000 km² of free high-resolution and matched low-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.
Those locations are also enriched with typically under-represented locations in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk.
Each high-resolution image (1.5 m/pixel) comes with multiple temporally-matched low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites (10 m/pixel).
We accompany this dataset with a paper, datasheet for datasets and an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox.
Why make this?
We hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop the same power of analysis allowed by costly private high-resolution imagery from free public low-resolution Sentinel2 imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution.
Licences