32 datasets found

Z
Data from: Dataset of very-high-resolution satellite RGB images to train...
data.niaid.nih.gov
explore.openaire.eu
+2more
Updated Jul 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Siham Tabik (2022). Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793421
Explore at:
Dataset updated
Jul 6, 2022
Dataset provided by
Siham Tabik
Sergio Puertas
Domingo Alcaraz-Segura
Rohaifa Khaldi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Spain, Sierra Nevada
Description
This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to recognize Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 2000 images (.jpg) of size 512x512 pixels partitioned into two classes: Shrubs and NoShrubs. We also provide partitioning of the data into Train (1800 images), Test (100 images), and Validation (100 images) subsets.
f
Power Plant Satellite Imagery Dataset
figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle Bradbury; Benjamin Brigman; Gouttham Chandrasekar; Leslie Collins; Shamikh Hossain; Marc Jeuland; Timothy Johnson; Boning Li; Trishul Nagenalli (2023). Power Plant Satellite Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5307364.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5307364.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Kyle Bradbury; Benjamin Brigman; Gouttham Chandrasekar; Leslie Collins; Shamikh Hossain; Marc Jeuland; Timothy Johnson; Boning Li; Trishul Nagenalli
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset contains satellite imagery of 4,454 power plants within the United States. The imagery is provided at two resolutions: 1m (4-band NAIP iamgery with near-infrared) and 30m (Landsat 8, pansharpened to 15m). The NAIP imagery is available for the U.S. and Landsat 8 is available globally. This dataset may be of value for computer vision work, machine learning, as well as energy and environmental analyses.Additionally, annotations of the specific locations of the spatial extent of the power plants in each image is provided. These annotations were collected via the crowdsourcing platform, Amazon Mechanical Turk, using multiple annotators for each image to ensure quality. Links to the sources of the imagery data, the annotation tool, and the team that created the dataset are included in the "References" section.To read more on these data, please refer to the "Power Plant Satellite Imagery Dataset Overview.pdf" file. To download a sample of the data without downloading the entire dataset, download "sample.zip" which includes two sample powerplants and the NAIP, Landsat 8, and binary annotations for each.Note: the NAIP imagery may appear "washed out" when viewed in standard image viewing software because it includes a near-infrared band in addition to the standard RGB data.
U
Coast Train--Labeled imagery for training and evaluation of data-driven...
data.usgs.gov
catalog.data.gov
Updated Aug 31, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand (2024). Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation [Dataset]. http://doi.org/10.5066/P91NP87I
Explore at:
Unique identifier
https://doi.org/10.5066/P91NP87I
Dataset updated
Aug 31, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Authors
Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Jan 1, 2008 - Dec 31, 2020
Description
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...
D
Data Labeling Market Report
datainsightsmarket.com
doc, pdf, ppt
Updated Mar 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
Explore at:
doc, ppt, pdfAvailable download formats
Dataset updated
Mar 8, 2025
Dataset authored and provided by
Data Insights Market
License
https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.
m
Dataset of Deep Learning from Landsat-8 Satellite Images for Estimating...
data.mendeley.com
Updated Jun 6, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yudhi Prabowo (2022). Dataset of Deep Learning from Landsat-8 Satellite Images for Estimating Burned Areas in Indonesia [Dataset]. http://doi.org/10.17632/fs7mtkg2wk.5
Explore at:
Unique identifier
https://doi.org/10.17632/fs7mtkg2wk.5
Dataset updated
Jun 6, 2022
Authors
Yudhi Prabowo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Indonesia
Description
The dataset consist of three categories; image subsets, burned area masks and quicklooks. The image subsets are derived from Landsat-8 scenes taken during the years 2019 and 2021. Each image has a size of 512x512 pixels and consists of 8 multispectral. The sequence of band names from band 1 to band 7 of the image subset is same as the sequence of band names of landsat-8 scene, except for band 8 of the image subset which is band 9 (cirrus band) in the original landsat-8 scene. The image subsets are saved in GeoTIFF file format with the latitude longitude coordinate system and WGS 1984 as the datum. The spatial resolution of image subsets is 0.00025 degree and the pixel values are stored in 16 bit unsigned integer with the range of value from 0 to 65535. The total of the dataset is 227 images which containing object of burned area surrounded by various ecological diversity backgrounds such as forest, shrub, grassland, waterbody, bare land, settlement, cloud and cloud shadow. In some cases, there are some image subsets with the burned areas covered by smoke due to the fire is still active. Some image subsets also overlap each other to cover the area of burned scar which the area is too large. The burned area mask is a binary annotation image which consists of two classes; burned area as the foreground and non-burned area as the background. These binary images are saved in 8 bit unsigned integer where the burned area is indicated by the pixel value of 1, whereas the non-burned area is indicated by 0. The burned area masks in this dataset contain only burned scars and are not contaminated with thick clouds, shadows, and vegetation. Among 227 images, 206 images contain burned areas whereas 21 images contain only background. The highest number of images in this dataset is dominated by images with coverage percentage of burned area between 0 and 10 percent. Our dataset also provides quicklook image as a quick preview of image subset. It offers a fast and full size preview of image subset without opening the file using any GIS software. The quicklook images can also be used for training and evaluating the model as a substitute of image subsets. The image size is 512x512 pixels same as the size of image subset and annotation image. It consists of three bands as a false color composite quicklook images, with combination of band 7 (SWIR-2), band 5 (NIR), and band 4 (red). These RGB composite images have been performed contrast stretching to enhance the images visualizations. The quicklook images are stored in GeoTIFF file format with 8 bit unsigned integer.

This work was financed by Riset Inovatif Produktif (RISPRO) fund through Prioritas Riset Nasional (PRN) project, grant no. 255/E1/PRN/2020 for 2020 - 2021 contract period.
Data from: Satellite Image Classification
kaggle.com
Updated Aug 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mahmoud Reda (2021). Satellite Image Classification [Dataset]. https://www.kaggle.com/mahmoudreda55/satellite-image-classification/metadata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 21, 2021
Dataset provided by
Kaggle
Authors
Mahmoud Reda
Description
Context

Satellite image Classification Dataset-RSI-CB256 , This dataset has 4 different classes mixed from Sensors and google map snapshot

Content

The past years have witnessed great progress on remote sensing (RS) image interpretation and its wide applications. With RS images becoming more accessible than ever before, there is an increasing demand for the automatic interpretation of these images. In this context, the benchmark datasets serve as essential prerequisites for developing and testing intelligent interpretation algorithms. After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation. Specifically, we first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations. We then present the general guidance on creating benchmark datasets in efficient manners. Following the presented guidance, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset containing a million instances for RS image scene classification. Several challenges and perspectives in RS image annotation are finally discussed to facilitate the research in benchmark dataset construction. We do hope this paper will provide the RS community an overall perspective on constructing large-scale and practical image datasets for further research, especially data-driven ones.

Acknowledgements

Annotated Datasets for RS Image Interpretation The interpretation of RS images has been playing an increasingly important role in a large diversity of applications, and thus, has attracted remarkable research attentions. Consequently, various datasets have been built to advance the development of interpretation algorithms for RS images. Covering literature published over the past decade, we perform a systematic review of the existing RS image datasets concerning the current mainstream of RS image interpretation tasks, including scene classification, object detection, semantic segmentation and change detection.

Inspiration

Artificial Intelligence, Computer Vision, Image Processing, Deep Learning, Satellite Image, Remote Sensing
d
Data from: BD-Sat
search.dataone.org
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul, Ovi; Nayem, Abu Bakar Siddik; Sarker, Anis; Ahsan Ali, Amin; Amin, M Ashraful; Rahman, AKM Mahbubur (2023). BD-Sat [Dataset]. http://doi.org/10.7910/DVN/TQZUCL
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/TQZUCL
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Paul, Ovi; Nayem, Abu Bakar Siddik; Sarker, Anis; Ahsan Ali, Amin; Amin, M Ashraful; Rahman, AKM Mahbubur
Description
BD-Sat provides a high-resolution dataset that includes pixel-by-pixel LULC annotations for Dhaka metropolitan city and the rural/urban area surrounding it. With the strict and standard procedure, the ground truth is made using Bing-satellite imagery at a ground spatial distance of 2.22 meters/pixel. Three stages well-defined annotation process has been followed with the support from geographic information system (GIS) experts to ensure the reliability of the annotations. We perform several experiments to establish the benchmark results. Results show that the annotated BD-Sat is sufficient to train large deep-learning models with adequate accuracy with five major LULC classes: forest, farmland, built-up, water, and meadow.
Global Data Annotation Tools Market Size By Data Type, By Functionality, By...
verifiedmarketresearch.com
Updated Mar 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
VERIFIED MARKET RESEARCH (2024). Global Data Annotation Tools Market Size By Data Type, By Functionality, By Industry of End Use, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/data-annotation-tools-market/
Explore at:
Dataset updated
Mar 19, 2024
Dataset provided by
Verified Market Researchhttps://www.verifiedmarketresearch.com/
Authors
VERIFIED MARKET RESEARCH
License
https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/
Time period covered
2024 - 2030
Area covered
Global
Description
Data Annotation Tools Market size was valued at USD 0.03 Billion in 2023 and is projected to reach USD 4.04 Billion by 2030, growing at a CAGR of 25.5% during the forecasted period 2024 to 2030.

Global Data Annotation Tools Market Drivers

The market drivers for the Data Annotation Tools Market can be influenced by various factors. These may include:

Rapid Growth in AI and Machine Learning: The demand for data annotation tools to label massive datasets for training and validation purposes is driven by the rapid growth of AI and machine learning applications across a variety of industries, including healthcare, automotive, retail, and finance.

Increasing Data Complexity: As data kinds like photos, videos, text, and sensor data become more complex, more sophisticated annotation tools are needed to handle a variety of data formats, annotations, and labeling needs. This will spur market adoption and innovation.

Quality and Accuracy Requirements: Training accurate and dependable AI models requires high-quality annotated data. Organizations can attain enhanced annotation accuracy and consistency by utilizing data annotation technologies that come with sophisticated annotation algorithms, quality control measures, and human-in-the-loop capabilities.

Applications Specific to Industries: The development of specialized annotation tools for particular industries, like autonomous vehicles, medical imaging, satellite imagery analysis, and natural language processing, is prompted by their distinct regulatory standards and data annotation requirements.
Data from: CloudTracks: A Dataset for Localizing Ship Tracks in Satellite...
zenodo.org
zip
Updated Nov 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muhammad Ahmed Chaudhry; Muhammad Ahmed Chaudhry; Lyna Kim; Jeremy Irvin; Jeremy Irvin; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris; Lyna Kim; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris (2023). CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds [Dataset]. http://doi.org/10.5281/zenodo.8412855
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8412855
Dataset updated
Nov 1, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Muhammad Ahmed Chaudhry; Muhammad Ahmed Chaudhry; Lyna Kim; Jeremy Irvin; Jeremy Irvin; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris; Lyna Kim; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
[Please use version 1.0.1]
The CloudTracks dataset consists of 1,780 MODIS satellite images hand-labeled for the presence of more than 12,000 ship tracks. More information about how the dataset was constructed may be found at github.com/stanfordmlgroup/CloudTracks. The file structure of the dataset is as follows:
CloudTracks/
full/
images/
(sample image name) mod2002121.1920D.png
jsons/
(sample json name) mod2002121.1920D.json
The naming convention is as follows:
mod2002121.1920D: the first 3 letters specify which of the sensors on the two MODIS satellites captured the image, mod for Terra and myd for Aqua. This is followed by a 4 digit year (2002) and a 3 digit day of the year (121). The following 4 digits specify the time of day (1920; 24 hour format in the UTC timezone), followed by D or N for Day or Night.
The 1,780 MODIS Terra and Aqua images were collected between 2002 and 2021 inclusive over various stratocumulus cloud regions (such as the East Pacific and East Atlantic) where ship tracks have commonly been observed. Each image has dimension 1354 x 2030 and a spatial resolution of 1km. Of the 36 bands collected by the instruments, we selected channels 1, 20, and 32 to capture useful physical properties of cloud formations.
The labels are found in the corresponding JSON files for each image. The following keys in the json are particularly important:
imagePath: the filename of the image.
shapes: the list of annotations corresponding to the image, where each element of the list is a dictionary corresponding to a single instance annotation. The dictionary has a key with value "shiptrack" or "uncertain" which is the label of the annotation and the corresponding value is a linestrip detailing the ship track path.
Further pre-processing details may be found at the GitHub link above. If you have any questions about the dataset, contact us at:
mahmedch@stanford.edu, lynakim@stanford.edu, jirvin16@cs.stanford.edu
m
Data from: MLRSNet: A Multi-label High Spatial Resolution Remote Sensing...
data.mendeley.com
Updated Sep 18, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Xiaoman Qi (2023). MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding [Dataset]. http://doi.org/10.17632/7j9bv9vwsx.4
Explore at:
Unique identifier
https://doi.org/10.17632/7j9bv9vwsx.4
Dataset updated
Sep 18, 2023
Authors
Xiaoman Qi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MLRSNet provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.

The Dataset includes: 1. Images folder: 46 categories, 109,161 high-spatial resolution remote sensing images. 2. Labels folders: each category has a .csv file. 3. Categories_names. xlsx: Sheet1 lists the names of 46 categories, and the Sheet2 shows the associated multi-label to each category.
f
Electric Transmission and Distribution Infrastructure Imagery Dataset
figshare.com
pdf
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kyle Bradbury; Qiwei Han; Varun Nair; Tamasha Pathirathna; Xiaolan You (2023). Electric Transmission and Distribution Infrastructure Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6931088.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.6931088.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Kyle Bradbury; Qiwei Han; Varun Nair; Tamasha Pathirathna; Xiaolan You
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
OverviewThe dataset contains fully annotated electric transmission and distribution infrastructure for approximately 321 sq km of high resolution satellite and aerial imagery from around the world. The imagery and associated infrastructure annotations span 14 cities and 5 continents, and were selected to represent diversity in human settlement density (i.e. rural vs urban), terrain type, and development index. This dataset may be of particular interest to those looking to train machine learning algorithms to automatically identify energy infrastructure in satellite imagery or for those working on domain adaptation for computer vision. Automated algorithms for identifying electricity infrastructure in satellite imagery may assist policy makers identify the best pathway to electrification for unelectrified areas.Data SourcesThis dataset contains data sourced from the LINZ Data Service licensed for reuse under CC BY 4.0. This dataset also contained extracts from the SpaceNet dataset:SpaceNet on Amazon Web Services (AWS). “Datasets.” The SpaceNet Catalog. Last modified April 30, 2018 (link below).Other imagery data included in this dataset are from the Connecticut Department of Energy and Environmental Protection and the U.S. Geological Survey. Links to each of the imagery data sources are provided below as well as the link to the annotation tool and the github repository that provides tools for using these data.AcknowledgementsThis dataset was created as part of the Duke University Data+ project, "Energy Infrastructure Map of the World" (link below) in collaboration with the Information Initiative at Duke and the Duke University Energy Initiative.
f
Bonn Roof Material + Satellite Imagery Dataset
figshare.com
zip
Updated Apr 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Julian Huang; Yue Lin; Alex Nhancololo (2025). Bonn Roof Material + Satellite Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28713194.v2
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28713194.v2
Dataset updated
Apr 18, 2025
Dataset provided by
figshare
Authors
Julian Huang; Yue Lin; Alex Nhancololo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bonn
Description
This dataset consists of annotated high-resolution aerial imagery of roof materials in Bonn, Germany, in the Ultralytics YOLO instance segmentation dataset format. Aerial imagery was sourced from OpenAerialMap, specifically from the Maxar Open Data Program. Roof material labels and building outlines were sourced from OpenStreetMap. Images and labels are split into training, validation, and test sets, meant for future machine learning models to be trained upon, for both building segmentation and roof type classification.The dataset is intended for applications such as informing studies on thermal efficiency, roof durability, heritage conservation, or socioeconomic analyses. There are six roof material types: roof tiles, tar paper, metal, concrete, gravel, and glass.Note: The data is in a .zip due to file upload limits. Please find a more detailed dataset description in the README.md
Sentinel-2 Cloud Mask Catalogue
zenodo.org
data.niaid.nih.gov
csv, pdf, zip
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alistair Francis; Alistair Francis; John Mrziglod; Panagiotis Sidiropoulos; Panagiotis Sidiropoulos; Jan-Peter Muller; Jan-Peter Muller; John Mrziglod (2024). Sentinel-2 Cloud Mask Catalogue [Dataset]. http://doi.org/10.5281/zenodo.4172871
Explore at:
pdf, zip, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4172871
Dataset updated
Jul 19, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Alistair Francis; Alistair Francis; John Mrziglod; Panagiotis Sidiropoulos; Panagiotis Sidiropoulos; Jan-Peter Muller; Jan-Peter Muller; John Mrziglod
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This dataset comprises cloud masks for 513 1022-by-1022 pixel subscenes, at 20m resolution, sampled random from the 2018 Level-1C Sentinel-2 archive. The design of this dataset follows from some observations about cloud masking: (i) performance over an entire product is highly correlated, thus subscenes provide more value per-pixel than full scenes, (ii) current cloud masking datasets often focus on specific regions, or hand-select the products used, which introduces a bias into the dataset that is not representative of the real-world data, (iii) cloud mask performance appears to be highly correlated to surface type and cloud structure, so testing should include analysis of failure modes in relation to these variables.

The data was annotated semi-automatically, using the IRIS toolkit, which allows users to dynamically train a Random Forest (implemented using LightGBM), speeding up annotations by iteratively improving it's predictions, but preserving the annotator's ability to make final manual changes when needed. This hybrid approach allowed us to process many more masks than would have been possible manually, which we felt was vital in creating a large enough dataset to approximate the statistics of the whole Sentinel-2 archive.

In addition to the pixel-wise, 3 class (CLEAR, CLOUD, CLOUD_SHADOW) segmentation masks, we also provide users with binary
classification "tags" for each subscene that can be used in testing to determine performance in specific circumstances. These include:

SURFACE TYPE: 11 categories

CLOUD TYPE: 7 categories

CLOUD HEIGHT: low, high

CLOUD THICKNESS: thin, thick

CLOUD EXTENT: isolated, extended

Wherever practical, cloud shadows were also annotated, however this was sometimes not possible due to high-relief terrain, or large ambiguities. In total, 424 were marked with shadows (if present), and 89 have shadows that were not annotatable due to very ambiguous shadow boundaries, or terrain that cast significant shadows. If users wish to train an algorithm specifically for cloud shadow masks, we advise them to remove those 89 images for which shadow was not possible, however, bear in mind that this will systematically reduce the difficulty of the shadow class compared to real-world use, as these contain the most difficult shadow examples.

In addition to the 20m sampled subscenes and masks, we also provide users with shapefiles that define the boundary of the mask on the original Sentinel-2 scene. If users wish to retrieve the L1C bands at their original resolutions, they can use these to do so.

Please see the README for further details on the dataset structure and more.

Contributions & Acknowledgements

The data were collected, annotated, checked, formatted and published by Alistair Francis and John Mrziglod.

Support and advice was provided by Prof. Jan-Peter Muller and Dr. Panagiotis Sidiropoulos, for which we are grateful.

We would like to extend our thanks to Dr. Pierre-Philippe Mathieu and the rest of the team at ESA PhiLab, who provided the environment in which this project was conceived, and continued to give technical support throughout.

Finally, we thank the ESA Network of Resources for sponsoring this project by providing ICT resources.
RarePlanes Dataset
opendatalab.com
paperswithcode.com
zip
Updated Mar 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AI.Reverie (2023). RarePlanes Dataset [Dataset]. https://opendatalab.com/OpenDataLab/RarePlanes_Dataset
Explore at:
zipAvailable download formats
Dataset updated
Mar 24, 2023
Dataset provided by
AI.Reverie, Inc.
In-Q-Tel CosmiQ Works
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
RarePlanes is a unique open-source machine learning dataset from CosmiQ Works and AI.Reverie that incorporates both real and synthetically generated satellite imagery. The RarePlanes dataset specifically focuses on the value of AI.Reverie synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery. Although other synthetic/real combination datasets exist, RarePlanes is the largest openly-available very-high resolution dataset built to test the value of synthetic data from an overhead perspective. Previous research has shown that synthetic data can reduce the amount of real training data needed and potentially improve performance for many tasks in the computer vision domain. The real portion of the dataset consists of 253 Maxar WorldView-3 satellite scenes spanning 112 locations and 2,142 km^2 with 14,700 hand-annotated aircraft. The accompanying synthetic dataset is generated via AI.Reverie’s novel simulation platform and features 50,000 synthetic satellite images with ~630,000 aircraft annotations. Both the real and synthetically generated aircraft feature 10 fine grain attributes including: aircraft length, wingspan, wing-shape, wing-position, wingspan class, propulsion, number of engines, number of vertical-stabilizers, presence of canards, and aircraft role. Finally, we conduct extensive experiments to evaluate the real and synthetic datasets and compare performances. By doing so, we show the value of synthetic data for the task of detecting and classifying aircraft from an overhead perspective.
Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...
zenodo.org
observatorio-cientifico.ua.es
+2more
text/x-python, zip
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik (2025). Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0) [Dataset]. http://doi.org/10.5281/zenodo.6941662
Explore at:
zip, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6941662
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).

Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):

Land Cover Class ID: is the identification number of each LULC class

Land Cover Class Short Name: is the short name of each LULC class

Image ID: is the identification number of each image within its corresponding LULC class

Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products

GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image

Latitude: is the latitude of the center point of each image

Longitude: is the longitude of the center point of each image

Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes

Administrative Department Level1: is the administrative level 1 name to which each image belongs

Administrative Department Level2: is the administrative level 2 name to which each image belongs

Locality: is the name of the locality to which each image belongs

Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile

For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:

A CSV file that contains all exported images for this class

A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images".

To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.

© Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)
m
Data from: Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka,...
data.mendeley.com
Updated May 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Fahad Khan (2025). Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka, Bangladesh: A Resource for Environmental Analysis [Dataset]. http://doi.org/10.17632/hst78yczmy.5
Explore at:
Unique identifier
https://doi.org/10.17632/hst78yczmy.5
Dataset updated
May 28, 2025
Authors
Md Fahad Khan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Dhaka, Bangladesh
Description
Google Earth Pro facilitated the acquisition of satellite imagery to monitor deforestation in Dhaka, Bangladesh. Multiple years of images were systematically captured from specific locations, allowing comprehensive analysis of tree cover reduction. The imagery displays diverse aspect ratios based on satellite perspectives and possesses high resolution, suitable for remote sensing. Each site provided 5 to 35 images annually, accumulating data over a ten-year period. The dataset classifies images into three primary categories: tree cover, deforested regions, and masked images. Organized by year, it comprises both raw and annotated images, each paired with a JSON file containing annotations and segmentation masks. This organization enhances accessibility and temporal analysis. Furthermore, the dataset is conducive to machine learning initiatives, particularly in training models for object detection and segmentation to evaluate environmental alterations.
Sentinel-2 Cloud Mask Annotations with Variability
kaggle.com
Updated Jan 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Sentinel-2 Cloud Mask Annotations with Variability [Dataset]. https://www.kaggle.com/datasets/thedevastator/sentinel-2-cloud-mask-annotations-with-variabili/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Sentinel-2 Cloud Mask Annotations with Variability Tags

A Large, Representative Set of 20m Subscenes

By [source]

About this dataset

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

In this guide, we will cover how to use this dataset and what information can be derived from it.

First, let’s take a look at the columns in the dataset. We have scene name, difficulty level, annotator name, shadows_marked (yes/no), clear percent, cloud percent, shadow percent, dataset type (WorldView 2 or 3), forest/jungle coverage percentage details etc., snow/ice coverage percentage details etc., agricultural coverage percentage details etc., urban/developed coverage percentage details etc., coastal coverage percentage details etc., hills/mountains coverage percentage details etc., desert/barren coverage percentage details etc., shrublands/plains coverage percentage details(etc.), wetland/bog marsh coverage%, open water%, enclosed water%, thin cloud % , thick clouds % , low clouds % , high clouds % , isolated clouds % along with extended cloud type (altocumulus / stratocumulus) cirrus haze / fog , ice_clouds & contrails . All of these columns provide detailed percentages about different types of landcover along with corresponding cloud types & other useful information like annotator name involved in creating annotation for a particular scene .

The data within each column can then be used to derive further insights about any given Sentinel-2 subscene including landcover as well as various associated meteorological events such as precipitation and wind patterns which could enable specific decision-making applications like crop monitoring or urban development tracking in addition to understanding environmental impacts over large areas easily visible through satellite imagery. Furthermore, by analyzing this data combined with other standard atmospheric parameters such as wind speed & direction it is possible to track storm path direction by looking at cyclonic activity predicted by different conditions pertaining to satellite images gathered previously allowing accurate forecasting opportunity .

Research Ideas

Using the geographical attributes associated with each scene, this dataset can be used to categorize cultures based on their characteristics and geography.

This dataset can be used to better understand climate data, by looking at how cloud formations are distributed in a region and in relation to weather patterns.

This dataset can also help with machine learning projects related to object detection, as the cloud patterns and layout of the scenes can be seen as objects that algorithms should try to recognize or identify correctly while training

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: classification_tags.csv | Column name | Description | |:------------------------------|:------------------------------------------------------------------------| | scene | Unique identifier for each subscene. (String) | | difficulty | Difficulty rating of the subscene. (Integer) | | annotator | Name of the annotator who classified the subscene. (String) | | shadows_marked | Whether shadows were marked in the subscene. (Boolean) | | clear_percent | Percentage of clear sky in the subscene. (Float) | | cloud_percent | Percentage of clouds in the subscene. (Float) | | shadow_percent | Percentage of shadows in the subscene. (Float) | | dataset | Dataset the subscene was taken from. (String) | | forest/jungle | Percentage of forest/jungle in the subscene. (Float) | | snow/ice | Percentage of snow/ice in the subscene. (Float) | | agricultural ...
Deep Fmask Dataset: Labeled dataset for Cloud, Shadow, Clear-Sky Land, Snow...
doi.pangaea.de
zip
Updated Mar 14, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamal Gopikrishnan Nambiar; Veniamin I Morgenshtern; Philipp Hochreuther; Thorsten Seehaus; Matthias Holger Braun (2022). Deep Fmask Dataset: Labeled dataset for Cloud, Shadow, Clear-Sky Land, Snow and Water Segmentation of Sentinel-2 Images over Snow and Ice Covered Regions [Dataset]. http://doi.org/10.1594/PANGAEA.942321
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.942321
Dataset updated
Mar 14, 2022
Dataset provided by
PANGAEA
Authors
Kamal Gopikrishnan Nambiar; Veniamin I Morgenshtern; Philipp Hochreuther; Thorsten Seehaus; Matthias Holger Braun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We present our dataset containing images with labeled polygons, annotated over Sentinel-2 L1C imagery from snow and ice-covered regions. We use similar labels as the Fmask cloud detection algorithm, i.e., clear-sky land, cloud, shadow, snow, and water. We annotated the labels manually using the QGIS software. The dataset consists of 45 scenes divided into validation (22 scenes) and test datasets (23 scenes). The source images were captured by the satellite between October 2019 and December 2020. We provide the list of '.SAFE' filenames containing the satellite imagery and these files can be downloaded from the Copernicus Open Access Hub. The dataset can be used to test and benchmark deep neural networks for the task of cloud, shadow, and snow segmentation.
Sarnet Search And Rescue Dataset
universe.roboflow.com
zip
Updated Jun 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow Public (2022). Sarnet Search And Rescue Dataset [Dataset]. https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue
Explore at:
zipAvailable download formats
Dataset updated
Jun 16, 2022
Dataset provided by
Roboflowhttps://roboflow.com/
Authors
Roboflow Public
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
SaR Bounding Boxes
Description
Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.

Satellite Imagery for Search And Rescue Dataset - ArXiv

This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.

https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">

The dataset contains the following:

Set Images Annotations
Train 1808 3048
Validate 490 747
Test 254 411
Total 2552 4206

The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.

Getting hold of the Data

Download the data here: sarnet.zip

Or follow these steps

# download the dataset wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip # extract the files unzip sarnet.zip

***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.

Getting started

Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb

Source Code for Paper

Source code for the paper is located here: SaRNet_train_test.ipynb

Cite this dataset

@misc{thoreau2021sarnet, title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery}, author={Michael Thoreau and Frazer Wilson}, year={2021}, eprint={2107.12469}, archivePrefix={arXiv}, primaryClass={eess.IV} }

Acknowledgment

The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.
TreeSatAI Benchmark Archive for Deep Learning in Forest Applications
zenodo.org
data.niaid.nih.gov
bin, pdf, zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias (2024). TreeSatAI Benchmark Archive for Deep Learning in Forest Applications [Dataset]. http://doi.org/10.5281/zenodo.6598391
Explore at:
pdf, zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6598391
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context and Aim

Deep learning in Earth Observation requires large image archives with highly reliable labels for model training and testing. However, a preferable quality standard for forest applications in Europe has not yet been determined. The TreeSatAI consortium investigated numerous sources for annotated datasets as an alternative to manually labeled training datasets.

We found the federal forest inventory of Lower Saxony, Germany represents an unseen treasure of annotated samples for training data generation. The respective 20-cm Color-infrared (CIR) imagery, which is used for forestry management through visual interpretation, constitutes an excellent baseline for deep learning tasks such as image segmentation and classification.

Description

The data archive is highly suitable for benchmarking as it represents the real-world data situation of many German forest management services. One the one hand, it has a high number of samples which are supported by the high-resolution aerial imagery. On the other hand, this data archive presents challenges, including class label imbalances between the different forest stand types.

The TreeSatAI Benchmark Archive contains:

50,381 image triplets (aerial, Sentinel-1, Sentinel-2)

synchronized time steps and locations

all original spectral bands/polarizations from the sensors

20 species classes (single labels)

12 age classes (single labels)

15 genus classes (multi labels)

60 m and 200 m patches

fixed split for train (90%) and test (10%) data

additional single labels such as English species name, genus, forest stand type, foliage type, land cover

The geoTIFF and GeoJSON files are readable in any GIS software, such as QGIS. For further information, we refer to the PDF document in the archive and publications in the reference section.

Version history

v1.0.0 - First release

Citation

Ahlswede et al. (in prep.)

GitHub

Full code examples and pre-trained models from the dataset article (Ahlswede et al. 2022) using the TreeSatAI Benchmark Archive are published on the GitHub repositories of the Remote Sensing Image Analysis (RSiM) Group (https://git.tu-berlin.de/rsim/treesat_benchmark). Code examples for the sampling strategy can be made available by Christian Schulz via email request.

Folder structure

We refer to the proposed folder structure in the PDF file.

Folder “aerial” contains the aerial imagery patches derived from summertime orthophotos of the years 2011 to 2020. Patches are available in 60 x 60 m (304 x 304 pixels). Band order is near-infrared, red, green, and blue. Spatial resolution is 20 cm.

Folder “s1” contains the Sentinel-1 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is VV, VH, and VV/VH ratio. Spatial resolution is 10 m.

Folder “s2” contains the Sentinel-2 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is B02, B03, B04, B08, B05, B06, B07, B8A, B11, B12, B01, and B09. Spatial resolution is 10 m.

The folder “labels” contains a JSON string which was used for multi-labeling of the training patches. Code example of an image sample with respective proportions of 94% for Abies and 6% for Larix is: "Abies_alba_3_834_WEFL_NLF.tif": [["Abies", 0.93771], ["Larix", 0.06229]]

The two files “test_filesnames.lst” and “train_filenames.lst” define the filenames used for train (90%) and test (10%) split. We refer to this fixed split for better reproducibility and comparability.

The folder “geojson” contains geoJSON files with all the samples chosen for the derivation of training patch generation (point, 60 m bounding box, 200 m bounding box).

CAUTION: As we could not upload the aerial patches as a single zip file on Zenodo, you need to download the 20 single species files (aerial_60m_…zip) separately. Then, unzip them into a folder named “aerial” with a subfolder named “60m”. This structure is recommended for better reproducibility and comparability to the experimental results of Ahlswede et al. (2022),

Join the archive

Model training, benchmarking, algorithm development… many applications are possible! Feel free to add samples from other regions in Europe or even worldwide. Additional remote sensing data from Lidar, UAVs or aerial imagery from different time steps are very welcome. This helps the research community in development of better deep learning and machine learning models for forest applications. You might have questions or want to share code/results/publications using that archive? Feel free to contact the authors.

Project description

This work was part of the project TreeSatAI (Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees at Infrastructures, Nature Conservation Sites and Forests). Its overall aim is the development of AI methods for the monitoring of forests and woody features on a local, regional and global scale. Based on freely available geodata from different sources (e.g., remote sensing, administration maps, and social media), prototypes will be developed for the deep learning-based extraction and classification of tree- and tree stand features. These prototypes deal with real cases from the monitoring of managed forests, nature conservation and infrastructures. The development of the resulting services by three enterprises (liveEO, Vision Impulse and LUP Potsdam) will be supported by three research institutes (German Research Center for Artificial Intelligence, TU Remote Sensing Image Analysis Group, TUB Geoinformation in Environmental Planning Lab).

Publications

Ahlswede et al. (2022, in prep.): TreeSatAI Dataset Publication

Ahlswede S., Nimisha, T.M., and Demir, B. (2022, in revision): Embedded Self-Enhancement Maps for Weakly Supervised Tree Species Mapping in Remote Sensing Images. IEEE Trans Geosci Remote Sens

Schulz et al. (2022, in prep.): Phenoprofiling

Conference contributions

S. Ahlswede, N. T. Madam, C. Schulz, B. Kleinschmit and B. Demіr, "Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation Methods", IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, “Exploring the temporal fingerprints of mid-European forest types from Sentinel-1 RVI and Sentinel-2 NDVI time series”, IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

C. Schulz, M. Förster, S. Vulova and B. Kleinschmit, “The temporal fingerprints of common European forest types from SAR and optical remote sensing data”, AGU Fall Meeting, New Orleans, USA, 2021.

B. Kleinschmit, M. Förster, C. Schulz, F. Arias, B. Demir, S. Ahlswede, A. K. Aksoy, T. Ha Minh, J. Hees, C. Gava, P. Helber, B. Bischke, P. Habelitz, A. Frick, R. Klinke, S. Gey, D. Seidel, S. Przywarra, R. Zondag and B. Odermatt, “Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees and Forests”, Living Planet Symposium, Bonn, Germany, 2022.

C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, (2022, submitted): “Exploring the temporal fingerprints of sixteen mid-European forest types from Sentinel-1 and Sentinel-2 time series”, ForestSAT, Berlin, Germany, 2022.

Set	Images	Annotations
Train	1808	3048
Validate	490	747
Test	254	411
Total	2552	4206

Facebook

Twitter

Click to copy link

Link copied

Cite

Siham Tabik (2022). Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793421

Data from: Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain)

Explore at:

Dataset updated

Jul 6, 2022

Dataset provided by

Siham Tabik
Sergio Puertas
Domingo Alcaraz-Segura
Rohaifa Khaldi

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered

Spain, Sierra Nevada

Description

This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to recognize Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 2000 images (.jpg) of size 512x512 pixels partitioned into two classes: Shrubs and NoShrubs. We also provide partitioning of the data into Train (1800 images), Test (100 images), and Validation (100 images) subsets.

Clear search

Close search

Google apps

Main menu

Data from: Dataset of very-high-resolution satellite RGB images to train...

Power Plant Satellite Imagery Dataset

Coast Train--Labeled imagery for training and evaluation of data-driven...

Data Labeling Market Report

Dataset of Deep Learning from Landsat-8 Satellite Images for Estimating...

Data from: Satellite Image Classification

Context

Content

Acknowledgements

Inspiration

Data from: BD-Sat

Global Data Annotation Tools Market Size By Data Type, By Functionality, By...

Data from: CloudTracks: A Dataset for Localizing Ship Tracks in Satellite...

Data from: MLRSNet: A Multi-label High Spatial Resolution Remote Sensing...

Electric Transmission and Distribution Infrastructure Imagery Dataset

Bonn Roof Material + Satellite Imagery Dataset

Sentinel-2 Cloud Mask Catalogue

RarePlanes Dataset

Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...

Data from: Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka,...

Sentinel-2 Cloud Mask Annotations with Variability

Sentinel-2 Cloud Mask Annotations with Variability Tags

A Large, Representative Set of 20m Subscenes

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Deep Fmask Dataset: Labeled dataset for Cloud, Shadow, Clear-Sky Land, Snow...

Sarnet Search And Rescue Dataset

Satellite Imagery for Search And Rescue Dataset - ArXiv

Getting hold of the Data

Getting started

Source Code for Paper

Cite this dataset

Acknowledgment

TreeSatAI Benchmark Archive for Deep Learning in Forest Applications

Data from: Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain)See More Versions

Data from: Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain)