32 datasets found
  1. Z

    Data from: Dataset of very-high-resolution satellite RGB images to train...

    • data.niaid.nih.gov
    • explore.openaire.eu
    • +2more
    Updated Jul 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Siham Tabik (2022). Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793421
    Explore at:
    Dataset updated
    Jul 6, 2022
    Dataset provided by
    Siham Tabik
    Sergio Puertas
    Domingo Alcaraz-Segura
    Rohaifa Khaldi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Spain, Sierra Nevada
    Description

    This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to recognize Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 2000 images (.jpg) of size 512x512 pixels partitioned into two classes: Shrubs and NoShrubs. We also provide partitioning of the data into Train (1800 images), Test (100 images), and Validation (100 images) subsets.

  2. f

    Power Plant Satellite Imagery Dataset

    • figshare.com
    pdf
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyle Bradbury; Benjamin Brigman; Gouttham Chandrasekar; Leslie Collins; Shamikh Hossain; Marc Jeuland; Timothy Johnson; Boning Li; Trishul Nagenalli (2023). Power Plant Satellite Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.5307364.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    figshare
    Authors
    Kyle Bradbury; Benjamin Brigman; Gouttham Chandrasekar; Leslie Collins; Shamikh Hossain; Marc Jeuland; Timothy Johnson; Boning Li; Trishul Nagenalli
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains satellite imagery of 4,454 power plants within the United States. The imagery is provided at two resolutions: 1m (4-band NAIP iamgery with near-infrared) and 30m (Landsat 8, pansharpened to 15m). The NAIP imagery is available for the U.S. and Landsat 8 is available globally. This dataset may be of value for computer vision work, machine learning, as well as energy and environmental analyses.Additionally, annotations of the specific locations of the spatial extent of the power plants in each image is provided. These annotations were collected via the crowdsourcing platform, Amazon Mechanical Turk, using multiple annotators for each image to ensure quality. Links to the sources of the imagery data, the annotation tool, and the team that created the dataset are included in the "References" section.To read more on these data, please refer to the "Power Plant Satellite Imagery Dataset Overview.pdf" file. To download a sample of the data without downloading the entire dataset, download "sample.zip" which includes two sample powerplants and the NAIP, Landsat 8, and binary annotations for each.Note: the NAIP imagery may appear "washed out" when viewed in standard image viewing software because it includes a near-infrared band in addition to the standard RGB data.

  3. U

    Coast Train--Labeled imagery for training and evaluation of data-driven...

    • data.usgs.gov
    • catalog.data.gov
    Updated Aug 31, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand (2024). Coast Train--Labeled imagery for training and evaluation of data-driven models for image segmentation [Dataset]. http://doi.org/10.5066/P91NP87I
    Explore at:
    Dataset updated
    Aug 31, 2024
    Dataset provided by
    United States Geological Surveyhttp://www.usgs.gov/
    Authors
    Phillipe Wernette; Daniel Buscombe; Jaycee Favela; Sharon Fitzpatrick; Evan Goldstein; Nicholas Enwright; Erin Dunand
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Time period covered
    Jan 1, 2008 - Dec 31, 2020
    Description

    Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...

  4. D

    Data Labeling Market Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Mar 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Labeling Market Report [Dataset]. https://www.datainsightsmarket.com/reports/data-labeling-market-20383
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Mar 8, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The data labeling market is experiencing robust growth, projected to reach $3.84 billion in 2025 and maintain a Compound Annual Growth Rate (CAGR) of 28.13% from 2025 to 2033. This expansion is fueled by the increasing demand for high-quality training data across various sectors, including healthcare, automotive, and finance, which heavily rely on machine learning and artificial intelligence (AI). The surge in AI adoption, particularly in areas like autonomous vehicles, medical image analysis, and fraud detection, necessitates vast quantities of accurately labeled data. The market is segmented by sourcing type (in-house vs. outsourced), data type (text, image, audio), labeling method (manual, automatic, semi-supervised), and end-user industry. Outsourcing is expected to dominate the sourcing segment due to cost-effectiveness and access to specialized expertise. Similarly, image data labeling is likely to hold a significant share, given the visual nature of many AI applications. The shift towards automation and semi-supervised techniques aims to improve efficiency and reduce labeling costs, though manual labeling will remain crucial for tasks requiring high accuracy and nuanced understanding. Geographical distribution shows strong potential across North America and Europe, with Asia-Pacific emerging as a key growth region driven by increasing technological advancements and digital transformation. Competition in the data labeling market is intense, with a mix of established players like Amazon Mechanical Turk and Appen, alongside emerging specialized companies. The market's future trajectory will likely be shaped by advancements in automation technologies, the development of more efficient labeling techniques, and the increasing need for specialized data labeling services catering to niche applications. Companies are focusing on improving the accuracy and speed of data labeling through innovations in AI-powered tools and techniques. Furthermore, the rise of synthetic data generation offers a promising avenue for supplementing real-world data, potentially addressing data scarcity challenges and reducing labeling costs in certain applications. This will, however, require careful attention to ensure that the synthetic data generated is representative of real-world data to maintain model accuracy. This comprehensive report provides an in-depth analysis of the global data labeling market, offering invaluable insights for businesses, investors, and researchers. The study period covers 2019-2033, with 2025 as the base and estimated year, and a forecast period of 2025-2033. We delve into market size, segmentation, growth drivers, challenges, and emerging trends, examining the impact of technological advancements and regulatory changes on this rapidly evolving sector. The market is projected to reach multi-billion dollar valuations by 2033, fueled by the increasing demand for high-quality data to train sophisticated machine learning models. Recent developments include: September 2024: The National Geospatial-Intelligence Agency (NGA) is poised to invest heavily in artificial intelligence, earmarking up to USD 700 million for data labeling services over the next five years. This initiative aims to enhance NGA's machine-learning capabilities, particularly in analyzing satellite imagery and other geospatial data. The agency has opted for a multi-vendor indefinite-delivery/indefinite-quantity (IDIQ) contract, emphasizing the importance of annotating raw data be it images or videos—to render it understandable for machine learning models. For instance, when dealing with satellite imagery, the focus could be on labeling distinct entities such as buildings, roads, or patches of vegetation.October 2023: Refuel.ai unveiled a new platform, Refuel Cloud, and a specialized large language model (LLM) for data labeling. Refuel Cloud harnesses advanced LLMs, including its proprietary model, to automate data cleaning, labeling, and enrichment at scale, catering to diverse industry use cases. Recognizing that clean data underpins modern AI and data-centric software, Refuel Cloud addresses the historical challenge of human labor bottlenecks in data production. With Refuel Cloud, enterprises can swiftly generate the expansive, precise datasets they require in mere minutes, a task that traditionally spanned weeks.. Key drivers for this market are: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Potential restraints include: Rising Penetration of Connected Cars and Advances in Autonomous Driving Technology, Advances in Big Data Analytics based on AI and ML. Notable trends are: Healthcare is Expected to Witness Remarkable Growth.

  5. m

    Dataset of Deep Learning from Landsat-8 Satellite Images for Estimating...

    • data.mendeley.com
    Updated Jun 6, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yudhi Prabowo (2022). Dataset of Deep Learning from Landsat-8 Satellite Images for Estimating Burned Areas in Indonesia [Dataset]. http://doi.org/10.17632/fs7mtkg2wk.5
    Explore at:
    Dataset updated
    Jun 6, 2022
    Authors
    Yudhi Prabowo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Indonesia
    Description

    The dataset consist of three categories; image subsets, burned area masks and quicklooks. The image subsets are derived from Landsat-8 scenes taken during the years 2019 and 2021. Each image has a size of 512x512 pixels and consists of 8 multispectral. The sequence of band names from band 1 to band 7 of the image subset is same as the sequence of band names of landsat-8 scene, except for band 8 of the image subset which is band 9 (cirrus band) in the original landsat-8 scene. The image subsets are saved in GeoTIFF file format with the latitude longitude coordinate system and WGS 1984 as the datum. The spatial resolution of image subsets is 0.00025 degree and the pixel values are stored in 16 bit unsigned integer with the range of value from 0 to 65535. The total of the dataset is 227 images which containing object of burned area surrounded by various ecological diversity backgrounds such as forest, shrub, grassland, waterbody, bare land, settlement, cloud and cloud shadow. In some cases, there are some image subsets with the burned areas covered by smoke due to the fire is still active. Some image subsets also overlap each other to cover the area of burned scar which the area is too large. The burned area mask is a binary annotation image which consists of two classes; burned area as the foreground and non-burned area as the background. These binary images are saved in 8 bit unsigned integer where the burned area is indicated by the pixel value of 1, whereas the non-burned area is indicated by 0. The burned area masks in this dataset contain only burned scars and are not contaminated with thick clouds, shadows, and vegetation. Among 227 images, 206 images contain burned areas whereas 21 images contain only background. The highest number of images in this dataset is dominated by images with coverage percentage of burned area between 0 and 10 percent. Our dataset also provides quicklook image as a quick preview of image subset. It offers a fast and full size preview of image subset without opening the file using any GIS software. The quicklook images can also be used for training and evaluating the model as a substitute of image subsets. The image size is 512x512 pixels same as the size of image subset and annotation image. It consists of three bands as a false color composite quicklook images, with combination of band 7 (SWIR-2), band 5 (NIR), and band 4 (red). These RGB composite images have been performed contrast stretching to enhance the images visualizations. The quicklook images are stored in GeoTIFF file format with 8 bit unsigned integer.

    This work was financed by Riset Inovatif Produktif (RISPRO) fund through Prioritas Riset Nasional (PRN) project, grant no. 255/E1/PRN/2020 for 2020 - 2021 contract period.

  6. Data from: Satellite Image Classification

    • kaggle.com
    Updated Aug 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mahmoud Reda (2021). Satellite Image Classification [Dataset]. https://www.kaggle.com/mahmoudreda55/satellite-image-classification/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 21, 2021
    Dataset provided by
    Kaggle
    Authors
    Mahmoud Reda
    Description

    Context

    Satellite image Classification Dataset-RSI-CB256 , This dataset has 4 different classes mixed from Sensors and google map snapshot

    Content

    The past years have witnessed great progress on remote sensing (RS) image interpretation and its wide applications. With RS images becoming more accessible than ever before, there is an increasing demand for the automatic interpretation of these images. In this context, the benchmark datasets serve as essential prerequisites for developing and testing intelligent interpretation algorithms. After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation. Specifically, we first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations. We then present the general guidance on creating benchmark datasets in efficient manners. Following the presented guidance, we also provide an example on building RS image dataset, i.e., Million-AID, a new large-scale benchmark dataset containing a million instances for RS image scene classification. Several challenges and perspectives in RS image annotation are finally discussed to facilitate the research in benchmark dataset construction. We do hope this paper will provide the RS community an overall perspective on constructing large-scale and practical image datasets for further research, especially data-driven ones.

    Acknowledgements

    Annotated Datasets for RS Image Interpretation The interpretation of RS images has been playing an increasingly important role in a large diversity of applications, and thus, has attracted remarkable research attentions. Consequently, various datasets have been built to advance the development of interpretation algorithms for RS images. Covering literature published over the past decade, we perform a systematic review of the existing RS image datasets concerning the current mainstream of RS image interpretation tasks, including scene classification, object detection, semantic segmentation and change detection.

    Inspiration

    Artificial Intelligence, Computer Vision, Image Processing, Deep Learning, Satellite Image, Remote Sensing

  7. d

    Data from: BD-Sat

    • search.dataone.org
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul, Ovi; Nayem, Abu Bakar Siddik; Sarker, Anis; Ahsan Ali, Amin; Amin, M Ashraful; Rahman, AKM Mahbubur (2023). BD-Sat [Dataset]. http://doi.org/10.7910/DVN/TQZUCL
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Paul, Ovi; Nayem, Abu Bakar Siddik; Sarker, Anis; Ahsan Ali, Amin; Amin, M Ashraful; Rahman, AKM Mahbubur
    Description

    BD-Sat provides a high-resolution dataset that includes pixel-by-pixel LULC annotations for Dhaka metropolitan city and the rural/urban area surrounding it. With the strict and standard procedure, the ground truth is made using Bing-satellite imagery at a ground spatial distance of 2.22 meters/pixel. Three stages well-defined annotation process has been followed with the support from geographic information system (GIS) experts to ensure the reliability of the annotations. We perform several experiments to establish the benchmark results. Results show that the annotated BD-Sat is sufficient to train large deep-learning models with adequate accuracy with five major LULC classes: forest, farmland, built-up, water, and meadow.

  8. Global Data Annotation Tools Market Size By Data Type, By Functionality, By...

    • verifiedmarketresearch.com
    Updated Mar 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    VERIFIED MARKET RESEARCH (2024). Global Data Annotation Tools Market Size By Data Type, By Functionality, By Industry of End Use, By Geographic Scope And Forecast [Dataset]. https://www.verifiedmarketresearch.com/product/data-annotation-tools-market/
    Explore at:
    Dataset updated
    Mar 19, 2024
    Dataset provided by
    Verified Market Researchhttps://www.verifiedmarketresearch.com/
    Authors
    VERIFIED MARKET RESEARCH
    License

    https://www.verifiedmarketresearch.com/privacy-policy/https://www.verifiedmarketresearch.com/privacy-policy/

    Time period covered
    2024 - 2030
    Area covered
    Global
    Description

    Data Annotation Tools Market size was valued at USD 0.03 Billion in 2023 and is projected to reach USD 4.04 Billion by 2030, growing at a CAGR of 25.5% during the forecasted period 2024 to 2030.

    Global Data Annotation Tools Market Drivers

    The market drivers for the Data Annotation Tools Market can be influenced by various factors. These may include:

    Rapid Growth in AI and Machine Learning: The demand for data annotation tools to label massive datasets for training and validation purposes is driven by the rapid growth of AI and machine learning applications across a variety of industries, including healthcare, automotive, retail, and finance.

    Increasing Data Complexity: As data kinds like photos, videos, text, and sensor data become more complex, more sophisticated annotation tools are needed to handle a variety of data formats, annotations, and labeling needs. This will spur market adoption and innovation.

    Quality and Accuracy Requirements: Training accurate and dependable AI models requires high-quality annotated data. Organizations can attain enhanced annotation accuracy and consistency by utilizing data annotation technologies that come with sophisticated annotation algorithms, quality control measures, and human-in-the-loop capabilities.

    Applications Specific to Industries: The development of specialized annotation tools for particular industries, like autonomous vehicles, medical imaging, satellite imagery analysis, and natural language processing, is prompted by their distinct regulatory standards and data annotation requirements.

  9. Data from: CloudTracks: A Dataset for Localizing Ship Tracks in Satellite...

    • zenodo.org
    zip
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Ahmed Chaudhry; Muhammad Ahmed Chaudhry; Lyna Kim; Jeremy Irvin; Jeremy Irvin; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris; Lyna Kim; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris (2023). CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds [Dataset]. http://doi.org/10.5281/zenodo.8412855
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 1, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Muhammad Ahmed Chaudhry; Muhammad Ahmed Chaudhry; Lyna Kim; Jeremy Irvin; Jeremy Irvin; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris; Lyna Kim; Yuzu Ido; Sonia Chu; Jared Thomas Isobe; Andrew Y. Ng; Duncan Watson-Parris
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    [Please use version 1.0.1]

    The CloudTracks dataset consists of 1,780 MODIS satellite images hand-labeled for the presence of more than 12,000 ship tracks. More information about how the dataset was constructed may be found at github.com/stanfordmlgroup/CloudTracks. The file structure of the dataset is as follows:

    CloudTracks/
    full/
    images/
    (sample image name) mod2002121.1920D.png
    jsons/
    (sample json name) mod2002121.1920D.json

    The naming convention is as follows:
    mod2002121.1920D: the first 3 letters specify which of the sensors on the two MODIS satellites captured the image, mod for Terra and myd for Aqua. This is followed by a 4 digit year (2002) and a 3 digit day of the year (121). The following 4 digits specify the time of day (1920; 24 hour format in the UTC timezone), followed by D or N for Day or Night.

    The 1,780 MODIS Terra and Aqua images were collected between 2002 and 2021 inclusive over various stratocumulus cloud regions (such as the East Pacific and East Atlantic) where ship tracks have commonly been observed. Each image has dimension 1354 x 2030 and a spatial resolution of 1km. Of the 36 bands collected by the instruments, we selected channels 1, 20, and 32 to capture useful physical properties of cloud formations.

    The labels are found in the corresponding JSON files for each image. The following keys in the json are particularly important:

    imagePath: the filename of the image.
    shapes: the list of annotations corresponding to the image, where each element of the list is a dictionary corresponding to a single instance annotation. The dictionary has a key with value "shiptrack" or "uncertain" which is the label of the annotation and the corresponding value is a linestrip detailing the ship track path.

    Further pre-processing details may be found at the GitHub link above. If you have any questions about the dataset, contact us at:
    mahmedch@stanford.edu, lynakim@stanford.edu, jirvin16@cs.stanford.edu

  10. m

    Data from: MLRSNet: A Multi-label High Spatial Resolution Remote Sensing...

    • data.mendeley.com
    Updated Sep 18, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xiaoman Qi (2023). MLRSNet: A Multi-label High Spatial Resolution Remote Sensing Dataset for Semantic Scene Understanding [Dataset]. http://doi.org/10.17632/7j9bv9vwsx.4
    Explore at:
    Dataset updated
    Sep 18, 2023
    Authors
    Xiaoman Qi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MLRSNet provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.

    The Dataset includes: 1. Images folder: 46 categories, 109,161 high-spatial resolution remote sensing images. 2. Labels folders: each category has a .csv file. 3. Categories_names. xlsx: Sheet1 lists the names of 46 categories, and the Sheet2 shows the associated multi-label to each category.

  11. f

    Electric Transmission and Distribution Infrastructure Imagery Dataset

    • figshare.com
    pdf
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kyle Bradbury; Qiwei Han; Varun Nair; Tamasha Pathirathna; Xiaolan You (2023). Electric Transmission and Distribution Infrastructure Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.6931088.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Kyle Bradbury; Qiwei Han; Varun Nair; Tamasha Pathirathna; Xiaolan You
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    OverviewThe dataset contains fully annotated electric transmission and distribution infrastructure for approximately 321 sq km of high resolution satellite and aerial imagery from around the world. The imagery and associated infrastructure annotations span 14 cities and 5 continents, and were selected to represent diversity in human settlement density (i.e. rural vs urban), terrain type, and development index. This dataset may be of particular interest to those looking to train machine learning algorithms to automatically identify energy infrastructure in satellite imagery or for those working on domain adaptation for computer vision. Automated algorithms for identifying electricity infrastructure in satellite imagery may assist policy makers identify the best pathway to electrification for unelectrified areas.Data SourcesThis dataset contains data sourced from the LINZ Data Service licensed for reuse under CC BY 4.0. This dataset also contained extracts from the SpaceNet dataset:SpaceNet on Amazon Web Services (AWS). “Datasets.” The SpaceNet Catalog. Last modified April 30, 2018 (link below).Other imagery data included in this dataset are from the Connecticut Department of Energy and Environmental Protection and the U.S. Geological Survey. Links to each of the imagery data sources are provided below as well as the link to the annotation tool and the github repository that provides tools for using these data.AcknowledgementsThis dataset was created as part of the Duke University Data+ project, "Energy Infrastructure Map of the World" (link below) in collaboration with the Information Initiative at Duke and the Duke University Energy Initiative.

  12. f

    Bonn Roof Material + Satellite Imagery Dataset

    • figshare.com
    zip
    Updated Apr 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Julian Huang; Yue Lin; Alex Nhancololo (2025). Bonn Roof Material + Satellite Imagery Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.28713194.v2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 18, 2025
    Dataset provided by
    figshare
    Authors
    Julian Huang; Yue Lin; Alex Nhancololo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bonn
    Description

    This dataset consists of annotated high-resolution aerial imagery of roof materials in Bonn, Germany, in the Ultralytics YOLO instance segmentation dataset format. Aerial imagery was sourced from OpenAerialMap, specifically from the Maxar Open Data Program. Roof material labels and building outlines were sourced from OpenStreetMap. Images and labels are split into training, validation, and test sets, meant for future machine learning models to be trained upon, for both building segmentation and roof type classification.The dataset is intended for applications such as informing studies on thermal efficiency, roof durability, heritage conservation, or socioeconomic analyses. There are six roof material types: roof tiles, tar paper, metal, concrete, gravel, and glass.Note: The data is in a .zip due to file upload limits. Please find a more detailed dataset description in the README.md

  13. Sentinel-2 Cloud Mask Catalogue

    • zenodo.org
    • data.niaid.nih.gov
    csv, pdf, zip
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alistair Francis; Alistair Francis; John Mrziglod; Panagiotis Sidiropoulos; Panagiotis Sidiropoulos; Jan-Peter Muller; Jan-Peter Muller; John Mrziglod (2024). Sentinel-2 Cloud Mask Catalogue [Dataset]. http://doi.org/10.5281/zenodo.4172871
    Explore at:
    pdf, zip, csvAvailable download formats
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alistair Francis; Alistair Francis; John Mrziglod; Panagiotis Sidiropoulos; Panagiotis Sidiropoulos; Jan-Peter Muller; Jan-Peter Muller; John Mrziglod
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Overview

    This dataset comprises cloud masks for 513 1022-by-1022 pixel subscenes, at 20m resolution, sampled random from the 2018 Level-1C Sentinel-2 archive. The design of this dataset follows from some observations about cloud masking: (i) performance over an entire product is highly correlated, thus subscenes provide more value per-pixel than full scenes, (ii) current cloud masking datasets often focus on specific regions, or hand-select the products used, which introduces a bias into the dataset that is not representative of the real-world data, (iii) cloud mask performance appears to be highly correlated to surface type and cloud structure, so testing should include analysis of failure modes in relation to these variables.

    The data was annotated semi-automatically, using the IRIS toolkit, which allows users to dynamically train a Random Forest (implemented using LightGBM), speeding up annotations by iteratively improving it's predictions, but preserving the annotator's ability to make final manual changes when needed. This hybrid approach allowed us to process many more masks than would have been possible manually, which we felt was vital in creating a large enough dataset to approximate the statistics of the whole Sentinel-2 archive.

    In addition to the pixel-wise, 3 class (CLEAR, CLOUD, CLOUD_SHADOW) segmentation masks, we also provide users with binary
    classification "tags" for each subscene that can be used in testing to determine performance in specific circumstances. These include:

    • SURFACE TYPE: 11 categories
    • CLOUD TYPE: 7 categories
    • CLOUD HEIGHT: low, high
    • CLOUD THICKNESS: thin, thick
    • CLOUD EXTENT: isolated, extended

    Wherever practical, cloud shadows were also annotated, however this was sometimes not possible due to high-relief terrain, or large ambiguities. In total, 424 were marked with shadows (if present), and 89 have shadows that were not annotatable due to very ambiguous shadow boundaries, or terrain that cast significant shadows. If users wish to train an algorithm specifically for cloud shadow masks, we advise them to remove those 89 images for which shadow was not possible, however, bear in mind that this will systematically reduce the difficulty of the shadow class compared to real-world use, as these contain the most difficult shadow examples.

    In addition to the 20m sampled subscenes and masks, we also provide users with shapefiles that define the boundary of the mask on the original Sentinel-2 scene. If users wish to retrieve the L1C bands at their original resolutions, they can use these to do so.

    Please see the README for further details on the dataset structure and more.

    Contributions & Acknowledgements

    The data were collected, annotated, checked, formatted and published by Alistair Francis and John Mrziglod.

    Support and advice was provided by Prof. Jan-Peter Muller and Dr. Panagiotis Sidiropoulos, for which we are grateful.

    We would like to extend our thanks to Dr. Pierre-Philippe Mathieu and the rest of the team at ESA PhiLab, who provided the environment in which this project was conceived, and continued to give technical support throughout.

    Finally, we thank the ESA Network of Resources for sponsoring this project by providing ICT resources.

  14. RarePlanes Dataset

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Mar 24, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AI.Reverie (2023). RarePlanes Dataset [Dataset]. https://opendatalab.com/OpenDataLab/RarePlanes_Dataset
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    AI.Reverie, Inc.
    In-Q-Tel CosmiQ Works
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    RarePlanes is a unique open-source machine learning dataset from CosmiQ Works and AI.Reverie that incorporates both real and synthetically generated satellite imagery. The RarePlanes dataset specifically focuses on the value of AI.Reverie synthetic data to aid computer vision algorithms in their ability to automatically detect aircraft and their attributes in satellite imagery. Although other synthetic/real combination datasets exist, RarePlanes is the largest openly-available very-high resolution dataset built to test the value of synthetic data from an overhead perspective. Previous research has shown that synthetic data can reduce the amount of real training data needed and potentially improve performance for many tasks in the computer vision domain. The real portion of the dataset consists of 253 Maxar WorldView-3 satellite scenes spanning 112 locations and 2,142 km^2 with 14,700 hand-annotated aircraft. The accompanying synthetic dataset is generated via AI.Reverie’s novel simulation platform and features 50,000 synthetic satellite images with ~630,000 aircraft annotations. Both the real and synthetically generated aircraft feature 10 fine grain attributes including: aircraft length, wingspan, wing-shape, wing-position, wingspan class, propulsion, number of engines, number of vertical-stabilizers, presence of canards, and aircraft role. Finally, we conduct extensive experiments to evaluate the real and synthetic datasets and compare performances. By doing so, we show the value of synthetic data for the task of detecting and classifying aircraft from an overhead perspective.

  15. Data from: Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB...

    • zenodo.org
    • observatorio-cientifico.ua.es
    • +2more
    text/x-python, zip
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik (2025). Sentinel2GlobalLULC: A dataset of Sentinel-2 georeferenced RGB imagery annotated for global land use/land cover mapping with deep learning (License CC BY 4.0) [Dataset]. http://doi.org/10.5281/zenodo.6941662
    Explore at:
    zip, text/x-pythonAvailable download formats
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Yassir Benhammou; Yassir Benhammou; Domingo Alcaraz-Segura; Domingo Alcaraz-Segura; Emilio Guirado; Emilio Guirado; Rohaifa Khaldi; Rohaifa Khaldi; Siham Tabik; Siham Tabik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sentinel2GlobalLULC is a deep learning-ready dataset of RGB images from the Sentinel-2 satellites designed for global land use and land cover (LULC) mapping. Sentinel2GlobalLULC v2.1 contains 194,877 images in GeoTiff and JPEG format corresponding to 29 broad LULC classes. Each image has 224 x 224 pixels at 10 m spatial resolution and was produced by assigning the 25th percentile of all available observations in the Sentinel-2 collection between June 2015 and October 2020 in order to remove atmospheric effects (i.e., clouds, aerosols, shadows, snow, etc.). A spatial purity value was assigned to each image based on the consensus across 15 different global LULC products available in Google Earth Engine (GEE).

    Our dataset is structured into 3 main zip-compressed folders, an Excel file with a dictionary for class names and descriptive statistics per LULC class, and a python script to convert RGB GeoTiff images into JPEG format. The first folder called "Sentinel2LULC_GeoTiff.zip" contains 29 zip-compressed subfolders where each one corresponds to a specific LULC class with hundreds to thousands of GeoTiff Sentinel-2 RGB images. The second folder called "Sentinel2LULC_JPEG.zip" contains 29 zip-compressed subfolders with a JPEG formatted version of the same images provided in the first main folder. The third folder called "Sentinel2LULC_CSV.zip" includes 29 zip-compressed CSV files with as many rows as provided images and with 12 columns containing the following metadata (this same metadata is provided in the image filenames):

    • Land Cover Class ID: is the identification number of each LULC class
    • Land Cover Class Short Name: is the short name of each LULC class
    • Image ID: is the identification number of each image within its corresponding LULC class
    • Pixel purity Value: is the spatial purity of each pixel for its corresponding LULC class calculated as the spatial consensus across up to 15 land-cover products
    • GHM Value: is the spatial average of the Global Human Modification index (gHM) for each image
    • Latitude: is the latitude of the center point of each image
    • Longitude: is the longitude of the center point of each image
    • Country Code: is the Alpha-2 country code of each image as described in the ISO 3166 international standard. To understand the country codes, we recommend the user to visit the following website where they present the Alpha-2 code for each country as described in the ISO 3166 international standard:https: //www.iban.com/country-codes
    • Administrative Department Level1: is the administrative level 1 name to which each image belongs
    • Administrative Department Level2: is the administrative level 2 name to which each image belongs
    • Locality: is the name of the locality to which each image belongs
    • Number of S2 images : is the number of found instances in the corresponding Sentinel-2 image collection between June 2015 and October 2020, when compositing and exporting its corresponding image tile

    For seven LULC classes, we could not export from GEE all images that fulfilled a spatial purity of 100% since there were millions of them. In this case, we exported a stratified random sample of 14,000 images and provided an additional CSV file with the images actually contained in our dataset. That is, for these seven LULC classes, we provide these 2 CSV files:

    • A CSV file that contains all exported images for this class
    • A CSV file that contains all images available for this class at spatial purity of 100%, both the ones exported and the ones not exported, in case the user wants to export them. These CSV filenames end with "including_non_downloaded_images".

    To clearly state the geographical coverage of images available in this dataset, we included in the version v2.1, a compressed folder called "Geographic_Representativeness.zip". This zip-compressed folder contains a csv file for each LULC class that provides the complete list of countries represented in that class. Each csv file has two columns, the first one gives the country code and the second one gives the number of images provided in that country for that LULC class. In addition to these 29 csv files, we provided another csv file that maps each ISO Alpha-2 country code to its original full country name.

    © Sentinel2GlobalLULC Dataset by Yassir Benhammou, Domingo Alcaraz-Segura, Emilio Guirado, Rohaifa Khaldi, Boujemâa Achchab, Francisco Herrera & Siham Tabik is marked with Attribution 4.0 International (CC-BY 4.0)

  16. m

    Data from: Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka,...

    • data.mendeley.com
    Updated May 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Fahad Khan (2025). Geospatial Dataset on Deforestation and Urban Sprawl in Dhaka, Bangladesh: A Resource for Environmental Analysis [Dataset]. http://doi.org/10.17632/hst78yczmy.5
    Explore at:
    Dataset updated
    May 28, 2025
    Authors
    Md Fahad Khan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Dhaka, Bangladesh
    Description

    Google Earth Pro facilitated the acquisition of satellite imagery to monitor deforestation in Dhaka, Bangladesh. Multiple years of images were systematically captured from specific locations, allowing comprehensive analysis of tree cover reduction. The imagery displays diverse aspect ratios based on satellite perspectives and possesses high resolution, suitable for remote sensing. Each site provided 5 to 35 images annually, accumulating data over a ten-year period. The dataset classifies images into three primary categories: tree cover, deforested regions, and masked images. Organized by year, it comprises both raw and annotated images, each paired with a JSON file containing annotations and segmentation masks. This organization enhances accessibility and temporal analysis. Furthermore, the dataset is conducive to machine learning initiatives, particularly in training models for object detection and segmentation to evaluate environmental alterations.

  17. Sentinel-2 Cloud Mask Annotations with Variability

    • kaggle.com
    Updated Jan 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Sentinel-2 Cloud Mask Annotations with Variability [Dataset]. https://www.kaggle.com/datasets/thedevastator/sentinel-2-cloud-mask-annotations-with-variabili/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Sentinel-2 Cloud Mask Annotations with Variability Tags

    A Large, Representative Set of 20m Subscenes

    By [source]

    About this dataset

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    In this guide, we will cover how to use this dataset and what information can be derived from it.

    First, let’s take a look at the columns in the dataset. We have scene name, difficulty level, annotator name, shadows_marked (yes/no), clear percent, cloud percent, shadow percent, dataset type (WorldView 2 or 3), forest/jungle coverage percentage details etc., snow/ice coverage percentage details etc., agricultural coverage percentage details etc., urban/developed coverage percentage details etc., coastal coverage percentage details etc., hills/mountains coverage percentage details etc., desert/barren coverage percentage details etc., shrublands/plains coverage percentage details(etc.), wetland/bog marsh coverage%, open water%, enclosed water%, thin cloud % , thick clouds % , low clouds % , high clouds % , isolated clouds % along with extended cloud type (altocumulus / stratocumulus) cirrus haze / fog , ice_clouds & contrails . All of these columns provide detailed percentages about different types of landcover along with corresponding cloud types & other useful information like annotator name involved in creating annotation for a particular scene .

    The data within each column can then be used to derive further insights about any given Sentinel-2 subscene including landcover as well as various associated meteorological events such as precipitation and wind patterns which could enable specific decision-making applications like crop monitoring or urban development tracking in addition to understanding environmental impacts over large areas easily visible through satellite imagery. Furthermore, by analyzing this data combined with other standard atmospheric parameters such as wind speed & direction it is possible to track storm path direction by looking at cyclonic activity predicted by different conditions pertaining to satellite images gathered previously allowing accurate forecasting opportunity .

    Research Ideas

    • Using the geographical attributes associated with each scene, this dataset can be used to categorize cultures based on their characteristics and geography.
    • This dataset can be used to better understand climate data, by looking at how cloud formations are distributed in a region and in relation to weather patterns.
    • This dataset can also help with machine learning projects related to object detection, as the cloud patterns and layout of the scenes can be seen as objects that algorithms should try to recognize or identify correctly while training

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. Data Source

    License

    License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

    Columns

    File: classification_tags.csv | Column name | Description | |:------------------------------|:------------------------------------------------------------------------| | scene | Unique identifier for each subscene. (String) | | difficulty | Difficulty rating of the subscene. (Integer) | | annotator | Name of the annotator who classified the subscene. (String) | | shadows_marked | Whether shadows were marked in the subscene. (Boolean) | | clear_percent | Percentage of clear sky in the subscene. (Float) | | cloud_percent | Percentage of clouds in the subscene. (Float) | | shadow_percent | Percentage of shadows in the subscene. (Float) | | dataset | Dataset the subscene was taken from. (String) | | forest/jungle | Percentage of forest/jungle in the subscene. (Float) | | snow/ice | Percentage of snow/ice in the subscene. (Float) | | agricultural ...

  18. Deep Fmask Dataset: Labeled dataset for Cloud, Shadow, Clear-Sky Land, Snow...

    • doi.pangaea.de
    zip
    Updated Mar 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kamal Gopikrishnan Nambiar; Veniamin I Morgenshtern; Philipp Hochreuther; Thorsten Seehaus; Matthias Holger Braun (2022). Deep Fmask Dataset: Labeled dataset for Cloud, Shadow, Clear-Sky Land, Snow and Water Segmentation of Sentinel-2 Images over Snow and Ice Covered Regions [Dataset]. http://doi.org/10.1594/PANGAEA.942321
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 14, 2022
    Dataset provided by
    PANGAEA
    Authors
    Kamal Gopikrishnan Nambiar; Veniamin I Morgenshtern; Philipp Hochreuther; Thorsten Seehaus; Matthias Holger Braun
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We present our dataset containing images with labeled polygons, annotated over Sentinel-2 L1C imagery from snow and ice-covered regions. We use similar labels as the Fmask cloud detection algorithm, i.e., clear-sky land, cloud, shadow, snow, and water. We annotated the labels manually using the QGIS software. The dataset consists of 45 scenes divided into validation (22 scenes) and test datasets (23 scenes). The source images were captured by the satellite between October 2019 and December 2020. We provide the list of '.SAFE' filenames containing the satellite imagery and these files can be downloaded from the Copernicus Open Access Hub. The dataset can be used to test and benchmark deep neural networks for the task of cloud, shadow, and snow segmentation.

  19. Sarnet Search And Rescue Dataset

    • universe.roboflow.com
    zip
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Public (2022). Sarnet Search And Rescue Dataset [Dataset]. https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Roboflowhttps://roboflow.com/
    Authors
    Roboflow Public
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    SaR Bounding Boxes
    Description

    Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.

    Satellite Imagery for Search And Rescue Dataset - ArXiv

    This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.

    https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">

    The dataset contains the following:

    SetImagesAnnotations
    Train18083048
    Validate490747
    Test254411
    Total25524206

    The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.

    Getting hold of the Data

    Download the data here: sarnet.zip

    Or follow these steps

    # download the dataset
    wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
    
    # extract the files
    unzip sarnet.zip
    

    ***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.

    Getting started

    Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb

    Source Code for Paper

    Source code for the paper is located here: SaRNet_train_test.ipynb

    Cite this dataset

    @misc{thoreau2021sarnet,
       title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery}, 
       author={Michael Thoreau and Frazer Wilson},
       year={2021},
       eprint={2107.12469},
       archivePrefix={arXiv},
       primaryClass={eess.IV}
    }
    

    Acknowledgment

    The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.

  20. TreeSatAI Benchmark Archive for Deep Learning in Forest Applications

    • zenodo.org
    • data.niaid.nih.gov
    bin, pdf, zip
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias (2024). TreeSatAI Benchmark Archive for Deep Learning in Forest Applications [Dataset]. http://doi.org/10.5281/zenodo.6598391
    Explore at:
    pdf, zip, binAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context and Aim

    Deep learning in Earth Observation requires large image archives with highly reliable labels for model training and testing. However, a preferable quality standard for forest applications in Europe has not yet been determined. The TreeSatAI consortium investigated numerous sources for annotated datasets as an alternative to manually labeled training datasets.

    We found the federal forest inventory of Lower Saxony, Germany represents an unseen treasure of annotated samples for training data generation. The respective 20-cm Color-infrared (CIR) imagery, which is used for forestry management through visual interpretation, constitutes an excellent baseline for deep learning tasks such as image segmentation and classification.

    Description

    The data archive is highly suitable for benchmarking as it represents the real-world data situation of many German forest management services. One the one hand, it has a high number of samples which are supported by the high-resolution aerial imagery. On the other hand, this data archive presents challenges, including class label imbalances between the different forest stand types.

    The TreeSatAI Benchmark Archive contains:

    • 50,381 image triplets (aerial, Sentinel-1, Sentinel-2)

    • synchronized time steps and locations

    • all original spectral bands/polarizations from the sensors

    • 20 species classes (single labels)

    • 12 age classes (single labels)

    • 15 genus classes (multi labels)

    • 60 m and 200 m patches

    • fixed split for train (90%) and test (10%) data

    • additional single labels such as English species name, genus, forest stand type, foliage type, land cover

    The geoTIFF and GeoJSON files are readable in any GIS software, such as QGIS. For further information, we refer to the PDF document in the archive and publications in the reference section.

    Version history

    v1.0.0 - First release

    Citation

    Ahlswede et al. (in prep.)

    GitHub

    Full code examples and pre-trained models from the dataset article (Ahlswede et al. 2022) using the TreeSatAI Benchmark Archive are published on the GitHub repositories of the Remote Sensing Image Analysis (RSiM) Group (https://git.tu-berlin.de/rsim/treesat_benchmark). Code examples for the sampling strategy can be made available by Christian Schulz via email request.

    Folder structure

    We refer to the proposed folder structure in the PDF file.

    • Folder “aerial” contains the aerial imagery patches derived from summertime orthophotos of the years 2011 to 2020. Patches are available in 60 x 60 m (304 x 304 pixels). Band order is near-infrared, red, green, and blue. Spatial resolution is 20 cm.

    • Folder “s1” contains the Sentinel-1 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is VV, VH, and VV/VH ratio. Spatial resolution is 10 m.

    • Folder “s2” contains the Sentinel-2 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is B02, B03, B04, B08, B05, B06, B07, B8A, B11, B12, B01, and B09. Spatial resolution is 10 m.

    • The folder “labels” contains a JSON string which was used for multi-labeling of the training patches. Code example of an image sample with respective proportions of 94% for Abies and 6% for Larix is: "Abies_alba_3_834_WEFL_NLF.tif": [["Abies", 0.93771], ["Larix", 0.06229]]

    • The two files “test_filesnames.lst” and “train_filenames.lst” define the filenames used for train (90%) and test (10%) split. We refer to this fixed split for better reproducibility and comparability.

    • The folder “geojson” contains geoJSON files with all the samples chosen for the derivation of training patch generation (point, 60 m bounding box, 200 m bounding box).

    CAUTION: As we could not upload the aerial patches as a single zip file on Zenodo, you need to download the 20 single species files (aerial_60m_…zip) separately. Then, unzip them into a folder named “aerial” with a subfolder named “60m”. This structure is recommended for better reproducibility and comparability to the experimental results of Ahlswede et al. (2022),

    Join the archive

    Model training, benchmarking, algorithm development… many applications are possible! Feel free to add samples from other regions in Europe or even worldwide. Additional remote sensing data from Lidar, UAVs or aerial imagery from different time steps are very welcome. This helps the research community in development of better deep learning and machine learning models for forest applications. You might have questions or want to share code/results/publications using that archive? Feel free to contact the authors.

    Project description

    This work was part of the project TreeSatAI (Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees at Infrastructures, Nature Conservation Sites and Forests). Its overall aim is the development of AI methods for the monitoring of forests and woody features on a local, regional and global scale. Based on freely available geodata from different sources (e.g., remote sensing, administration maps, and social media), prototypes will be developed for the deep learning-based extraction and classification of tree- and tree stand features. These prototypes deal with real cases from the monitoring of managed forests, nature conservation and infrastructures. The development of the resulting services by three enterprises (liveEO, Vision Impulse and LUP Potsdam) will be supported by three research institutes (German Research Center for Artificial Intelligence, TU Remote Sensing Image Analysis Group, TUB Geoinformation in Environmental Planning Lab).

    Publications

    Ahlswede et al. (2022, in prep.): TreeSatAI Dataset Publication

    Ahlswede S., Nimisha, T.M., and Demir, B. (2022, in revision): Embedded Self-Enhancement Maps for Weakly Supervised Tree Species Mapping in Remote Sensing Images. IEEE Trans Geosci Remote Sens

    Schulz et al. (2022, in prep.): Phenoprofiling

    Conference contributions

    S. Ahlswede, N. T. Madam, C. Schulz, B. Kleinschmit and B. Demіr, "Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation Methods", IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

    C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, “Exploring the temporal fingerprints of mid-European forest types from Sentinel-1 RVI and Sentinel-2 NDVI time series”, IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

    C. Schulz, M. Förster, S. Vulova and B. Kleinschmit, “The temporal fingerprints of common European forest types from SAR and optical remote sensing data”, AGU Fall Meeting, New Orleans, USA, 2021.

    B. Kleinschmit, M. Förster, C. Schulz, F. Arias, B. Demir, S. Ahlswede, A. K. Aksoy, T. Ha Minh, J. Hees, C. Gava, P. Helber, B. Bischke, P. Habelitz, A. Frick, R. Klinke, S. Gey, D. Seidel, S. Przywarra, R. Zondag and B. Odermatt, “Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees and Forests”, Living Planet Symposium, Bonn, Germany, 2022.

    C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, (2022, submitted): “Exploring the temporal fingerprints of sixteen mid-European forest types from Sentinel-1 and Sentinel-2 time series”, ForestSAT, Berlin, Germany, 2022.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Siham Tabik (2022). Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6793421

Data from: Dataset of very-high-resolution satellite RGB images to train deep learning models to recognize high-mountain juniper shrubs from Sierra Nevada (Spain)

Related Article
Explore at:
Dataset updated
Jul 6, 2022
Dataset provided by
Siham Tabik
Sergio Puertas
Domingo Alcaraz-Segura
Rohaifa Khaldi
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Area covered
Spain, Sierra Nevada
Description

This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to recognize Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 2000 images (.jpg) of size 512x512 pixels partitioned into two classes: Shrubs and NoShrubs. We also provide partitioning of the data into Train (1800 images), Test (100 images), and Validation (100 images) subsets.

Search
Clear search
Close search
Google apps
Main menu