5 datasets found

TreeSatAI Benchmark Archive for Deep Learning in Forest Applications
zenodo.org
data.niaid.nih.gov
bin, pdf, zip
Updated Jul 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias (2024). TreeSatAI Benchmark Archive for Deep Learning in Forest Applications [Dataset]. http://doi.org/10.5281/zenodo.6598391
Explore at:
pdf, zip, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6598391
Dataset updated
Jul 16, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Context and Aim

Deep learning in Earth Observation requires large image archives with highly reliable labels for model training and testing. However, a preferable quality standard for forest applications in Europe has not yet been determined. The TreeSatAI consortium investigated numerous sources for annotated datasets as an alternative to manually labeled training datasets.

We found the federal forest inventory of Lower Saxony, Germany represents an unseen treasure of annotated samples for training data generation. The respective 20-cm Color-infrared (CIR) imagery, which is used for forestry management through visual interpretation, constitutes an excellent baseline for deep learning tasks such as image segmentation and classification.

Description

The data archive is highly suitable for benchmarking as it represents the real-world data situation of many German forest management services. One the one hand, it has a high number of samples which are supported by the high-resolution aerial imagery. On the other hand, this data archive presents challenges, including class label imbalances between the different forest stand types.

The TreeSatAI Benchmark Archive contains:

50,381 image triplets (aerial, Sentinel-1, Sentinel-2)

synchronized time steps and locations

all original spectral bands/polarizations from the sensors

20 species classes (single labels)

12 age classes (single labels)

15 genus classes (multi labels)

60 m and 200 m patches

fixed split for train (90%) and test (10%) data

additional single labels such as English species name, genus, forest stand type, foliage type, land cover

The geoTIFF and GeoJSON files are readable in any GIS software, such as QGIS. For further information, we refer to the PDF document in the archive and publications in the reference section.

Version history

v1.0.0 - First release

Citation

Ahlswede et al. (in prep.)

GitHub

Full code examples and pre-trained models from the dataset article (Ahlswede et al. 2022) using the TreeSatAI Benchmark Archive are published on the GitHub repositories of the Remote Sensing Image Analysis (RSiM) Group (https://git.tu-berlin.de/rsim/treesat_benchmark). Code examples for the sampling strategy can be made available by Christian Schulz via email request.

Folder structure

We refer to the proposed folder structure in the PDF file.

Folder “aerial” contains the aerial imagery patches derived from summertime orthophotos of the years 2011 to 2020. Patches are available in 60 x 60 m (304 x 304 pixels). Band order is near-infrared, red, green, and blue. Spatial resolution is 20 cm.

Folder “s1” contains the Sentinel-1 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is VV, VH, and VV/VH ratio. Spatial resolution is 10 m.

Folder “s2” contains the Sentinel-2 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is B02, B03, B04, B08, B05, B06, B07, B8A, B11, B12, B01, and B09. Spatial resolution is 10 m.

The folder “labels” contains a JSON string which was used for multi-labeling of the training patches. Code example of an image sample with respective proportions of 94% for Abies and 6% for Larix is: "Abies_alba_3_834_WEFL_NLF.tif": [["Abies", 0.93771], ["Larix", 0.06229]]

The two files “test_filesnames.lst” and “train_filenames.lst” define the filenames used for train (90%) and test (10%) split. We refer to this fixed split for better reproducibility and comparability.

The folder “geojson” contains geoJSON files with all the samples chosen for the derivation of training patch generation (point, 60 m bounding box, 200 m bounding box).

CAUTION: As we could not upload the aerial patches as a single zip file on Zenodo, you need to download the 20 single species files (aerial_60m_…zip) separately. Then, unzip them into a folder named “aerial” with a subfolder named “60m”. This structure is recommended for better reproducibility and comparability to the experimental results of Ahlswede et al. (2022),

Join the archive

Model training, benchmarking, algorithm development… many applications are possible! Feel free to add samples from other regions in Europe or even worldwide. Additional remote sensing data from Lidar, UAVs or aerial imagery from different time steps are very welcome. This helps the research community in development of better deep learning and machine learning models for forest applications. You might have questions or want to share code/results/publications using that archive? Feel free to contact the authors.

Project description

This work was part of the project TreeSatAI (Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees at Infrastructures, Nature Conservation Sites and Forests). Its overall aim is the development of AI methods for the monitoring of forests and woody features on a local, regional and global scale. Based on freely available geodata from different sources (e.g., remote sensing, administration maps, and social media), prototypes will be developed for the deep learning-based extraction and classification of tree- and tree stand features. These prototypes deal with real cases from the monitoring of managed forests, nature conservation and infrastructures. The development of the resulting services by three enterprises (liveEO, Vision Impulse and LUP Potsdam) will be supported by three research institutes (German Research Center for Artificial Intelligence, TU Remote Sensing Image Analysis Group, TUB Geoinformation in Environmental Planning Lab).

Publications

Ahlswede et al. (2022, in prep.): TreeSatAI Dataset Publication

Ahlswede S., Nimisha, T.M., and Demir, B. (2022, in revision): Embedded Self-Enhancement Maps for Weakly Supervised Tree Species Mapping in Remote Sensing Images. IEEE Trans Geosci Remote Sens

Schulz et al. (2022, in prep.): Phenoprofiling

Conference contributions

S. Ahlswede, N. T. Madam, C. Schulz, B. Kleinschmit and B. Demіr, "Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation Methods", IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, “Exploring the temporal fingerprints of mid-European forest types from Sentinel-1 RVI and Sentinel-2 NDVI time series”, IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

C. Schulz, M. Förster, S. Vulova and B. Kleinschmit, “The temporal fingerprints of common European forest types from SAR and optical remote sensing data”, AGU Fall Meeting, New Orleans, USA, 2021.

B. Kleinschmit, M. Förster, C. Schulz, F. Arias, B. Demir, S. Ahlswede, A. K. Aksoy, T. Ha Minh, J. Hees, C. Gava, P. Helber, B. Bischke, P. Habelitz, A. Frick, R. Klinke, S. Gey, D. Seidel, S. Przywarra, R. Zondag and B. Odermatt, “Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees and Forests”, Living Planet Symposium, Bonn, Germany, 2022.

C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, (2022, submitted): “Exploring the temporal fingerprints of sixteen mid-European forest types from Sentinel-1 and Sentinel-2 time series”, ForestSAT, Berlin, Germany, 2022.
Bank Account Fraud Dataset Suite (NeurIPS 2022)
kaggle.com
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sérgio Jesus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

Learn more about Feedzai Research here: https://research.feedzai.com/

Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }
Artificial Intelligence In Biotechnology Market Analysis North America,...
technavio.com
pdf
Updated Dec 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Artificial Intelligence In Biotechnology Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, Germany, UK, Switzerland, The Netherlands, Japan, South Korea, India, Brazil - Size and Forecast 2025-2029 [Dataset]. https://www.technavio.com/report/artificial-intelligence-in-biotechnology-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Dec 25, 2024
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2025 - 2029
Area covered
United States
Description
Snapshot img

What is the Artificial Intelligence In Biotechnology Market Size?

The artificial intelligence in biotechnology market size is forecast to increase by USD 4.46 billion, at a CAGR of 19% between 2024 and 2029. Artificial Intelligence (AI) is revolutionizing the biotechnology industry by enhancing research and development processes, enabling accurate diagnoses, and improving productivity. Key growth factors fueling the market include substantial investments in biotechnology advancements and strategic collaborations between industry players and tech companies. However, the high initial cost of implementing AI solutions remains a challenge for smaller organizations. The market is expected to witness significant growth due to the increasing adoption of AI in areas such as drug discovery, genetic research, and agricultural technology. Furthermore, advancements in machine learning algorithms and natural language processing are enabling more precise and efficient data analysis, leading to new discoveries and innovations. Overall, the integration of AI in biotechnology is transforming the industry and offering numerous opportunities for growth.

What will be the size of the Market during the forecast period?

Request Free Artificial Intelligence In Biotechnology Market Sample

Market Segmentation

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2025-2029, as well as historical data from 2019 - 2023 for the following segments.

Application Drug discovery and development Clinical trials and optimization Medical imaging Diagnostics Others End-user Pharmaceutical companies Biotechnology companies Contract research organization (CRO) Healthcare providers Others Geography North America US Europe Germany UK APAC China India Japan South Korea South America Brazil Middle East and Africa

Which is the largest segment driving market growth?

The drug discovery and development segment is estimated to witness significant growth during the forecast period. The market is experiencing significant growth, particularly in drug discovery and development. AI technologies are transforming the drug discovery process by increasing accuracy and efficiency in identifying potential drug candidates.

Get a glance at the market share of various regions. Download the PDF Sample

The drug discovery and development segment was valued at USD 522.60 million in 2019. AI applications in biotechnology extend beyond drug discovery, including compound screening, personalized medicine, and environmental factors analysis. This growth is driven by the increasing demand for personalized treatments, the need for faster drug development, and the potential for AI to revolutionize various applications in biotech and pharmaceuticals.

Which region is leading the market?

For more insights on the market share of various regions, Request Free Sample

North America is estimated to contribute 40% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period. The North American region leads the global artificial intelligence (AI) market in biotechnology due to substantial investments, strategic collaborations, and technological advancements. With a strong infrastructure and a strong focus on innovation, the region is at the forefront of adopting and developing AI-driven biotechnological solutions. For example, in March 2023, Predictive Oncology partnered with Integra Therapeutics to enhance gene editing capabilities for cancer therapies. This collaboration leverages Predictive Oncology's expertise in protein expression to advance gene editing techniques, aiming to develop more effective cancer treatments. The partnership in Minnesota highlights the region's commitment to pioneering cancer research and therapeutic development through AI technology. In the life sciences sector, AI is utilized to analyze large datasets of genetic information, improve treatment outcomes, and increase productivity. Key players in the market include leading research institutions and biotechnology companies.

How do company ranking index and market positioning come to your aid?

Companies are implementing various strategies, such as strategic alliances, partnerships, mergers and acquisitions, geographical expansion, and product/service launches, to enhance their presence in the market.

Abbott Laboratories - The company offers artificial intelligence in biotechnology solutions that include AI-driven medical imaging and predictive analytics for identifying individuals at risk of heart attacks.

Technavio provides the ranking index for the top 20 companies along with insights on the market
VasTexture: Vast repository of textures and PBR Materials extracted from...
zenodo.org
jpeg, zip
Updated Apr 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2025). VasTexture: Vast repository of textures and PBR Materials extracted from images using unsupervised approach [Dataset]. http://doi.org/10.5281/zenodo.11391127
Explore at:
zip, jpegAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11391127
Dataset updated
Apr 26, 2025
Dataset provided by
Zenodohttp://zenodo.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
VasTexture: Vast repository of textures and SVBRDF/PBR Materials extracted from images using an unsupervised approach.

This is an old version For latest version: https://zenodo.org/records/12629301

This dataset contains hundreds of thousands (hopefully millions soon) of textures and PBR/SV-BRDF materials extracted from real-world natural images.

The repository is composed of RGB images of textures given as RGB images (each image is one uniform texture) and folders of PBR/SVBRDF materials given as a set of property maps (base color, roughness, metallic, etc).

Visualisation of sampled PBRs and Textures can be seen in: PBR_examples.jpg and Textures_Examples.jpg

Link to the main project page

Link to paper

File structure

Texture images are given in the Extracted_textures_*.zip files.

Each image in this zip file is a single texture, the textures were extracted and cropped from the open images dataset.

PBR Materials are available in PBR_*.zip files these PBRs were generated from the texture images in an unsupervised way (with no human intervention). Each subfolder in this file contains the properties map of the PBRs (roughness, metallic, etc, suitable for blender/unreal engine). Visualization of the rendered material appears in the file Material_View.jpg in each PBR folder.

PBR materials that were generated by mixing other PBR materials are available in files with the names PBR_mix*.zip

Samples for each case can be found in files named: Sample_*.zip

Documented code used to extract the textures and generate the PBRs is available at:

Texture_And_Material_ExtractionCode_And_Documentation.zip

Details:

The materials and textures were extracted from real-world images using an unsupervised extraction method (code supplied). As such they are far more diverse and wide in scope compared to existing repositories, at the same time they are much more noisy and contain more outliers compared to existing repositories. This repository is probably more useful for things that demand large-scale and very diverse data, yet can use noisy and lower quality compared to professional repositories with manually made assets like ambientCG. It can be very useful for creating machine learning datasets, or large-scale procedural generation. It is less suitable for areas that demand precise clean and categorized PBR like CGI art and graphic design. For preview It is recommended to look at PBR_examples.jpg and Textures_Examples.jpg or download the Sample files and look at the Material_View.jpg files to visualize the quality of the materials.

Scale:

Currently, there are a few hundred of thousands PBR materials and textures but the goal is to make this into over a million in the near future.

Data generation code:

The Python scripts used to extract these assets are supplied at:

Texture_And_Material_ExtractionCode_And_Documentation.zip

The code could be run in any folder of random images extract regions with uniform textures and turn these into PBR materials.

Alternative download sources:

Alternative download sources:

https://sites.google.com/view/infinitexture/home

https://e.pcloud.link/publink/show?code=kZON5TZtxLfdvKrVCzn12NADBFRNuCKHm70

https://icedrive.net/s/jfY1xSDNkVwtYDYD4FN5wha2A8Pz

Paper

This work was done as part of the paper "Learning Zero-Shot Material States Segmentation,

by Implanting Natural Image Patterns in Synthetic Data".

@article{eppel2024learning,

title={Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data},

author={Eppel, Sagi and Li, Jolina and Drehwald, Manuel and Aspuru-Guzik, Alan},

journal={arXiv preprint arXiv:2403.03309},

year={2024}

}

License:

All the code and repositories are available on CC0 (free to use) licenses.

Textures were extracted from the open images dataset which is an Apache license.
Ai Edge Computing Market Analysis North America, Europe, APAC, South...
technavio.com
pdf
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Technavio (2024). Ai Edge Computing Market Analysis North America, Europe, APAC, South America, Middle East and Africa - US, China, Canada, UK, Germany, France, Brazil, Japan, Saudi Arabia, India - Size and Forecast 2024-2028 [Dataset]. https://www.technavio.com/report/ai-edge-computing-market-industry-analysis
Explore at:
pdfAvailable download formats
Dataset updated
Nov 21, 2024
Dataset provided by
TechNavio
Authors
Technavio
Time period covered
2024 - 2028
Area covered
China, Saudi Arabia, Brazil, Germany, United States, Canada, France, United Kingdom, Japan
Description
Snapshot img

AI Edge Computing Market Size and Trends

The AI edge computing market size is forecast to increase by USD 69.72 billion at a CAGR of 38.6% between 2023 and 2028. The market is experiencing significant growth due to the increasing deployment of IoT sensors and programmable application-specific integrated circuits (ASICs). Edge AI technology is being adopted in various industries, including smart homes, autonomous vehicles, and manufacturing, for real-time application of computer vision, object detection, and quality inspection. Edge AI enables data processing at the source, reducing latency and bandwidth requirements. However, security concerns related to edge AI devices are a challenge, necessitating strong encryption and access control mechanisms. Additionally, the integration of edge AI in commercial drones and remote through augmented reality is a developing trend. Overall, the edge AI market is poised for expansion, driven by the need for real-time analytics and the proliferation of connected devices.

Request Free Sample

AI edge computing refers to the practice of processing artificial intelligence (AI) algorithms at the edge of a network, closer to where data is generated, rather than relying on cloud computing for processing. This approach offers several advantages, including reduced latency, increased data security, and improved energy efficiency. The market in North America is witnessing significant growth due to the increasing adoption of connected devices and the need for real-time data processing. Data security is a major concern for organizations, and AI edge computing addresses this issue by keeping sensitive data local and encrypted. Network connectivity is also essential for edge computing, and the rollout of 5G technology is expected to accelerate the adoption of AI edge computing solutions. Image recognition and computer vision are two key applications of AI edge computing. These technologies are used in various industries, including retail and manufacturing, for tasks such as quality control and inventory management.

Market Segmentation

The market research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD billion' for the period 2024-2028, as well as historical data from 2018 - 2022 for the following segments.

Type Hardware Software and services Geography North America Canada US Europe Germany UK APAC China South America Middle East and Africa

By Type Insights

The hardware segment is estimated to witness significant growth during the forecast period. AI edge computing refers to the implementation of artificial intelligence and machine learning algorithms to process data from IoT sensors and other hardware devices at the local level, enabling real-time decision-making. The hardware component of the global market comprises processors and devices that necessitate cognitive computing.

Get a glance at the market share of various segments Download the PDF Sample

The hardware segment was the largest segment and was valued at USD 3.87 billion in 2018. The physical edge AI computing components consist of processors and sensors, while the devices encompass smartphones, laptops, smart speakers, drones, and surveillance cameras. Various processors, such as central processing units (CPUs), graphics processing units (GPUs), field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs), are utilized in edge AI devices. The expansion of interconnected systems and devices, including smartphones, laptops, and smart speakers, has fueled the growth of this market segment during the forecast period.

Regional Analysis

For more insights on the market share of various regions Download PDF Sample now!

North America is estimated to contribute 79% to the growth of the global market during the forecast period. Technavio's analysts have elaborately explained the regional trends and drivers that shape the market during the forecast period. In the North American market, AI edge computing is experiencing significant growth due to various factors. One key driver is the increasing use of edge AI devices, such as surveillance cameras and IoT sensors. The adoption of these technologies is particularly high in the US, where government initiatives mandate their installation in public places. Additionally, the region's advanced IT and telecom infrastructure, including high-speed networks, supports the efficient implementation of AI algorithms in industries like automotive, robotics, and healthcare. The automotive sector, in particular, is witnessing an increase in AI edge computing applications, with the development of electroceuticals and autonomous vehicles requiring real-time decision-making capabilities. In healthcare, AI edge computing enables the processing of large health data sets at
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Christian Schulz; Christian Schulz; Steve Ahlswede; Steve Ahlswede; Christiano Gava; Patrick Helber; Patrick Helber; Benjamin Bischke; Benjamin Bischke; Florencia Arias; Michael Förster; Michael Förster; Jörn Hees; Jörn Hees; Begüm Demir; Begüm Demir; Birgit Kleinschmit; Birgit Kleinschmit; Christiano Gava; Florencia Arias (2024). TreeSatAI Benchmark Archive for Deep Learning in Forest Applications [Dataset]. http://doi.org/10.5281/zenodo.6598391

TreeSatAI Benchmark Archive for Deep Learning in Forest Applications

Explore at:

pdf, zip, binAvailable download formats

Unique identifier

https://doi.org/10.5281/zenodo.6598391

Dataset updated

Jul 16, 2024

Dataset provided by

Zenodohttp://zenodo.org/

Authors

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Context and Aim

Deep learning in Earth Observation requires large image archives with highly reliable labels for model training and testing. However, a preferable quality standard for forest applications in Europe has not yet been determined. The TreeSatAI consortium investigated numerous sources for annotated datasets as an alternative to manually labeled training datasets.

We found the federal forest inventory of Lower Saxony, Germany represents an unseen treasure of annotated samples for training data generation. The respective 20-cm Color-infrared (CIR) imagery, which is used for forestry management through visual interpretation, constitutes an excellent baseline for deep learning tasks such as image segmentation and classification.

Description

The data archive is highly suitable for benchmarking as it represents the real-world data situation of many German forest management services. One the one hand, it has a high number of samples which are supported by the high-resolution aerial imagery. On the other hand, this data archive presents challenges, including class label imbalances between the different forest stand types.

The TreeSatAI Benchmark Archive contains:

50,381 image triplets (aerial, Sentinel-1, Sentinel-2)
synchronized time steps and locations
all original spectral bands/polarizations from the sensors
20 species classes (single labels)
12 age classes (single labels)
15 genus classes (multi labels)
60 m and 200 m patches
fixed split for train (90%) and test (10%) data
additional single labels such as English species name, genus, forest stand type, foliage type, land cover

The geoTIFF and GeoJSON files are readable in any GIS software, such as QGIS. For further information, we refer to the PDF document in the archive and publications in the reference section.

Version history

v1.0.0 - First release

Citation

Ahlswede et al. (in prep.)

GitHub

Full code examples and pre-trained models from the dataset article (Ahlswede et al. 2022) using the TreeSatAI Benchmark Archive are published on the GitHub repositories of the Remote Sensing Image Analysis (RSiM) Group (https://git.tu-berlin.de/rsim/treesat_benchmark). Code examples for the sampling strategy can be made available by Christian Schulz via email request.

Folder structure

We refer to the proposed folder structure in the PDF file.

Folder “aerial” contains the aerial imagery patches derived from summertime orthophotos of the years 2011 to 2020. Patches are available in 60 x 60 m (304 x 304 pixels). Band order is near-infrared, red, green, and blue. Spatial resolution is 20 cm.
Folder “s1” contains the Sentinel-1 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is VV, VH, and VV/VH ratio. Spatial resolution is 10 m.
Folder “s2” contains the Sentinel-2 imagery patches derived from summertime mosaics of the years 2015 to 2020. Patches are available in 60 x 60 m (6 x 6 pixels) and 200 x 200 m (20 x 20 pixels). Band order is B02, B03, B04, B08, B05, B06, B07, B8A, B11, B12, B01, and B09. Spatial resolution is 10 m.
The folder “labels” contains a JSON string which was used for multi-labeling of the training patches. Code example of an image sample with respective proportions of 94% for Abies and 6% for Larix is: "Abies_alba_3_834_WEFL_NLF.tif": [["Abies", 0.93771], ["Larix", 0.06229]]
The two files “test_filesnames.lst” and “train_filenames.lst” define the filenames used for train (90%) and test (10%) split. We refer to this fixed split for better reproducibility and comparability.
The folder “geojson” contains geoJSON files with all the samples chosen for the derivation of training patch generation (point, 60 m bounding box, 200 m bounding box).

CAUTION: As we could not upload the aerial patches as a single zip file on Zenodo, you need to download the 20 single species files (aerial_60m_…zip) separately. Then, unzip them into a folder named “aerial” with a subfolder named “60m”. This structure is recommended for better reproducibility and comparability to the experimental results of Ahlswede et al. (2022),

Join the archive

Model training, benchmarking, algorithm development… many applications are possible! Feel free to add samples from other regions in Europe or even worldwide. Additional remote sensing data from Lidar, UAVs or aerial imagery from different time steps are very welcome. This helps the research community in development of better deep learning and machine learning models for forest applications. You might have questions or want to share code/results/publications using that archive? Feel free to contact the authors.

Project description

This work was part of the project TreeSatAI (Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees at Infrastructures, Nature Conservation Sites and Forests). Its overall aim is the development of AI methods for the monitoring of forests and woody features on a local, regional and global scale. Based on freely available geodata from different sources (e.g., remote sensing, administration maps, and social media), prototypes will be developed for the deep learning-based extraction and classification of tree- and tree stand features. These prototypes deal with real cases from the monitoring of managed forests, nature conservation and infrastructures. The development of the resulting services by three enterprises (liveEO, Vision Impulse and LUP Potsdam) will be supported by three research institutes (German Research Center for Artificial Intelligence, TU Remote Sensing Image Analysis Group, TUB Geoinformation in Environmental Planning Lab).

Publications

Ahlswede et al. (2022, in prep.): TreeSatAI Dataset Publication

Ahlswede S., Nimisha, T.M., and Demir, B. (2022, in revision): Embedded Self-Enhancement Maps for Weakly Supervised Tree Species Mapping in Remote Sensing Images. IEEE Trans Geosci Remote Sens

Schulz et al. (2022, in prep.): Phenoprofiling

Conference contributions

S. Ahlswede, N. T. Madam, C. Schulz, B. Kleinschmit and B. Demіr, "Weakly Supervised Semantic Segmentation of Remote Sensing Images for Tree Species Classification Based on Explanation Methods", IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, “Exploring the temporal fingerprints of mid-European forest types from Sentinel-1 RVI and Sentinel-2 NDVI time series”, IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 2022.

C. Schulz, M. Förster, S. Vulova and B. Kleinschmit, “The temporal fingerprints of common European forest types from SAR and optical remote sensing data”, AGU Fall Meeting, New Orleans, USA, 2021.

B. Kleinschmit, M. Förster, C. Schulz, F. Arias, B. Demir, S. Ahlswede, A. K. Aksoy, T. Ha Minh, J. Hees, C. Gava, P. Helber, B. Bischke, P. Habelitz, A. Frick, R. Klinke, S. Gey, D. Seidel, S. Przywarra, R. Zondag and B. Odermatt, “Artificial Intelligence with Satellite data and Multi-Source Geodata for Monitoring of Trees and Forests”, Living Planet Symposium, Bonn, Germany, 2022.

C. Schulz, M. Förster, S. Vulova, T. Gränzig and B. Kleinschmit, (2022, submitted): “Exploring the temporal fingerprints of sixteen mid-European forest types from Sentinel-1 and Sentinel-2 time series”, ForestSAT, Berlin, Germany, 2022.

Clear search

Close search

Google apps

Main menu

TreeSatAI Benchmark Archive for Deep Learning in Forest Applications

Bank Account Fraud Dataset Suite (NeurIPS 2022)

Artificial Intelligence In Biotechnology Market Analysis North America,...

Snapshot img

VasTexture: Vast repository of textures and PBR Materials extracted from...

VasTexture: Vast repository of textures and SVBRDF/PBR Materials extracted from images using an unsupervised approach.

This is an old version For latest version: https://zenodo.org/records/12629301

File structure

Details:

Scale:

Data generation code:

Alternative download sources:

Paper

License:

Ai Edge Computing Market Analysis North America, Europe, APAC, South...

Snapshot img

TreeSatAI Benchmark Archive for Deep Learning in Forest Applications