Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Detecting Landscape Objects on Satellite Images with Artificial Intelligence In recent years, there has been a significant increase in the use of artificial intelligence (AI) for image recognition and object detection. This technology has proven to be useful in a wide range of applications, from self-driving cars to facial recognition systems. In this project, the focus lies on using AI to detect landscape objects in satellite images (aerial photography angle) with the goal to create an annotated map of The Netherlands with all the coordinates of the given landscape objects.
Background Information
Problem Statement One of the things that Naturalis does is conducting research into the distribution of wild bees (Naturalis, n.d.). For their research they use a model that predicts whether or not a certain species can occur at a given location. Representing the real world in a digital form, there is at the moment not yet a way to generate an inventory of landscape features such as presence of trees, ponds and hedges, with their precise location on the digital map. The current models rely on species observation data and climate variables, but it is expected that adding detailed physical landscape information could increase the prediction accuracy. Common maps do not contain this level of detail, but high-resolution satellite images do.
Possible opportunities Based on the problem statement, there is at the moment at Naturalis not a map that does contain the level of detail where detection of landscape elements could be made, according to their wishes. The idea emerged that it should be possible to use satellite images to find the locations of small landscape elements and produce an annotated map. Therefore, by refining the accuracy of the current prediction model, researchers can gain a profound understanding of wild bees in the Netherlands with the goal to take effective measurements to protect wild bees and their living environment.
Goal of project The goal of the project is to develop an artificial intelligence model for landscape detection on satellite images to create an annotated map of The Netherlands that would therefore increase the accuracy prediction of the current model that is used at Naturalis. The project aims to address the problem of a lack of detailed maps of landscapes that could revolutionize the way Naturalis conduct their research on wild bees. Therefore, the ultimate aim of the project in the long term is to utilize the comprehensive knowledge to protect both the wild bees population and their natural habitats in the Netherlands.
Data Collection Google Earth One of the main challenges of this project was the difficulty in obtaining a qualified dataset (with or without data annotation). Obtaining high-quality satellite images for the project presents challenges in terms of cost and time. The costs in obtaining high-quality satellite images of the Netherlands is 1,038,575 $ in total (for further details and information of the costs of satellite images. On top of that, the acquisition process for such images involves various steps, from the initial request to the actual delivery of the images, numerous protocols and processes need to be followed.
After conducting further research, the best possible solution was to use Google Earth as the primary source of data. While Google Earth is not allowed to be used for commercial or promotional purposes, this project is for research purposes only for Naturalis on their research of wild bees, hence the regulation does not apply in this case.
https://captain-whu.github.io/DOTA/dataset.htmlhttps://captain-whu.github.io/DOTA/dataset.html
In the past decade, significant progress in object detection has been made in natural images, but authors of the DOTA v2.0: Dataset of Object deTection in Aerial images note that this progress hasn't extended to aerial images. The main reason for this discrepancy is the substantial variations in object scale and orientation caused by the bird's-eye view of aerial images. One major obstacle to the development of object detection in aerial images (ODAI) is the lack of large-scale benchmark datasets. The DOTA dataset contains 1,793,658 object instances spanning 18 different categories, all annotated with oriented bounding box annotations (OBB). These annotations were collected from a total of 11,268 aerial images. Using this extensive and meticulously annotated dataset, the authors establish baselines covering ten state-of-the-art algorithms, each with over 70 different configurations. These configurations are evaluated for both speed and accuracy performance.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SynthAer is a dataset consisting of synthetic aerial images with pixel-level semantic annotations from a suburban scene generated using the 3D modelling tool Blender. SynthAer contains three time-of-day variations for each image - one for lighting conditions at dawn, one for midday, and one for dusk.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The FlameVision dataset is a comprehensive aerial image dataset designed specifically for detecting and classifying wildfires. It consists of a total of 8600 high-resolution images, with 5000 images depicting fire and the remaining 3600 images depicting non-fire scenes. The images are provided in PNG format for classification tasks and JPG format for detection tasks. The dataset is organized into two primary folders, one for detection and the other for classification, with further subdivisions into train, validation, and test sets for each folder. To facilitate accurate object detection, the dataset also includes 4500 image annotation files. These annotation files contain manual annotations in XML format, which specify the exact positions of objects and their corresponding labels within the images. The annotations were performed using Roboflow, ensuring high quality and consistency across the dataset. One of the notable features of the FlameVision dataset is its compatibility with various convolutional neural network (CNN) architectures, including EfficientNet, DenseNet, VGG-16, ResNet50, YOLO, and R-CNN. This makes it a versatile and valuable resource for researchers and practitioners in the field of wildfire detection and classification, enabling the development and evaluation of sophisticated ML models.
DOTA is a large-scale dataset for object detection in aerial images. It can be used to develop and evaluate object detectors in aerial images. The images are collected from different sensors and platforms. Each image is of the size in the range from 800 × 800 to 20,000 × 20,000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. The instances in DOTA images are annotated by experts in aerial image interpretation by arbitrary (8 d.o.f.) quadrilateral. We will continue to update DOTA, to grow in size and scope to reflect evolving real-world conditions. Now it has three versions:
DOTA-v1.0 contains 15 common categories, 2,806 images and 188, 282 instances. The proportions of the training set, validation set, and testing set in DOTA-v1.0 are 1/2, 1/6, and 1/3, respectively.
DOTA-v1.5 uses the same images as DOTA-v1.0, but the extremely small instances (less than 10 pixels) are also annotated. Moreover, a new category, ”container crane” is added. It contains 403,318 instances in total. The number of images and dataset splits are the same as DOTA-v1.0. This version was released for the DOAI Challenge 2019 on Object Detection in Aerial Images in conjunction with IEEE CVPR 2019.
DOTA-v2.0 collects more Google Earth, GF-2 Satellite, and aerial images. There are 18 common categories, 11,268 images and 1,793,658 instances in DOTA-v2.0. Compared to DOTA-v1.5, it further adds the new categories of ”airport” and ”helipad”. The 11,268 images of DOTA are split into training, validation, test-dev, and test-challenge sets. To avoid the problem of overfitting, the proportion of training and validation set is smaller than the test set. Furthermore, we have two test sets, namely test-dev and test-challenge. Training contains 1,830 images and 268,627 instances. Validation contains 593 images and 81,048 instances. We released the images and ground truths for training and validation sets. Test-dev contains 2,792 images and 353,346 instances. We released the images but not the ground truths. Test-challenge contains 6,053 images and 1,090,637 instances.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aerial Multi-Vehicle Detection Dataset: Efficient road traffic monitoring is playing a fundamental role in successfully resolving traffic congestion in cities. Unmanned Aerial Vehicles (UAVs) or drones equipped with cameras are an attractive proposition to provide flexible and infrastructure-free traffic monitoring. Due to the affordability of such drones, computer vision solutions for traffic monitoring have been widely used. Therefore, this dataset provide images that can be used for either training or evaluating Traffic Monitoring applications. More specifically, it can be used for training an aerial vehicle detection algorithm, benchmark an already trained vehicle detection algorithm, enhance an existing dataset and aid in traffic monitoring and analysis of road segments.
The dataset construction involved manually collecting all aerial images of vehicles using UAV drones and manually annotated into three classes 'Car', 'Bus', and ''Truck'.The aerial images were collected through manual flights in road segments in Nicosia or Limassol, Cyprus, during busy hours. The images are in High Quality, Full HD (1080p) to 4k (2160p) but are usually resized before training. All images were manually annotated and inspected afterward with the vehicles that indicate 'Car' for small to medium sized vehicles, 'Bus' for busses, and 'Truck' for large sized vehicles and trucks. All annotations were converted into VOC and COCO formats for training in numerous frameworks. The data collection took part in different periods, covering busy road segments in the cities of Nicosia and Limassol in Cyprus. The altitude of the flights varies between 150 to 250 meters high, with a top view perspective. Some of the images found in this dataset are taken from Harpy Data dataset [1]
The dataset includes a total of 9048 images of which 904 are split for validation, 905 for testing, and the rest 7239 for training.
Subset
Images
Car
Bus
Truck
Training
7239
200301
1601
6247
Validation
904
23397
193
727
Testing
905
24715
208
770
It is advised to further enhance the dataset so that random augmentations are probabilistically applied to each image prior to adding it to the batch for training. Specifically, there are a number of possible transformations such as geometric (rotations, translations, horizontal axis mirroring, cropping, and zooming), as well as image manipulations (illumination changes, color shifting, blurring, sharpening, and shadowing).
[1] Makrigiorgis, R., 2021. Harpy Data Dataset. [online] Kios.ucy.ac.cy. Available at: [Accessed 22 September 2022].
NOTE If you use this dataset in your research/publication please cite us using the following :
Rafael Makrigiorgis, Panayiotis Kolios, & Christos Kyrkou. (2022). Aerial Multi-Vehicle Detection Dataset (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7053442
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Aerial Vessels Detection Dataset: The dataset construction involved manually collecting all aerial images of vessels using UAV drones and manually annotated into three classes 'Person', 'Ship', and ''Boat'. The aerial images were collected through manual flights above Cyprus Coasts in Limassol, Famagusta and Larnaca areas. The main purpose of this dataset is to be used for marine monitoring. Capturing footage over large areas and localizing any unwanted vessels entering an area of interest, can aid in localizing refugees that illegally enter a country or manage marine traffic for commercial use.
The images are collected in 720p and Full HD (1080p) but are usually resized before training.
All images were manually annotated and inspected afterward with the vessels that indicate 'Person' for people detection, 'Boat' for small to medium-sized boats, and 'Ship' for large ships or commercial ships. All annotations were converted into VOC and COCO formats and initially labeled in YOLO, for training in numerous frameworks. The data collection took part in different periods.
The dataset includes a total of 10252 images of which 1024 are split for validation, 1025 for testing, and the rest 8203 for training.
Subset
Images
Person
Boat
Ship
Training
8203
219
48550
920
Validation
1024
7
5890
143
Testing
1025
13
5247
109
It is advised to further enhance the dataset so that random augmentations are probabilistically applied to each image prior to adding it to the batch for training. Specifically, there are a number of possible transformations such as geometric (rotations, translations, horizontal axis mirroring, cropping, and zooming), as well as image manipulations (illumination changes, color shifting, blurring, sharpening, and shadowing).
NOTE If you use this dataset in your research/publication please cite us using the following :
Rafael Makrigiorgis, Panayiotis Kolios, & Christos Kyrkou. (2022). Aerial Vessels Detection Dataset (1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7076145
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Small Object Aerial Person Detection Dataset:
The aerial dataset publication comprises a collection of frames captured from unmanned aerial vehicles (UAVs) during flights over the University of Cyprus campus and Civil Defense exercises. The dataset is primarily intended for people detection, with a focus on detecting small objects due to the top-view perspective of the images. The dataset includes annotations generated in popular formats such as YOLO, COCO, and VOC, making it highly versatile and accessible for a wide range of applications. Overall, this aerial dataset publication represents a valuable resource for researchers and practitioners working in the field of computer vision and machine learning, particularly those focused on people detection and related applications.
Subset
Images
People
Training
2092
40687
Validation
523
10589
Testing
521
10432
It is advised to further enhance the dataset so that random augmentations are probabilistically applied to each image prior to adding it to the batch for training. Specifically, there are a number of possible transformations such as geometric (rotations, translations, horizontal axis mirroring, cropping, and zooming), as well as image manipulations (illumination changes, color shifting, blurring, sharpening, and shadowing).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary
Photovoltaic (PV) energy generation plays a crucial role in the energy transition. Small-scale PV installations are deployed at an unprecedented pace, and their integration into the grid can be challenging since stakeholders often lack quality data about these installations. Overhead imagery is increasingly used to improve the knowledge of distributed PV installations with machine learning models capable of automatically mapping these installations. However, these models cannot be easily transferred from one region or data source to another due to differences in image acquisition. To address this issue known as domain shift and foster the development of PV array mapping pipelines, we propose a dataset containing aerial images, annotations, and segmentation masks. We provide installation metadata for more than 28,000 installations. We provide ground truth segmentation masks for 13,000 installations, including 7,000 with annotations for two different image providers. Finally, we provide ground truth annotations and associated installation metadata for more than 8,000 installations. Dataset applications include end-to-end PV registry construction, robust PV installations mapping, and analysis of crowdsourced datasets.
This dataset contains the complete records associated with the article "A crowdsourced dataset of aerial images of solar panels, their segmentation masks, and characteristics", currently under review. The preprint is accessible at this link: https://arxiv.org/abs/2209.03726. These complete records consist of:
The complete training dataset containing RGB overhead imagery, segmentation masks and metadata of PV installations (folder bdappv),
The raw crowdsourcing data, and the postprocessed data for replication and validation (folder data).
Data records
Folders are organized as follows:
bdappv/ Root data folder
google / ign: One folder for each campaign
img/: Folder containing all the images presented to the users. This folder contains 28807 images for Google and 17325 images for IGN.
mask/: Folder containing all segmentations masks generated from the polygon annotations of the users. This folder contains 13303 masks for Google and 7686 masks for IGN.
metadata.csv The .csv file with the installations' metadata.
data/ Root data folder
raw/ Folder containing the raw crowdsourcing data and raw metadata;
input-google.json: .json input data data containing all information on images and raw annotators’ contributions for both phases (clicks and polygons) during the first annotation campaign;
input-ign.json: .json input data containing all information on images and raw annotators’ contributions for both phases (clicks and polygons) during the second annotation campaign;
raw-metadata.json: .json output containing the PV systems’ metadata extracted from the BDPV database before filtering. It can be used to replicate the association between the installations and the segmentation masks, as done in the notebook metadata.
replication/ Folder containing the compiled data used to generate the segmentation masks;
campaign-google/campaign-ign: One folder for each campaign
click-analysis.json: .json output on the click analysis, compiling raw input into a few best-guess locations for the PV arrays. This dataset enables the replication of our annotations,
polygon-analysis.json: .json output of polygon analysis, compiling raw input into a best-guess polygon for the PV arrays.
validation/ Folder containing the compiled data used for technical validation.
campaign-google/campaign-ign: One folder for each campaign
click-analysis-thres=1.0.json: .json output of the click analysis with a lowered threshold to analyze the effect of the threshold on image classification, as done in the notebook annotation;
polygon-analysis-thres=1.0.json: .json output of polygon analysis, with a lowered threshold to analyze the effect of the threshold on polygon annotation, as done in the notebook annotations.
metadata.csv: the .csv file of filtered installations' metadata.
License
We extracted the thumbnails contained in the google/img/ folder using Google Earth Engine API and we generated the thumbnails contained in the ign/img/ folder from high resolution tiles downloaded from the online IGN portal accessible here: https://geoservices.ign.fr/bdortho. Images provided by Google are subjet to Google's terms and conditions. Images provided by the IGN are subject to an open license 2.0.
Access the terms and conditions of Google images at this URL: https://www.google.com/intl/en/help/legalnotices_maps/
Access the terms and conditions of IGN images at this URL: https://www.etalab.gouv.fr/wp-content/uploads/2018/11/open-licence.pdf
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Aerial Images Traffic is a dataset for object detection tasks - it contains Traffic Analysis 03bi annotations for 705 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This is an object detection dataset for small object detection from aerial images. All the annotations have been pre-processed to YOLO-format.
There are 3 classes in this dataset: airplane, ship, vehicle.
Here's the dataset split:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Merged Satellite Flood Images is a dataset for object detection tasks - it contains Flood annotations for 440 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Coast Train is a library of images of coastal environments, annotations, and corresponding thematic label masks (or ‘label images’) collated for the purposes of training and evaluating machine learning (ML), deep learning, and other models for image segmentation. It includes image sets from both geospatial satellite, aerial, and UAV imagery and orthomosaics, as well as non-geospatial oblique and nadir imagery. Images include a diverse range of coastal environments from the U.S. Pacific, Gulf of Mexico, Atlantic, and Great Lakes coastlines, consisting of time-series of high-resolution (≤1m) orthomosaics and satellite image tiles (10–30m). Each image, image annotation, and labelled image is available as a single NPZ zipped file. NPZ files follow the following naming convention: {datasource}_{numberofclasses}_{threedigitdatasetversion}.zip, where {datasource} is the source of the original images (for example, NAIP, Landsat 8, Sentinel 2), {numberofclasses} is the number of classes us ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Fernández-Andrés, J.; Sánchez-Soriano, J. Dataset: Roundabout Aerial Images for Vehicle Detection. Data 2022, 7, 47. https://doi.org/10.3390/data7040047
This publication presents a dataset of Spanish roundabouts aerial images taken from an UAV, along with annotations in PASCAL VOC XML files that indicate the position of vehicles within them. Additionally, a CSV file is attached containing information related to the location and characteristics of the captured roundabouts. This work details the process followed to obtain them: image capture, processing and labeling. The dataset consists of 985,260 total instances: 947,400 cars, 19,596 cycles, 2,262 trucks, 7,008 buses and 2,208 empty roundabouts, in 61,896 1920x1080px JPG images. These are divided into 15,474 extracted images from 8 roundabouts with different traffic flows and 46,422 images created using data augmentation techniques. The purpose of this dataset is to help research on computer vision on the road, as such labeled images are not abundant. It can be used to train supervised learning models, such as convolutional neural networks, which are very popular in object detection.
Roundabout (scenes) |
Frames |
Car |
Truck |
Cycle |
Bus |
Empty |
1 (00001) |
1,996 |
34,558 |
0 |
4229 |
0 |
0 |
2 (00002) |
514 |
743 |
0 |
0 |
0 |
157 |
3 (00003-00017) |
1,795 |
4822 |
58 |
0 |
0 |
0 |
4 (00018-00033) |
1,027 |
6615 |
0 |
0 |
0 |
0 |
5 (00034-00049) |
1,261 |
2248 |
0 |
550 |
0 |
81 |
6 (00050-00052) |
5,501 |
180,342 |
1420 |
120 |
1376 |
0 |
7 (00053) |
2,036 |
5,789 |
562 |
0 |
226 |
92 |
8 (00054) |
1,344 |
1,733 |
222 |
0 |
150 |
222 |
Total |
15,474 |
236,850 |
2,262 |
4,899 |
1,752 |
552 |
Data augmentation |
x4 |
x4 |
x4 |
x4 |
x4 |
x4 |
Total |
61,896 |
947,400 |
9048 |
19,596 |
7,008 |
2,208 |
https://captain-whu.github.io/iSAID/dataset.htmlhttps://captain-whu.github.io/iSAID/dataset.html
The authors of the iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images dataset have introduced the first benchmark dataset for instance segmentation in aerial imagery, which merges instance-level object detection and pixel-level segmentation tasks. It contains 655,451 object instances spanning 15 different categories across 2,806 high-resolution images. Precise per-pixel annotations have been provided for each instance, ensuring accurate localization for detailed scene analysis. Compared to existing small-scale aerial image-based instance segmentation datasets, iSAID boasts 15 times the number of object categories and 5 times the number of instances.
This Project consists of two datasets, both of aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines).
The aim of the project is to examine automated dolphins detection and tracking from aerial surveys.
The project description, details and results are presented in the paper (link to the paper).
Each dataset was organized and set for a different phase of the project. Each dataset is located in a different zip file:
1. Detection - Detection.zip
2. Tracking - Tracking.zip
Further information about the datasets' content and annotation format is below.
* In aim to watch each file content, use the preview option, in addition a description appears later on this section.
Detection Dataset
This dataset contains 1125 aerial images, while an image can contain several dolphins.
The detection phase of the project is done using RetinaNet, supervised deep learning based algorithm, with the implementation of Keras RetinaNet. Therefore, the data was divided into three parts - Train, Validation and Test. The relations is 70%, 15%, 15% respectively.
The annotation format follows the requested format of that implementation (Keras RetinaNet). Each object, which is a dolphin, is annotated as a bounding box coordinates and a class. For this project, the dolphins were not distinguished into species, therefore, a dolphin object is annotated as a bounding box, and classified as a 'Dolphin'. Detection zip file includes:
*The annotation format is detailed in Annotation section.
Detection zip file content:
Detection
|——————train_set (images)
|——————train_set.csv
|——————validation_set (images)
|——————train_set.csv
|——————test_set (images)
|——————train_set.csv
└——————class_mapping.csv
Tracking
This dataset contains 5 short videos (10-30 seconds), which were trimmed from a longer aerial videos, captured from a drone.
The tracking phase of the project is done using two metrics:
Both metrics demand the videos' frames sequence as an input. Therefore, the videos' frames were extracted. The first frame was annotated manually for initialization, and the algorithms track accordingly. Same as the Detection dataset, each frame can includes several objects (dolphins).
For annotation consistency, the videos' frames sequences were annotated similar to the Detection Dataset above, (details can be found in Annotation section). Each video's frames annotations separately. Therefore, Tracking zip file contains a folder for each video (5 folders in total), named after the video's file name.
Each video folder contains:
The examined videos description and details are displayed in 'Videos Description.xlsx' file. Use the preview option for displaying its content.
Tracking zip file content:
Tracking
|——————DJI_0195_trim_0015_0045
| └——————frames (images)
| └——————annotations_DJI_0195_trim_0015_0045.csv
| └——————class_mapping_DJI_0195_trim_0015_0045.csv
| └——————DJI_0195_trim_0015_0045.MP4
|——————DJI_0395_trim_0010_0025
| └——————frames (images)
| └——————annotations_DJI_0395_trim_0010_0025.csv
| └——————class_mapping_DJI_0395_trim_0010_0025.csv
| └——————DJI_0195_trim_0015_0045.MP4
|——————DJI_0395_trim_00140_00150
| └——————frames (images)
| └——————annotations_DJI_0395_trim_00140_00150.csv
| └——————class_mapping_DJI_0395_trim_00140_00150.csv
| └——————DJI_0395_trim_00140_00150.MP4
|——————DJI_0395_trim_0055_0085
| └——————frames (images)
| └——————annotations_DJI_0395_trim_0055_0085.csv
| └——————class_mapping_DJI_0395_trim_0055_0085.csv
| └——————DJI_0395_trim_0055_0085.MP4
└——————HighToLow_trim_0045_0070
└—————frames (images)
└—————annotations_HighToLow_trim_0045_0070.csv
└—————class_mapping_HighToLow_trim_0045_0070.csv
└—————HighToLow_trim_0045_0070.MP4
Annotations format
Both datasets have similar annotation format which is described below. The data annotation format, of both datasets, follows the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection phase of the project.
Each object (dolphin) is annotated by a bounding box left-top and right-bottom coordinates and a class. Each image or frame can includes several objects. All data was annotated using Labelbox application.
For each subset (Train, Validation and Test of Detection dataset, and each video of Tracking Dataset) there are two corresponded CSV files:
Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame.
The format of each line of the CSV annotation is:
path/to/image.jpg,x1,y1,x2,y2,class_name
An example from `train_set.csv`:
.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin
.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin
.\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin
.\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin
.\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin
This defines a dataset with 2 images:
Each line in the Class Mapping CSV file contains a mapping:
class_name,id
An example:
Dolphin,0
—In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird’s-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous challenges run on DOTA have attracted more than 1300 teams worldwide. We believe that the expanded large-scale DOTA dataset, the extensive baselines, the code library and the challenges can facilitate the designs of robust algorithms and reproducible research on the problem of object detection in aerial images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to perform instance segmentation of Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 810 images (.jpg) of size 224x224 pixels. We also provide partitioning of the data into Train (567 images), Test (162 images), and Validation (81 images) subsets. Their annotations are provided in three different .json files following the COCO annotation format.
Dataset includes aerial images and videos of dolphins, being taken by drones. The data was captured from few places (Italy and Israel coast lines).
The datsaset was collected in aim to perform automated dolphins detection of aerial images, and dolphins tracking from aerial videos.
The Project description and results in the github link, which describes and visualizes the paper (link to the paper).
The dataset includes two zip files:
Detection.zip
Tracking.zip
For both files, the data annotation format is identical, and described below.
In aim to watch each file content, use the preview option, in addition a description appears later on this section.
Annotations format
The data annotation format is inspired by the requested format of Keras RetinaNet Implementation, which was used for training in the Dolphins Detection Phase.
Each object is annotated by a bounding box. All data was annotated using Labelbox application.
For each subset there are two corresponded CSV files:
Annotation file
Class mapping file
Each line in the Annotations CSV file contains an annotation (bounding box) in an image or frame. The format of each line of the CSV annotation is:
path/to/image.jpg,x1,y1,x2,y2,class_name
path/to/image.jpg - a path to the image/frame
x1, y1 - image coordinates of the left upper corner of the bounding box
x2, y2 - image coordinates of the right bottom corner of the bounding box
class_name - class name of the annotated object
An example from train_set.csv
:
.\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,506,644,599,681,Dolphin .\train_set\1146_20170730101_ce1_sc_GOPR3047 103.jpg,394,754,466,826,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,613,699,682,781,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,528,354,586,443,Dolphin .\train_set\1147_20170730101_ce1_sc_GOPR3047 104.jpg,633,250,723,307,Dolphin
This defines a dataset with 2 images:
1146_20170730101_ce1_sc_GOPR3047 103.jpg
contains 2 bounding boxes which contains dolphins.
1146_20170730101_ce1_sc_GOPR3047 104.jpg
contains 3 bounding boxes which contains dolphins.
Each line in the Class Mapping CSV file contains a mapping:
class_name,id
An example:
Dolphin,0
Detection
The data for dolphins' detection is separated to three sub-directories: train, validation and test sets.
Since all files contain only one class - Dolphin
, there is one class_mapping.csv
which is can be used for all the three subsets.
Detection dataset folder includes:
A folder for each - train, validation and test sets, which includes the images
An annotations CSV file for each - train, validation and test sets
A class mapping csv file (for all the sets)
There is an annotation CSV file for each of the subset.
Tracking
For the tracking phase, trackers were examined and evaluated on 5 videos. Each video has its annotation and class mapping CSV files. In addition, extracted each video's frames are available in the frames directory.
Tracking dataset folder includes a folder for each video (5 videos), which contain:
frames directory, which includes extracted frames of the video
An annotations CSV
A class mapping csv file
The original video
The examined videos description and details:
Detection and Tracking dataset structure:
Detection |——————train_set (images) |——————train_set.csv |——————validation_set (images) |——————train_set.csv |——————test_set (images) |——————train_set.csv └——————class_mapping.csv
Tracking |——————DJI_0195_trim_0015_0045 | └——————frames (images) | └——————annotations_DJI_0195_trim_0015_0045.csv | └——————class_mapping_DJI_0195_trim_0015_0045.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_0010_0025 | └——————frames (images) | └——————annotations_DJI_0395_trim_0010_0025.csv | └——————class_mapping_DJI_0395_trim_0010_0025.csv | └——————DJI_0195_trim_0015_0045.MP4 |——————DJI_0395_trim_00140_00150 | └——————frames (images) | └——————annotations_DJI_0395_trim_00140_00150.csv | └——————class_mapping_DJI_0395_trim_00140_00150.csv | └——————DJI_0395_trim_00140_00150.MP4 |——————DJI_0395_trim_0055_0085 | └——————frames (images) | └——————annotations_DJI_0395_trim_0055_0085.csv | └——————class_mapping_DJI_0395_trim_0055_0085.csv | └——————DJI_0395_trim_0055_0085.MP4 └——————HighToLow_trim_0045_0070 └—————frames (images) └—————annotations_HighToLow_trim_0045_0070.csv └—————class_mapping_HighToLow_trim_0045_0070.csv └—————HighToLow_trim_0045_0070.MP4
Leaves from genetically unique Juglans regia plants were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA). Soil samples were collected in Fall of 2017 from the riparian oak forest located at the Russell Ranch Sustainable Agricultural Institute at the University of California Davis. The soil was sieved through a 2 mm mesh and was air dried before imaging. A single soil aggregate was scanned at 23 keV using the 10x objective lens with a pixel resolution of 650 nanometers on beamline 8.3.2 at the ALS. Additionally, a drought stressed almond flower bud (Prunus dulcis) from a plant housed at the University of California, Davis, was scanned using a 4x lens with a pixel resolution of 1.72 µm on beamline 8.3.2 at the ALS Raw tomographic image data was reconstructed using TomoPy. Reconstructions were converted to 8-bit tif or png format using ImageJ or the PIL package in Python before further processing. Images were annotated using Intel’s Computer Vision Annotation Tool (CVAT) and ImageJ. Both CVAT and ImageJ are free to use and open source. Leaf images were annotated in following Théroux-Rancourt et al. (2020). Specifically, Hand labeling was done directly in ImageJ by drawing around each tissue; with 5 images annotated per leaf. Care was taken to cover a range of anatomical variation to help improve the generalizability of the models to other leaves. All slices were labeled by Dr. Mina Momayyezi and Fiona Duong.To annotate the flower bud and soil aggregate, images were imported into CVAT. The exterior border of the bud (i.e. bud scales) and flower were annotated in CVAT and exported as masks. Similarly, the exterior of the soil aggregate and particulate organic matter identified by eye were annotated in CVAT and exported as masks. To annotate air spaces in both the bud and soil aggregate, images were imported into ImageJ. A gaussian blur was applied to the image to decrease noise and then the air space was segmented using thresholding. After applying the threshold, the selected air space region was converted to a binary image with white representing the air space and black representing everything else. This binary image was overlaid upon the original image and the air space within the flower bud and aggregate was selected using the “free hand” tool. Air space outside of the region of interest for both image sets was eliminated. The quality of the air space annotation was then visually inspected for accuracy against the underlying original image; incomplete annotations were corrected using the brush or pencil tool to paint missing air space white and incorrectly identified air space black. Once the annotation was satisfactorily corrected, the binary image of the air space was saved. Finally, the annotations of the bud and flower or aggregate and organic matter were opened in ImageJ and the associated air space mask was overlaid on top of them forming a three-layer mask suitable for training the fully convolutional network. All labeling of the soil aggregate and soil aggregate images was done by Dr. Devin Rippner. These images and annotations are for training deep learning models to identify different constituents in leaves, almond buds, and soil aggregates Limitations: For the walnut leaves, some tissues (stomata, etc.) are not labeled and only represent a small portion of a full leaf. Similarly, both the almond bud and the aggregate represent just one single sample of each. The bud tissues are only divided up into buds scales, flower, and air space. Many other tissues remain unlabeled. For the soil aggregate annotated labels are done by eye with no actual chemical information. Therefore particulate organic matter identification may be incorrect. Resources in this dataset:Resource Title: Annotated X-ray CT images and masks of a Forest Soil Aggregate. File Name: forest_soil_images_masks_for_testing_training.zipResource Description: This aggregate was collected from the riparian oak forest at the Russell Ranch Sustainable Agricultural Facility. The aggreagate was scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 0,0,0; pores spaces have a value of 250,250, 250; mineral solids have a value= 128,0,0; and particulate organic matter has a value of = 000,128,000. These files were used for training a model to segment the forest soil aggregate and for testing the accuracy, precision, recall, and f1 score of the model.Resource Title: Annotated X-ray CT images and masks of an Almond bud (P. Dulcis). File Name: Almond_bud_tube_D_P6_training_testing_images_and_masks.zipResource Description: Drought stressed almond flower bud (Prunis dulcis) from a plant housed at the University of California, Davis, was scanned by X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 4x lens with a pixel resolution of 1.72 µm using. For masks, the background has a value of 0,0,0; air spaces have a value of 255,255, 255; bud scales have a value= 128,0,0; and flower tissues have a value of = 000,128,000. These files were used for training a model to segment the almond bud and for testing the accuracy, precision, recall, and f1 score of the model.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads Resource Title: Annotated X-ray CT images and masks of Walnut leaves (J. Regia) . File Name: 6_leaf_training_testing_images_and_masks_for_paper.zipResource Description: Stems were collected from genetically unique J. regia accessions at the 117 USDA-ARS-NCGR in Wolfskill Experimental Orchard, Winters, California USA to use as scion, and were grafted by Sierra Gold Nursery onto a commonly used commercial rootstock, RX1 (J. microcarpa × J. regia). We used a common rootstock to eliminate any own-root effects and to simulate conditions for a commercial walnut orchard setting, where rootstocks are commonly used. The grafted saplings were repotted and transferred to the Armstrong lathe house facility at the University of California, Davis in June 2019, and kept under natural light and temperature. Leaves from each accession and treatment were scanned using X-ray micro-computed tomography (microCT) on the X-ray μCT beamline (8.3.2) at the Advanced Light Source (ALS) in Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA USA) using the 10x objective lens with a pixel resolution of 650 nanometers. For masks, the background has a value of 170,170,170; Epidermis value= 85,85,85; Mesophyll value= 0,0,0; Bundle Sheath Extension value= 152,152,152; Vein value= 220,220,220; Air value = 255,255,255.Resource Software Recommended: Fiji (ImageJ),url: https://imagej.net/software/fiji/downloads