Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide a dataset for target detection and tracking in aerial imagery-M3OT, a multimodal vehicle detection and tracking dataset acquired by two Unmanned Aerial Vehicles(UAVs) in a high altitude region. The dataset consists of both RGB and infrared thermal (IR) modalities, with two drones` altitudes ranging from 100m to 120m. The dataset consists of 21,580 frames extracted from 8 hours of video, 10790 paired RGB- Infrared thermal (IR) images from two UAVs, and 220,000 bounding boxes across various environments, including suburban, urban, daytime, dusk, and night. This dataset can serve as a benchmark for object detection, multiple object tracking, and other computer vision tasks.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The approach begins with UAV-mounted multi-modal sensors, including RGB cameras, infrared sensors, and LiDAR, to ensure reliable data collection across diverse operational conditions. To train and evaluate the proposed model, the UAV Small Object Detection Dataset is utilized, which includes aerial imagery labeled across 10 categories of small and weak objects. This dataset comprises 717 training samples, 84 validation samples, and 43 test samples, enabling robust model development under real-world aerial scenarios.
Dataset Structure and Splits This particular version of the dataset is meticulously organized 80% of Training and 20% of Testing into standard splits to facilitate structured deep learning experimentation, ensuring unbiased model evaluation and comparability of results:
Training Set: Comprising 717 samples, this subset is dedicated to training deep learning models. It contains the largest portion of the data, allowing models to learn diverse patterns, features, and contextual information necessary for small object detection. Each image in the training set is accompanied by precise bounding box annotations for the target objects. Validation Set: Consisting of 84 samples, this set is used during the model development phase for hyperparameter tuning, model selection, and early stopping. It provides an unbiased estimate of the model's performance on unseen data during training, preventing overfitting to the training set. Test Set: With 43 samples, this independent set is reserved exclusively for the final evaluation of the trained models. Performance metrics derived from the test set are considered the most reliable indicator of a model's generalization capability to real-world, completely unseen data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Multi Drone Detection is a dataset for object detection tasks - it contains Drone annotations for 2,194 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The dataset contains UAV footage of wild antelopes (blackbucks) in grassland habitats. It can be mainly used for two tasks: Multi-object tracking (MOT) and Re-Identification (Re-ID). We provide annotations for the position of animals in each frame, allowing us to offer very long videos (up to 3 min) completely annotated while maintaining the identity of each animal in the video. The Re-ID dataset offers two videos, that capture the movement of some animals simultaneously from two different UAVs. The Re-ID task is to find the same individual in two videos taken simultaneously from a slightly different perspective. The relevant paper will be published in the NeurIPS 2024 Dataset and Benchmarking Track. https://nips.cc/virtual/2024/poster/97563 Resolution: 5.4 K MOT: 12 videos ( MOT17 Format) Re-ID: 6 sets (each with a pair of drones) (Custom) Detection: 320 Images (COCO, YOLO)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All datasets are derived from the https://zenodo.org/records/15103888" target="_blank" rel="noopener">official release of the 4th Anti-UAV Challenge, featuring thermal infrared videos.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The VisDrone Dataset is a comprehensive benchmark developed by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. Designed for various computer vision tasks associated with drone-based image and video analysis, the dataset serves as an essential resource for researchers and practitioners in the field.
This structured approach facilitates focused training and evaluation for distinct computer vision challenges.The VisDrone Dataset is widely used for training and evaluating deep learning models in various drone-based computer vision tasks, including:
The VisDrone Dataset stands out as a significant contribution to the field of drone-based computer vision. Its diverse sensor data, extensive annotations, and various task-focused subsets make it a valuable resource for advancing research and development in drone applications. Whether for academic research or practical implementations, the VisDrone Dataset is instrumental in fostering innovation in the rapidly evolving domain of drone technology.
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The UAV low-altitude dataset for multi-target detection and tracking provides a specialized benchmark designed to support research in object detection and multi-object tracking under low-altitude unmanned aerial vehicle (UAV) surveillance scenarios. The dataset contains annotated images and video sequences of pedestrians, vehicles, and other common ground targets captured from low-flying UAV platforms, reflecting challenges such as small object size, frequent occlusions, scale variations, and complex backgrounds. All samples are labeled with bounding boxes and identity information to facilitate both detection and tracking tasks. This dataset aims to advance the development of lightweight and robust algorithms for real-time UAV-based monitoring applications, including public safety, traffic management, and intelligent surveillance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains UAV RGB videos (MP4) recorded with a Phantom4 RTK in a vineyard during the harvesting campaign of 2023. It also includes frames and annotations (PNG) to boost Object Detection and Tracking of grape bunches. There are two types of videos: (1) videos capturing the side of the canopy from a frontal point of view only, and (2) videos that collect the data from multiple perspectives to avoid leaf occlusion, common in commercial vineyards. All flights were executed 3 meters above ground level, with a clear sky and wind speed below 0.5 m/s.
Facebook
Twitter
According to our latest research, the global Drone Video AI Object Tracking for Search market size reached USD 1.18 billion in 2024, and is projected to grow at a robust CAGR of 19.6% from 2025 to 2033. By the end of 2033, the market is forecasted to reach USD 5.50 billion. This impressive growth is primarily driven by the increasing adoption of advanced AI-driven drone technologies in critical search operations, enabling real-time object detection and tracking across diverse environments.
The primary growth factor fueling the Drone Video AI Object Tracking for Search market is the rising demand for rapid, accurate, and scalable search solutions in emergency and high-risk scenarios. As natural disasters, urban emergencies, and missing person incidents become more frequent, organizations are turning to drones equipped with AI object tracking to augment traditional search methods. These AI-powered drones can scan vast and challenging terrains, identifying objects of interest with remarkable speed and precision. The integration of deep learning algorithms and computer vision technologies has significantly improved the efficacy of search missions, reducing human error and response times. Furthermore, the ability of drones to operate in hazardous or inaccessible environments enhances safety for human personnel, making them indispensable tools for modern search and rescue operations.
Another key driver for market expansion is the increased investment by government and defense agencies worldwide. Public safety authorities are prioritizing the deployment of AI-enabled drone systems for surveillance, disaster response, and border security. The proliferation of high-resolution cameras, thermal imaging, and advanced sensors has made drones more versatile and effective in object tracking applications. Additionally, regulatory frameworks in several countries are evolving to support the safe integration of drones into national airspace, further accelerating market adoption. The synergy between public sector initiatives and private sector innovation is fostering a dynamic ecosystem where AI object tracking drones are rapidly becoming standard equipment for search and monitoring tasks.
Technological advancements in AI algorithms, edge computing, and cloud-based data processing are also transforming the landscape of drone video analytics. The continuous improvement of neural networks and real-time data transmission capabilities allows for more sophisticated object recognition and tracking, even in complex or cluttered environments. Cloud-based solutions enable collaborative search efforts, where data from multiple drones can be aggregated, analyzed, and visualized in real time. This technological leap is not only enhancing operational efficiency but also opening new avenues for commercial enterprises and environmental organizations to leverage drone video AI object tracking for diverse applications such as wildlife monitoring, infrastructure inspection, and environmental conservation.
Regionally, North America holds the largest share of the Drone Video AI Object Tracking for Search market, accounting for approximately 38% of the global market in 2024. This dominance is attributed to substantial investments in R&D, a strong presence of leading technology firms, and proactive government policies supporting drone integration. Europe follows closely, driven by stringent safety regulations and growing adoption in public safety and environmental monitoring. The Asia Pacific region is witnessing the fastest growth, with a projected CAGR of 22.1% during the forecast period, fueled by rapid urbanization, increasing disaster management needs, and supportive regulatory frameworks. Latin America and the Middle East & Africa are also emerging as promising markets, with growing interest in leveraging drone AI technologies for public safety and environmental applications.
The Drone Video AI Object Tracking for Search market is segmented by
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking
The MMOT dataset was presented in the paper MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking. The official code and further details can be found on the GitHub repository: https://github.com/Annzstbl/MMOT.
Introduction
MMOT is the first large-scale benchmark for drone-based multispectral multi-object tracking (MOT). It integrates spectral… See the full description on the dataset page: https://huggingface.co/datasets/Annzstbl/MMOT.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with categorized axis-aligned bounding boxes (BBs) for vehicle detection from a high-altitude bird’s-eye view (BeV) perspective. Captured over Songdo International Business District, South Korea, this dataset consists of 5,419 annotated video frames, featuring approximately 300,000 vehicle instances categorized into four classes:
This dataset can serve as a benchmark for aerial vehicle detection, supporting research and real-world applications in intelligent transportation systems, traffic monitoring, and aerial vision-based mobility analytics. It was developed in the context of a multi-drone experiment aimed at enhancing geo-referenced vehicle trajectory extraction.
📌 Citation: If you use this dataset in your work, kindly acknowledge it by citing the following article:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205.
🔗 Related dataset: For precisely georeferenced vehicle trajectories extracted from the same large-scale multi-drone experiment, see Songdo Traffic: 10.5281/zenodo.13828384.
Publicly available datasets for aerial vehicle detection often exhibit limitations such as:
To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.
The dataset is randomly split into training (80%) and test (20%) subsets:
| Subset | Images | Car | Bus | Truck | Motorcycle | Total Vehicles |
| Train | 4,335 | 195,539 | 7,030 | 11,779 | 2,963 | 217,311 |
| Test | 1,084 | 49,508 | 1,759 | 3,052 | 805 | 55,124 |
A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.
The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.
More details on the experimental setup and data processing pipeline are available in [1].
Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.
Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:
{
"images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
"categories": [
{"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
{"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
]
}
0 0.52 0.63 0.10 0.05 # Car bounding box
2 0.25 0.40 0.15 0.08 # Truck bounding box
The dataset is provided as two compressed archives:
1. Training Data (train.zip, 12.91 GB)
train/
│── coco_annotations.json # COCO format
│── images/
│ ├── 0001.jpg
│ ├── ...
│── labels/
│ ├── 0001.txt # YOLO format
│ ├── 0001.xml # Pascal VOC format
│ ├── ...
2. Testing Data (test.zip, 3.22 GB)
test/
│── coco_annotations.json
│── images/
│ ├── 00027.jpg
│ ├── ...
│── labels/
│ ├── 00027.txt
│ ├── 00027.xml
│ ├── ...
README.md – Dataset documentation (this description)LICENSE.txt – Creative Commons Attribution 4.0 Licensenames.txt – Class names (one per line)data.yaml – Example YOLO configuration file for training/testingIn addition to the funding sources listed in the metadata, the creators express their gratitude to Artem Vasilev for his dedicated efforts in data annotation. We also thank the research teams of Prof. Simon Oh (Korea University) and Prof. Minju Park (Hannam University) for their assistance during the data collection campaign, including the provision of drone equipment and student support.
Preferred Citation: If you use Songdo Vision for any purpose, whether academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying article [1]:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205
BibTeX entry:
@article{fonod2025advanced, title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery}, author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
journal = {Transportation Research Part C: Emerging
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In this short paper, we study platooning control of drones using only the information from a camera attached to each drone. For this, we adopt real-time objection detection based on a deep learning model called YOLO (you only look once). The YOLO object detector continuously estimates the relative position of the drone in front, by which each drone is controlled by a PD (Proportional-Derivative) feedback controller for platooning. The effectiveness of the proposed system is shown by indoor experiments with three drones.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 74 images of aerial maritime photographs taken with via a Mavic Air 2 drone and 1,151 bounding boxes, consisting of docks, boats, lifts, jetskis, and cars. This is a multi class problem. This is an aerial object detection dataset. This is a maritime object detection dataset.
The drone was flown at 400 ft. No drones were harmed in the making of this dataset.
This dataset was collected and annotated by the Roboflow team, released with MIT license.
https://i.imgur.com/9ZYLQSO.jpg" alt="Image example">
This dataset is a great starter dataset for building an aerial object detection model with your drone.
Fork or download this dataset and follow our How to train state of the art object detector YOLOv4 for more. Stay tuned for particular tutorials on how to teach your UAV drone how to see and comprable airplane imagery and airplane footage.
See here for how to use the CVAT annotation tool that was used to create this dataset.
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. :fa-spacer: Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

Facebook
TwitterThe VisDrone-MOT dataset is a large-scale benchmark for multiple object tracking under drone scenes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The UAVDT dataset is an open-source dataset specifically designed for drone-based detection and tracking. It aims to provide researchers with high-quality and rich multi-task data to facilitate the application of drones in complex environments, particularly for tasks such as object detection, object tracking, and motion analysis.The dataset was released by the Computer Vision Laboratory at Shenzhen University in 2019 and has been widely utilized in areas such as drone video analysis, autonomous driving, and intelligent surveillance.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enhanced animal welfare has emerged as a pivotal element in contemporary precision animal husbandry, with bovine monitoring constituting a significant facet of precision agriculture. The evolution of intelligent agriculture in recent years has significantly facilitated the integration of drone flight monitoring tools and innovative systems, leveraging deep learning to interpret bovine behavior. Smart drones, outfitted with monitoring systems, have evolved into viable solutions for wildlife protection and monitoring as well as animal husbandry. Nevertheless, challenges arise under actual and multifaceted ranch conditions, where scale alterations, unpredictable movements, and occlusions invariably influence the accurate tracking of unmanned aerial vehicles (UAVs). To address these challenges, this manuscript proposes a tracking algorithm based on deep learning, adhering to the Joint Detection Tracking (JDT) paradigm established by the CenterTrack algorithm. This algorithm is designed to satisfy the requirements of multi-objective tracking in intricate practical scenarios. In comparison with several preeminent tracking algorithms, the proposed Multi-Object Tracking (MOT) algorithm demonstrates superior performance in Multiple Object Tracking Accuracy (MOTA), Multiple Object Tracking Precision (MOTP), and IDF1. Additionally, it exhibits enhanced efficiency in managing Identity Switches (ID), False Positives (FP), and False Negatives (FN). This algorithm proficiently mitigates the inherent challenges of MOT in complex, livestock-dense scenarios.
Facebook
TwitterSeaDronesSee is a large-scale data set aimed at helping develop systems for Search and Rescue (SAR) using Unmanned Aerial Vehicles (UAVs) in maritime scenarios. Building highly complex autonomous UAV/drone systems that aid in SAR missions requires robust computer vision algorithms to detect and track objects or persons of interest. This data set provides three tracks: object detection, single-object tracking, and multi-object tracking.
All datasets and other information can be found at: https://seadronessee.cs.uni-tuebingen.de/home
This dataset contains only the compressed version of the Object Detection v2 Dataset
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The UAV Aerial Photography Multi-Object Dataset is designed for smart transportation applications, featuring a collection of internet-collected UAV (Unmanned Aerial Vehicle) aerial photography images with a resolution of 1920 x 1080 pixels. This dataset predominantly covers large-scale scenes such as parking lots and highways, with each image containing over 200 vehicles. Every object within these images is meticulously annotated with a bounding box that aligns with the object's orientation, ensuring precise vehicle detection and tracking.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Enhanced animal welfare has emerged as a pivotal element in contemporary precision animal husbandry, with bovine monitoring constituting a significant facet of precision agriculture. The evolution of intelligent agriculture in recent years has significantly facilitated the integration of drone flight monitoring tools and innovative systems, leveraging deep learning to interpret bovine behavior. Smart drones, outfitted with monitoring systems, have evolved into viable solutions for wildlife protection and monitoring as well as animal husbandry. Nevertheless, challenges arise under actual and multifaceted ranch conditions, where scale alterations, unpredictable movements, and occlusions invariably influence the accurate tracking of unmanned aerial vehicles (UAVs). To address these challenges, this manuscript proposes a tracking algorithm based on deep learning, adhering to the Joint Detection Tracking (JDT) paradigm established by the CenterTrack algorithm. This algorithm is designed to satisfy the requirements of multi-objective tracking in intricate practical scenarios. In comparison with several preeminent tracking algorithms, the proposed Multi-Object Tracking (MOT) algorithm demonstrates superior performance in Multiple Object Tracking Accuracy (MOTA), Multiple Object Tracking Precision (MOTP), and IDF1. Additionally, it exhibits enhanced efficiency in managing Identity Switches (ID), False Positives (FP), and False Negatives (FN). This algorithm proficiently mitigates the inherent challenges of MOT in complex, livestock-dense scenarios.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Seraphim Drone Detection Dataset
Dataset Overview
This is a comprehensive drone image dataset curated from 23 open-source datasets and processed through a custom cleaning pipeline. The dataset is designed for training object detection models to identify drones in various environments and conditions. The majority of images feature rotary-wing (multi-rotor) unmanned aerial vehicles (UAVs), with a smaller portion representing fixed-wing and hybrid.… See the full description on the dataset page: https://huggingface.co/datasets/lgrzybowski/seraphim-drone-detection-dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We provide a dataset for target detection and tracking in aerial imagery-M3OT, a multimodal vehicle detection and tracking dataset acquired by two Unmanned Aerial Vehicles(UAVs) in a high altitude region. The dataset consists of both RGB and infrared thermal (IR) modalities, with two drones` altitudes ranging from 100m to 120m. The dataset consists of 21,580 frames extracted from 8 hours of video, 10790 paired RGB- Infrared thermal (IR) images from two UAVs, and 220,000 bounding boxes across various environments, including suburban, urban, daytime, dusk, and night. This dataset can serve as a benchmark for object detection, multiple object tracking, and other computer vision tasks.