VisDrone is a large-scale benchmark with carefully annotated ground-truth for various important computer vision tasks, to make vision meet drones. The VisDrone2019 dataset is collected by the AISKYEYE team at Lab of Machine Learning and Data Mining, Tianjin University, China. The benchmark dataset consists of 288 video clips formed by 261,908 frames and 10,209 static images, captured by various drone-mounted cameras, covering a wide range of aspects including location (taken from 14 different cities separated by thousands of kilometers in China), environment (urban and country), objects (pedestrian, vehicles, bicycles, etc.), and density (sparse and crowded scenes). Note that, the dataset was collected using various drone platforms (i.e., drones with different models), in different scenarios, and under various weather and lighting conditions. These frames are manually annotated with more than 2.6 million bounding boxes of targets of frequent interests, such as pedestrians, cars, bicycles, and tricycles. Some important attributes including scene visibility, object class and occlusion, are also provided for better data utilization.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Traffic Analysis: The "visdrone" model can be deployed to analyze traffic patterns, by counting and tracking pedestrians in a street scene. It can be useful for urban planning, deciding where to place crosswalks, estimating pedestrian traffic for retail locations or to improve public transportation routes and schedules.
Security Surveillance: This model can be used in surveillance systems for crowded areas like malls, airports, or city centers. By identifying pedestrian movements - for example, detecting unusual behaviors or tracking a person across multiple camera views - it can aid in ensuring public safety.
Autonomous Vehicles: In the domain of self-driving cars, "visdrone" can help in pedestrian detection and thus prevent potential accidents. A critical requirement for these vehicles is to identify and respect pedestrians.
Enhanced Augmented Reality (AR): In AR applications, the model can be used to identify pedestrians in real time and incorporate them into the AR environment for a more interactive experience - for instance, to avoid overlapping digital elements with real-world pedestrians.
Pedestrian Friendly Urban Design: City planners can use "visdrone" to monitor pedestrian flow and density in various areas of a city. This could help in nature-inclusive urban designing, i.e., creating more pedestrian friendly spaces, more green spaces etc.
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Dataset Card for VisDrone2019-DET
This is a FiftyOne version of the VisDrone2019-DET dataset with 8629 samples.
Installation
If you haven't already, install FiftyOne: pip install -U fiftyone
Usage
import fiftyone as fo import fiftyone.utils.huggingface as fouh
dataset = fouh.load_from_hub("Voxel51/VisDrone2019-DET")
session =… See the full description on the dataset page: https://huggingface.co/datasets/Voxel51/VisDrone2019-DET.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
VisDrone Video is a dataset for object detection tasks - it contains Car People Pedestrians Van Motor annotations for 6,275 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by Минь Тиен Ха
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VisDrone Dataset (YOLO Format)
Overview
This repository contains the VisDrone dataset converted into the YOLO (You Only Look Once) format. The VisDrone dataset is a large-scale benchmark for object detection, segmentation, and tracking in drone videos. The dataset includes a variety of challenging scenarios with diverse objects and backgrounds.
Dataset Details
Classes: 0: pedestrian 1: people 2: bicycle 3: car 4: van 5: truck 6: tricycle 7: awning-tricycle 8:… See the full description on the dataset page: https://huggingface.co/datasets/banu4prasad/VisDrone-Dataset.
This dataset supports the manuscript titled "Tiny Object Detection in Aerial Traffic Surveillance using YOLOv8-Nano". It contains training and evaluation resources used to benchmark YOLOv8n and YOLO-MARS models on the VisDrone dataset for real-time object detection. The data includes:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the field of UAV aerial image processing, ensuring accurate detection of tiny targets is essential. Current UAV aerial image target detection algorithms face challenges such as low computational demands, high accuracy, and fast detection speeds. To address these issues, we propose an improved, lightweight algorithm: LCFF-Net. First, we propose the LFERELAN module, designed to enhance the extraction of tiny target features and optimize the use of computational resources. Second, a lightweight cross-scale feature pyramid network (LC-FPN) is employed to further enrich feature information, integrate multi-level feature maps, and provide more comprehensive semantic information. Finally, to increase model training speed and achieve greater efficiency, we propose a lightweight, detail-enhanced, shared convolution detection head (LDSCD-Head) to optimize the original detection head. Moreover, we present different scale versions of the LCFF-Net algorithm to suit various deployment environments. Empirical assessments conducted on the VisDrone dataset validate the efficacy of the algorithm proposed. Compared to the baseline-s model, the LCFF-Net-n model outperforms baseline-s by achieving a 2.8% increase in the mAP50 metric and a 3.9% improvement in the mAP50–95 metric, while reducing parameters by 89.7%, FLOPs by 50.5%, and computation delay by 24.7%. Thus, LCFF-Net offers high accuracy and fast detection speeds for tiny target detection in UAV aerial images, providing an effective lightweight solution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains VISDRONE for object detection in videos.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The VisDrone2019 dataset is collected by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. The dataset contains a large number of objects in urban and rural road scenes (10 categories such as pedestrians, vehicles, bicycles, etc.), covering a wide variety of scenes and containing a large number of small objects.A link to the original data set: https://github.com/VisDrone/VisDrone-Dataset We selected the training set of the object detection part as our data set, and randomly divided it into new training set, verification set and test set in a ratio close to 7:2:1. Available at https://github.com/VisDrone/VisDrone-Dataset
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Uit Flooded Visdrone is a dataset for object detection tasks - it contains Car P0Xc XsqN Car P0Xc Car P0Xc XsqN annotations for 7,411 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
VisDrone Aug 3 is a dataset for object detection tasks - it contains Cars WfnA annotations for 8,497 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Visdrone Full is a dataset for object detection tasks - it contains Vehicles annotations for 6,213 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Dataset Card for Voxel51/VisDrone2019-DET
This is a FiftyOne dataset with 8629 samples.
Installation
If you haven't already, install FiftyOne: pip install -U fiftyone
Usage
import fiftyone as fo from fiftyone.utils.huggingface import load_from_hub
dataset = load_from_hub("dgural/Data-Curation-for-Visual-AI-Module-5-VisDrone")
session =… See the full description on the dataset page: https://huggingface.co/datasets/dgural/Data-Curation-for-Visual-AI-Module-5-VisDrone.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental results in the embedded environment, on VisDrone-val.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Attention ablation experiment result on VisDrone-val.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by veerchheda
Released under MIT
The DroneVehicle dataset consists of a total of 56,878 images collected by the drone, half of which are RGB images, and the resting are infrared images. We have made rich annotations with oriented bounding boxes for the five categories. Among them, car has 389,779 annotations in RGB images, and 428,086 annotations in infrared images, truck has 22,123 annotations in RGB images, and 25,960 annotations in infrared images, bus has 15,333 annotations in RGB images, and 16,590 annotations in infrared images, van has 11,935 annotations in RGB images, and 12,708 annotations in infrared images, and freight car has 13,400 annotations in RGB images, and 17,173 annotations in infrared image. This dataset is available on the download page.
In DroneVehicle, to annotate the objects at the image boundaries, we set a white border with a width of 100 pixels on the top, bottom, left and right of each image, so that the downloaded image scale is 840 x 712. When training our detection network, we can perform pre-processing to remove the surrounding white border and change the image scale to 640 x 512.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In the field of UAV aerial image processing, ensuring accurate detection of tiny targets is essential. Current UAV aerial image target detection algorithms face challenges such as low computational demands, high accuracy, and fast detection speeds. To address these issues, we propose an improved, lightweight algorithm: LCFF-Net. First, we propose the LFERELAN module, designed to enhance the extraction of tiny target features and optimize the use of computational resources. Second, a lightweight cross-scale feature pyramid network (LC-FPN) is employed to further enrich feature information, integrate multi-level feature maps, and provide more comprehensive semantic information. Finally, to increase model training speed and achieve greater efficiency, we propose a lightweight, detail-enhanced, shared convolution detection head (LDSCD-Head) to optimize the original detection head. Moreover, we present different scale versions of the LCFF-Net algorithm to suit various deployment environments. Empirical assessments conducted on the VisDrone dataset validate the efficacy of the algorithm proposed. Compared to the baseline-s model, the LCFF-Net-n model outperforms baseline-s by achieving a 2.8% increase in the mAP50 metric and a 3.9% improvement in the mAP50–95 metric, while reducing parameters by 89.7%, FLOPs by 50.5%, and computation delay by 24.7%. Thus, LCFF-Net offers high accuracy and fast detection speeds for tiny target detection in UAV aerial images, providing an effective lightweight solution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The VisDrone Dataset is a large-scale benchmark created by the AISKYEYE team at the Lab of Machine Learning and Data Mining, Tianjin University, China. It contains carefully annotated ground truth data for various computer vision tasks related to drone-based image and video analysis.
VisDrone is a large-scale benchmark with carefully annotated ground-truth for various important computer vision tasks, to make vision meet drones. The VisDrone2019 dataset is collected by the AISKYEYE team at Lab of Machine Learning and Data Mining, Tianjin University, China. The benchmark dataset consists of 288 video clips formed by 261,908 frames and 10,209 static images, captured by various drone-mounted cameras, covering a wide range of aspects including location (taken from 14 different cities separated by thousands of kilometers in China), environment (urban and country), objects (pedestrian, vehicles, bicycles, etc.), and density (sparse and crowded scenes). Note that, the dataset was collected using various drone platforms (i.e., drones with different models), in different scenarios, and under various weather and lighting conditions. These frames are manually annotated with more than 2.6 million bounding boxes of targets of frequent interests, such as pedestrians, cars, bicycles, and tricycles. Some important attributes including scene visibility, object class and occlusion, are also provided for better data utilization.