Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080x1080 and 1280x1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640x640 and 1280x1280, respectively. copyright
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The PASCAL Visual Object Classes (VOC) 2012 dataset is a benchmark in object recognition, widely used for training and evaluating models in computer vision tasks.
The dataset has been modified to include only the image data and labels in YOLO format. The original annotation files have been removed, and object labels were converted using provided scripts(from Ultralytics) to be compatible with YOLO-based object detection models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Object Detection Bench
This dataset is a customized version of the RealworldQA dataset, specifically tailored for object detection and segmentation benchmarking tasks.
Dataset Description
This benchmark dataset contains real-world images with questions, answers, and custom prompts designed for evaluating object detection and segmentation models. Each sample includes:
Image: Real-world photographs Question: Original question about the image content Answer: Ground truth… See the full description on the dataset page: https://huggingface.co/datasets/JigsawStack/object-detection-bench.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FAIR-CSAR-V1.0 dataset, constructed on single-look complex (SLC) images of Gaofen-3 satellite, is the largest and most finely annotated SAR image dataset for fine-grained target to date. FAIR-CSAR-V1.0 aims to advance related technologies in SAR image object detection, recognition, and target characteristic understanding. The dataset is developed by Key Laboratory of Target Cognition and Application Technology (TCAT) at the Aerospace Information Research Institute, Chinese Academy of Sciences.FAIR-CSAR-V1.0 comprises 175 scenes of Gaofen-3 Level-1 SLC products, covering 32 global regions including airports, oil refineries, ports, and rivers. With a total data volume of 250 GB and over 340,000 instances, FAIR-CSAR-V1.0 covers 5 main categories and 22 subcategories, providing detailed annotations for imaging parameters (e.g., radar center frequency, pulse repetition frequency) and target characteristics (e.g., satellite-ground relative azimuthal angle, key scattering point distribution).FAIR-CSAR-V1.0 consists of two sub-datasets: the SL dataset and the FSI dataset. The SL dataset, acquired in spotlight mode with a nominal resolution of 1 meter, contains 170,000 instances across 22 target classes. The FSI dataset, acquired in fine stripmap mode with a nominal resolution of 5 meters, includes 170,000 instances across 3 target classes. Figure 1 presents an overview of the dataset.Data paper and citation format:[1] Youming Wu, Wenhui Diao, Yuxi Suo, Xian Sun. A Benchmark Dataset for Fine-Grained Object Detection and Recognition Based on Single-Look Complex SAR Images (FAIR-CSAR-V1.0) [OL]. Journal of Radars, 2025. https://radars.ac.cn/web/data/getData?dataType=FAIR_CSAR_en&pageType=en.[2] Y. Wu, Y. Suo, Q. Meng, W. Dai, T. Miao, W. Zhao, Z. Yan, W. Diao, G. Xie, Q. Ke, Y. Zhao, K. Fu and X. Sun, FAIR-CSAR: A Benchmark Dataset for Fine-Grained Object Detection and Recognition Based on Single-Look Complex SAR Images[J]. IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1-22, 2025, doi: 10.1109/TGRS.2024.3519891.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
PINTEL FireDet Benchmark Dataset is a dataset for object detection tasks - it contains Fire annotations for 390 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset for TiROD: Tiny Robotics Dataset and Benchmark for Continual Object DetectionOfficial Website -> https://pastifra.github.io/TiROD/
Code -> https://github.com/pastifra/TiROD_code
Video -> https://www.youtube.com/watch?v=e76m3ol1i4I
Paper -> https://arxiv.org/abs/2409.16215
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Welcome to the CloudNet repository. This project provides a cloud detection dataset and a pre-trained model designed to enhance object detection accuracy in remote sensing aerial images, particularly in challenging cloud-covered scenarios. The dataset comprises two classes: cloud and non-cloud images, sourced from the publicly available Maxar "Hurricane Ian" repository.
The CloudNet dataset consists of cloud and non-cloud images, facilitating research in cloud detection for object detection in remote sensing imagery.
The CloudNet model is a pre-trained model specifically designed for cloud detection in remote sensing imagery. It is trained on the CloudNet dataset and serves as a valuable tool for enhancing object detection accuracy in the presence of clouds.
You can download the pre-trained CloudNet model weights from the following link: CloudNet Model Weights
If you find the CloudNet dataset or model useful in your research, please cite our work using the following BibTeX entry:
@INPROCEEDINGS{10747011,
author={Haque, Mohd Ariful and Rifat, Rakib Hossain and Kamal, Marufa and George, Roy and Gupta, Kishor Datta and Shujaee, Khalil},
booktitle={2024 Fifth International Conference on Intelligent Data Science Technologies and Applications (IDSTA)},
title={CDD & CloudNet: A Benchmark Dataset & Model for Object Detection Performance},
year={2024},
volume={},
number={},
pages={118-122},
abstract={Aerial imagery obtained through remote sensing is extensively utilized across diverse industries, particularly for object detection applications where it has demonstrated considerable efficacy. However, clouds in these images can obstruct evaluation and detection tasks. This study therefore involved the compilation of a cloud dataset, which categorized images into two classes: those containing clouds and those without. These images were sourced from the publicly available Maxar ‘Hurricane Ian’ repository, which contains images from various natural events. We demonstrated the impact of cloud removal during pre-processing on object detection using this dataset and employed six CNN models, including a custom model, for cloud detection benchmarking. These models were used to detect objects in aerial images from two other events in the Maxar dataset. Our results show significant improvements in precision, recall, and F1-score for CNN models, along with optimized training times for object detection in the CloudNet+YOLO combination. The findings demonstrate the effectiveness of our approach in improving object detection accuracy and efficiency in remote sensing imagery, particularly in challenging cloud-covered scenarios.},
keywords={Training;Industries;Accuracy;Object detection;Benchmark testing;Data science;Data models;Remote sensing;Cloud Detection;Dataset;Deep Learning;CNN;ResNet;Vgg16;DenseNet169;EfficientNet;MobileNet},
doi={10.1109/IDSTA62194.2024.10747011},
ISSN={},
month={Sep.},}
The CloudNet dataset and model are released under the License.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A collection of annotations for PACO images containing free-form fine-grained textual captions of objects, their parts, and their attributes. It also comprises several sets of negative captions that can be used to test and evaluate the fine-grained recognition ability of open-vocabulary models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was originally created by Yimin Chen. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/workspace-txxpz/underwater-detection.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Tsinghua-Daimler Cyclist Detection Benchmark Dataset in yolo format for Object Detection
I'm not owner the of this dataset, all the credit goes to X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun, S. Pan, K. Li and D. M. Gavrila, the creators of this dataset.
id center_x center_y width height (relative to image width and height)0 0.41015625 0.44140625 0.0341796875 0.11328125
This dataset is made freely available non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use, copy, and distribute the data given that you agree:
X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun, S. Pan, K. Li and D. M. Gavrila. A New Benchmark for Vision-Based Cyclist Detection. In Proc. of the IEEE Intelligent Vehicles Symposium (IV), Gothenburg, Sweden, pp.1028-1033, 2016.
Facebook
TwitterDUTS is a large-scale saliency detection dataset, containing 10,553 training images and 5,019 test images. All training images are collected from the ImageNet DET training/val sets, while test images are collected from the ImageNet DET test set and the SUN data set. Both the training and test set contain very challenging scenarios for saliency detection. Accurate pixel-level ground truths were manually annotated by 50 subjects.
This dataset is obtained from the official DUTS dataset homepage. Any work based on the dataset checkpoints should cite:
@inproceedings{wang2017,
title={Learning to Detect Salient Objects with Image-level Supervision},
author={Wang, Lijun and Lu, Huchuan and Wang, Yifan and Feng, Mengyang
and Wang, Dong, and Yin, Baocai and Ruan, Xiang},
booktitle={CVPR},
year={2017}
}
All rights reserved by the original authors of DUTS Image Dataset.
Facebook
TwitterThe VIMER-UFO benchmark dataset consists of 8 computer vision tasks: CPLFW, Market1501, DukeMTMC, MSMT-17, Veri-776, VehicleId, VeriWild, and SOP.
Facebook
Twitterhttp://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
The VisDrone dataset is a large-scale visual object detection and tracking benchmark captured by drones. Developed by the AISKYEYE team at Tianjin University, it aims to facilitate research in computer vision tasks such as object detection, object tracking, and crowd analysis in aerial imagery.
The dataset consists of high-resolution images and videos collected using drones flying over urban and suburban environments across various cities in China. These scenes include pedestrians, vehicles, bicycles, and other common objects, captured under different lighting conditions, angles, and motion patterns.
The dataset has been modified to include only the image data and labels in YOLO format. The original annotation files have been removed, and object labels were converted using provided scripts(from Ultralytics) to be compatible with YOLO-based object detection models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Testing Benchmark is a dataset for object detection tasks - it contains Label annotations for 504 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Website Dataset video Code Paper
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection 2025
Dataset description
M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection
Contact
If you have any questions, please feel free to contact me via email at wchao0601@163.com
Citation
If our work is helpful, you can cite our paper:… See the full description on the dataset page: https://huggingface.co/datasets/wchao0601/m4-sar.
Facebook
Twitter"DIOR" is a large-scale benchmark dataset for object detection in optical remote sensing images, which consists of 23,463 images and 192,518 object instances annotated with horizontal bounding boxes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The automatic, accurate perception of targets in space is a crucial prerequisite for many aerospace missions, such as on-orbit maintenance and target monitoring. Therefore, research on perception technologies within images from spaceborne cameras, is of great significance. The recent, rapid development of deep learning has revealed its potential for application to space target perception. However, implementing deep learning models typically requires large-scale labeled datasets. In this study, we build a multitask benchmark space target dataset, NCSTP, to address the limitations of current datasets. First, we collect and modify various space target models for satellites, space debris, and space rocks. By importing them into a realistic space environment simulated by Blender, 200,000 images are generated with different target sizes, poses, lighting conditions, and backgrounds. Then, the data are annotated to ensure the dataset supports simultaneous space target detection, recognition and component segmentation. NCSTP has 10 fine-grained classes of satellites, 6 classes of space debris, and 4 classes of space rocks. All the data can be used for training space target detection and recognition models. We further annotate the body, solar panels, antennas, and observation payloads of each satellite for component segmentation. Finally, we test a series of state-of-the-art object detection and semantic segmentation models on the dataset to establish a benchmark.2025.6.16: A smaller version NCSTP-10000 is available now
Facebook
TwitterThe PASCAL Visual Object Classes Challenge (VOC) is a benchmark dataset for object detection and semantic segmentation.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note: To better find the files to download, select "Change View: Tree". The dataset contains: 2931 images from conventional pig farming with object detection annotations in yolo and coco format with predefined training, validation and test splits Trained model weights for pig detection A thorough explanation of all files contained in this data repository can be found in ReadMe.txt.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
With the advance of AI, road object detection has been a prominent topic in computer vision, mostly using perspective cameras. Fisheye lens provides omnidirectional wide coverage for using fewer cameras to monitor road intersections, however with view distortions. To our knowledge, there is no existing open dataset prepared for traffic surveillance on fisheye cameras. This paper introduces an open FishEye8K benchmark dataset for road object detection tasks, which comprises 157K bounding boxes across five classes (Pedestrian, Bike, Car, Bus, and Truck). In addition, we present benchmark results of State-of-The-Art (SoTA) models, including variations of YOLOv5, YOLOR, YOLO7, and YOLOv8. The dataset comprises 8,000 images recorded in 22 videos using 18 fisheye cameras for traffic monitoring in Hsinchu, Taiwan, at resolutions of 1080x1080 and 1280x1280. The data annotation and validation process were arduous and time-consuming, due to the ultra-wide panoramic and hemispherical fisheye camera images with large distortion and numerous road participants, particularly people riding scooters. To avoid bias, frames from a particular camera were assigned to either the training or test sets, maintaining a ratio of about 70:30 for both the number of images and bounding boxes in each class. Experimental results show that YOLOv8 and YOLOR outperform on input sizes 640x640 and 1280x1280, respectively. copyright