Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
XML To COCO JSON MaskRCNN is a dataset for object detection tasks - it contains Door Window Light annotations for 800 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
CantaloupeFastRCNN is a dataset for object detection tasks - it contains Cantaloupe annotations for 1,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was used in the AI4QC project (Artificial Intelligence for Quality Control), in the context of RFI detection through an object detection task. It consists of a set of labeled RFIs (radio frequency interferences). These interferences are caused by man-made sources and can lead to an artefact in the satellite image, typically a bright rectangular pattern. Bounding boxes were defined around RFI artefacts in 3940 Sentinel-1 quick-looks (png images). A few "other anomalies" were identified as well, leaving a total of 11724 "RFI" bounding boxes and 301 "Other Anomalies" bounding boxes.
The labeled RFIs are available in three formats: PASCAL VOC (xml files), COCO (json files) and YOLO (txt files). Each is contained in a different zip file. The S1_images zip file contains the 3940 Sentinel-1 quick-looks. One can combine the label files (in a chosen format) with the S1 images to train object detection algorithms to automatically detect RFIs in a satellite image. A predefined train/test split is available (80% training and 20% testing), with training and testing zip folders containing the images and labels for each subset. The data was split according to 3 criterias: RFI over land vs sea, size of the RFI bounding boxes and geographic location.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The "Udacity Self-Driving Car > fixed-small" dataset is a curated and re-labeled collection of images designed for object detection tasks in autonomous driving applications. It addresses the shortcomings of the original Udacity dataset by correcting missing labels for critical objects such as pedestrians, bikers, vehicles, and traffic lights. With 15,000 high-resolution images (1920x1200) and a total of 97,942 annotations spanning 11 classes, this dataset ensures high-quality labeling for training and evaluation.
The dataset is compatible with popular machine learning frameworks and is available in multiple formats, including COCO JSON, VOC XML, and Tensorflow TFRecords. A downsampled version (512x512 resolution) is also provided to accommodate models with computational constraints.
Annotations have been hand-verified for accuracy, making this dataset a reliable choice for building robust object detection and tracking models for autonomous vehicles. However, users should note that duplicate bounding boxes for certain classes may require preprocessing, such as IOU-based filtering, to ensure optimal model performance.
The dataset is released under the MIT License, promoting openness and accessibility for research and development in computer vision and autonomous driving technologies.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains figure bounding boxes corresponding to the bioRxiv 10k dataset.
It provides annotations in two formats:
COCO format (JSON)
JATS XML with GROBID's "coords" attribute
The COCO format contains bounding boxes in rendered pixel units, as well as PDF user units. The latter uses field names with the "pt_" prefix.
The "coords" attribute uses the PDF user units.
The dataset was generated by using an algorithm to find the figure images within the rendered PDF pages. The main algorithm used for that purpose is SIFT. As a fallback, OpenCV's Template Matching (with multi scaling) was used. There may be some error cases in the document. Very few documents were excluded, were neither algorithm was able to find any match for one of the figure images (six documents in the train subset, two documents in the test subset).
Figure images may appear next to a figure description, but they may also appear as "attachments". The latter usually appears at the end of the document (but not always) and often on pages with dimensions different to the regular page size (but not always).
This dataset itself doesn't contain any images. The PDF to render pages can be found in the bioRxiv 10k dataset.
The dataset is intended for training or evaluation purposes of the semantic Figure extraction. The evaluation score would be calculated by comparing the extracted bounding boxes with the one from this purpose. (example implementation ScienceBeam Judge)
The dataset was created as part of eLife's ScienceBeam project.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The Udacity Self-Driving Car Dataset has been updated to address issues in the original annotations, which were missing labels for several key objects such as pedestrians, bikers, cars, and traffic lights. These omissions can negatively impact model performance and, in the context of self-driving cars, could lead to dangerous scenarios. To solve this, the dataset has been re-labeled with accurate annotations, ensuring improved performance for computer vision models used in autonomous vehicle systems.
This dataset includes various formats for ease of use, including VOC XML, COCO JSON, TensorFlow Object Detection TFRecords, and more.
A downsampled version is also available at 512x512 (approx. 580 MB), suitable for common machine learning models like YOLO v3, Mask R-CNN, SSD, and MobileNet.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The original Udacity Self Driving Car Dataset is missing labels for thousands of pedestrians, bikers, cars, and traffic lights. This will result in poor model performance. When used in the context of self driving cars, this could even lead to human fatalities.
We re-labeled the dataset to correct errors and omissions. We have provided convenient downloads in many formats including VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and more.
Some examples of labels missing from the original dataset:
https://i.imgur.com/A5J3qSt.jpg" alt="Examples of Missing Labels">
The dataset contains 97,942 labels across 11 classes and 15,000 images. There are 1,720 null examples (images with no labels).
All images are 1920x1200 (download size ~3.1 GB). We have also provided a version downsampled to 512x512 (download size ~580 MB) that is suitable for most common machine learning models (including YOLO v3, Mask R-CNN, SSD, and mobilenet).
Annotations have been hand-checked for accuracy by Roboflow.
https://i.imgur.com/bOFkueI.pnghttps://" alt="Class Balance">
Annotation Distribution:
https://i.imgur.com/NwcrQKK.png" alt="Annotation Heatmap">
Udacity is building an open source self driving car! You might also try using this dataset to do person-detection and tracking.
Our updates to the dataset are released under the MIT License (the same license as the original annotations and images).
Note: the dataset contains many duplicated bounding boxes for the same subject which we have not corrected. You will probably want to filter them by taking the IOU for classes that are 100% overlapping or it could affect your model performance (expecially in stoplight detection which seems to suffer from an especially severe case of duplicated bounding boxes).
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with categorized axis-aligned bounding boxes (BBs) for vehicle detection from a high-altitude bird’s-eye view (BeV) perspective. Captured over Songdo International Business District, South Korea, this dataset consists of 5,419 annotated video frames, featuring approximately 300,000 vehicle instances categorized into four classes:
This dataset can serve as a benchmark for aerial vehicle detection, supporting research and real-world applications in intelligent transportation systems, traffic monitoring, and aerial vision-based mobility analytics. It was developed in the context of a multi-drone experiment aimed at enhancing geo-referenced vehicle trajectory extraction.
📌 Citation: If you use this dataset in your work, kindly acknowledge it by citing the following article:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205.
🔗 Related dataset: For precisely georeferenced vehicle trajectories extracted from the same large-scale multi-drone experiment, see Songdo Traffic: 10.5281/zenodo.13828384.
Publicly available datasets for aerial vehicle detection often exhibit limitations such as:
To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.
The dataset is randomly split into training (80%) and test (20%) subsets:
| Subset | Images | Car | Bus | Truck | Motorcycle | Total Vehicles |
| Train | 4,335 | 195,539 | 7,030 | 11,779 | 2,963 | 217,311 |
| Test | 1,084 | 49,508 | 1,759 | 3,052 | 805 | 55,124 |
A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.
The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.
More details on the experimental setup and data processing pipeline are available in [1].
Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.
Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:
{
"images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
"categories": [
{"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
{"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
]
}
0 0.52 0.63 0.10 0.05 # Car bounding box
2 0.25 0.40 0.15 0.08 # Truck bounding box
The dataset is provided as two compressed archives:
1. Training Data (train.zip, 12.91 GB)
train/
│── coco_annotations.json # COCO format
│── images/
│ ├── 0001.jpg
│ ├── ...
│── labels/
│ ├── 0001.txt # YOLO format
│ ├── 0001.xml # Pascal VOC format
│ ├── ...
2. Testing Data (test.zip, 3.22 GB)
test/
│── coco_annotations.json
│── images/
│ ├── 00027.jpg
│ ├── ...
│── labels/
│ ├── 00027.txt
│ ├── 00027.xml
│ ├── ...
README.md – Dataset documentation (this description)LICENSE.txt – Creative Commons Attribution 4.0 Licensenames.txt – Class names (one per line)data.yaml – Example YOLO configuration file for training/testingIn addition to the funding sources listed in the metadata, the creators express their gratitude to Artem Vasilev for his dedicated efforts in data annotation. We also thank the research teams of Prof. Simon Oh (Korea University) and Prof. Minju Park (Hannam University) for their assistance during the data collection campaign, including the provision of drone equipment and student support.
Preferred Citation: If you use Songdo Vision for any purpose, whether academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying article [1]:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205
BibTeX entry:
@article{fonod2025advanced, title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery}, author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
journal = {Transportation Research Part C: Emerging
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The original Udacity Self Driving Car Dataset is missing labels for thousands of pedestrians, bikers, cars, and traffic lights. This will result in poor model performance. When used in the context of self driving cars, this could even lead to human fatalities.
We re-labeled the dataset to correct errors and omissions. We have provided convenient downloads in many formats including VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and more.
Some examples of labels missing from the original dataset:
https://i.imgur.com/A5J3qSt.jpg" alt="Examples of Missing Labels">
Udacity is building an open source self driving car! You might also try using this dataset to do person-detection and tracking.
Our updates to the dataset are released under the same license as the original.
Note: the dataset contains many duplicated bounding boxes for the same subject which we have not corrected. You will probably want to filter them by taking the IOU for classes that are 100% overlapping or it could affect your model performance (expecially in stoplight detection which seems to suffer from an especially severe case of duplicated bounding boxes).
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

Facebook
TwitterUdacity 自动驾驶汽车 > 固定小”数据集是经过精心策划和重新标记的图像集合,专为自动驾驶应用中的对象检测任务而设计。它通过纠正行人、骑自行车的人、车辆和交通灯等关键对象的缺失标签来解决原始 Udacity 数据集的缺点。该数据集包含 15,000 张高分辨率图像 (1920x1200) 和跨越 11 个类别的总共 97,942 个注释,可确保训练和评估的高质量标记。 该数据集与流行的机器学习框架兼容,并提供多种格式,包括 COCO JSON、VOC XML 和 Tensorflow TFRecords。还提供了下采样版本(512x512 分辨率)以适应具有计算限制的模型。 注释的准确性经过手工验证,使该数据集成为为自动驾驶车辆构建强大的对象检测和跟踪模型的可靠选择。然而,用户应该注意,某些类别的重复边界框可能需要预处理,例如基于 IOU 的过滤,以确保最佳模型性能。 该数据集根据麻省理工学院许可发布,促进计算机视觉和自动驾驶技术研究和开发的开放性和可访问性。
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
XML To COCO JSON MaskRCNN is a dataset for object detection tasks - it contains Door Window Light annotations for 800 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).