Facebook
TwitterThis dataset was created by Jeff Faudi
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Boots Oriented Bounding Box is a dataset for object detection tasks - it contains Box annotations for 509 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterVector dataset extracted using a deep learning oriented object detection model. Model is trained to identify and classify above and below swimming pools. Show full description
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Oriented Bounding Boxes Dataset is a dataset for object detection tasks - it contains Robot O0Gq annotations for 563 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A large scale, merged dataset for oriented vehicle detection in aerial imagery, preformatted for YOLOv11-OBB models.
This dataset combines three distinct aerial imagery collections—**VSAI**, DroneVehicles, and DIOR-R, into a unified resource for training and benchmarking oriented object detection models. It has been specifically preprocessed and formatted for use with Ultralytics' YOLOv11-OBB models.
The primary goal is to provide a detailed dataset for tasks like aerial surveillance, traffic monitoring, and vehicle detection from a drone's perspective. All annotations have been converted to the YOLO OBB format, and the classes have been simplified for focused vehicle detection tasks.
small-vehicle and large-vehicle.data.yaml configuration file for immediate use in YOLO training pipelines.train, validation, and test sets.small-vehicle and large-vehicle. The vehicle class from the DIOR-R dataset was mapped to large-vehicle.| Class ID | Class Name | Source Dataset(s) |
|---|---|---|
| 0 | small-vehicle | VSAI, DroneVehicles |
| 1 | large-vehicle | VSAI, DroneVehicles, DIOR-R |
Each image has a corresponding .txt label file. Each line in the file represents one object in the YOLOv11-OBB format:
class_id x1 y1 x2 y2 x3 y3 x4 y4
class_id: The class index (0 for small-vehicle, 1 for large-vehicle).(x1, y1)...(x4, y4): The four corner points of the oriented bounding box, with all coordinates normalized to a range of [0, 1].The dataset is organized into a standard YOLO directory structure for easy integration with training programs.
RoadVehiclesYOLOOBBDataset/
├── train/
│ ├── images/ #18,274 images
│ └── labels/ #18,274 labels
├── val/
│ ├── images/ #5,420 images
│ └── labels/ #5,420 labels
├── test/
│ ├── images/ #5,431 images
│ └── labels/ #5,431 labels
├── data.yaml #YOLO dataset configuration file.
└── ReadMe.md #Documentation
To use this dataset with YOLOv11 or other compatible frameworks, simply point your training script to the included data.yaml file.
data.yaml:#Dataset configuration.
path: RoadVehiclesYOLOOBBDataset/
train: train/images
val: val/images
test: test/images
#Number of classes.
nc: 2
#Class names.
names:
0: small-vehicle
1: large-vehicle
This merged dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0), which is the most restrictive license among its sources.
When using this dataset, please provide attribution to all original sources as follows:
- VSAI_Dataset: by DroneVision, licensed under CC BY-NC-SA 4.0.
- DroneVehicles Dataset: by Yiming Sun, Bing Cao, Pengfei Zhu, and Qin G. Hu and modified by Mridankan Mandal, licensed under CC BY-NC-SA 4.0.
- DIOR-R dataset: by the DIOR...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
License Plates OBB is a dataset for object detection tasks - it contains Cars License Plate 3Vv6 annotations for 434 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
High-resolution aerial imagery with 16,000+ oriented bounding boxes for vehicle detection, pre-formatted for Ultralytics YOLOv11.
This dataset is a ready-to-use version of the original Eagle Dataset from the German Aerospace Center (DLR). The original dataset was created to benchmark object detection models on challenging aerial imagery, featuring vehicles at various orientations.
This version has been converted to the YOLOv11-OBB (Oriented Bounding Box) format. The conversion makes the dataset directly compatible with modern deep learning frameworks like Ultralytics YOLO, allowing researchers and developers to train state-of-the-art object detectors with minimal setup.
The dataset is ideal for tasks requiring precise localization of rotated objects, such as vehicle detection in parking lots, traffic monitoring, and urban planning from aerial viewpoints.
The dataset is split into training, validation, and test sets, following a standard structure for computer vision tasks.
Dataset Split & Counts:
Directory Structure:
EagleDatasetYOLO/
├── train/
│ ├── images/ # 159 images
│ └── labels/ # 159 .txt obb labels
├── val/
│ ├── images/ # 53 images
│ └── labels/ # 53 .txt obb labels
├── test/
│ ├── images/ # 106 images
│ └── labels/ # 106 .txt obb labels
├── data.yaml
└── license.md
Annotation Format (YOLOv11-OBB):
Each .txt label file contains one object per line. The format for each object is:
<class_id> <x_center> <y_center> <width> <height> <angle>
<class_id>: The class index (in this case, 0 for 'vehicle').<x_center> <y_center>: The normalized center coordinates of the bounding box.<width> <height>: The normalized width and height of the bounding box.<angle>: The rotation angle of the box in radians, from -π/2 to π/2.data.yaml Configuration:
A data.yaml file is included for easy integration with the Ultralytics framework.
path: ../EagleDatasetYOLO
train: train/images
val: val/images
test: test/images
nc: 1
names: ['vehicle']
This dataset is a conversion of the original work by the German Aerospace Center (DLR). The conversion to YOLOv11-OBB format was performed by Mridankan Mandal.
The dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license (CC BY-NC-SA 4.0).
If you use this dataset in your research, please cite the original creators and acknowledge the conversion work.
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This is a ready-to-use dataset consisting of X-ray images of the human jaw, with corresponding annotations for individual teeth. Each tooth is labeled using oriented bounding box (OBB) coordinates, making the dataset well-suited for tasks that require precise object localization and orientation awareness. There are a total of 17 classes representing teeth in upper jaw
The annotations are formatted specifically for compatibility with YOLO-OBB (Oriented Bounding Box) models, enabling seamless integration into training pipelines for dental detection and analysis tasks.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Current remote sensing object detection frameworks often focus solely on the geometric relationship between true and predicted boxes, neglecting the intrinsic shapes of the boxes. In the field of remote sensing detection, there are numerous elongated bounding boxes. Variations in the shape and size of these boxes result in differences in their Intersection over Union (IoU) values, which is particularly noticeable when detecting small objects. Platforms with limited resources, such as satellites and unmanned drones, have strict requirements for detector storage space and computational complexity. This makes it challenging for existing methods to balance detection performance and computational demands. Therefore, this paper presents RS-YOLO, a lightweight framework that enhances You Only Look Once (YOLO) and is specifically designed for deployment on resource-limited platforms. RS-YOLO has developed a bounding box regression approach for remote sensing images, focusing on the shape and scale of the boundary boxes. Additionally, to improve the integration of multi-scale spatial features, RS-YOLO introduces a lightweight multi-scale hybrid attention module for cross-space fusion. The DOTA-v1.0 and HRSC2016 datasets were used to test our model, which was then compared to multiple state-of-the-art oriented object detection models. The results indicate that the detector introduced in this article achieves top performance while being lightweight and suitable for deployment on resource-limited platforms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The DeepScoresV2 Dataset for Music Object Detection contains digitally rendered images of written sheet music, together with the corresponding ground truth to fit various types of machine learning models. A total of 151 Million different instances of music symbols, belonging to 135 different classes are annotated. The total Dataset contains 255,385 Images. For most researches, the dense version, containing 1714 of the most diverse and interesting images, should suffice.
The dataset contains ground in the form of:
Non-oriented bounding boxes
Oriented bounding boxes
Semantic segmentation
Instance segmentation
The accompaning paper The DeepScoresV2 Dataset and Benchmark for Music Object Detection published at ICPR2020 can be found here:
https://digitalcollection.zhaw.ch/handle/11475/20647
A toolkit for convenient loading and inspection of the data can be found here:
https://github.com/yvan674/obb_anns
Code to train baseline models can be found here:
https://github.com/tuggeluk/mmdetection/tree/DSV2_Baseline_FasterRCNN
https://github.com/tuggeluk/DeepWatershedDetection/tree/dwd_old
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
ExposureEngine: Oriented Logo Detection & Sponsor Visibility Analytics (Dataset)
Paper | Project Page
Rotation-aware OBB annotations for sponsor logos in professional soccer broadcasts — built for sports analytics, YOLO OBB training, and sponsorship measurement.
Oriented Bounding Boxes
Sports Broadcasts
Sponsorship Analytics
YOLOv8/YOLOv11 OBB
Brand Visibility
ExposureEngine provides high-quality oriented bounding box (OBB) polygon… See the full description on the dataset page: https://huggingface.co/datasets/SimulaMet-HOST/ExposureEngine.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The paper for this dataset is found here, the dataset was used in the Gaofen Challenge hosted by the Aerospace Information Research Institute, Chinese Academy of Sciences.
I have put this together because a few months ago I had a project that needed such a dataset for vehicle detection, and found there wasn't much out there with suitable resolution and quality. I ended up using the xView1 Dataset, which was pretty good, but noted at the time the FAIR1M had a lot of potential too.
It's main points of difference of FAIR1M compared to many others in this space are: - Some geographical diversity: Asia, Europe, North America, Capetown, Sydney. Mostly Urban - Oriented bounding boxes - Most of the imagery is high resolution: 0.3m or 0.6m, which makes it just enough for small car detection.
For comparison, xView-1 is larger and more geographically diverse, but has flat bounding boxes. If you want to try oriented bounding boxes, FAIR1M is worth a try.
I could only find 240,852 spatially unique labels, the rest seem to be duplicates due to overlapping imagery. Though some of course would be in the hidden test set, which has not been made public. Anyway, that's still a lot of labels, so thanks to the organisers for making these available.
Facebook
Twitterhttps://captain-whu.github.io/DOTA/dataset.htmlhttps://captain-whu.github.io/DOTA/dataset.html
In the past decade, significant progress in object detection has been made in natural images, but authors of the DOTA v2.0: Dataset of Object deTection in Aerial images note that this progress hasn't extended to aerial images. The main reason for this discrepancy is the substantial variations in object scale and orientation caused by the bird's-eye view of aerial images. One major obstacle to the development of object detection in aerial images (ODAI) is the lack of large-scale benchmark datasets. The DOTA dataset contains 1,793,658 object instances spanning 18 different categories, all annotated with oriented bounding box annotations (OBB). These annotations were collected from a total of 11,268 aerial images. Using this extensive and meticulously annotated dataset, the authors establish baselines covering ten state-of-the-art algorithms, each with over 70 different configurations. These configurations are evaluated for both speed and accuracy performance.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A cleaned, and reformatted version of the VSAI Dataset, specifically adapted for Oriented Bounding Box (OBB) vehicle detection using the YOLOv11 format.
This dataset is designed for aerial/drone-based vehicle detection tasks. It is a modified version of the original VSAI Dataset v1 by the DroneVision Team. This version has been modified by Mridankan Mandal for the easy of training object detection models like the YOLO11-OBB models.
The dataset is split into two classes: small-vehicle and large-vehicle. All annotations have been converted to the YOLOv11-OBB format, and the data is organized into training, validation, and testing sets.
This dataset improves upon the original by incorporating several key modifications to make it more accessible and useful for modern computer vision tasks:
The dataset is organized in a standard YOLO format for easy integration with popular training frameworks.
YOLOOBBVSAIDataset/
├── train/
│ ├── images/ #Contains 4,297 image files.
│ └── labels/ #Contains 4,297 .txt label files.
├── val/
│ ├── images/ #Contains 537 image files.
│ └── labels/ #Contains 537 .txt label files.
├── test/
│ ├── images/ #Contains 538 image files.
│ └── labels/ #Contains 538 .txt label files.
├── data.yaml #Dataset configuration file.
├── license.md #Full license details.
└── ReadMe.md #Dataset README file.
Each .txt label file contains one or more lines, with each line representing a single object in the YOLOv11-OBB format:
class_id x1 y1 x2 y2 x3 y3 x4 y4
class_id: An integer representing the object class (0 for small-vehicle, 1 for large-vehicle).(x1, y1)...(x4, y4): The four corner points of the oriented bounding box, with coordinates normalized between 0 and 1.data.yaml:To begin training a YOLO model with this dataset, you can use the provided data.yaml file. Simply update the path to the location of the dataset on your local machine.
#The path to the root dataset directory.
path: /path/to/YOLOOBBVSAIDataset/
train: train/images
val: val/images
test: test/images
#Number of classes.
nc: 2
#The Class names,
names:
0: small-vehicle
1: large-vehicle
This dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
When using this dataset, attribute as follows:
If you use this dataset in your research, use the following BibTeX entry to cite it:
@dataset{vsai_yolo_obb_2025,
title={VSAI Dataset (YOLOv11-OBB Format)},
author={Mridankan Mandal},
year={2025},
note={Modified from original VSAI v1 dataset by DroneVision},
license={CC BY-NC-SA 4.0}
}
Facebook
TwitterThe crosswalk polygons can be utilized for safety, mobility, and other analyses. This model builds upon YOLOv8 and incorporates oriented bounding boxes (OBB), enhancing detection accuracy by precisely marking crosswalks regardless of their orientations. Various strategies are adopted to enhance the baseline YOLOv8 model, including Convolutional Block Attention, a dual-branch Spatial Pyramid Pooling-Fast module, and cosine annealing. We have also developed a dataset comprising over 23,000 annotated instances of crosswalks to train and validate the proposed model and its variations. The best-performing model achieves a precision of 96.5% and a recall of 93.3% on data collected in Massachusetts, demonstrating its accuracy and efficiency.From the MassGIS website, we downloaded images for 2019 and 2021. The image dataset for each year comprises over 10,000 high-resolution images (tiles). Each image has 100 million pixels (10,000 x 10,000 pixels), and each pixel represents about 6 inches (15 centimeters) on the ground. This resolution provides sufficient detail for identifying small-sized features such as pedestrian crosswalks.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
A class merged and cleaned version of the DroneVehicle dataset, specifically formatted for Oriented Bounding Box (OBB) detection using the YOLOv11 framework.
This dataset is designed for aerial and drone-based vehicle detection tasks that require identifying vehicles with precise rotation and orientation. It is a modified and restructured version of the original DroneVehicle dataset, which was introduced by Yiming Sun, Bing Cao, Pengfei Zhu, and Qin G. Hu. This version was adapted by Mridankan Mandal to facilitate easy training with YOLOv11-OBB models.
The original classes have been merged into two simplified categories: small-vehicle (car, van) and large-vehicle (bus, truck, freight car).
The dataset contains a total of 17,325 images. It is pre-split into training, validation, and test sets to ensure standardized evaluation.
0: small-vehicle1: large-vehicleThe data is organized in the following directory structure:
DroneVehicleYOLOv11OBB/
├── train/
│ ├── images/ #12,118 image files.
│ └── labels/ #12,118 YOLOv11-OBB .txt files.
├── val/
│ ├── images/ #2,599 image files.
│ └── labels/ #2,599 label files.
├── test/
│ ├── images/ #2,608 image files.
│ └── labels/ #2,608 label files.
├── data.yaml #Dataset configuration file.
├── license.md #Full license terms.
└── ReadMe.md #This file.
Each .txt label file contains one or more lines, with each line representing a single object in the YOLOv11-OBB format:
class_id x1 y1 x2 y2 x3 y3 x4 y4
class_id: An integer representing the object class (0 for small-vehicle, 1 for large-vehicle).(x1, y1)...(x4, y4): The four corner points of the oriented bounding box, with coordinates normalized between 0 and 1.data.yaml:To use this dataset with a YOLO framework, you can use the provided data.yaml file. It specifies the dataset paths and class information.
path: DroneVehiclesDatasetYOLO/ #Path to the root dataset directory.
train: train/images #Training images subdirectory.
val: val/images #Validation images subdirectory.
test: test/images #Test images subdirectory.
#Number of classes.
nc: 2
#The Class names.
names:
0: small-vehicle
1: large-vehicle
This dataset is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.
When using this dataset, you must include the following attributions:
Special thanks to Yiming Sun, Bing Cao, Pengfei Zhu, and Qin G. Hu for creating and sharing the original DroneVehicle dataset, which formed the foundation for this work.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a project created to aid in land use classification of properties based on their facades on the streets. It is a bounding box object detection oriented dataset, but the objective is to try semi-supervised techniques to utilize the fewer annotated image examples as possible.
Facebook
TwitterThe HRSC2016 dataset is a high resolution ship recognition dataset annotated with oriented bounding boxes which contains 1061 images, and the image size ranges from 300 × 300 to 1500 × 900.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here. Dataset Description ActiveHuman was generated using Unity's Perception package. It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals). The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset. Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.
Folder configuration The dataset consists of 3 folders:
JSON Data: Contains all the generated JSON files. RGB Images: Contains the generated RGB images. Semantic Segmentation Images: Contains the generated semantic segmentation images.
Essential Terminology
Annotation: Recorded data describing a single capture. Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.). Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor). Ego coordinate system: Coordinates with respect to the ego. Global coordinate system: Coordinates with respect to the global origin in Unity. Sensor: Device that captures the dataset (in this instance the sensor is a camera). Sensor coordinate system: Coordinates with respect to the sensor. Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital. UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.
Dataset Data The dataset includes 4 types of JSON annotation files files:
annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:
id: Integer identifier of the annotation's definition. name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation). description: Description of the annotation's specifications. format: Format of the file containing the annotation specifications (e.g., json, PNG). spec: Format-specific specifications for the annotation values generated by each Labeler.
Most Labelers generate different annotation specifications in the spec key-value pair:
BoundingBox2DLabeler/BoundingBox3DLabeler:
label_id: Integer identifier of a label. label_name: String identifier of a label. KeypointLabeler:
template_id: Keypoint template UUID. template_name: Name of the keypoint template. key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:
label: Joint label. index: Joint index. color: RGBA values of the keypoint. color_code: Hex color code of the keypoint skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:
label1: Label of the first joint. label2: Label of the second joint. joint1: Index of the first joint. joint2: Index of the second joint. color: RGBA values of the connection. color_code: Hex color code of the connection. SemanticSegmentationLabeler:
label_name: String identifier of a label. pixel_value: RGBA values of the label. color_code: Hex color code of the label.
captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:
id: UUID of the capture. sequence_id: UUID of the sequence. step: Index of the capture within a sequence. timestamp: Timestamp (in ms) since the beginning of a sequence. sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:
sensor_id: Sensor UUID. ego_id: Ego UUID. modality: Modality of the sensor (e.g., camera, radar). translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system. rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system. camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration. projection: Projection type used by the camera (e.g., orthographic, perspective). ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:
ego_id: Ego UUID. translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system. rotation: Quaternion variable containing the ego's orientation. velocity: 3D vector containing the ego's velocity (in meters per second). acceleration: 3D vector containing the ego's acceleration (in ). format: Format of the file captured by the sensor (e.g., PNG, JPG). annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:
id: Annotation UUID . annotation_definition: Integer identifier of the annotation's definition. filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image. values: List of key-value pairs containing annotation data for the current Labeler.
Each Labeler generates different annotation specifications in the values key-value pair:
BoundingBox2DLabeler:
label_id: Integer identifier of a label. label_name: String identifier of a label. instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values. x: Position of the 2D bounding box on the X axis. y: Position of the 2D bounding box position on the Y axis. width: Width of the 2D bounding box. height: Height of the 2D bounding box. BoundingBox3DLabeler:
label_id: Integer identifier of a label. label_name: String identifier of a label. instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values. translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters). size: 3D vector containing the size of the 3D bounding box (in meters) rotation: Quaternion variable containing the orientation of the 3D bounding box. velocity: 3D vector containing the velocity of the 3D bounding box (in meters per second). acceleration: 3D vector containing the acceleration of the 3D bounding box acceleration (in ). KeypointLabeler:
label_id: Integer identifier of a label. instance_id: UUID of one instance of a joint. Keypoints with the same joint label that are visible on the same capture have different instance_id values. template_id: UUID of the keypoint template. pose: Pose label for that particular capture. keypoints: Array containing the properties of each keypoint. Each keypoint that exists in the keypoint template file is one element of the array. Each entry's contents have as follows:
index: Index of the keypoint in the keypoint template file. x: Pixel coordinates of the keypoint on the X axis. y: Pixel coordinates of the keypoint on the Y axis. state: State of the keypoint.
The SemanticSegmentationLabeler does not contain a values list.
egos.json: Contains collections of key-value pairs for each ego. These include:
id: UUID of the ego. description: Description of the ego. sensors.json: Contains collections of key-value pairs for all sensors of the simulation. These include:
id: UUID of the sensor. ego_id: UUID of the ego on which the sensor is attached. modality: Modality of the sensor (e.g., camera, radar, sonar). description: Description of the sensor (e.g., camera, radar).
Image names The RGB and semantic segmentation images share the same image naming convention. However, the semantic segmentation images also contain the string Semantic_ at the beginning of their filenames. Each RGB image is named "e_h_l_d_r.jpg", where:
e denotes the id of the environment. h denotes the id of the person. l denotes the id of the lighting condition. d denotes the camera distance at which the image was captured. r denotes the camera angle at which the image was captured.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
SemanticSugarBeets is a comprehensive dataset and framework designed for analyzing post-harvest and post-storage sugar beets using monocular RGB images. It supports three key tasks: instance segmentation to identify and delineate individual sugar beets, semantic segmentation to classify specific regions of each beet (e.g., damage, soil adhesion, vegetation, and rot) and oriented object detection to estimate the size and mass of beets using reference objects. The dataset includes 952 annotated images with 2,920 sugar-beet instances, captured both before and after storage. Accompanying the dataset is a demo application and processing code, available on GitHub. For more details, refer to the paper presented at the Agriculture-Vision Workshop at CVPR 2025.
The dataset supports three primary learning tasks, each designed to address specific aspects of sugar-beet analysis:
The dataset is organized into the following directories:
File names of images and annotations follow this format:
ssb-
If you use the SemanticSugarBeets dataset or source code in your research, please cite the following paper to acknowledge the authors' contributions:
Croonen, G., Trondl, A., Simon, J., Steininger, D., 2025. SemanticSugarBeets: A Multi-Task Framework and Dataset for Inspecting Harvest and Storage Characteristics of Sugar Beets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
Facebook
TwitterThis dataset was created by Jeff Faudi