Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Roboflow Packages dataset is a collection of packages located at the doors of various apartments and homes. Packages are flat envelopes, small boxes, and large boxes. Some images contain multiple annotated packages.
This dataset may be used as a good starter dataset to track and identify when a package has been delivered to a home. Perhaps you want to know when a package arrives to claim it quickly or prevent package theft.
If you plan to use this dataset and adapt it to your own front door, it is recommended that you capture and add images from the context of your specific camera position. You can easily add images to this dataset via the web UI or via the Roboflow Upload API.
Roboflow enables teams to build better computer vision models faster. We provide tools for image collection, organization, labeling, preprocessing, augmentation, training and deployment. :fa-spacer: Developers reduce boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

Facebook
TwitterThis dataset was created by Lyndia Lu
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Image Augmentation is a dataset for object detection tasks - it contains Fractured annotations for 702 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
If you use this dataset, please cite this paper: Puertas, E.; De-Las-Heras, G.; Sánchez-Soriano, J.; Fernández-Andrés, J. Dataset: Variable Message Signal Annotated Images for Object Detection. Data 2022, 7, 41. https://doi.org/10.3390/data7040041
This dataset consists of Spanish road images taken from inside a vehicle, as well as annotations in XML files in PASCAL VOC format that indicate the location of Variable Message Signals within them. Also, a CSV file is attached with information regarding the geographic position, the folder where the image is located, and the text in Spanish. This can be used to train supervised learning computer vision algorithms, such as convolutional neural networks. Throughout this work, the process followed to obtain the dataset, image acquisition, and labeling, and its specifications are detailed. The dataset is constituted of 1216 instances, 888 positives, and 328 negatives, in 1152 jpg images with a resolution of 1280x720 pixels. These are divided into 576 real images and 576 images created from the data-augmentation technique. The purpose of this dataset is to help in road computer vision research since there is not one specifically for VMSs.
The folder structure of the dataset is as follows:
In which:
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is structured for underwater object detection tasks, following the COCO annotation format. It contains both real and augmented images of various underwater objects (e.g., fish, coral, ROVs). Images are grouped into classes, and all annotations are stored in a single JSON file for ease of access and compatibility with most object detection frameworks.
The dataset folder structure is as follows:
Underwater_Object_Detection_Dataset/
├── combined_images/
│ ├── animal_fish/
│ │ ├── real_and_augmented_image1.jpg
│ │ ├── real_and_augmented_image2.jpg
│ │ └── ...
│ ├── plant/
│ │ ├── real_and_augmented_image1.jpg
│ │ └── ...
│ ├── rov/
│ │ ├── real_and_augmented_image1.jpg
│ │ └── ...
│ ├── test/
│ │ ├── test_image1.jpg
│ │ ├── test_image2.jpg
│ │ └── ...
│ ├── mixed_categories/
│ │ ├── mixed_image1.jpg
│ │ ├── mixed_image2.jpg
│ │ └── ...
│ └── ...
├── combined_annotations.json
combined_images/: Contains subfolders for each class, with each folder containing both real and augmented images for that class.test/: Contains images specifically for testing the model, kept separate from the main classes.mixed_categories/: Contains images with multiple object classes in a single image, allowing for multi-object detection tasks.combined_annotations.json: A single JSON file with all image and annotation information, formatted in COCO-style for seamless integration with object detection models.The combined_annotations.json file follows the COCO format, structured into three main sections: images, annotations, and categories.
{
"images": [
{
"id": 1,
"file_name": "vid_000159_frame0000008.jpg",
"width": 480,
"height": 270
},
{
"id": 2,
"file_name": "vid_000339_frame0000012.jpg",
"width": 480,
"height": 270
}
// Additional images
],
"annotations": [
{
"segmentation": [],
"area": 343.875,
"iscrowd": 0,
"image_id": 1,
"bbox": [238.0, 165.0, 18.0, 23.0],
"category_id": 1,
"id": 221
},
{
"segmentation": [],
"area": 500.25,
"iscrowd": 0,
"image_id": 2,
"bbox": [120.0, 140.0, 25.0, 20.0],
"category_id": 2,
"id": 222
}
// Additional annotations
],
"categories": [
{
"supercategory": "marine_life",
"id": 1,
"name": "fish"
},
{
"supercategory": "marine_life",
"id": 2,
"name": "coral"
},
{
"supercategory": "vehicle",
"id": 3,
"name": "rov"
}
// Additional categories
]
}
images: Contains metadata about each image:
"id": Unique identifier for the image."file_name": File name within its respective class folder."width" and "height": Dimensions of the image in pixels.annotations: Lists each object annotation with the following details:
"segmentation": For polygonal segmentation (empty here as we use bounding boxes only)."area": Area of the bounding box."iscrowd": Set to 0 for individual objects, 1 if dense clustering."image_id": Corresponds to the id in images, linking the annotation to its image."bbox": Bounding box in [x_min, y_min, width, height] format."category_id": Refers to the object’s class in categories."id": Unique ID for each annotation.categories: Lists unique object classes in the dataset:
"supercategory": High-level grouping for the class."id": Unique ID for each class."name": Name of the object class.This dataset is suitable for: - Training and validation for underwater object detection models. - Benchmarking and testing on object detection algorithms. - Exploring domain adaptation using real and augmented underwater images.
test/ folder is intended exclusively for testing the model, helping to evaluate its performance on unseen data.mixed_categories/ folder contains images with multiple object types, making it suitable for multi-object detection challenges, where models need to detect several classes in the same image.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is an extremely challenging set of over 3,000+ images of excavator vehicles from multiple construction site. These images captured and crowdsourced from over 2000+ different locations, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs. It contains a wide variety of indoor door images. This dataset can be used scene classification and domestic object detection.
Optimized for Generative AI, Visual Question Answering, Image Classification, and LMM development, this dataset provides a strong basis for achieving robust model performance.
COCO, YOLO, PASCAL-VOC, Tf-Record
The images in this dataset are exclusively owned by Data Cluster Labs and were not downloaded from the internet. To access a larger portion of the training dataset for research and commercial purposes, a license can be purchased. Contact us at sales@datacluster.ai Visit www.datacluster.ai to know more.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Hard Hat dataset is an object detection dataset of workers in workplace settings that require a hard hat. Annotations also include examples of just "person" and "head," for when an individual may be present without a hard hart.
The original dataset has a 75/25 train-test split.
Example Image:
https://i.imgur.com/7spoIJT.png" alt="Example Image">
One could use this dataset to, for example, build a classifier of workers that are abiding safety code within a workplace versus those that may not be. It is also a good general dataset for practice.
Use the fork or Download this Dataset button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
Image Preprocessing | Image Augmentation | Modify Classes
* v1 (resize-416x416-reflect): generated with the original 75/25 train-test split | No augmentations
* v2 (raw_75-25_trainTestSplit): generated with the original 75/25 train-test split | These are the raw, original images
* v3 (v3): generated with the original 75/25 train-test split | Modify Classes used to drop person class | Preprocessing and Augmentation applied
* v5 (raw_HeadHelmetClasses): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class
* v8 (raw_HelmetClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and person classes
* v9 (raw_PersonClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop head and helmet classes
* v10 (raw_AllClasses): generated with a 70/20/10 train/valid/test split | These are the raw, original images
* v11 (augmented3x-AllClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied | 3x image generation | Trained with Roboflow's Fast Model
* v12 (augmented3x-HeadHelmetClasses-FastModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Fast Model
* v13 (augmented3x-HeadHelmetClasses-AccurateModel): generated with a 70/20/10 train/valid/test split | Preprocessing and Augmentation applied, Modify Classes used to drop person class | 3x image generation | Trained with Roboflow's Accurate Model
* v14 (raw_HeadClassOnly): generated with a 70/20/10 train/valid/test split | Modify Classes used to drop person class, and remap/relabel helmet class to head
Choosing Between Computer Vision Model Sizes | Roboflow Train
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Accurate identification of small tea buds is a key technology for tea harvesting robots, which directly affects tea quality and yield. However, due to the complexity of the tea plantation environment and the diversity of tea buds, accurate identification remains an enormous challenge. Current methods based on traditional image processing and machine learning fail to effectively extract subtle features and morphology of small tea buds, resulting in low accuracy and robustness. To achieve accurate identification, this paper proposes a small object detection algorithm called STF-YOLO (Small Target Detection with Swin Transformer and Focused YOLO), which integrates the Swin Transformer module and the YOLOv8 network to improve the detection ability of small objects. The Swin Transformer module extracts visual features based on a self-attention mechanism, which captures global and local context information of small objects to enhance feature representation. The YOLOv8 network is an object detector based on deep convolutional neural networks, offering high speed and precision. Based on the YOLOv8 network, modules including Focus and Depthwise Convolution are introduced to reduce computation and parameters, increase receptive field and feature channels, and improve feature fusion and transmission. Additionally, the Wise Intersection over Union loss is utilized to optimize the network. Experiments conducted on a self-created dataset of tea buds demonstrate that the STF-YOLO model achieves outstanding results, with an accuracy of 91.5% and a mean Average Precision of 89.4%. These results are significantly better than other detectors. Results show that, compared to mainstream algorithms (YOLOv8, YOLOv7, YOLOv5, and YOLOx), the model improves accuracy and F1 score by 5-20.22 percentage points and 0.03-0.13, respectively, proving its effectiveness in enhancing small object detection performance. This research provides technical means for the accurate identification of small tea buds in complex environments and offers insights into small object detection. Future research can further optimize model structures and parameters for more scenarios and tasks, as well as explore data augmentation and model fusion methods to improve generalization ability and robustness.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Image Augmentation And Annotation is a dataset for object detection tasks - it contains Objects annotations for 431 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset is specifically curated for object detection tasks aimed at identifying and classifying road damage and potholes. The original dataset on which this augmented dataset is based, included images labeled with four distinct classes: - Pothole - Alligator Crack - Long Crack - Lat Crack However, for training the model for detecting road damages, it has been placed into 1 class, namely the "Pothole" class, which now also includes the alligator, longitudinal, and lateral cracks.
To enhance the robustness and generalization capability of models trained on this dataset, extensive data augmentation techniques have been applied. The augmentation pipeline includes:
These augmentations ensure that models can learn to recognize road damages under various conditions and viewpoints, improving their detection performance.
Bounding boxes are provided in the YOLO format, ensuring easy integration with popular object detection frameworks. The bounding boxes are adjusted to correspond with the augmented images to maintain annotation accuracy.
The dataset includes the following class:
Class ID Class Name 0 Pothole
The dataset is divided into training, validation, and testing sets with the following proportions:
This split ensures a sufficient amount of data for training the model while maintaining enough data for validation and testing to assess model performance accurately.
This dataset aims to aid researchers and developers in building and fine-tuning models for road damage detection, contributing to safer and more efficient road maintenance systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance comparison of the OFIDA and several SOTA data augmentation methods for image classification.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data was constructed for detecting window and blind states. All images were annotated in XML format using LabelImg for object detection tasks. The results of applying the Faster R-CNN based model include detected images and loss graphs for both training and validation in this dataset. Additionally, the raw data with other annotations can be used for applications such as semantic segmentation and image captioning.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vehicle Detection Dataset
This dataset is designed for vehicle detection tasks, featuring a comprehensive collection of images annotated for object detection. This dataset, originally sourced from Roboflow (https://universe.roboflow.com/object-detection-sn8ac/ai-traffic-system), was exported on May 29, 2025, at 4:59 PM GMT and is now publicly available on Kaggle under the CC BY 4.0 license.
../train/images../valid/images../test/imagesThis dataset was created and exported via Roboflow, an end-to-end computer vision platform that facilitates collaboration, image collection, annotation, dataset creation, model training, and deployment. The dataset is part of the ai-traffic-system project (version 1) under the workspace object-detection-sn8ac. For more details, visit: https://universe.roboflow.com/object-detection-sn8ac/ai-traffic-system/dataset/1.
This dataset is ideal for researchers, data scientists, and developers working on vehicle detection and traffic monitoring systems. It can be used to: - Train and evaluate deep learning models for object detection, particularly using the YOLOv11 framework. - Develop AI-powered traffic management systems, autonomous driving applications, or urban mobility solutions. - Explore computer vision techniques for real-world traffic scenarios.
For advanced training notebooks compatible with this dataset, check out: https://github.com/roboflow/notebooks. To explore additional datasets and pre-trained models, visit: https://universe.roboflow.com.
The dataset is licensed under CC BY 4.0, allowing for flexible use, sharing, and adaptation, provided appropriate credit is given to the original source.
This dataset is a valuable resource for building robust vehicle detection models and advancing computer vision applications in traffic systems.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Image data augmentation plays a crucial role in data augmentation (DA) by increasing the quantity and diversity of labeled training data. However, existing methods have limitations. Notably, techniques like image manipulation, erasing, and mixing can distort images, compromising data quality. Accurate representation of objects without confusion is a challenge in methods like auto augment and feature augmentation. Preserving fine details and spatial relationships also proves difficult in certain techniques, as seen in deep generative models. To address these limitations, we propose OFIDA, an object-focused image data augmentation algorithm. OFIDA implements one-to-many enhancements that not only preserve essential target regions but also elevate the authenticity of simulating real-world settings and data distributions. Specifically, OFIDA utilizes a graph-based structure and object detection to streamline augmentation. Specifically, by leveraging graph properties like connectivity and hierarchy, it captures object essence and context for improved comprehension in real-world scenarios. Then, we introduce DynamicFocusNet, a novel object detection algorithm built on the graph framework. DynamicFocusNet merges dynamic graph convolutions and attention mechanisms to flexibly adjust receptive fields. Finally, the detected target images are extracted to facilitate one-to-many data augmentation. Experimental results validate the superiority of our OFIDA method over state-of-the-art methods across six benchmark datasets.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset contains images of lions and tigers sourced from the Open Images Dataset V6 and labeled specifically for object detection using the YOLO format. The dataset focuses on two classes: lion and tiger, with annotations provided for each image in a YOLO-compatible .txt file format. This dataset is ideal for training machine learning models for wildlife detection and classification tasks, particularly in distinguishing between these two majestic big cats. Key Features:
Classes: Lion and Tiger
Annotations: YOLO format, with bounding box coordinates and class labels provided in separate .txt files for each image.
Source: Images sourced from Open Images Dataset V6, which is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Application: Suitable for object detection models like YOLO, SSD, or Faster R-CNN.
Usage:
The dataset can be used for training, validating, or testing object detection models. Each image is accompanied by a corresponding YOLO annotation file, making it easy to integrate into any YOLO-based pipeline. Attribution:
This dataset is derived from the Open Images Dataset V6, and proper attribution must be given. Please credit the Open Images Dataset when using or sharing this dataset in any format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Data Augmentation Data Adjust 5k is a dataset for object detection tasks - it contains Leaves annotations for 4,994 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The data was used for "Impact of Traditional Augmentation Methods on Window States Detection", which is a conference paper of CLIMA2022. The main purpose of this data is for reproductivity of proposed methods. All images are annotated with XML format using LabelImg. Additionally, this dataset may be used for other object detection and segmentation tasks as a possible application.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
BRAGAN is a new dataset of Brazilian wildlife developed for object detection tasks, combining real images with synthetic samples generated by Generative Adversarial Networks (GANs). It focuses on five medium and large-sized mammal species frequently involved in roadkill incidents on Brazilian highways: lowland tapir (Tapirus terrestris), jaguarundi (Herpailurus yagouaroundi), maned wolf (Chrysocyon brachyurus), puma (Puma concolor), and giant anteater (Myrmecophaga tridactyla). Its primary goal is to provide a standardized and expanded resource for biodiversity conservation research, wildlife monitoring technologies, and computer vision applications, with an emphasis on automated wildlife detection.
The dataset builds upon the original BRA-Dataset by Ferrante et al. (2022), which was constructed from structured internet searches and manually curated with bounding box annotations. However, while the BRA-Dataset faced limitations in size and variability, BRAGAN introduces a new stage of dataset expansion through GAN-based synthetic image generation, substantially improving both the quantity and diversity of samples. In its final version, BRAGAN comprises approximately 9,238 images, divided into three main groups:
Real images — original photographs from the BRA-Dataset. Total: 1,823.
Classically augmented images — transformations applied to real samples, including rotations (RT), horizontal flips (HF), vertical flips (VF), and horizontal (HS) and vertical shifts (VS). Total: 7,300.
GAN-generated images — synthetic samples created using WGAN-GP models trained separately for each species on preprocessed subsets of the original data. All generated images underwent visual inspection to ensure morphological fidelity and proper framing before inclusion. Total: 115.
The dataset follows an organized directory structure with images/ and labels/ folders, each divided into train/ and val/ subsets, following an 80–20 split. Images are provided in .jpg format, while annotations follow the YOLO standard in .txt files (class_id x_center y_center width height, with normalized coordinates). The file naming convention explicitly encodes the species and the augmentation type for reproducibility.
Designed to be compatible with multiple object detection architectures, BRAGAN has been evaluated on YOLOv5, YOLOv8, and YOLOv11 (variants n, s, and m), enabling the assessment of dataset expansion across different computational settings and performance requirements.
By combining real data, classical augmentations, and high-quality synthetic samples, the BRAGAN provides a valuable resource for wildlife detection, environmental monitoring, and conservation research, especially in contexts where image availability for rare or threatened species is limited.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
The CADOT dataset is introduced as part of the Grand Challenge at IEEE ICIP 2025, aiming to push forward the development of advanced object detection techniques in remote sensing imagery, particularly focused on dense urban environments. The competition is organized by LabCom IRISER, in collaboration with IGN (Institut national de l'information géographique et forestière), and encourages the use of AI-based data augmentation to enhance model robustness.
The challenge calls for the detection of small objects in high-resolution optical satellite imagery, which is inherently complex due to occlusions, diverse object types, and varied urban layouts. Participants are expected to develop detection pipelines that are not only accurate but also robust under real-world remote sensing constraints.
The CADOT dataset comprises high-resolution aerial images captured over a dense urban area in the Île-de-France region, France. Each image is carefully annotated with 14 object categories including buildings, roads, vehicles, trees, and various other urban components. The imagery comes from IGN and reflects a realistic and challenging setting for object detection models due to factors like shadows, perspective distortion, and dense object arrangements.
To facilitate easier use of the dataset in machine learning workflows, I have reformatted the original data into the following versions:
.jpg and .png format (cropped and full-frame)For full licensing terms and official documentation, please refer to the official challenge page: 🔗 https://cadot.onrender.com/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Performance evaluation of semantic segmentation on the CITYSCAPES validation set using mIoU.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The Roboflow Packages dataset is a collection of packages located at the doors of various apartments and homes. Packages are flat envelopes, small boxes, and large boxes. Some images contain multiple annotated packages.
This dataset may be used as a good starter dataset to track and identify when a package has been delivered to a home. Perhaps you want to know when a package arrives to claim it quickly or prevent package theft.
If you plan to use this dataset and adapt it to your own front door, it is recommended that you capture and add images from the context of your specific camera position. You can easily add images to this dataset via the web UI or via the Roboflow Upload API.
Roboflow enables teams to build better computer vision models faster. We provide tools for image collection, organization, labeling, preprocessing, augmentation, training and deployment. :fa-spacer: Developers reduce boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:
