Facebook
TwitterThis dataset was created by deepanshu
Facebook
TwitterThis dataset was created by Ari
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Load COCO 2017 dataset Load any dataset in COCO format to Ikomia format. Then, any training algorithms from the Ikomia marketplace can be connected to this converter....
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
A dataset containing images of car dashcam view with instance segmentation samples of road lanes.
Classes: * divider-line * dotted-line * double-line * random-line * road-sign-line * solid-line
Original dataset source => https://universe.roboflow.com/bestgetsbetter/jpj
License => CC BY 4.0
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed for barcode detection in images. It combines 10+ publicly available datasets (including Roboflow collections, InventBar, and ParcelBar), carefully merged and deduplicated using an MD5 hashing algorithm to ensure unique images.
It is suitable for object detection tasks and comes in COCO JSON format, making it compatible with most modern detection frameworks. Total number of images in the dataset are 18,697. This dataset is single class (barcode). The images are kept in original resolution, no resizing was done.
Dataset Composition: - Train: 13,087 images - Validation: 2,804 images - Test: 2,806 images
I trained YOLOv11n model and achieved following results: | Metric | Score | | ------------- | ----- | | Precision | 0.970 | | Recall | 0.951 | | mAP@50 | 0.974 | | mAP@50-95 | 0.860 |
Facebook
Twitterhttps://dataverse.ird.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.23708/N2UY4Chttps://dataverse.ird.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.23708/N2UY4C
There are already a lot of datasets linked to computer vision tasks (Imagenet, MS COCO, Pascal VOC, OpenImages, and numerous others), but they all suffer from important bias. One bias of significance for us is the data origin: most datasets are composed of data coming from developed countries. Facing this situation, and the need of data with local context in developing countries, we try here to adapt common data generation process to inclusive data, meaning data drawn from locations and cultural context that are unseen or poorly represented. We chose to replicate MS COCO's data generation process, as it is well documented and easy to implement. Data was collected from January to April 2022 through Flickr platform. This dataset contains the results of our data collection process, as follows : 23 text files containing comma separated URLs for each of the 23 geographic zones identified in the UN M49 norm. These text files are named according to the names of the geographic zones they cover. Annotations for 400 images per geographic zones. Those annotations are COCO-style, and inform on the presence or absence of 91 categories of objects or concepts on the images. They are shared in a JSON format. Licenses for the 400 annotations per geographic zones, based on the original licenses of the data and specified per image. Those licenses are shared under CSV format. A document explaining the objectives and methodology underlying the data collection, also describing the different components of the dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth):
Format: .pth
Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):
Format: .zip
Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.
Contents:
Images: JPEG format images of micro-FTIR filters.
Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.
Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.
The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Code can be found on the related Github repository.
Facebook
TwitterThis dataset was created by Jiří Raška
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed for object detection tasks and follows the COCO format. It contains 300 images and corresponding annotation files in JSON format. The dataset is split into training, validation, and test sets, ensuring a balanced distribution for model evaluation.
train/ (70% - 210 images)
valid/ (15% - 45 images)
test/ (15% - 45 images)
Images in JPEG/PNG format.
A corresponding _annotations.coco.json file that includes bounding box annotations.
The dataset has undergone several preprocessing and augmentation steps to enhance model generalization:
Auto-orientation applied
Resized to 640x640 pixels (stretched)
Flip: Horizontal flipping
Crop: 0% minimum zoom, 5% maximum zoom
Rotation: Between -5° and +5°
Saturation: Adjusted between -4% and +4%
Brightness: Adjusted between -10% and +10%
Blur: Up to 0px
Noise: Up to 0.1% of pixels
Bounding Box Augmentations:
Flipping, cropping, rotation, brightness adjustments, blur, and noise applied accordingly to maintain annotation consistency.
The dataset follows the COCO (Common Objects in Context) format, which includes:
images section: Contains image metadata such as filename, width, and height.
annotations section: Includes bounding boxes, category IDs, and segmentation masks (if applicable).
categories section: Defines class labels.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data abstract:
The YogDATA dataset contains images from an industrial laboratory production line when it is functioned to quality yogurts. The case-study for the recognition of yogurt cups requires training of Mask R-CNN and YOLO v5.0 models with a set of corresponding images. Thus, it is important to collect the corresponding images to train and evaluate the class. Specifically, the YogDATA dataset includes the same labeled data for Mask R-CNN (coco format) and YOLO models. For the YOLO architecture, training and validation datsets include sets of images in jpg format and their annotations in txt file format. For the Mask R-CNN architecture, the annotation of the same sets of images are included in json file format (80% of images and annotations of each subset are in training set and 20% of images of each subset are in test set.)
Paper abstract:
The explosion of the digitisation of the traditional industrial processes and procedures is consolidating a positive impact on modern society by offering a critical contribution to its economic development. In particular, the dairy sector consists of various processes, which are very demanding and thorough. It is crucial to leverage modern automation tools and through-engineering solutions to increase their efficiency and continuously meet challenging standards. Towards this end, in this work, an intelligent algorithm based on machine vision and artificial intelligence, which identifies dairy products within production lines, is presented. Furthermore, in order to train and validate the model, the YogDATA dataset was created that includes yogurt cups within a production line. Specifically, we evaluate two deep learning models (Mask R-CNN and YOLO v5.0) to recognise and detect each yogurt cup in a production line, in order to automate the packaging processes of the products. According to our results, the performance precision of the two models is similar, estimating its at 99\%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
DoPose (Dortmund Pose)is a dataset of highly cluttered and closely stacked objects. The dataset is saved in the BOP format. The dataset includes RGB images, Depth images, 6D Pose of objects, segmentation mask (all and visible), COCO Json annotation, camera transformations, and 3D model of all objects. The dataset contains 2 different types of scenes (table and bin). Each scene contains different view angles. For the bin scenes, the data contains 183 scenes with 2150 image views. In those 183 scenes 35 scenes contain 2 views, 20 contains 3 views and 128 contains 16 views. And for table scenes, the data contains 118 scenes with 1175 image views. in Those 118 scenes, 20 scenes contain 3 views, 50 scenes with 6 images, and 48 scenes with 17 images. So in total, our data contains 301 scenes and 3325 view images. Most of the scenes contain mixed objects. The dataset contains 19 objects in total.
For more info about the dataset content and collection process please refer to our Arxiv preprint
If you have any questions about the dataset, please contact anas.gouda@tu-dortmund.de
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
I wanted to train a custom YOLO object detection model, but the MS-COCO dataset was not in a good format. So I parsed the instances json files in the MS-COCO annotations and processed the dataset to be a YOLO friendly format.
I downloaded the dataset from COCO webste. You can download any split you need from the COCO dataset website
Directory info: 1. test: Only contains the test images 2. train: Has two sub folders, images - contains the training images, labels - contains the training labels in a .txt file for each train image 3. val: Has two sub folders, images - contains the validation images, labels - contains the validation labels in a .txt file for each validation image
I do not own the dataset in any way. I merely parsed the dataset to a be in a ready to train YOLO format. Download the original dataset from the COCO webste
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.
This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.
https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">
The dataset contains the following:
| Set | Images | Annotations |
|---|---|---|
| Train | 1808 | 3048 |
| Validate | 490 | 747 |
| Test | 254 | 411 |
| Total | 2552 | 4206 |
The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.
Download the data here: sarnet.zip
Or follow these steps
# download the dataset
wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
# extract the files
unzip sarnet.zip
***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.
Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb
Source code for the paper is located here: SaRNet_train_test.ipynb
@misc{thoreau2021sarnet,
title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery},
author={Michael Thoreau and Frazer Wilson},
year={2021},
eprint={2107.12469},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.
Facebook
Twitterhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.
------------------
./actions/speaking_status:
./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status
The processed annotations consist of:
./speaking: The first row contains person IDs matching the sensor IDs,
The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).
./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.
To load these files with pandas: pd.read_csv(p, index_col=False)
./raw-covfee.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
--------------------
./pose:
./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints
To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))
The skeleton structure (limbs) is contained within each file in:
f['categories'][0]['skeleton']
and keypoint names at:
f['categories'][0]['keypoints']
./raw-covfee.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
---------------------
./f_formations:
seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.
First column: time stamp
Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.
phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone
Facebook
Twitterhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
./actions/speaking_status: ./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status The processed annotations consist of: ./speaking: The first row contains person IDs matching the sensor IDs, The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames). ./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation. To load these files with pandas: pd.read_csv(p, index_col=False)
./raw-covfee.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
./pose: ./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json')) The skeleton structure (limbs) is contained within each file in: f['categories'][0]['skeleton'] and keypoint names at: f['categories'][0]['keypoints'] ./raw-covfee.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
./f_formations: seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10). seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10). Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8. First column: time stamp Second column: "()" delineates groups, "" delineates subjects, cam X indicates the best camera view for which a particular group exists.
phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train, val, and test, located under the images/ directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train, val, and test, located under the labels/ directory. - Additional metadata: - counts.txt: Summary of label distributions. - Cache files (train.cache, val.cache, test.cache) for efficient dataset loading.- Metadata: - classes.txt: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json - val_annotations.json - test_annotations.json- Configuration File: - EMVSD.yaml: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Object Detection for Olfactory References (ODOR) Dataset Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes. Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories. It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas. Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception. How to use To download the dataset images, run the download_imgs.py script in the subfolder. The images will be downloaded to the imgs folder. The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year. For the sake of license compliance, we do not publish the images directly (although most of the images are public domain). Instead, we provide links to their source collections in the metadata file (meta.csv) and a python script to download the artwork images (download_images.py). The mapping between the images array of the annotations.json and the metadata.csv file can be accomplished via the file_name attribute of the elements of the images array and the unique File Name column of the metadata.csv file, respectively.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Odeuropa Dataset of Olfactory Objects This dataset is released as part of the Odeuropa project. The annotations are identical to the training set of the ICPR2022-ODOR Challenge. It contains bounding box annotations for smell-active objects in historical artworks gathered from various digital connections. The smell-active objects annotated in the dataset either carry smells themselves or hint at the presence of smells. The dataset provides 15823 bounding boxes on 2192 artworks in 87 object categories. An additional csv file contains further image-level metadata such as artist, collection, or year of creation. How to use Due to licensing issues, we cannot provide the images directly, but instead provide a collection of links and a download script. To get the images, just run the download_imgs.py script which loads the images using the links from the metadata.csv file. The downloaded images can then be found in the images subfolder. The bounding box annotations can be found in the annotations.json. The annotations follow the COCO JSON format, the definition is available here. The mapping between the images array of the annotations.json and the metadata.csv file can be accomplished via the file_name attribute of the elements of the images array and the unique File Name column of the metadata.csv file, respectively. Additional image-level metadata is available in the metadata.csv file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with categorized axis-aligned bounding boxes (BBs) for vehicle detection from a high-altitude bird’s-eye view (BeV) perspective. Captured over Songdo International Business District, South Korea, this dataset consists of 5,419 annotated video frames, featuring approximately 300,000 vehicle instances categorized into four classes:
This dataset can serve as a benchmark for aerial vehicle detection, supporting research and real-world applications in intelligent transportation systems, traffic monitoring, and aerial vision-based mobility analytics. It was developed in the context of a multi-drone experiment aimed at enhancing geo-referenced vehicle trajectory extraction.
📌 Citation: If you use this dataset in your work, kindly acknowledge it by citing the following article:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205.
🔗 Related dataset: For precisely georeferenced vehicle trajectories extracted from the same large-scale multi-drone experiment, see Songdo Traffic: 10.5281/zenodo.13828384.
Publicly available datasets for aerial vehicle detection often exhibit limitations such as:
To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.
The dataset is randomly split into training (80%) and test (20%) subsets:
| Subset | Images | Car | Bus | Truck | Motorcycle | Total Vehicles |
| Train | 4,335 | 195,539 | 7,030 | 11,779 | 2,963 | 217,311 |
| Test | 1,084 | 49,508 | 1,759 | 3,052 | 805 | 55,124 |
A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.
The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.
More details on the experimental setup and data processing pipeline are available in [1].
Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.
Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:
{
"images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
"annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
"categories": [
{"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
{"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
]
}
0 0.52 0.63 0.10 0.05 # Car bounding box
2 0.25 0.40 0.15 0.08 # Truck bounding box
The dataset is provided as two compressed archives:
1. Training Data (train.zip, 12.91 GB)
train/
│── coco_annotations.json # COCO format
│── images/
│ ├── 0001.jpg
│ ├── ...
│── labels/
│ ├── 0001.txt # YOLO format
│ ├── 0001.xml # Pascal VOC format
│ ├── ...
2. Testing Data (test.zip, 3.22 GB)
test/
│── coco_annotations.json
│── images/
│ ├── 00027.jpg
│ ├── ...
│── labels/
│ ├── 00027.txt
│ ├── 00027.xml
│ ├── ...
README.md – Dataset documentation (this description)LICENSE.txt – Creative Commons Attribution 4.0 Licensenames.txt – Class names (one per line)data.yaml – Example YOLO configuration file for training/testingIn addition to the funding sources listed in the metadata, the creators express their gratitude to Artem Vasilev for his dedicated efforts in data annotation. We also thank the research teams of Prof. Simon Oh (Korea University) and Prof. Minju Park (Hannam University) for their assistance during the data collection campaign, including the provision of drone equipment and student support.
Preferred Citation: If you use Songdo Vision for any purpose, whether academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying article [1]:
Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205
BibTeX entry:
@article{fonod2025advanced, title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery}, author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
journal = {Transportation Research Part C: Emerging
Facebook
TwitterThis dataset was created by deepanshu