88 datasets found
  1. Food Object Detection Dataset

    • kaggle.com
    zip
    Updated Jan 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    deepanshu (2022). Food Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/deepking/food-object-detection-dataset
    Explore at:
    zip(73580047 bytes)Available download formats
    Dataset updated
    Jan 23, 2022
    Authors
    deepanshu
    Description

    Dataset

    This dataset was created by deepanshu

    Contents

  2. Sartorius COCO Format Dataset

    • kaggle.com
    zip
    Updated Oct 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ari (2021). Sartorius COCO Format Dataset [Dataset]. https://www.kaggle.com/datasets/vexxingbanana/sartorius-coco-format-dataset
    Explore at:
    zip(9798602 bytes)Available download formats
    Dataset updated
    Oct 28, 2021
    Authors
    Ari
    Description

    Dataset

    This dataset was created by Ari

    Contents

  3. I

    dataset_coco

    • app.ikomia.ai
    Updated Dec 19, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ikomia (2023). dataset_coco [Dataset]. https://app.ikomia.ai/hub/algorithms/dataset_coco/
    Explore at:
    Dataset updated
    Dec 19, 2023
    Dataset authored and provided by
    Ikomia
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Load COCO 2017 dataset Load any dataset in COCO format to Ikomia format. Then, any training algorithms from the Ikomia marketplace can be connected to this converter....

  4. Road Lane Instance Segmentation

    • kaggle.com
    zip
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sovit Ranjan Rath (2023). Road Lane Instance Segmentation [Dataset]. https://www.kaggle.com/datasets/sovitrath/road-lane-instance-segmentation/code
    Explore at:
    zip(48401668 bytes)Available download formats
    Dataset updated
    Jul 1, 2023
    Authors
    Sovit Ranjan Rath
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    A dataset containing images of car dashcam view with instance segmentation samples of road lanes.

    Classes: * divider-line * dotted-line * double-line * random-line * road-sign-line * solid-line

    Original dataset source => https://universe.roboflow.com/bestgetsbetter/jpj

    License => CC BY 4.0

  5. Barcode Detection Dataset (COCO Format)

    • kaggle.com
    zip
    Updated Sep 27, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hammad Javaid (2025). Barcode Detection Dataset (COCO Format) [Dataset]. https://www.kaggle.com/datasets/hammadjavaid/barcode-detection-dataset-coco-format
    Explore at:
    zip(2563157311 bytes)Available download formats
    Dataset updated
    Sep 27, 2025
    Authors
    Hammad Javaid
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed for barcode detection in images. It combines 10+ publicly available datasets (including Roboflow collections, InventBar, and ParcelBar), carefully merged and deduplicated using an MD5 hashing algorithm to ensure unique images.

    It is suitable for object detection tasks and comes in COCO JSON format, making it compatible with most modern detection frameworks. Total number of images in the dataset are 18,697. This dataset is single class (barcode). The images are kept in original resolution, no resizing was done.

    Dataset Composition: - Train: 13,087 images - Validation: 2,804 images - Test: 2,806 images

    I trained YOLOv11n model and achieved following results: | Metric | Score | | ------------- | ----- | | Precision | 0.970 | | Recall | 0.951 | | mAP@50 | 0.974 | | mAP@50-95 | 0.860 |

  6. D

    COCO-style geographically unbiased image dataset for computer vision...

    • dataverse.ird.fr
    pdf, txt, zip
    Updated Jan 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Theophile Bayet; Theophile Bayet (2023). COCO-style geographically unbiased image dataset for computer vision applications [Dataset]. http://doi.org/10.23708/N2UY4C
    Explore at:
    zip(176316624), zip(218991), pdf(57252), txt(1731), pdf(83345), zip(308454)Available download formats
    Dataset updated
    Jan 13, 2023
    Dataset provided by
    DataSuds
    Authors
    Theophile Bayet; Theophile Bayet
    License

    https://dataverse.ird.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.23708/N2UY4Chttps://dataverse.ird.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.23708/N2UY4C

    Time period covered
    Jan 1, 2022 - Apr 1, 2022
    Description

    There are already a lot of datasets linked to computer vision tasks (Imagenet, MS COCO, Pascal VOC, OpenImages, and numerous others), but they all suffer from important bias. One bias of significance for us is the data origin: most datasets are composed of data coming from developed countries. Facing this situation, and the need of data with local context in developing countries, we try here to adapt common data generation process to inclusive data, meaning data drawn from locations and cultural context that are unseen or poorly represented. We chose to replicate MS COCO's data generation process, as it is well documented and easy to implement. Data was collected from January to April 2022 through Flickr platform. This dataset contains the results of our data collection process, as follows : 23 text files containing comma separated URLs for each of the 23 geographic zones identified in the UN M49 norm. These text files are named according to the names of the geographic zones they cover. Annotations for 400 images per geographic zones. Those annotations are COCO-style, and inform on the presence or absence of 91 categories of objects or concepts on the images. They are shared in a JSON format. Licenses for the 400 annotations per geographic zones, based on the original licenses of the data and specified per image. Those licenses are shared under CSV format. A document explaining the objectives and methodology underlying the data collection, also describing the different components of the dataset.

  7. Z

    COCO dataset and neural network weights for micro-FTIR particle detection on...

    • data.niaid.nih.gov
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schowing, Thibault (2024). COCO dataset and neural network weights for micro-FTIR particle detection on filters. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10839526
    Explore at:
    Dataset updated
    Aug 13, 2024
    Dataset provided by
    HES-SO Vaud
    Authors
    Schowing, Thibault
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.

    More information about the project here.

    Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.

    Contents:

    Weights File (neuralNetWeights_V3.pth):

    Format: .pth

    Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.

    Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):

    Format: .zip

    Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.

    Contents:

    Images: JPEG format images of micro-FTIR filters.

    Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.

    Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.

    Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.

    Usage Notes:

    The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.

    The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.

    Code can be found on the related Github repository.

  8. Satellite Small Objects Dataset, COCO JSON format

    • kaggle.com
    zip
    Updated Oct 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jiří Raška (2024). Satellite Small Objects Dataset, COCO JSON format [Dataset]. https://www.kaggle.com/datasets/jraska1/satellite-small-objects-dataset-coco-json-format
    Explore at:
    zip(433112583 bytes)Available download formats
    Dataset updated
    Oct 23, 2024
    Authors
    Jiří Raška
    Description

    Dataset

    This dataset was created by Jiří Raška

    Contents

  9. Person-Collecting-Waste COCO Dataset

    • kaggle.com
    zip
    Updated Mar 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ashutosh Sharma (2025). Person-Collecting-Waste COCO Dataset [Dataset]. https://www.kaggle.com/datasets/ashu009/person-collecting-waste-coco-dataset/discussion
    Explore at:
    zip(19854259 bytes)Available download formats
    Dataset updated
    Mar 31, 2025
    Authors
    Ashutosh Sharma
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset: COCO-Formatted Object Detection Dataset

    Overview

    This dataset is designed for object detection tasks and follows the COCO format. It contains 300 images and corresponding annotation files in JSON format. The dataset is split into training, validation, and test sets, ensuring a balanced distribution for model evaluation.

    Dataset Structure

    The dataset is organized into three main folders:

    train/ (70% - 210 images)

    valid/ (15% - 45 images)

    test/ (15% - 45 images)

    Each folder contains:

    Images in JPEG/PNG format.

    A corresponding _annotations.coco.json file that includes bounding box annotations.

    Preprocessing & Augmentations

    The dataset has undergone several preprocessing and augmentation steps to enhance model generalization:

    Image Preprocessing:

    Auto-orientation applied

    Resized to 640x640 pixels (stretched)

    Augmentation Techniques:

    Flip: Horizontal flipping

    Crop: 0% minimum zoom, 5% maximum zoom

    Rotation: Between -5° and +5°

    Saturation: Adjusted between -4% and +4%

    Brightness: Adjusted between -10% and +10%

    Blur: Up to 0px

    Noise: Up to 0.1% of pixels

    Bounding Box Augmentations:

    Flipping, cropping, rotation, brightness adjustments, blur, and noise applied accordingly to maintain annotation consistency.

    Annotation Format

    The dataset follows the COCO (Common Objects in Context) format, which includes:

    images section: Contains image metadata such as filename, width, and height.

    annotations section: Includes bounding boxes, category IDs, and segmentation masks (if applicable).

    categories section: Defines class labels.

  10. YOGData: Labelled data (YOLO and Mask R-CNN) for yogurt cup identification...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 29, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Symeon Symeonidis; Vasiliki Balaska; Dimitrios Tsilis; Fotis K. Konstantinidis; Fotis K. Konstantinidis; Symeon Symeonidis; Vasiliki Balaska; Dimitrios Tsilis (2022). YOGData: Labelled data (YOLO and Mask R-CNN) for yogurt cup identification within production lines [Dataset]. http://doi.org/10.5281/zenodo.6773531
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jun 29, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Symeon Symeonidis; Vasiliki Balaska; Dimitrios Tsilis; Fotis K. Konstantinidis; Fotis K. Konstantinidis; Symeon Symeonidis; Vasiliki Balaska; Dimitrios Tsilis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data abstract:
    The YogDATA dataset contains images from an industrial laboratory production line when it is functioned to quality yogurts. The case-study for the recognition of yogurt cups requires training of Mask R-CNN and YOLO v5.0 models with a set of corresponding images. Thus, it is important to collect the corresponding images to train and evaluate the class. Specifically, the YogDATA dataset includes the same labeled data for Mask R-CNN (coco format) and YOLO models. For the YOLO architecture, training and validation datsets include sets of images in jpg format and their annotations in txt file format. For the Mask R-CNN architecture, the annotation of the same sets of images are included in json file format (80% of images and annotations of each subset are in training set and 20% of images of each subset are in test set.)

    Paper abstract:
    The explosion of the digitisation of the traditional industrial processes and procedures is consolidating a positive impact on modern society by offering a critical contribution to its economic development. In particular, the dairy sector consists of various processes, which are very demanding and thorough. It is crucial to leverage modern automation tools and through-engineering solutions to increase their efficiency and continuously meet challenging standards. Towards this end, in this work, an intelligent algorithm based on machine vision and artificial intelligence, which identifies dairy products within production lines, is presented. Furthermore, in order to train and validate the model, the YogDATA dataset was created that includes yogurt cups within a production line. Specifically, we evaluate two deep learning models (Mask R-CNN and YOLO v5.0) to recognise and detect each yogurt cup in a production line, in order to automate the packaging processes of the products. According to our results, the performance precision of the two models is similar, estimating its at 99\%.

  11. DoPose: dataset for object segmentation and 6D pose estimation

    • zenodo.org
    zip
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anas Gouda; Anas Gouda; Ashwin Nedungadi; Anay Ghatpande; Christopher Reining; Christopher Reining; Hazem Youssef; Hazem Youssef; Moritz Roidl; Ashwin Nedungadi; Anay Ghatpande; Moritz Roidl (2022). DoPose: dataset for object segmentation and 6D pose estimation [Dataset]. http://doi.org/10.5281/zenodo.6103779
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 11, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anas Gouda; Anas Gouda; Ashwin Nedungadi; Anay Ghatpande; Christopher Reining; Christopher Reining; Hazem Youssef; Hazem Youssef; Moritz Roidl; Ashwin Nedungadi; Anay Ghatpande; Moritz Roidl
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    DoPose (Dortmund Pose)is a dataset of highly cluttered and closely stacked objects. The dataset is saved in the BOP format. The dataset includes RGB images, Depth images, 6D Pose of objects, segmentation mask (all and visible), COCO Json annotation, camera transformations, and 3D model of all objects. The dataset contains 2 different types of scenes (table and bin). Each scene contains different view angles. For the bin scenes, the data contains 183 scenes with 2150 image views. In those 183 scenes 35 scenes contain 2 views, 20 contains 3 views and 128 contains 16 views. And for table scenes, the data contains 118 scenes with 1175 image views. in Those 118 scenes, 20 scenes contain 3 views, 50 scenes with 6 images, and 48 scenes with 17 images. So in total, our data contains 301 scenes and 3325 view images. Most of the scenes contain mixed objects. The dataset contains 19 objects in total.

    For more info about the dataset content and collection process please refer to our Arxiv preprint

    If you have any questions about the dataset, please contact anas.gouda@tu-dortmund.de

  12. MS-COCO 2017 dataset - YOLO format

    • kaggle.com
    zip
    Updated Nov 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahariar Alif (2025). MS-COCO 2017 dataset - YOLO format [Dataset]. https://www.kaggle.com/datasets/alifshahariar/ms-coco-2017-dataset-yolo-format
    Explore at:
    zip(26509567635 bytes)Available download formats
    Dataset updated
    Nov 1, 2025
    Authors
    Shahariar Alif
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    I wanted to train a custom YOLO object detection model, but the MS-COCO dataset was not in a good format. So I parsed the instances json files in the MS-COCO annotations and processed the dataset to be a YOLO friendly format.

    I downloaded the dataset from COCO webste. You can download any split you need from the COCO dataset website

    Directory info: 1. test: Only contains the test images 2. train: Has two sub folders, images - contains the training images, labels - contains the training labels in a .txt file for each train image 3. val: Has two sub folders, images - contains the validation images, labels - contains the validation labels in a .txt file for each validation image

    I do not own the dataset in any way. I merely parsed the dataset to a be in a ready to train YOLO format. Download the original dataset from the COCO webste

  13. Sarnet Search And Rescue Dataset

    • universe.roboflow.com
    zip
    Updated Jun 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow Public (2022). Sarnet Search And Rescue Dataset [Dataset]. https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/dataset/5
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 16, 2022
    Dataset provided by
    Roboflowhttps://roboflow.com/
    Authors
    Roboflow Public
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    SaR Bounding Boxes
    Description

    Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.

    Satellite Imagery for Search And Rescue Dataset - ArXiv

    This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.

    https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">

    The dataset contains the following:

    SetImagesAnnotations
    Train18083048
    Validate490747
    Test254411
    Total25524206

    The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.

    Getting hold of the Data

    Download the data here: sarnet.zip

    Or follow these steps

    # download the dataset
    wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
    
    # extract the files
    unzip sarnet.zip
    

    ***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.

    Getting started

    Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb

    Source Code for Paper

    Source code for the paper is located here: SaRNet_train_test.ipynb

    Cite this dataset

    @misc{thoreau2021sarnet,
       title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery}, 
       author={Michael Thoreau and Frazer Wilson},
       year={2021},
       eprint={2107.12469},
       archivePrefix={arXiv},
       primaryClass={eess.IV}
    }
    

    Acknowledgment

    The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.

  14. C

    Annotations for ConfLab A Rich Multimodal Multisensor Dataset of...

    • data.4tu.nl
    Updated Jun 9, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Annotations for ConfLab A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild [Dataset]. http://doi.org/10.4121/20017664.v2
    Explore at:
    Dataset updated
    Jun 9, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Description

    This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.

    ------------------

    ./actions/speaking_status:

    ./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status

    The processed annotations consist of:

    ./speaking: The first row contains person IDs matching the sensor IDs,

    The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).

    ./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.

    To load these files with pandas: pd.read_csv(p, index_col=False)


    ./raw-covfee.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

    Annotations were done at 60 fps.

    --------------------

    ./pose:

    ./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints

    To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))

    The skeleton structure (limbs) is contained within each file in:

    f['categories'][0]['skeleton']

    and keypoint names at:

    f['categories'][0]['keypoints']

    ./raw-covfee.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

    Annotations were done at 60 fps.

    ---------------------

    ./f_formations:

    seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).

    seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).

    Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.

    First column: time stamp

    Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.


    phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone

  15. f

    Annotations for ConfLab: A Data Collection Concept, Dataset, and Benchmark...

    • figshare.com
    Updated Oct 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Annotations for ConfLab: A Data Collection Concept, Dataset, and Benchmark for Machine Analysis of Free-Standing Social Interactions in the Wild [Dataset]. http://doi.org/10.4121/20017664.v3
    Explore at:
    Dataset updated
    Oct 10, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Description

    This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.

    ./actions/speaking_status: ./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status The processed annotations consist of: ./speaking: The first row contains person IDs matching the sensor IDs, The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames). ./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation. To load these files with pandas: pd.read_csv(p, index_col=False)

    ./raw-covfee.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

    Annotations were done at 60 fps.

    ./pose: ./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json')) The skeleton structure (limbs) is contained within each file in: f['categories'][0]['skeleton'] and keypoint names at: f['categories'][0]['keypoints'] ./raw-covfee.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

    Annotations were done at 60 fps.

    ./f_formations: seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10). seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10). Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8. First column: time stamp Second column: "()" delineates groups, "" delineates subjects, cam X indicates the best camera view for which a particular group exists.

    phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone

  16. W

    TexBiG

    • anthology.aicmu.ac.cn
    • webis.de
    6885143
    Updated 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volker Rodehorst; Benno Stein (2022). TexBiG [Dataset]. http://doi.org/10.5281/zenodo.6885143
    Explore at:
    6885143Available download formats
    Dataset updated
    2022
    Dataset provided by
    Bauhaus-Universität Weimar
    The Web Technology & Information Systems Network
    Authors
    Volker Rodehorst; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.

  17. Esefjorden Marine Vegetation Segmentation Dataset (EMVSD)

    • figshare.com
    bin
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn Christian Weinbach (2024). Esefjorden Marine Vegetation Segmentation Dataset (EMVSD) [Dataset]. http://doi.org/10.6084/m9.figshare.24072606.v4
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Bjørn Christian Weinbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train, val, and test, located under the images/ directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train, val, and test, located under the labels/ directory. - Additional metadata: - counts.txt: Summary of label distributions. - Cache files (train.cache, val.cache, test.cache) for efficient dataset loading.- Metadata: - classes.txt: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json - val_annotations.json - test_annotations.json- Configuration File: - EMVSD.yaml: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml

  18. The Object Detection for Olfactory References (ODOR) Dataset

    • data.europa.eu
    unknown
    Updated Oct 19, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2023). The Object Detection for Olfactory References (ODOR) Dataset [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-10027116?locale=de
    Explore at:
    unknown(3926)Available download formats
    Dataset updated
    Oct 19, 2023
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Object Detection for Olfactory References (ODOR) Dataset Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes. Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories. It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas. Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception. How to use To download the dataset images, run the download_imgs.py script in the subfolder. The images will be downloaded to the imgs folder. The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year. For the sake of license compliance, we do not publish the images directly (although most of the images are public domain). Instead, we provide links to their source collections in the metadata file (meta.csv) and a python script to download the artwork images (download_images.py). The mapping between the images array of the annotations.json and the metadata.csv file can be accomplished via the file_name attribute of the elements of the images array and the unique File Name column of the metadata.csv file, respectively.

  19. Odeuropa Dataset of Smell-Related Objects

    • data.europa.eu
    unknown
    Updated Jul 3, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2025). Odeuropa Dataset of Smell-Related Objects [Dataset]. https://data.europa.eu/data/datasets/oai-zenodo-org-6362952?locale=el
    Explore at:
    unknown(859886)Available download formats
    Dataset updated
    Jul 3, 2025
    Dataset authored and provided by
    Zenodohttp://zenodo.org/
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Odeuropa Dataset of Olfactory Objects This dataset is released as part of the Odeuropa project. The annotations are identical to the training set of the ICPR2022-ODOR Challenge. It contains bounding box annotations for smell-active objects in historical artworks gathered from various digital connections. The smell-active objects annotated in the dataset either carry smells themselves or hint at the presence of smells. The dataset provides 15823 bounding boxes on 2192 artworks in 87 object categories. An additional csv file contains further image-level metadata such as artist, collection, or year of creation. How to use Due to licensing issues, we cannot provide the images directly, but instead provide a collection of links and a download script. To get the images, just run the download_imgs.py script which loads the images using the links from the metadata.csv file. The downloaded images can then be found in the images subfolder. The bounding box annotations can be found in the annotations.json. The annotations follow the COCO JSON format, the definition is available here. The mapping between the images array of the annotations.json and the metadata.csv file can be accomplished via the file_name attribute of the elements of the images array and the unique File Name column of the metadata.csv file, respectively. Additional image-level metadata is available in the metadata.csv file.

  20. Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a...

    • zenodo.org
    • data-staging.niaid.nih.gov
    bin, txt, zip
    Updated Sep 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Fonod; Robert Fonod; Haechan Cho; Haechan Cho; Hwasoo Yeo; Hwasoo Yeo; Nikolas Geroliminis; Nikolas Geroliminis (2025). Songdo Vision: Vehicle Annotations from High-Altitude BeV Drone Imagery in a Smart City [Dataset]. http://doi.org/10.5281/zenodo.13828408
    Explore at:
    bin, txt, zipAvailable download formats
    Dataset updated
    Sep 10, 2025
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Robert Fonod; Robert Fonod; Haechan Cho; Haechan Cho; Hwasoo Yeo; Hwasoo Yeo; Nikolas Geroliminis; Nikolas Geroliminis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Oct 4, 2022 - Oct 7, 2022
    Area covered
    Songdo-dong
    Description

    Overview

    The Songdo Vision dataset provides high-resolution (4K, 3840×2160 pixels) RGB images annotated with categorized axis-aligned bounding boxes (BBs) for vehicle detection from a high-altitude bird’s-eye view (BeV) perspective. Captured over Songdo International Business District, South Korea, this dataset consists of 5,419 annotated video frames, featuring approximately 300,000 vehicle instances categorized into four classes:

    • Car (including vans and light-duty vehicles)
    • Bus
    • Truck
    • Motorcycle

    This dataset can serve as a benchmark for aerial vehicle detection, supporting research and real-world applications in intelligent transportation systems, traffic monitoring, and aerial vision-based mobility analytics. It was developed in the context of a multi-drone experiment aimed at enhancing geo-referenced vehicle trajectory extraction.

    📌 Citation: If you use this dataset in your work, kindly acknowledge it by citing the following article:

    Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205.

    🔗 Related dataset: For precisely georeferenced vehicle trajectories extracted from the same large-scale multi-drone experiment, see Songdo Traffic: 10.5281/zenodo.13828384.

    Motivation

    Publicly available datasets for aerial vehicle detection often exhibit limitations such as:

    • Non-BeV perspectives with varying angles and distortions
    • Inconsistent annotation quality, with loose or missing bounding boxes
    • Lower-resolution imagery, reducing detection accuracy, particularly for smaller vehicles
    • Lack of annotation detail, especially for motorcycles in dense urban scenes with complex backgrounds

    To address these challenges, Songdo Vision provides high-quality human-annotated bounding boxes, with machine learning assistance used to enhance efficiency and consistency. This ensures accurate and reliable ground truth for training and evaluating detection models.

    Dataset Composition

    The dataset is randomly split into training (80%) and test (20%) subsets:

    SubsetImagesCarBusTruckMotorcycleTotal Vehicles
    Train4,335195,5397,03011,7792,963217,311
    Test1,08449,5081,7593,05280555,124

    A subset of 5,274 frames was randomly sampled from drone video sequences, while an additional 145 frames were carefully selected to represent challenging cases, such as motorcycles at pedestrian crossings, in bicycle lanes, near traffic light poles, and around other distinctive road markers where they may blend into the urban environment.

    Data Collection

    The dataset was collected as part of a collaborative multi-drone experiment conducted by KAIST and EPFL in Songdo, South Korea, from October 4–7, 2022.

    • A fleet of 10 drones monitored 20 busy intersections, executing advanced flight plans to optimize coverage.
    • 4K (3840×2160) RGB video footage was recorded at 29.97 FPS from altitudes of 140–150 meters.
    • Each drone flew 10 sessions per day, covering peak morning and afternoon periods.
    • The experiment resulted in 12TB of 4K raw video data.

    More details on the experimental setup and data processing pipeline are available in [1].

    Bounding Box Annotations & Formats

    Annotations were generated using a semi-automated object detection annotation process in Azure ML Studio, leveraging machine learning-assisted bounding box detection with human verification to ensure precision.

    Each annotated frame includes categorized, axis-aligned bounding boxes, stored in three widely-used formats:

    1. COCO JSON format

    • Single annotation file per dataset subset (i.e., one for training, one for testing).
    • Contains metadata such as image dimensions, bounding box coordinates, and class labels.
    • Example snippet:
    {
     "images": [{"id": 1, "file_name": "0001.jpg", "width": 3840, "height": 2160}],
     "annotations": [{"id": 1, "image_id": 1, "category_id": 2, "bbox": [500, 600, 200, 50], "area": 10000, "iscrowd": 0}],
     "categories": [
      {"id": 1, "name": "car"}, {"id": 2, "name": "bus"},
      {"id": 3, "name": "truck"}, {"id": 4, "name": "motorcycle"}
     ]
    }

    2. YOLO TXT format

    • One annotation file per image, following the format:
    • Bounding box values are normalized to [0,1], with the origin at the top-left corner.
    • Example snippet:
    0 0.52 0.63 0.10 0.05 # Car bounding box
    2 0.25 0.40 0.15 0.08 # Truck bounding box

    3. Pascal VOC XML format

    • One annotation file per image, structured in XML.
    • Contains image properties and absolute pixel coordinates for each bounding box.
    • Example snippet:

    File Structure

    The dataset is provided as two compressed archives:

    1. Training Data (train.zip, 12.91 GB)

    train/
    │── coco_annotations.json # COCO format
    │── images/
    │  ├── 0001.jpg
    │  ├── ...
    │── labels/
    │  ├── 0001.txt # YOLO format
    │  ├── 0001.xml # Pascal VOC format
    │  ├── ...

    2. Testing Data (test.zip, 3.22 GB)

    test/
    │── coco_annotations.json
    │── images/
    │  ├── 00027.jpg
    │  ├── ...
    │── labels/
    │  ├── 00027.txt
    │  ├── 00027.xml
    │  ├── ...

    Additional Files

    • README.md – Dataset documentation (this description)
    • LICENSE.txt – Creative Commons Attribution 4.0 License
    • names.txt – Class names (one per line)
    • data.yaml – Example YOLO configuration file for training/testing

    Acknowledgments

    In addition to the funding sources listed in the metadata, the creators express their gratitude to Artem Vasilev for his dedicated efforts in data annotation. We also thank the research teams of Prof. Simon Oh (Korea University) and Prof. Minju Park (Hannam University) for their assistance during the data collection campaign, including the provision of drone equipment and student support.

    Citation & Attribution

    Preferred Citation: If you use Songdo Vision for any purpose, whether academic research, commercial applications, open-source projects, or benchmarking efforts, please cite our accompanying article [1]:

    Robert Fonod, Haechan Cho, Hwasoo Yeo, Nikolas Geroliminis (2025). Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery, Transportation Research Part C: Emerging Technologies, vol. 178, 105205. DOI: 10.1016/j.trc.2025.105205

    BibTeX entry:

    @article{fonod2025advanced,
     title = {Advanced computer vision for extracting georeferenced vehicle trajectories from drone imagery}, 
     author = {Fonod, Robert and Cho, Haechan and Yeo, Hwasoo and Geroliminis, Nikolas},
    journal = {Transportation Research Part C: Emerging

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
deepanshu (2022). Food Object Detection Dataset [Dataset]. https://www.kaggle.com/datasets/deepking/food-object-detection-dataset
Organization logo

Food Object Detection Dataset

Object Detection Dataset in COCO JSON format , to localize food .

Explore at:
zip(73580047 bytes)Available download formats
Dataset updated
Jan 23, 2022
Authors
deepanshu
Description

Dataset

This dataset was created by deepanshu

Contents

Search
Clear search
Close search
Google apps
Main menu