This dataset was created by Ari
The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides annotated very-high-resolution satellite RGB images extracted from Google Earth to train deep learning models to perform instance segmentation of Juniperus communis L. and Juniperus sabina L. shrubs. All images are from the high mountain of Sierra Nevada in Spain. The dataset contains 810 images (.jpg) of size 224x224 pixels. We also provide partitioning of the data into Train (567 images), Test (162 images), and Validation (81 images) subsets. Their annotations are provided in three different .json files following the COCO annotation format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth
):
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip
):
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
neuralNetWeights_V3.pth
file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip
should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TACO is a growing image dataset of trash in the wild. It contains segmented images of litter taken under diverse environments: woods, roads and beaches. These images are manually labeled according to an hierarchical taxonomy to train and evaluate object detection algorithms. Annotations are provided in a similar format to COCO dataset.
https://raw.githubusercontent.com/wiki/pedropro/TACO/images/teaser.gif" alt="Gif of the model running inference">
https://raw.githubusercontent.com/wiki/pedropro/TACO/images/2.png" alt="Example Image #2 from the Dataset">
https://raw.githubusercontent.com/wiki/pedropro/TACO/images/5.png" alt="Example Image #5 from the Dataset">
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train
, val
, and test
, located under the images/
directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train
, val
, and test
, located under the labels/
directory. - Additional metadata: - counts.txt
: Summary of label distributions. - Cache files (train.cache
, val.cache
, test.cache
) for efficient dataset loading.- Metadata: - classes.txt
: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json
- val_annotations.json
- test_annotations.json
- Configuration File: - EMVSD.yaml
: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Real-world dataset of ~400 images of cuboid-shaped parcels with full 2D and 3D annotations in the COCO format.
Relevant computer vision tasks:
For details, see our paper and project page.
If you use this resource for scientific research, please consider citing
@inproceedings{naumannScrapeCutPasteLearn2022,
title = {Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics},
author = {Naumann, Alexander and Hertlein, Felix and Zhou, Benchun and Dörr, Laura and Furmans, Kai},
booktitle = {{{IEEE Conference}} on {{Machine Learning}} and Applications ({{ICMLA}})},
date = 2022
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of five subsets with annotated images in COCO format, designed for object detection and tracking plant growth: 1. Cucumber_Train Dataset (for Faster R-CNN) - Includes training, validation, and test images of cucumbers from different angles. - Annotations: Bounding boxes in COCO format for object detection tasks.
Annotations: Bounding boxes in COCO format.
Pepper Dataset
Contains images of pepper plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
Cannabis Dataset
Contains images of cannabis plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
Cucumber Dataset
Contains images of cucumber plants for 24 hours at hourly intervals from a fixed angle.
Annotations: Bounding boxes in COCO format.
This dataset supports training and evaluation of object detection models across diverse crops.
TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches. These images are manually labelled and segmented according to a hierarchical taxonomy to train and evaluate object detection algorithms. The annotations are provided in COCO format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here.
Dataset Description
ActiveHuman was generated using Unity's Perception package.
It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).
The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.
Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.
Folder configuration
The dataset consists of 3 folders:
Essential Terminology
Dataset Data
The dataset includes 4 types of JSON annotation files files:
Most Labelers generate different annotation specifications in the spec key-value pair:
Each Labeler generates different annotation specifications in the values key-value pair:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is a collection of raw and annotated Multispectral (MS) images acquired in a heterogenous agricultural environment with MicaSense RedEdge-M camera. The spectra particularly Green, Blue, Red, Red Edge and Near Infrared (NIR) were acquired at sub-metre level.. The MS images were labelled manually using VIA and automatically using Grounding DINO in combination with Segment Anything Model. The segmentation masks obtained using these two annotation techniqes over as well as the source code to perform necessary image processing operations are provided in the repository. The images are focussed over Horseradish (Raphanus Raphanistrum) infestations in Triticum Aestivum (wheat) crops.
The nomenclature of sequecncing and naming images and annotations has been in this format: IMG_
This dataset 'RafanoSet'is categorized in 6 directories namely 'Raw Images', 'Manual Annotations', 'Automated Annotations', 'Binary Masks - Manual', 'Binary Masks - Automated' and 'Codes'. The sub-directory 'Raw Images' consists of manually acquired 85 images in .PNG format. over 17 different scenes. The sub-directory 'Manual Annotations' consists of annotation file 'region_data' in COCO segmentation format. The sub-directory 'Automated Annotations' consists of 80 automatically annotated images in .JPG format and 80 .XML files in Pascal VOC annotation format.
The scientific framework of image acquisition and annotations are explained in the Data in Brief paper which is the course of peer review. This is just a prerequisite to the data article. Field experimentation roles:
The image acquisition was performed by Mariano Crimaldi, a researcher, on behalf of Department of Agriculture and the hosting institution University of Naples Federico II, Italy.
Shubham Rana has been the curator and analyst for the data under the supervision of his PhD supervisor Prof. Salvatore Gerbino. They are affiliated with Department of Engineering, University of Campania 'Luigi Vanvitelli'.
Domenico Barretta, Department of Engineering has been associated in consulting and brainstorming role particularly with data validation, annotation management and litmus testing of the datasets.
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.
------------------
./actions/speaking_status:
./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status
The processed annotations consist of:
./speaking: The first row contains person IDs matching the sensor IDs,
The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).
./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.
To load these files with pandas: pd.read_csv(p, index_col=False)
./raw.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
--------------------
./pose:
./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints
To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))
The skeleton structure (limbs) is contained within each file in:
f['categories'][0]['skeleton']
and keypoint names at:
f['categories'][0]['keypoints']
./raw.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
---------------------
./f_formations:
seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.
First column: time stamp
Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.
phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Cell microscopy images with cell and nucleus segmentations in COCO annotation format
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.
In this repository, we provide:
66 Full HD video clips (total size: 5.5 GB)
126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)
3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):
annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood
annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.
annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.
The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:
More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973
The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval
See also http://aimh.isti.cnr.it/dataset/MOBDrone
Citing the MOBDrone
The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form.
Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people
@inproceedings{MOBDrone2021, title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue}, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing}, year={2021} }
and this Zenodo Dataset
@dataset{donato_cafarelli_2022_5996890, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, title = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}}, month = feb, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5996890}, url = {https://doi.org/10.5281/zenodo.5996890} }
Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.
Contact Information
If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it
Acknowledgements
This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.
Description:
This dataset has been specifically curated for cow pose estimation, designed to enhance animal behavior analysis and monitoring through computer vision techniques. The dataset is annotated with 12 keypoints on the cow’s body, enabling precise tracking of body movements and posture. It is structured in the COCO format, making it compatible with popular deep learning models like YOLOv8, OpenPose, and others designed for object detection and keypoint estimation tasks.
Applications:
This dataset is ideal for agricultural tech solutions, veterinary care, and animal behavior research. It can be used in various use cases such as health monitoring, activity tracking, and early disease detection in cattle. Accurate pose estimation can also assist in optimizing livestock management by understanding animal movement patterns and detecting anomalies in their gait or behavior.
Download Dataset
Keypoint Annotations:
The dataset includes the following 12 keypoints, strategically marked to represent significant anatomical features of cows:
Nose: Essential for head orientation and overall movement tracking.
Right Eye: Helps in head pose estimation.
Left Eye: Complements the right eye for accurate head direction.
Neck (side): Marks the side of the neck, key for understanding head and body coordination.
Left Front Hoof: Tracks the front left leg movement.
Right Front Hoof: Tracks the front right leg movement.
Left Back Hoof: Important for understanding rear leg motion.
Right Back Hoof: Completes the leg movement tracking for both sides.
Backbone (side): Vital for posture and overall body orientation analysis.
Tail Root: Used for tracking tail movements and posture shifts.
Backpose Center (near tail’s midpoint): Marks the midpoint of the back, crucial for body stability and movement analysis.
Stomach (center of side pose): Helps in identifying body alignment and weight distribution.
Dataset Format:
The data is structure in the COCO format, with annotations that include image coordinates for each keypoint. This format is highly suitable for integration into popular deep learning frameworks. Additionally, the dataset includes metadata like bounding boxes, image sizes, and segmentation masks to provide detail context for each cow in an image.
Compatibility:
This dataset is optimize for use with cutting-edge pose estimation models such as YOLOv8 and other keypoint detection models like DeepLabCut and HRNet, enabling efficient training and inference for cow pose tracking. It can be seamlessly integrate into existing machine learning pipelines for both real-time and post-processed analysis.
This dataset is sourced from Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Abstract
This paper presents SubPipe, an underwater dataset for SLAM, object detection, and image segmentation. SubPipe has been recorded using a lightweight autonomous underwater vehicle (LAUV), operated by OceanScan MST, and carrying a sensor suite including two cameras, a side-scan sonar, and an inertial navigation system, among other sensors. The AUV has been deployed in a pipeline inspection environment with a submarine pipe partially covered by sand. The AUV's pose ground truth is estimated from the navigation sensors. The side-scan sonar and RGB images include object detection and segmentation annotations, respectively. State-of-the-art segmentation, object detection, and SLAM methods are benchmarked on SubPipe to demonstrate the dataset's challenges and opportunities for leveraging computer vision algorithms.To the authors' knowledge, this is the first annotated underwater dataset providing a real pipeline inspection scenario. The dataset and experiments are publicly available online.
On Zenodo we provide three versions for SubPipe. One is the full version (SubPipe.zip, ~80GB unzipped) and two subsamples: SubPipeMini.zip, ~12GB unzipped and SubPipeMini2.zip, ~16GB unzipped. Both subsamples are only parts of the entire dataset (SubPipe.zip). SubPipeMini is a subset, containing semantic segmentation data, and it has interesting camera data of the underwater pipeline. On the other hand, SubPipeMini2 is mainly focused on underwater side-scan sonar images of the seabed including ground truth object detection bounding boxes of the pipeline.
For (re-)using/publishing SubPipe, please include the following copyright text:
SubPipe is a public dataset of a submarine outfall pipeline, property of Oceanscan-MST. This dataset was acquired with a Light Autonomous Underwater Vehicle by Oceanscan-MST, within the scope of Challenge Camp 1 of the H2020 REMARO project.
More information about OceanScan-MST can be found at this link.
Cam0 — GoPro Hero 10
Camera parameters:
Resolution: 1520×2704
fx = 1612.36
fy = 1622.56
cx = 1365.43
cy = 741.27
k1,k2, p1, p2 = [−0.247, 0.0869, −0.006, 0.001]
Side-scan Sonars
Each sonar image was created after 20 “ping” (after every 20 new lines) which corresponds to approx. ~1 image / second.
Regarding the object detection annotations, we provide both COCO and YOLO formats for each annotation. A single COCO annotation file is provided per each chunk and per each frequency (low frequency vs. high frequency), whereas the YOLO annotations are provided for each SSS image file.
Metadata about the side-scan sonar images contained in this dataset:
Images for object detection
5000
LF image size: 2500 × 500
5030
HF Image size 5000 × 500
Total number of images: 10030
Annotations
3163
3172
Total number of annotations: 6335
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Transverse Cirrus Bands (TCB) Dataset
Dataset Overview
This dataset contains manually annotated satellite imagery of Transverse Cirrus Bands (TCBs), a type of cloud formation often associated with atmospheric turbulence. The dataset is formatted for object detection tasks using the YOLO and COCO annotation formats, making it suitable for training deep learning models for automated TCB detection.
Data Collection
Source: NASA-IMPACT Data Share Satellite Sensors:… See the full description on the dataset page: https://huggingface.co/datasets/viknesh1211/NASA_IMPACT_TCB_OBJ.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for our paper "WormSwin: Instance Segmentation of C. elegans using Vision Transformer".
This publication is divided into three parts:
The CSB-1 Dataset consists of frames extracted from videos of Caenorhabditis elegans (C. elegans) annotated with binary masks. Each C. elegans is separately annotated, providing accurate annotations even for overlapping instances. All annotations are provided in binary mask format and as COCO Annotation JSON files (see COCO website).
The videos are named after the following pattern:
<"worm age in hours"_"mutation"_"irradiated (binary)"_"video index (zero based)">
For mutation the following values are possible:
An example video name would be 24_1_1_2 meaning it shows C. elegans with csb-1 mutation, being 24h old which got irradiated.
Video data was provided by M. Rieckher; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.
The Synthetic Images Dataset was created by cutting out C. elegans (foreground objects) from the CSB-1 Dataset and placing them randomly on background images also taken from the CSB-1 Dataset. Foreground objects were flipped, rotated and slightly blurred before placed on the background images.
The same was done with the binary mask annotations taken from CSB-1 Dataset so that they match the foreground objects in the synthetic images. Additionally, we added rings of random color, size, thickness and position to the background images to simulate petri-dish edges.
This synthetic dataset was generated by M. Deserno.
The Mating Dataset (MD) consists of 450 grayscale image patches of 1,012 x 1,012 px showing C. elegans with high overlap, crawling on a petri-dish.
We took the patches from a 10 min. long video of size 3,036 x 3,036 px. The video was downsampled from 25 fps to 5 fps before selecting 50 random frames for annotating and patching.
Like the other datasets, worms were annotated with binary masks and annotations are provided as COCO Annotation JSON files.
The video data was provided by X.-L. Chu; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.
Further details about the datasets can be found in our paper.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data abstract: The YogDATA dataset contains images from an industrial laboratory production line when it is functioned to quality yogurts. The case-study for the recognition of yogurt cups requires training of Mask R-CNN and YOLO v5.0 models with a set of corresponding images. Thus, it is important to collect the corresponding images to train and evaluate the class. Specifically, the YogDATA dataset includes the same labeled data for Mask R-CNN (coco format) and YOLO models. For the YOLO architecture, training and validation datsets include sets of images in jpg format and their annotations in txt file format. For the Mask R-CNN architecture, the annotation of the same sets of images are included in json file format (80% of images and annotations of each subset are in training set and 20% of images of each subset are in test set.)
Paper abstract: The explosion of the digitisation of the traditional industrial processes and procedures is consolidating a positive impact on modern society by offering a critical contribution to its economic development. In particular, the dairy sector consists of various processes, which are very demanding and thorough. It is crucial to leverage modern automation tools and through-engineering solutions to increase their efficiency and continuously meet challenging standards. Towards this end, in this work, an intelligent algorithm based on machine vision and artificial intelligence, which identifies dairy products within production lines, is presented. Furthermore, in order to train and validate the model, the YogDATA dataset was created that includes yogurt cups within a production line. Specifically, we evaluate two deep learning models (Mask R-CNN and YOLO v5.0) to recognise and detect each yogurt cup in a production line, in order to automate the packaging processes of the products. According to our results, the performance precision of the two models is similar, estimating its at 99\%.
A RGB-D dataset converted from SUN-RGBD into COCO-style instance segmentation format. To transform SUN-RGBD into an instance segmentation benchmark (i.e., SUN-RGBDIS), we employed a pipeline similar to that of NYUDv2-IS. We selected 17 categories from the original 37 classes, carefully omitting non-instance categories like ceilings and walls. Images lacking any identifiable object instances were filtered out to maintain dataset relevance for instance segmentation tasks. We systematically convert segmentation annotations into COCO format, generating precise bounding boxes, instance masks, and object attributes.
This dataset was created by Ari