Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.
COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
I wanted to train a custom YOLO object detection model, but the MS-COCO dataset was not in a good format. So I parsed the instances json files in the MS-COCO annotations and processed the dataset to be a YOLO friendly format.
I downloaded the dataset from COCO webste. You can download any split you need from the COCO dataset website
Directory info: 1. test: Only contains the test images 2. train: Has two sub folders, images - contains the training images, labels - contains the training labels in a .txt file for each train image 3. val: Has two sub folders, images - contains the validation images, labels - contains the validation labels in a .txt file for each validation image
I do not own the dataset in any way. I merely parsed the dataset to a be in a ready to train YOLO format. Download the original dataset from the COCO webste
Facebook
TwitterCOCO is a large-scale object detection, segmentation, and captioning dataset.
Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('coco', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterThis dataset contains all COCO 2017 images and annotations split in training (118287 images) and validation (5000 images).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Load COCO 2017 dataset Load any dataset in COCO format to Ikomia format. Then, any training algorithms from the Ikomia marketplace can be connected to this converter....
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The COCO dataset is a large dataset of labeled images and annotations. It is a popular dataset for machine learning and artificial intelligence research. The dataset consists of 330,000 images and 500,000 object annotations. The annotations include the bounding boxes of objects in the images, as well as the labels of the objects.
Facebook
TwitterCOCO Object Detection Dataset | 2017
Downloaded from here and it includes Train images for now.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This Dataset is a subsets of COCO 2017 -train- images using "Crowd" & "person" Labels With the First Caption of Each one
COCO Summary: The COCO dataset is a comprehensive collection designed for object detection, segmentation, and captioning tasks. It comprises over 200,000 images, encompassing a diverse array of everyday scenes and objects. Each image features multiple objects and scenes across 80 distinct object categories, all of which are annotated with descriptive image captions.
Facebook
TwitterCOCO minitrain is a curated mini training set (25K images β 20% of train2017) for COCO.
@inproceedings{HoughNet,
author = {Nermin Samet and Samet Hicsonmez and Emre Akbas},
title = {HoughNet: Integrating near and long-range evidence for bottom-up object detection},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2020},
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.
COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.
Facebook
Twitterhttp://www.gnu.org/licenses/agpl-3.0.htmlhttp://www.gnu.org/licenses/agpl-3.0.html
Ultralytics COCO8 is a small, but versatile object detection dataset composed of the first 8 images of the COCO train 2017 set, 4 for training and 4 for validation. This dataset is ideal for testing and debugging object detection models, or for experimenting with new detection approaches. With 8 images, it is small enough to be easily manageable, yet diverse enough to test training pipelines for errors and act as a sanity check before training larger datasets.
To train a YOLOv8n model on the COCO8 dataset for 100 epochs with an image size of 640, you can use the following code snippets. For a comprehensive list of available arguments, refer to the model Training page.
# Start training from a pretrained *.pt model
yolo detect train data=coco8.yaml model=yolov8n.pt epochs=100 imgsz=640
Facebook
TwitterA collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.
RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.
Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".
Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):
| dataset | partition | split | refs | images |
|---|---|---|---|---|
| refcoco | train | 40000 | 19213 | |
| refcoco | val | 5000 | 4559 | |
| refcoco | test | 5000 | 4527 | |
| refcoco | unc | train | 42404 | 16994 |
| refcoco | unc | val | 3811 | 1500 |
| refcoco | unc | testA | 1975 | 750 |
| refcoco | unc | testB | 1810 | 750 |
| refcoco+ | unc | train | 42278 | 16992 |
| refcoco+ | unc | val | 3805 | 1500 |
| refcoco+ | unc | testA | 1975 | 750 |
| refcoco+ | unc | testB | 1798 | 750 |
| refcocog | train | 44822 | 24698 | |
| refcocog | val | 5000 | 4650 | |
| refcocog | umd | train | 42226 | 21899 |
| refcocog | umd | val | 2573 | 1300 |
| refcocog | umd | test | 5023 | 2600 |
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('ref_coco', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO Train Sample is a dataset for object detection tasks - it contains Bicycle Car Person annotations for 8,057 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Val Dataset Creation For COCO + Landing Pad Image Dataset is a dataset for object detection tasks - it contains COCO And LandingPads annotations for 1,852 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The dataset is structured with images split into directories and no downscaling was done.
The following notebook explains how to convert custom annotations to COCO format:
https://www.kaggle.com/sreevishnudamodaran/build-custom-coco-annotations-512x512-tiled
- coco_train
- images(contains images in jpg format)
- original_tiff_image_name
- tile_column_number
- image
.
.
.
.
.
.
.
.
.
- train.json (contains all the segmentation annotations in coco
- format with proper relative path of the images)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO 2017_Train Image is a dataset for object detection tasks - it contains Person Car Dog Cake annotations for 300 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data abstract:
The YogDATA dataset contains images from an industrial laboratory production line when it is functioned to quality yogurts. The case-study for the recognition of yogurt cups requires training of Mask R-CNN and YOLO v5.0 models with a set of corresponding images. Thus, it is important to collect the corresponding images to train and evaluate the class. Specifically, the YogDATA dataset includes the same labeled data for Mask R-CNN (coco format) and YOLO models. For the YOLO architecture, training and validation datsets include sets of images in jpg format and their annotations in txt file format. For the Mask R-CNN architecture, the annotation of the same sets of images are included in json file format (80% of images and annotations of each subset are in training set and 20% of images of each subset are in test set.)
Paper abstract:
The explosion of the digitisation of the traditional industrial processes and procedures is consolidating a positive impact on modern society by offering a critical contribution to its economic development. In particular, the dairy sector consists of various processes, which are very demanding and thorough. It is crucial to leverage modern automation tools and through-engineering solutions to increase their efficiency and continuously meet challenging standards. Towards this end, in this work, an intelligent algorithm based on machine vision and artificial intelligence, which identifies dairy products within production lines, is presented. Furthermore, in order to train and validate the model, the YogDATA dataset was created that includes yogurt cups within a production line. Specifically, we evaluate two deep learning models (Mask R-CNN and YOLO v5.0) to recognise and detect each yogurt cup in a production line, in order to automate the packaging processes of the products. According to our results, the performance precision of the two models is similar, estimating its at 99\%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO 2017_train 300 is a dataset for instance segmentation tasks - it contains Dog Person Car Cake annotations for 300 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Overview
COCO-King is a large-scale dataset for reference-guided image completion tasks, derived from the COCO dataset. It features images with masked objects and corresponding reference images of those objects, enabling models to learn how to replace or complete masked regions with guidance from reference images.
Dataset Size and Structure
Total size: 690MB Images: 9,558 total images (8,134 training + 1,424 validation) Categories: 170 diverse object categories Directory structure:
coco-king/ βββ train/ β βββ images/ # Original images with objects to be masked β βββ mask/ # Binary masks (white background, black object) β βββ reference/ # Augmented reference images of masked objects βββ val/ β βββ images/ # Validation images β βββ mask/ # Validation masks β βββ reference/ # Validation reference images βββ metadata.json # Complete dataset metadata βββ train_annotations.json # COCO-format training annotations βββ val_annotations.json # COCO-format validation annotations
Unique Features
Specially Curated Masks
Smoothed Contours: Each mask features smooth, rounded edges to mimic human-drawn masks rather than pixel-perfect segmentations
Processing Pipeline: Masks underwent morphological operations and Gaussian blurring to create natural-looking boundaries
Single Masked Object per Image: Each image has one primary object masked (the largest that meets size criteria), despite containing multiple objects (avg. 7 objects per image)
Rich Reference Images
Paint by Example Style Augmentations: Reference images are augmented similar to the Paint by Example paper:
Mild color jittering (brightness, contrast, saturation, hue) Random horizontal flips Small random rotations (up to 10 degrees) Mild perspective transformations Occasional equalization and auto-contrast
Balanced Object Selection
Size Range: Objects cover 0.89% to 42% of image area (average: ~25%) Multiple Objects: Every image contains multiple objects (ranging from 2 to 29) Diverse Categories: Well-distributed across 170 object categories
Dataset Highlights
Applications
This dataset is ideal for: Exemplar-based image inpainting/completion: Using reference images to guide the filling of masked regions Reference-guided object placement: Learning to place objects in scenes with proper perspective and lighting
Object replacement: Replacing objects in images with new objects while maintaining scene coherence
Style/appearance transfer: Learning to transfer appearance characteristics to objects in new scenes
Research on Paint by Example or similar architectures: Models that aim to fill masked regions based on reference images
Data Processing
Derived from COCO dataset with additional processing
Each image triplet (image, mask, reference) was processed to ensure: The masked object is of appropriate size Masks have smooth, natural contours Reference images maintain object identity while providing variation through augmentation
This dataset offers a unique resource for developing and benchmarking models that can intelligently replace or complete portions of images based on reference examples.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A simple dataset for benchmarking CreateML object detection models. The images are sampled from COCO dataset with eyes and nose bounding boxes added. Itβs not meant to be serious or useful in a real application. The purpose is to look at how long it takes to train CreateML models with varying dataset and batch sizes.
Training performance is affected by model configuration, dataset size and batch configuration. Larger models and batches require more memory. I used CreateML object detection project to compare the performance.
Hardware
M1 Macbook Air * 8 GPU * 4/4 CPU * 16G memory * 512G SSD
M1 Max Macbook Pro * 24 GPU * 2/8 CPU * 32G memory * 2T SSD
Small Dataset Train: 144 Valid: 16 Test: 8
Results |batch | M1 ET | M1Max ET | peak mem G | |--------|:------|:---------|:-----------| |16 | 16 | 11 | 1.5 | |32 | 29 | 17 | 2.8 | |64 | 56 | 30 | 5.4 | |128 | 170 | 57 | 12 |
Larger Dataset Train: 301 Valid: 29 Test: 18
Results |batch | M1 ET | M1Max ET | peak mem G | |--------|:------|:---------|:-----------| |16 | 21 | 10 | 1.5 | |32 | 42 | 17 | 3.5 | |64 | 85 | 30 | 8.4 | |128 | 281 | 54 | 16.5 |
CreateML Settings
For all tests, training was set to Full Network. I closed CreateML between each run to make sure memory issues didn't cause a slow down. There is a bug with Monterey as of 11/2021 that leads to memory leak. I kept an eye on the memory usage. If it looked like there was a memory leak, I restarted MacOS.
Observations
In general, more GPU and memory with MBP reduces the training time. Having more memory lets you train with larger datasets. On M1 Macbook Air, the practical limit is 12G before memory pressure impacts performance. On M1 Max MBP, the practical limit is 26G before memory pressure impacts performance. To work around memory pressure, use smaller batch sizes.
On the larger dataset with batch size 128, the M1Max is 5x faster than Macbook Air. Keep in mind a real dataset should have thousands of samples like Coco or Pascal. Ideally, you want a dataset with 100K images for experimentation and millions for the real training. The new M1 Max Macbooks is a cost effective alternative to building a Windows/Linux workstation with RTX 3090 24G. For most of 2021, the price of RTX 3090 with 24G is around $3,000.00. That means an equivalent windows workstation would cost the same as the M1Max Macbook pro I used to run the benchmarks.
Full Network vs Transfer Learning
As of CreateML 3, training with full network doesn't fully utilize the GPU. I don't know why it works that way. You have to select transfer learning to fully use the GPU. The results of transfer learning with the larger dataset. In general, the training time is faster and loss is better.
| batch | ET min | Train Acc | Val Acc | Test Acc | Top IU Train | Top IU Valid | Top IU Test | Peak mem G | loss |
|---|---|---|---|---|---|---|---|---|---|
| 16 | 4 | 75 | 19 | 12 | 78 | 23 | 13 | 1.5 | 0.41 |
| 32 | 8 | 75 | 21 | 10 | 78 | 26 | 11 | 2.76 | 0.02 |
| 64 | 13 | 75 | 23 | 8 | 78 | 24 | 9 | 5.3 | 0.017 |
| 128 | 25 | 75 | 22 | 13 | 78 | 25 | 14 | 8.4 | 0.012 |
Github Project
The source code and full results are up on Github https://github.com/woolfel/createmlbench
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.
COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.