Facebook
TwitterCOCO is a large-scale object detection, segmentation, and captioning dataset.
Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('coco', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterThe Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO Dataset Only Person is a dataset for object detection tasks - it contains Person annotations for 2,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MJ-COCO-2025 is a modified version of the MS-COCO-2017 dataset, in which the annotation errors have been automatically corrected using model-driven methods. The name "MJ" originates from the initials of Min Je Kim, the individual who updated the dataset. "MJ" also stands for "Modification & Justification," emphasizing that the modifications were not manually edited but were systematically validated through machine learning models to increase reliability and quality. Thus, MJ-COCO-2025 reflects both a personal identity and a commitment to improving the dataset through thoughtful modification, ensuring improved accuracy, reliability and consistency. The comparative results of MS-COCO and MJ-COCO datasets are presented in Table 1 and Figure 1. The MJ-COCO-2025 dataset features the improvements, including fixes for group annotations, addition of missing annotations, removal of redundant or overlapping labels, etc. These refinements aim to improve training and evaluation performance in object detection tasks.
The re-labeled MJ-COCO-2025 dataset exhibits notable improvements in annotation quality compared to the original MS-COCO-2017 dataset. As shown in Table 1, it includes substantial increases in categories such as previously missing annotations and group annotations. At the same time, the dataset has been refined by reducing annotation noise through the removal of duplicates, resolution of challenging or debatable cases, and elimination of non-existent object annotations.
Table 1: Comparison of Class-wise Annotations: MS-COCO-2017 and MJ-COCO-2025. Class Names | MS-COCO | MJ-COCO | Difference | Class Names | MS-COCO | MJ-COCO | Difference ---------------------|---------|---------|------------|----------------------|---------|---------|------------ Airplane | 5,135 | 5,810 | 675 | Kite | 9,076 | 15,092 | 6,016 Apple | 5,851 | 19,527 | 13,676 | Knife | 7,770 | 6,697 | -1,073 Backpack | 8,720 | 10,029 | 1,309 | Laptop | 4,970 | 5,280 | 310 Banana | 9,458 | 49,705 | 40,247 | Microwave | 1,673 | 1,755 | 82 Baseball Bat | 3,276 | 3,517 | 241 | Motorcycle | 8,725 | 10,045 | 1,320 Baseball Glove | 3,747 | 3,440 | -307 | Mouse | 2,262 | 2,377 | 115 Bear | 1,294 | 1,311 | 17 | Orange | 6,399 | 18,416 | 12,017 Bed | 4,192 | 4,177 | -15 | Oven | 3,334 | 4,310 | 976 Bench | 9,838 | 9,784 | -54 | Parking Meter | 1,285 | 1,355 | 70 Bicycle | 7,113 | 7,853 | 740 | Person | 262,465 | 435,252 | 172,787 Bird | 10,806 | 13,346 | 2,540 | Pizza | 5,821 | 6,049 | 228 Boat | 10,759 | 13,386 | 2,627 | Potted Plant | 8,652 | 11,252 | 2,600 Book | 24,715 | 35,712 | 10,997 | Refrigerator | 2,637 | 2,728 | 91 Bottle | 24,342 | 32,455 | 8,113 | Remote | 5,703 | 5,428 | -275 Bowl | 14,358 | 13,591 | -767 | Sandwich | 4,373 | 3,925 | -448 Broccoli | 7,308 | 14,275 | 6,967 | Scissors | 1,481 | 1,558 | 77 Bus | 6,069 | 7,132 | 1,063 | Sheep | 9,509 | 12,813 | 3,304 Cake | 6,353 | 8,968 | 2,615 | Sink | 5,610 | 5,969 | 359 Car | 43,867 | 51,662 | 7,795 | Skateboard | 5,543 | 5,761 | 218 Carrot | 7,852 | 15,411 | 7,559 | Skis | 6,646 | 8,945 | 2,299 Cat | 4,768 | 4,895 | 127 | Snowboard | 2,685 | 2,565 | -120 Cell Phone | 6,434 | 6,642 | 208 | Spoon | 6,165 | 6,156 | -9 Chair | 38,491 | 56,750 | 18,259 | Sports Ball | 6,347 | 6,060 | -287 Clock | 6,334 | 7,618 | 1,284 | Stop Sign | 1,983 | 2,684 | 701 Couch | 5,779 | 5,598 | -181 | Suitcase | 6,192 | 7,447 | 1,255 Cow | 8,147 | 8,990 | 843 | Surfboard | 6,126 | 6,175 | 49 Cup | 20,650 | 22,545 | 1,895 | Teddy Bear | 4,793 | 6,432 | 1,639 Dining Table | 15,714 | 16,569 | 855 | Tennis Racket | 4,812 | 4,932 | 120 Dog | 5,508 | 5,870 | 362 | Tie | 6,496 | 6,048 | -448 Donut | 7,179 | 11,622 | 4,443 ...
Facebook
TwitterThis dataset was created by fsai236
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Yolov8n Coco is a dataset for object detection tasks - it contains Coco Dataset annotations for 372 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This publicly available Multitask COCO dataset has been preprocessed for seamless use in object detection, keypoint detection, and segmentation tasks. It enables multi-label annotations for COCO, ensuring robust performance across various vision applications. Special thanks to yermandy for providing access to multi-label annotations.
Optimized for deep learning models, this dataset is structured for easy integration into training pipelines, supporting diverse applications in computer vision research.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Enhance your AI-powered damage detection with our Coco Damage Detection Trained Models. Designed for precision and efficiency, these models are versatile and easily integrated into various applications.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Dataset Overview
COCO-King is a large-scale dataset for reference-guided image completion tasks, derived from the COCO dataset. It features images with masked objects and corresponding reference images of those objects, enabling models to learn how to replace or complete masked regions with guidance from reference images.
Dataset Size and Structure
Total size: 690MB Images: 9,558 total images (8,134 training + 1,424 validation) Categories: 170 diverse object categories Directory structure:
coco-king/ ├── train/ │ ├── images/ # Original images with objects to be masked │ ├── mask/ # Binary masks (white background, black object) │ └── reference/ # Augmented reference images of masked objects ├── val/ │ ├── images/ # Validation images │ ├── mask/ # Validation masks │ └── reference/ # Validation reference images ├── metadata.json # Complete dataset metadata ├── train_annotations.json # COCO-format training annotations └── val_annotations.json # COCO-format validation annotations
Unique Features
Specially Curated Masks
Smoothed Contours: Each mask features smooth, rounded edges to mimic human-drawn masks rather than pixel-perfect segmentations
Processing Pipeline: Masks underwent morphological operations and Gaussian blurring to create natural-looking boundaries
Single Masked Object per Image: Each image has one primary object masked (the largest that meets size criteria), despite containing multiple objects (avg. 7 objects per image)
Rich Reference Images
Paint by Example Style Augmentations: Reference images are augmented similar to the Paint by Example paper:
Mild color jittering (brightness, contrast, saturation, hue) Random horizontal flips Small random rotations (up to 10 degrees) Mild perspective transformations Occasional equalization and auto-contrast
Balanced Object Selection
Size Range: Objects cover 0.89% to 42% of image area (average: ~25%) Multiple Objects: Every image contains multiple objects (ranging from 2 to 29) Diverse Categories: Well-distributed across 170 object categories
Dataset Highlights
Applications
This dataset is ideal for: Exemplar-based image inpainting/completion: Using reference images to guide the filling of masked regions Reference-guided object placement: Learning to place objects in scenes with proper perspective and lighting
Object replacement: Replacing objects in images with new objects while maintaining scene coherence
Style/appearance transfer: Learning to transfer appearance characteristics to objects in new scenes
Research on Paint by Example or similar architectures: Models that aim to fill masked regions based on reference images
Data Processing
Derived from COCO dataset with additional processing
Each image triplet (image, mask, reference) was processed to ensure: The masked object is of appropriate size Masks have smooth, natural contours Reference images maintain object identity while providing variation through augmentation
This dataset offers a unique resource for developing and benchmarking models that can intelligently replace or complete portions of images based on reference examples.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
TACO COCO is a dataset for object detection tasks - it contains Recyclables annotations for 1,499 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis dataset is a filtered subset of the COCO 2017 dataset containing only the 'cat' class. The images and annotations are optimized for training object detection models
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Object Detection Coco is a dataset for object detection tasks - it contains Coco annotations for 206 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Sangay Bhutia
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.
COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.
We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet
This repository contains the following files:
coco_classes_map.txt, contains the mapping for the 80 coco classes
lvis_classes_map.txt, contains the mapping for the 1460 coco classes
openimages_classes_map.txt, contains the mapping for the 601 coco classes
classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy
all-classnames.xlsx, contains a side-by-side view of all classes considered
This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022] that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet Zhang et al. 2021, Mask R-CNN He et al. 2017, and a Faster R-CNN+Inception ResNet (trained on the Open Images V4).
This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form:
@inproceedings{amato2021visione, title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio}, journal={Journal of Imaging}, volume={7}, number={5}, pages={76}, year={2021}, publisher={Multidisciplinary Digital Publishing Institute} }
References:
[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_52
[Amato et al. 2021] Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L. and Vairo, C., 2021. The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging, 7(5), p.76.
[Gupta et al.2019] Gupta, A., Dollar, P. and Girshick, R., 2019. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).
[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.
[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.
[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The COCO dataset is a foundational large-scale benchmark for object detection, segmentation, captioning, and keypoint analysis. Created by Microsoft, it features complex everyday scenes with common objects in their natural contexts. With over 330,000 images and 2.5 million labeled instances, it has become the gold standard for training and evaluating computer vision models.
images/
Contains 2 subdirectories split by usage:
train2017/: Main training set (118K images)
val2017/: Validation set (5K images)
File Naming: 000000000009.jpg (12-digit zero-padded IDs)
Formats: JPEG images with varying resolutions (average 640×480)
annotations/
Contains task-specific JSON files with consistent naming:
captions_*.json: 5 human-generated descriptions per image
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
COCO Human Parts Dataset
This is a subset of the COCO dataset specifically designed for human body part detection. The dataset includes detailed annotations for each image, allowing for training models to not just detect humans but to also identify and localize specific parts of the human body.
Labels
The COCO Human Parts dataset contains the following labels:
person [0] head [1] face [2] lefthand [3] righthand [4] leftfoot [5] rightfoot [6]
These labels represent the… See the full description on the dataset page: https://huggingface.co/datasets/testdummyvt/cocohumanparts.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Coco is a dataset for instance segmentation tasks - it contains Player annotations for 849 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterThis dataset was created by lachonman2
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth):
Format: .pth
Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):
Format: .zip
Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.
Contents:
Images: JPEG format images of micro-FTIR filters.
Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.
Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.
The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Code can be found on the related Github repository.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
PPE COCO is a dataset for object detection tasks - it contains PPE Detector annotations for 7,579 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCOCO is a large-scale object detection, segmentation, and captioning dataset.
Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('coco', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">