The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Annotations: The dataset has annotations for
object detection: bounding boxes and per-instance segmentation masks with 80 object categories, captioning: natural language descriptions of the images (see MS COCO Captions), keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle), stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff), panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road), dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO Dataset Limited (Person Only) is a dataset for object detection tasks - it contains People annotations for 5,438 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.
detection-datasets/coco dataset hosted on Hugging Face and contributed by the HF Datasets community
This is an open source object detection model by TensorFlow in TensorFlow Lite format. While it is not recommended to use this model in production surveys, it can be useful for demonstration purposes and to get started with smart assistants in ArcGIS Survey123. You are responsible for the use of this model. When using Survey123, it is your responsibility to review and manually correct outputs.This object detection model was trained using the Common Objects in Context (COCO) dataset. COCO is a large-scale object detection dataset that is available for use under the Creative Commons Attribution 4.0 License.The dataset contains 80 object categories and 1.5 million object instances that include people, animals, food items, vehicles, and household items. For a complete list of common objects this model can detect, see Classes.The model can be used in ArcGIS Survey123 to detect common objects in photos that are captured with the Survey123 field app. Using the modelFollow the guide to use the model. You can use this model to detect or redact common objects in images captured with the Survey123 field app. The model must be configured for a survey in Survey123 Connect.Fine-tuning the modelThis model cannot be fine-tuned using ArcGIS tools.InputCamera feed (either low-resolution preview or high-resolution capture).OutputImage with common object detections written to its EXIF metadata or an image with detected objects redacted.Model architectureThis is an open source object detection model by TensorFlow in TensorFlow Lite format with MobileNet architecture. The model is available for use under the Apache License 2.0.Sample resultsHere are a few results from the model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MJ-COCO-2025 is a modified version of the MS-COCO-2017 dataset, in which the annotation errors have been automatically corrected using model-driven methods. The name "MJ" originates from the initials of Min Je Kim, the individual who updated the dataset. "MJ" also stands for "Modification & Justification," emphasizing that the modifications were not manually edited but were systematically validated through machine learning models to increase reliability and quality. Thus, MJ-COCO-2025 reflects both a personal identity and a commitment to improving the dataset through thoughtful modification, ensuring improved accuracy, reliability and consistency. The comparative results of MS-COCO and MJ-COCO datasets are presented in Table 1 and Figure 1. The MJ-COCO-2025 dataset features the improvements, including fixes for group annotations, addition of missing annotations, removal of redundant or overlapping labels, etc. These refinements aim to improve training and evaluation performance in object detection tasks.
The re-labeled MJ-COCO-2025 dataset demonstrates substantial improvements in annotation quality, with significant increases in several categories and minor corrections in a few due to previous over-annotations or misclassifications, as shown in Table 1 when compared to the original MS-COCO-2017 dataset.
Table 1: Comparison of Class-wise Annotations: MS-COCO-2017 and MJ-COCO-2025. Class Names | MS-COCO | MJ-COCO | Difference | Class Names | MS-COCO | MJ-COCO | Difference ---------------------|---------|---------|------------|----------------------|---------|---------|------------ Airplane | 5,135 | 5,810 | 675 | Kite | 9,076 | 15,092 | 6,016 Apple | 5,851 | 19,527 | 13,676 | Knife | 7,770 | 6,697 | -1,073 Backpack | 8,720 | 10,029 | 1,309 | Laptop | 4,970 | 5,280 | 310 Banana | 9,458 | 49,705 | 40,247 | Microwave | 1,673 | 1,755 | 82 Baseball Bat | 3,276 | 3,517 | 241 | Motorcycle | 8,725 | 10,045 | 1,320 Baseball Glove | 3,747 | 3,440 | -307 | Mouse | 2,262 | 2,377 | 115 Bear | 1,294 | 1,311 | 17 | Orange | 6,399 | 18,416 | 12,017 Bed | 4,192 | 4,177 | -15 | Oven | 3,334 | 4,310 | 976 Bench | 9,838 | 9,784 | -54 | Parking Meter | 1,285 | 1,355 | 70 Bicycle | 7,113 | 7,853 | 740 | Person | 262,465 | 435,252 | 172,787 Bird | 10,806 | 13,346 | 2,540 | Pizza | 5,821 | 6,049 | 228 Boat | 10,759 | 13,386 | 2,627 | Potted Plant | 8,652 | 11,252 | 2,600 Book | 24,715 | 35,712 | 10,997 | Refrigerator | 2,637 | 2,728 | 91 Bottle | 24,342 | 32,455 | 8,113 | Remote | 5,703 | 5,428 | -275 Bowl | 14,358 | 13,591 | -767 | Sandwich | 4,373 | 3,925 | -448 Broccoli | 7,308 | 14,275 | 6,967 | Scissors | 1,481 | 1,558 | 77 Bus | 6,069 | 7,132 | 1,063 | Sheep | 9,509 | 12,813 | 3,304 Cake | 6,353 | 8,968 | 2,615 | Sink | 5,610 | 5,969 | 359 Car | 43,867 | 51,662 | 7,795 | Skateboard | 5,543 | 5,761 | 218 Carrot | 7,852 | 15,411 | 7,559 | Skis | 6,646 | 8,945 | 2,299 Cat | 4,768 | 4,895 | 127 | Snowboard | 2,685 | 2,565 | -120 Cell Phone | 6,434 | 6,642 | 208 | Spoon | 6,165 | 6,156 | -9 Chair | 38,491 | 56,750 | 18,259 | Sports Ball | 6,347 | 6,060 | -287 Clock | 6,334 | 7,618 | 1,284 | Stop Sign | 1,983 | 2,684 | 701 Couch | 5,779 | 5,598 | -181 | Suitcase | 6,192 | 7,447 | 1,255 Cow | 8,147 | 8,990 | 843 | Surfboard | 6,126 | 6,175 | 49 Cup | 20,650 | 22,545 | 1,895 | Teddy Bear | 4,793 | 6,432 | 1,639 Dining Table | 15,714 | 16,569 | 855 | Tennis Racket | 4,812 | 4,932 | 120 Dog | 5,508 | 5,870 | 362 | Tie | 6,496 | 6,048 | -448 Donut | 7,179 | 11,622 | 4,443 | Toaster | 225 | 320 | 95 Elephant | 5,513 | 6,233 | 720 | Toilet | 4,157 | 4,433 | 276 Fire Hydrant ...
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Tiny COCO is a dataset for object detection tasks - it contains Coco Objects annotations for 5,025 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.
COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.
We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet
This repository contains the following files:
coco_classes_map.txt, contains the mapping for the 80 coco classes
lvis_classes_map.txt, contains the mapping for the 1460 coco classes
openimages_classes_map.txt, contains the mapping for the 601 coco classes
classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy
all-classnames.xlsx, contains a side-by-side view of all classes considered
This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022] that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet Zhang et al. 2021, Mask R-CNN He et al. 2017, and a Faster R-CNN+Inception ResNet (trained on the Open Images V4).
This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form:
@inproceedings{amato2021visione, title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio}, journal={Journal of Imaging}, volume={7}, number={5}, pages={76}, year={2021}, publisher={Multidisciplinary Digital Publishing Institute} }
References:
[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_52
[Amato et al. 2021] Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L. and Vairo, C., 2021. The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging, 7(5), p.76.
[Gupta et al.2019] Gupta, A., Dollar, P. and Girshick, R., 2019. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).
[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).
[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.
[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.
[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).
COCO is a large-scale object detection, segmentation, and captioning dataset.
COCO (Common Objects in COntext) is a popular dataset in Computer Vision. It contains annotations for Computer Vision tasks - object detection, segmentation, keypoint detection, stuff segmentation, panoptic segmentation, densepose, and image captioning. For more details visit COCO Dataset
The Tensor Processing Unit (TPU) hardware accelerators are very fast. The challenge is often to feed them data fast enough to keep them busy. Google Cloud Storage (GCS) is capable of sustaining very high throughput but as with all cloud storage systems, initiating a connection costs some network back and forth. Therefore, having our data stored as thousands of individual files is not ideal. This dataset contains COCO dataset with object detection annotations in a smaller number of files and you can use the power of tf.data.Dataset to read from multiple files in parallel.
TFRecord file format Tensorflow's preferred file format for storing data is the protobuf-based TFRecord format. Other serialization formats would work too but you can load a dataset from TFRecord files directly by writing:
filenames = tf.io.gfile.glob(FILENAME_PATTERN) dataset = tf.data.TFRecordDataset(filenames) dataset = dataset.map(...)
For more details https://codelabs.developers.google.com/codelabs/keras-flowers-data/
You can use the following code in your kaggle notebook to get Google Cloud Storage (GCS) path of any public Kaggle dataset .
from kaggle_datasets import KaggleDatasets
GCS_PATH = KaggleDatasets().get_gcs_path()
View the notebook COCO Object Detection dataset in TFRecord to see how TFRecord files are created from the original COCO dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental results of the object detection task on the COCO dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Vehicles Coco Dataset is a dataset for object detection tasks - it contains Vehicles annotations for 9,629 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Verbs in COCO (V-COCO) is a dataset that builds off COCO for human-object interaction detection. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person has annotations for 29 action categories and there are no interaction labels including objects.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.
This dataset was created by frosty_otter
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth):
Format: .pth
Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):
Format: .zip
Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.
Contents:
Images: JPEG format images of micro-FTIR filters.
Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.
Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.
The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Code can be found on the related Github repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.
COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Annotations: The dataset has annotations for
object detection: bounding boxes and per-instance segmentation masks with 80 object categories, captioning: natural language descriptions of the images (see MS COCO Captions), keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle), stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff), panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road), dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.