The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 164K images.
This is the original version from 2014 made available here for easy access in Kaggle and because it does not seem to be still available on the COCO Dataset website. This has been retrieved from the mirror that Joseph Redmon has setup on this own website.
The 2014 version of the COCO dataset is an excellent object detection dataset with 80 classes, 82,783 training images and 40,504 validation images. This dataset contains all this imagery on two folders as well as the annotation with the class and location (bounding box) of the objects contained in each image.
The initial split provides training (83K), validation (41K) and test (41K) sets. Since the split between training and validation was not optimal in the original dataset, there is also two text (.part) files with a new split with only 5,000 images for validation and the rest for training. The test set has no labels and can be used for visual validation or pseudo-labelling.
This is mostly inspired by Erik Linder-Norén and [Joseph Redmon](https://pjreddie.com/darknet/yolo
detection-datasets/coco dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
MS COCO Object Detector is a dataset for object detection tasks - it contains Fruits annotations for 1,156 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed for object detection tasks and follows the COCO format. It contains 300 images and corresponding annotation files in JSON format. The dataset is split into training, validation, and test sets, ensuring a balanced distribution for model evaluation.
train/ (70% - 210 images)
valid/ (15% - 45 images)
test/ (15% - 45 images)
Images in JPEG/PNG format.
A corresponding _annotations.coco.json file that includes bounding box annotations.
The dataset has undergone several preprocessing and augmentation steps to enhance model generalization:
Auto-orientation applied
Resized to 640x640 pixels (stretched)
Flip: Horizontal flipping
Crop: 0% minimum zoom, 5% maximum zoom
Rotation: Between -5° and +5°
Saturation: Adjusted between -4% and +4%
Brightness: Adjusted between -10% and +10%
Blur: Up to 0px
Noise: Up to 0.1% of pixels
Bounding Box Augmentations:
Flipping, cropping, rotation, brightness adjustments, blur, and noise applied accordingly to maintain annotation consistency.
The dataset follows the COCO (Common Objects in Context) format, which includes:
images section: Contains image metadata such as filename, width, and height.
annotations section: Includes bounding boxes, category IDs, and segmentation masks (if applicable).
categories section: Defines class labels.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for MSCOCO
Dataset Summary
COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:
Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image 250,000 people with keypoints
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More… See the full description on the dataset page: https://huggingface.co/datasets/shunk031/MSCOCO.
This is an open source object detection model by TensorFlow in TensorFlow Lite format. While it is not recommended to use this model in production surveys, it can be useful for demonstration purposes and to get started with smart assistants in ArcGIS Survey123. You are responsible for the use of this model. When using Survey123, it is your responsibility to review and manually correct outputs.This object detection model was trained using the Common Objects in Context (COCO) dataset. COCO is a large-scale object detection dataset that is available for use under the Creative Commons Attribution 4.0 License.The dataset contains 80 object categories and 1.5 million object instances that include people, animals, food items, vehicles, and household items. For a complete list of common objects this model can detect, see Classes.The model can be used in ArcGIS Survey123 to detect common objects in photos that are captured with the Survey123 field app. Using the modelFollow the guide to use the model. You can use this model to detect or redact common objects in images captured with the Survey123 field app. The model must be configured for a survey in Survey123 Connect.Fine-tuning the modelThis model cannot be fine-tuned using ArcGIS tools.InputCamera feed (either low-resolution preview or high-resolution capture).OutputImage with common object detections written to its EXIF metadata or an image with detected objects redacted.Model architectureThis is an open source object detection model by TensorFlow in TensorFlow Lite format with MobileNet architecture. The model is available for use under the Apache License 2.0.Sample resultsHere are a few results from the model.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes. COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes. We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet This repository contains the following files: coco_classes_map.txt, contains the mapping for the 80 coco classes lvis_classes_map.txt, contains the mapping for the 1460 coco classes openimages_classes_map.txt, contains the mapping for the 601 coco classes classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy all-classnames.xlsx, contains a side-by-side view of all classes considered This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022] that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet Zhang et al. 2021, Mask R-CNN He et al. 2017, and a Faster R-CNN+Inception ResNet (trained on the Open Images V4). This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form: @inproceedings{amato2021visione, title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio}, journal={Journal of Imaging}, volume={7}, number={5}, pages={76}, year={2021}, publisher={Multidisciplinary Digital Publishing Institute} } References: [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_52 [Amato et al. 2021] Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L. and Vairo, C., 2021. The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging, 7(5), p.76. [Gupta et al.2019] Gupta, A., Dollar, P. and Girshick, R., 2019. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364). [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969). [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981. [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham. [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO Object Detection Project is a dataset for object detection tasks - it contains Coco Minitrain Annotations annotations for 1,201 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://choosealicense.com/licenses/openrail/https://choosealicense.com/licenses/openrail/
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
COCO is a large-scale object detection, segmentation, and captioning dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Experimental results of the object detection task on the COCO dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. Dataset contains 330K images (>200K labeled) 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Car_coco is a dataset for object detection tasks - it contains Car annotations for 3,199 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://www.kcl.ac.uk/researchsupport/assets/DataAccessAgreement-Description.pdfhttps://www.kcl.ac.uk/researchsupport/assets/DataAccessAgreement-Description.pdf
This dataset contains annotated images for object detection for containers and hands in a first-person view (egocentric view) during drinking activities. Both YOLOV8 format and COCO format are provided.Please refer to our paper for more details.Purpose: Training and testing the object detection model.Content: Videos from Session 1 of Subjects 1-20.Images: Extracted from the videos of Subjects 1-20 Session 1.Additional Images:~500 hand/container images from Roboflow Open Source data.~1500 null (background) images from VOC Dataset and MIT Indoor Scene Recognition Dataset:1000 indoor scenes from 'MIT Indoor Scene Recognition'400 other unrelated objects from VOC DatasetData Augmentation:Horizontal flipping±15% brightness change±10° rotationFormats Provided:COCO formatPyTorch YOLOV8 formatImage Size: 416x416 pixelsTotal Images: 16,834Training: 13,862Validation: 1,975Testing: 997Instance Numbers:Containers: Over 10,000Hands: Over 8,000
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO MiniTrain Object Detection is a dataset for object detection tasks - it contains Coco annotations for 206 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image 250,000 people with keypoints
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of the state-of-the-art model on MS COCO data.
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.
Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.