The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.
This dataset contains a collection of 32,000 image-caption pairs sourced from:
Each entry is included in the JSON file train_mix_32000.json
, with the following fields:
- "filename"
: Image filename (relative to dataset structure)
- "caption"
: Image description
- "data"
: Source dataset ("coco"
or "cc12m"
)
train_mix_32000.json
: Metadata file with image paths and captions.images/
: Folder structure containing all 32,000 actual image files referenced in the JSON.đź’ˇ Image paths in the JSON have been adjusted to reflect the folder structure inside this Kaggle dataset.
This dataset includes images from:
COCO 2014
Licensed under Creative Commons Attribution 4.0.
CC12M
Provided by Google LLC under a permissive license:
The dataset may be freely used for any purpose, although acknowledgment of Google LLC as the data source would be appreciated.
The dataset is provided "AS IS" without any warranty, express or implied.
View License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.
COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:
Attribution-ShareAlike 3.0 (CC BY-SA 3.0)https://creativecommons.org/licenses/by-sa/3.0/
License information was derived automatically
Please note: this archive requires support for dangling symlinks, which excludes the Windows operating system.
To use this dataset, you will need to download the MS COCO 2017 detection images and expand them to a folder called coco17 in the train_val_combined directory. The download can be found here: https://cocodataset.org/#download You will also need to download the AI2D image description dataset and expand them to a folder called ai2d in the train_val_combined directory. The download can be found here: https://prior.allenai.org/projects/diagram-understanding
License Notes for Train and Val: Since the images in this dataset come from different sources, they are bound by different licenses.
Images for bar charts, x-y plots, maps, pie charts, tables, and technical drawings were downloaded directly from wikimedia commons. License and authorship information is stored independently for each image in these categories in the wikimedia_commons_licenses.csv file. Each row (note: some rows are multi-line) is formatted so:
Images in the slides category were taken from presentations which were downloaded from Wikimedia Commons. The names of the presentations on Wikimedia Commons omits the trailing underscore, number, and file extension, and ends with .pdf instead. The source materials' licenses are shown in source_slices_licenses.csv.
Wikimedia commons photos' information page can be found at "https://commons.wikimedia.org/wiki/File:
License Notes for Testing: The testing images have been uploaded to SlideWiki by SlideWiki users. The image authorship and copyright information is available in authors.csv.
Further information can be found for each image using the SlideWiki file service. Documentation is available at https://fileservice.slidewiki.org/documentation#/ and in particular: metadata is available at "https://fileservice.slidewiki.org/metadata/
This is the SlideImages dataset, which has been assembled for the SlideImages paper. If you find the dataset useful, please cite our paper: https://doi.org/10.1007/978-3-030-45442-5_36
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
COCO Dataset Limited (Person Only) is a dataset for object detection tasks - it contains People annotations for 5,438 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
coco2017
Image-text pairs from MS COCO2017.
Data origin
Data originates from cocodataset.org While coco-karpathy uses a dense format (with several sentences and sendids per row), coco-karpathy-long uses a long format with one sentence (aka caption) and sendid per row. coco-karpathy-long uses the first five sentences and therefore is five times as long as coco-karpathy. phiyodr/coco2017: One row corresponds one image with several sentences. phiyodr/coco2017-long: One row… See the full description on the dataset page: https://huggingface.co/datasets/phiyodr/coco2017.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. http://cocodataset.org COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image * 250,000 people with keypoints
This dataset contains pickled Python objects with data from the annotations of the Microsoft (MS) COCO dataset. COCO is a large-scale object detection, segmentation, and captioning dataset.
Except for the objs file, which is a plain text file continuing a list of objects, the data in this dataset is all in the pickle format, a way of storing Python objects at binary data files.
Important: These pickles were pickled using Python 2. Since Kernels use Python 3, you will need to specify the encoding when unpickling these files. The Python utility scripts here have been updated to correctly unpickle these files.
# the correct syntax to read these pickled files into Python 3
pickle.load(open('file_path, 'rb'), encoding = "latin1")
As a derivative of the original COCO dataset, this dataset is distributed under a CC-BY 4.0 license. These files were distributed as part of the supporting materials for Zhao et al 2017. If you use these files in your work, please cite the following paper:
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2017). Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2979-2989).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Mini COCO Dataset is a dataset for instance segmentation tasks - it contains People annotations for 493 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Vehicles Coco is a dataset for object detection tasks - it contains Vehicles annotations for 18,998 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Vehicles Coco Dataset is a dataset for object detection tasks - it contains Vehicles annotations for 9,629 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
## Overview
Test Cocodataset is a dataset for object detection tasks - it contains Fruits annotations for 2,010 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [Public Domain license](https://creativecommons.org/licenses/Public Domain).
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Download following datasets:
1. PASCAL-5i
Download PASCAL VOC2012 devkit (train/val data): wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
Download PASCAL VOC2012 SDS extended mask annotations.
2. COCO-20i
Download COCO2014 train/val images and annotations: wget http://images.cocodataset.org/zips/train2014.zip wget http://images.cocodataset.org/zips/val2014.zip wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip… See the full description on the dataset page: https://huggingface.co/datasets/zaplm/IC-VOS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.
In this repository, we provide:
66 Full HD video clips (total size: 5.5 GB)
126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)
3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):
annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood
annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.
annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.
The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:
Test set: All the images whose filename starts with "DJI_0804" (total: 37,604 images)
Training set: All the images whose filename starts with "DJI_0915" (total: 88,568 images)
More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973 The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval See also http://aimh.isti.cnr.it/dataset/MOBDrone
Citing the MOBDrone
The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form. Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people
@inproceedings{MOBDrone2021, title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue}, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing}, year={2021} }
and this Zenodo Dataset
@dataset{donato_cafarelli_2022_5996890, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, title = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}}, month = feb, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5996890}, url = {https://doi.org/10.5281/zenodo.5996890} }
Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.
Contact Information
If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it
Acknowledgements
This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
The Dataset
A collection of images of parking lots for vehicle detection, segmentation, and counting.
Each image is manually labeled with pixel-wise masks and bounding boxes localizing vehicle instances.
The dataset includes about 250 images depicting several parking areas describing most of the problematic situations that we can find in a real scenario: seven different cameras capture the images under various weather conditions and viewing angles. Another challenging aspect is the presence of partial occlusion patterns in many scenes such as obstacles (trees, lampposts, other cars) and shadowed cars.
The main peculiarity is that images are taken during the day and the night, showing utterly different lighting conditions.
We suggest a three-way split (train-validation-test). The train split contains images taken during the daytime while validation and test splits include images gathered at night.
In line with these splits we provide some annotation files:
train_coco_annotations.json and val_coco_annotations.json --> JSON files that follow the golden standard MS COCO data format (for more info see https://cocodataset.org/#format-data) for the training and the validation splits, respectively. All the vehicles are labeled with the COCO category 'car'. They are suitable for vehicle detection and instance segmentation.
train_dot_annotations.csv and val_dot_annotations.csv --> CSV files that contain xy coordinates of the centroids of the vehicles for the training and the validation splits, respectively. Dot annotation is commonly used for the visual counting task.
ground_truth_test_counting.csv --> CSV file that contains the number of vehicles present in each image. It is only suitable for testing vehicle counting solutions.
Citing our work
If you found this dataset useful, please cite the following paper
@inproceedings{Ciampi_visapp_2021, doi = {10.5220/0010303401850195}, url = {https://doi.org/10.5220%2F0010303401850195}, year = 2021, publisher = {{SCITEPRESS} - Science and Technology Publications}, author = {Luca Ciampi and Carlos Santiago and Joao Costeira and Claudio Gennaro and Giuseppe Amato}, title = {Domain Adaptation for Traffic Density Estimation}, booktitle = {Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications} }
and this Zenodo Dataset
@dataset{ciampi_ndispark_6560823, author = {Luca Ciampi and Carlos Santiago and Joao Costeira and Claudio Gennaro and Giuseppe Amato}, title = {{Night and Day Instance Segmented Park (NDISPark) Dataset: a Collection of Images taken by Day and by Night for Vehicle Detection, Segmentation and Counting in Parking Areas}}, month = may, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.6560823}, url = {https://doi.org/10.5281/zenodo.6560823} }
Contact Information
If you would like further information about the dataset or if you experience any issues downloading files, please contact us at luca.ciampi@isti.cnr.it
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Weapon Object Coco is a dataset for instance segmentation tasks - it contains Object Person annotations for 1,775 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset is designed for object detection tasks and follows the COCO format. It contains 300 images and corresponding annotation files in JSON format. The dataset is split into training, validation, and test sets, ensuring a balanced distribution for model evaluation.
train/ (70% - 210 images)
valid/ (15% - 45 images)
test/ (15% - 45 images)
Images in JPEG/PNG format.
A corresponding _annotations.coco.json file that includes bounding box annotations.
The dataset has undergone several preprocessing and augmentation steps to enhance model generalization:
Auto-orientation applied
Resized to 640x640 pixels (stretched)
Flip: Horizontal flipping
Crop: 0% minimum zoom, 5% maximum zoom
Rotation: Between -5° and +5°
Saturation: Adjusted between -4% and +4%
Brightness: Adjusted between -10% and +10%
Blur: Up to 0px
Noise: Up to 0.1% of pixels
Bounding Box Augmentations:
Flipping, cropping, rotation, brightness adjustments, blur, and noise applied accordingly to maintain annotation consistency.
The dataset follows the COCO (Common Objects in Context) format, which includes:
images section: Contains image metadata such as filename, width, and height.
annotations section: Includes bounding boxes, category IDs, and segmentation masks (if applicable).
categories section: Defines class labels.
The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.