Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Tinsae Bahiru
Released under Apache 2.0
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This publicly available Multitask COCO dataset has been preprocessed for seamless use in object detection, keypoint detection, and segmentation tasks. It enables multi-label annotations for COCO, ensuring robust performance across various vision applications. Special thanks to yermandy for providing access to multi-label annotations.
Optimized for deep learning models, this dataset is structured for easy integration into training pipelines, supporting diverse applications in computer vision research.
Facebook
TwitterThe Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The COCO dataset is a foundational large-scale benchmark for object detection, segmentation, captioning, and keypoint analysis. Created by Microsoft, it features complex everyday scenes with common objects in their natural contexts. With over 330,000 images and 2.5 million labeled instances, it has become the gold standard for training and evaluating computer vision models.
images/
Contains 2 subdirectories split by usage:
train2017/: Main training set (118K images)
val2017/: Validation set (5K images)
File Naming: 000000000009.jpg (12-digit zero-padded IDs)
Formats: JPEG images with varying resolutions (average 640×480)
annotations/
Contains task-specific JSON files with consistent naming:
captions_*.json: 5 human-generated descriptions per image
Facebook
Twitter## Overview
Microsoft COCO Pose Detection is a dataset for computer vision tasks - it contains Objects annotations for 5,105 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset can be used for a variety of computer vision tasks, including object detection, instance segmentation, keypoint detection, semantic segmentation, and image captioning. Whether you're working on supervised or semi-supervised learning, this resource is designed to meet your needs.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Coco Kp is a dataset for computer vision tasks - it contains CuRQ annotations for 319 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image 250,000 people with keypoints
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Is Keypoint-Only subset from COCO 2017 Dataset. You can access the original COCO Dataset from here
This Dataset contains three folders: annotations, val2017, and train2017. - Contents in annotation folder is two jsons, for val dan train. Each jsons contains various informations, like the image id, bounding box, and keypoints locations. - Contents of val2017 and train2017 is various images that have been filtered. They are the images that have num_keypoints > 0 according to the annotation file.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
From_coco is a dataset for computer vision tasks - it contains Armor WTdO annotations for 9,293 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCOCO (Common Objects in COntext) is a popular dataset in Computer Vision. It contains annotations for Computer Vision tasks - object detection, segmentation, keypoint detection, stuff segmentation, panoptic segmentation, densepose, and image captioning. For more details visit COCO Dataset
The Tensor Processing Unit (TPU) hardware accelerators are very fast. The challenge is often to feed them data fast enough to keep them busy. Google Cloud Storage (GCS) is capable of sustaining very high throughput but as with all cloud storage systems, initiating a connection costs some network back and forth. Therefore, having our data stored as thousands of individual files is not ideal. This dataset contains COCO dataset with object detection annotations in a smaller number of files and you can use the power of tf.data.Dataset to read from multiple files in parallel.
TFRecord file format Tensorflow's preferred file format for storing data is the protobuf-based TFRecord format. Other serialization formats would work too but you can load a dataset from TFRecord files directly by writing:
filenames = tf.io.gfile.glob(FILENAME_PATTERN) dataset = tf.data.TFRecordDataset(filenames) dataset = dataset.map(...)
For more details https://codelabs.developers.google.com/codelabs/keras-flowers-data/
You can use the following code in your kaggle notebook to get Google Cloud Storage (GCS) path of any public Kaggle dataset .
from kaggle_datasets import KaggleDatasets
GCS_PATH = KaggleDatasets().get_gcs_path()
View the notebook COCO Object Detection dataset in TFRecord to see how TFRecord files are created from the original COCO dataset.
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains 26,768 images of hands annotated with keypoints, making it suitable for training models for hand detection and keypoint estimation. The annotations were generated using the MediaPipe library, ensuring high accuracy and consistency. The dataset is compatible with both COCO and YOLOv8 formats.
The dataset is organized as follows:
hand_keypoint_dataset/
│
├── images/
│ ├── train/
│ ├── val/
│
├── coco_annotation/
│ ├── train/
│ │ ├── _annotations.coco.json
│ ├── val/
│ │ ├── _annotations.coco.json
│
├── labels/
│ ├── train/
│ ├── val/
│
└── README.md
images: Contains all the images divided into training and validation. annotations: Contains the annotations for the images in COCO. labels: Contains the annotations for the images in YOLO formats.
The dataset includes keypoints for hand detection. The keypoints are annotated as follows:
Each hand has a total of 21 keypoints.
To use the dataset with COCO-compatible models, you can directly load the JSON files using COCO APIs available in various deep learning frameworks.
For YOLOv8, ensure you have the required environment set up. You can use the provided text files to train YOLOv8 models by specifying the dataset path in your configuration file.
We would like to thank the following sources for providing the images used in this dataset:
https://sites.google.com/view/11khands https://www.kaggle.com/datasets/ritikagiridhar/2000-hand-gestures https://www.kaggle.com/datasets/imsparsh/gesture-recognition
The images were collected and used under the respective licenses provided by each platform.
For any questions or issues, please contact its.riondsilva@gmail.com
Thank you for using the Hand Keypoint Dataset!
Facebook
TwitterGarlic Keypoint Detection dataset
This dataset contains 1000 images of a single garlic clove in a pressumably industrial setting. The annotations are coco-formatted and are composed of a bounding box and 2 keypoints: head and tail. The dataset was taken from https://universe.roboflow.com/gesture-recognition-dsn2n/garlic_keypoint/dataset/1. Refer to the original repo for licensing questions. The annotations json files were slightly modified (formatting, image base directory,..)… See the full description on the dataset page: https://huggingface.co/datasets/tlpss/roboflow-garlic.
Facebook
TwitterSynthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.
Relevant computer vision tasks:
The dataset is for academic research use only, since it uses resources with restrictive licenses.
For a detailed description of how the resources are used, we refer to our paper and project page.
Licenses of the resources in detail:
You can use our textureless models (i.e. the obj files) of damaged parcels under CC BY 4.0 (note that this does not apply to the textures).
If you use this resource for scientific research, please consider citing
@inproceedings{naumannParcel3DShapeReconstruction2023,
author = {Naumann, Alexander and Hertlein, Felix and D\"orr, Laura and Furmans, Kai},
title = {Parcel3D: Shape Reconstruction From Single RGB Images for Applications in Transportation Logistics},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2023},
pages = {4402-4412}
}
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Big Pyramid is a dataset for computer vision tasks - it contains Pyramid annotations for 300 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains more than 8,000 video clips of 92 individual Amur tigers from 10 zoos in China. Around 9500 bounding boxes are provided along with pose keypoints, and around 3600 of those bounding boxes are associated with an individual tiger ID. This data set was originally published as part of the Re-identification challenge at the ICCV 2019 Workshop on Computer Vision for Wildlife Conservation; suggested train/val/test splits correspond to those used for the competition.
Data format All annotation tar files include README.md files with detailed format information; this section provides a high-level summary only.
Detection Bounding boxes are provided in Pascal VOC format.
Pose Pose annotations are provided in COCO format. Annotations use the COCO “keypoint” annotation type, with categories like “left_ear”, “right_ear”, “nose”, etc.
Re-identification Identifications in the “train” set are provided as a .csv-formatted list of [ID,filename] pairs; the “test” set contains only a list of images requiring identification. Pose annotations are provided for both sets.
The competition for which this dataset was prepared divided re-identification into two tasks, one (“plain re-ID”) where pose and bounding box annotations were available, and one (“wild re-ID”) where annotations were not available.
Tracks Tiger Detection: From images/videos captured by cameras, this task aims to place tight bounding boxes around tigers. As the detection may run on the edge (smart cameras), both the detection accuracy (in terms of AP) and the computing cost are used to measure the quality of the detector.
Tiger Pose Detection: From images/videos with detected tiger bounding boxes, this task aims to estimate tiger pose (i.e., keypoint landmarks) for tiger image alignment/normalization, so that pose variations are removed or alleviated in the tiger re-identification step. We will use mean average precision (mAP) and object keypoint similarity (OKS) to evaluate submissions.
Tiger Re-ID with Human Alignment (Plain Re-ID): We define a set of queries and a target database of Amur tigers. Both queries and targets in the database are already annotated with bounding boxes and pose information. Tiger re-identification aims to find all the database images containing the same tiger as the query. Both mAP and rank-1 accuracy will be used to evaluate accuracy.
Tiger Re-ID in the Wild: This track will evaluate the accuracy of tiger re-identification in wild with a fully automated pipeline. To simulate the real use case, no annotations are provided. Submissions should automatically detect and identify tigers in all images in the test set. Both mAP and rank-1 accuracy will be used to evaluate the accuracy of different models.
Format Description Detection: Data annotaiton in Pascal VOC format. Submission in COCO detection format. Training with the given training set and testing set will be provided in the test stage. Pose: Both data annotaiton and submission are in COCO format. Training with the given training set and testing set will be provided in the test stage. Plain ReID: Dataset contains cropped images with manual annotaetd ID and keypoints. Submission should be a json file in the following format:
[ {"query_id":0, "ans_ids":[29,38,10,.......]}, {"query_id":3, "ans_ids":[95,18,20,.......]}, ... ] where the "query_id" is the id of query image, and each followed array "ans_ids" lists re-ID results (image ids) in the confidence descending order. Similar to most existing Re-ID tasks, the plain Re-ID task requires to build models on training-set, and evaluating on the test-set. During testing, each image will be taken as query image, while all the remained images in the test-set as "gallery" or "database", the query results should be rank-list of images in "gallery". The evaluation server will separate the test-set into two cases: single-camera and cross camera (see our arxiv report for more details) to measure performance. The evaluation metrics are mAP and top-k (k=1, 5).
ReID in Wild: This task aims to evaluate the performance of Re-ID in a full automatical way. Paritipants require to build tiger detector, tiger pose estimator, and re-ID module based on the provide training-set, and integrate them as a full pipeline to re-identification each detected tiger in a set of wild input images. The test-set is the same as that of the detection task. The re-ID evaluation will use all the detected boxes as "gallery", while the other procedure is smilar to the plain re-ID case. Submission should be a json file with the following schema:
{ "bboxs":[bbox], "reid_result":[ {"query_id":int, "ans_ids":[int,int,...]} ] } where bbox{ "bbox_id": int, #used in reid_result "image_id": int, "pos": [x,y,w,h] #x,y is the top-left coord, all in pixels. } where the 'reid_result' is almost the same format as in Plain ReID, with only 'id' replac...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains 32,352 synthetic images of the asteroid Bennu taken from a variety of poses and illumination conditions using Blender. The dataset is split up into clean images and augmented images, the poses and illumination conditions for each of the images are the same, however, for the augmented images a variety of augmentations are added, which can be found in the paper (currently under review AIAA) The annotations are the same for all, where the annotations are given in the COCO format, allowing for ease of use with a range of keypoint detection networks. Furthermore, the *.csv files can be used to train object detection networks using TensorFlow. The *.json files contain all information, i.e., pose, keypoints, bounding box, for any given image. The dataset_utils contain a variety of functions that can be used for processing the dataset and plotting ground-truth keypoints and bounding boxes for a given image.
Facebook
TwitterIn the growing of perennial plants a large variety of different plants, typically more than 2000, are cultivated in small batches. In this domain weed removal is a recurring task that is currently done manually on a weekly basis on the whole population. Since the labour is not only repetitve but also requires working in unergonomical positions, being able to have it done by automation seems beneficial.
In this notebook we seek to answer two questions: - How reliable can weed be detected by using optical RGB inspection in combination with a state-of-the-art machine learning model? - When building the dataset, does it help to additionally provide a keypoint that marks the centre of the plant to later automatically identify the centre of the plant?
While answering the first question makes weed removal possible in the first place, finding a response to the latter is crucial when it comes to removing the weed with precision, be it with mechanical, electrical or chemical methods.
We will pursue these two questions by building our own dataset and by training a detecron2 model. Therefore this notebook will also contain an in-detail introduction to detectron2. At the time of writing, detailed discussions of this framework are only sparsly found on the internet.
We built two datasets: one for weed classification and another one for keypoint detection. Some images are present in both datasets.
Since weeds grow at different times of the year, we took images over a period of seven weeks, starting in mid-april and finishing in june. The photos were taken at a plant nursery in southern Germany. In total we took 392 images.
We used a Fuji X-T2 camera with an image size of 24MP(4000x6000px). These images were labeled and directly fed into the model.
There are several standard annotation schemas, such as Pascal VOC or COCO. Since detectron2 has a built-in dataloader that works with the COCO-annotation format we decided to use COCO.
This Dataset and Notebook are published under MIT licence.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset comprises 4 different actions in tennis, each action has 500 images and a COCO-format JSON files. The images in the dataset were extracted frame by frame from videos that were self-recorded, and manually classified according to different tennis actions.
The actions in this dataset, the action categories name in COCO-format is in brackets: 1. backhand shot (backhand) 2. forehand shot (forehand) 3. ready position (ready_position) 4. serve (serve)
We organize two main directories: annotations and images. - annotations: the JSON files of the actions (COCO-format) - images: the images of the actions (according four actions classify to four folders)
We use COCO-Annotator to annotating and categorizing human actions. And we annotate the key points are in following (refer to OpenPose's annotation): ["nose", "left_eye", "right_eye", "left_ear", "right_ear", "left_shoulder", "right_shoulder", "left_elbow", "right_elbow", "left_wrist", "right_wrist", "left_hip", "right_hip", "left_knee", "right_knee", "left_ankle", "right_ankle", "neck"]
The dataset comprises 4 different actions in tennis, each action have 500 images and a COCO-format JSON files. Size on disk is 508 MB (533,372,928 bytes).
National Taichung University of Science and Technology, National Kaohsiung University of Science and Technology
Computer Vision, Image Processing, Tennis, Action Recognition
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Tinsae Bahiru
Released under Apache 2.0