COCO-WholeBody is an extension of COCO dataset with whole-body annotations. There are 4 types of bounding boxes (person box, face box, left-hand box, and right-hand box) and 133 keypoints (17 for body, 6 for feet, 68 for face and 42 for hands) annotations for each person in the image.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.
The COCO keypoint-2017 dataset contains over 200,000 images and 250,000 human instances labeled with 17 keypoints.
The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.
While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.
The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.
The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for "COCO Keypoints"
Quick Start
Usage
from datasets.load import load_dataset
dataset = load_dataset('whyen-wang/coco_keypoints') example = dataset['train'][0] print(example) {'image':
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
https://choosealicense.com/licenses/gpl-3.0/https://choosealicense.com/licenses/gpl-3.0/
CropCOCO Dataset
CropCOCO is a validation-only dataset of COCO val 2017 images cropped such that some keypoints annotations are outside of the image. It can be used for keypoint detection, out-of-image keypoint detection and localization, person detection and amodal person detection.
📦 Dataset Details
Total images: 4,114 Annotations: COCO-style (bounding boxes, human keypoints, both in and out-of-image)Resolution: Varies Format: JSON annotations + JPG images… See the full description on the dataset page: https://huggingface.co/datasets/vrg-prague/CropCOCO.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
COCO is a large-scale object detection, segmentation, and captioning dataset. http://cocodataset.org COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image * 250,000 people with keypoints
Description:
👉 Download the dataset here
This dataset has been specifically curated for cow pose estimation, designed to enhance animal behavior analysis and monitoring through computer vision techniques. The dataset is annotated with 12 keypoints on the cow’s body, enabling precise tracking of body movements and posture. It is structured in the COCO format, making it compatible with popular deep learning models like YOLOv8, OpenPose, and others designed for object detection and keypoint estimation tasks.
Applications:
This dataset is ideal for agricultural tech solutions, veterinary care, and animal behavior research. It can be used in various use cases such as health monitoring, activity tracking, and early disease detection in cattle. Accurate pose estimation can also assist in optimizing livestock management by understanding animal movement patterns and detecting anomalies in their gait or behavior.
Download Dataset
Keypoint Annotations:
The dataset includes the following 12 keypoints, strategically marked to represent significant anatomical features of cows:
Nose: Essential for head orientation and overall movement tracking.
Right Eye: Helps in head pose estimation.
Left Eye: Complements the right eye for accurate head direction.
Neck (side): Marks the side of the neck, key for understanding head and body coordination.
Left Front Hoof: Tracks the front left leg movement.
Right Front Hoof: Tracks the front right leg movement.
Left Back Hoof: Important for understanding rear leg motion.
Right Back Hoof: Completes the leg movement tracking for both sides.
Backbone (side): Vital for posture and overall body orientation analysis.
Tail Root: Used for tracking tail movements and posture shifts.
Backpose Center (near tail’s midpoint): Marks the midpoint of the back, crucial for body stability and movement analysis.
Stomach (center of side pose): Helps in identifying body alignment and weight distribution.
Dataset Format:
The data is structure in the COCO format, with annotations that include image coordinates for each keypoint. This format is highly suitable for integration into popular deep learning frameworks. Additionally, the dataset includes metadata like bounding boxes, image sizes, and segmentation masks to provide detail context for each cow in an image.
Compatibility:
This dataset is optimize for use with cutting-edge pose estimation models such as YOLOv8 and other keypoint detection models like DeepLabCut and HRNet, enabling efficient training and inference for cow pose tracking. It can be seamlessly integrate into existing machine learning pipelines for both real-time and post-processed analysis.
This dataset is sourced from Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is Part 2/2 of the ActiveHuman dataset! Part 1 can be found here.
Dataset Description
ActiveHuman was generated using Unity's Perception package.
It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).
The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.
Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.
Folder configuration
The dataset consists of 3 folders:
Essential Terminology
Dataset Data
The dataset includes 4 types of JSON annotation files files:
Most Labelers generate different annotation specifications in the spec key-value pair:
Each Labeler generates different annotation specifications in the values key-value pair:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Real-world dataset of ~400 images of cuboid-shaped parcels with full 2D and 3D annotations in the COCO format.
Relevant computer vision tasks:
bounding box detection
instance segmentation
keypoint estimation
3D bounding box estimation
3D voxel reconstruction (.binvox files)
3D reconstruction (.obj files)
For details, see our paper and project page.
If you use this resource for scientific research, please consider citing
@inproceedings{naumannScrapeCutPasteLearn2022, title = {Scrape, Cut, Paste and Learn: Automated Dataset Generation Applied to Parcel Logistics}, author = {Naumann, Alexander and Hertlein, Felix and Zhou, Benchun and Dörr, Laura and Furmans, Kai}, booktitle = {{{IEEE Conference}} on {{Machine Learning}} and Applications ({{ICMLA}})}, date = 2022 }
Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.
Relevant computer vision tasks:
The dataset is for academic research use only, since it uses resources with restrictive licenses.
For a detailed description of how the resources are used, we refer to our paper and project page.
Licenses of the resources in detail:
You can use our textureless models (i.e. the obj files) of damaged parcels under CC BY 4.0 (note that this does not apply to the textures).
If you use this resource for scientific research, please consider citing
@inproceedings{naumannParcel3DShapeReconstruction2023,
author = {Naumann, Alexander and Hertlein, Felix and D\"orr, Laura and Furmans, Kai},
title = {Parcel3D: Shape Reconstruction From Single RGB Images for Applications in Transportation Logistics},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2023},
pages = {4402-4412}
}
The OccludedPASCAL3D+ is a dataset is designed to evaluate the robustness to occlusion for a number of computer vision tasks, such as object detection, keypoint detection and pose estimation. In the OccludedPASCAL3D+ dataset, we simulate partial occlusion by superimposing objects cropped from the MS-COCO dataset on top of objects from the PASCAL3D+ dataset. We only use ImageNet subset in PASCAL3D+, which has 10812 testing images.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Kwon-Young Choi
Released under MIT
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.
------------------
./actions/speaking_status:
./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status
The processed annotations consist of:
./speaking: The first row contains person IDs matching the sensor IDs,
The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).
./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.
To load these files with pandas: pd.read_csv(p, index_col=False)
./raw.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
--------------------
./pose:
./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints
To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))
The skeleton structure (limbs) is contained within each file in:
f['categories'][0]['skeleton']
and keypoint names at:
f['categories'][0]['keypoints']
./raw.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
---------------------
./f_formations:
seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.
First column: time stamp
Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.
phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone
A large-scale dataset named AIC (AI Challenger) with three sub-datasets, human keypoint detection (HKD), large-scale attribute dataset (LAD) and image Chinese captioning (ICC).
http://www.apache.org/licenses/LICENSE-2.0http://www.apache.org/licenses/LICENSE-2.0
Description: The "iRodent" dataset contains rodent species observations obtained using the iNaturalist API, with a focus on Suborder Myomorpha (Taxon ID: 16). The dataset features prominent rodent species like Muskrat, Brown Rat, House Mouse, Black Rat, Hispid Cotton Rat, Meadow Vole, Bank Vole, Deer Mouse, White-footed Mouse, and Striped Field Mouse. The dataset provides manually labeled keypoints for pose estimation and segmentation masks for a subset of images using a Mask R-CNN model.
Creator: Adaptive Motor Control Lab
Data Format: COCO format
Number of Images: 443
Species: Muskrat, Brown Rat, House Mouse, Black Rat, Hispid Cotton Rat, Meadow Vole, Bank Vole, Deer Mouse, White-footed Mouse, Striped Field Mouse
Image Resolution: Varied (800x600 to 5184x3456 pixels)
Annotations: Pose keypoints and generated segmentation masks by Tian Qiu and Mackenzie Mathis.
License: Apache 2.0
Keywords: animal pose estimation, behaviour analysis, keypoints, rodent
Contact: Mackenzie Mathis
Email: mackenzie.mathis@epfl.ch
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TAMPAR is a real-world dataset of parcel photos for tampering detection with annotations in COCO format. For details see our paper and for visual samples our project page. Features are:
Relevant computer vision tasks:
If you use this resource for scientific research, please consider citing our WACV 2024 paper "TAMPAR: Visual Tampering Detection for Parcel Logistics in Postal Supply Chains".
https://choosealicense.com/licenses/agpl-3.0/https://choosealicense.com/licenses/agpl-3.0/
Human keypoint dataset of anime/manga-style character illustrations. Extension of the AnimeDrawingsDataset, with additional features: all 17 COCO-compliant human keypoints character bounding boxes 2000 additional samples (4000 total) from Danbooru with difficult tags Useful for pose estimation of illustrated characters, which allows downstream tasks such as pose-guided reference drawing retrieval (e.g. Hermit Purple).
Garlic Keypoint Detection dataset
This dataset contains 1000 images of a single garlic clove in a pressumably industrial setting. The annotations are coco-formatted and are composed of a bounding box and 2 keypoints: head and tail. The dataset was taken from https://universe.roboflow.com/gesture-recognition-dsn2n/garlic_keypoint/dataset/1. Refer to the original repo for licensing questions. The annotations json files were slightly modified (formatting, image base directory,..)… See the full description on the dataset page: https://huggingface.co/datasets/tlpss/roboflow-garlic.
COCO-WholeBody is an extension of COCO dataset with whole-body annotations. There are 4 types of bounding boxes (person box, face box, left-hand box, and right-hand box) and 133 keypoints (17 for body, 6 for feet, 68 for face and 42 for hands) annotations for each person in the image.