38 datasets found
  1. g

    COCO Dataset 2017

    • gts.ai
    json
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS, COCO Dataset 2017 [Dataset]. https://gts.ai/dataset-download/coco-dataset-2017/
    Explore at:
    jsonAvailable download formats
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The MS COCO (Microsoft Common Objects in Context) 2017 dataset is a large-scale benchmark for object detection, segmentation, key-point detection, and image captioning. It includes over 328K images with comprehensive annotations that drive advancements in computer vision research.

  2. keypoint-detection coco dataset

    • kaggle.com
    zip
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tinsae Bahiru (2024). keypoint-detection coco dataset [Dataset]. https://www.kaggle.com/datasets/tinsaiebahiru/keypoint-detection-coco-dataset
    Explore at:
    zip(9555 bytes)Available download formats
    Dataset updated
    Nov 6, 2024
    Authors
    Tinsae Bahiru
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Tinsae Bahiru

    Released under Apache 2.0

    Contents

  3. Multi Task COCO 2012

    • kaggle.com
    Updated Jan 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikdintel (2025). Multi Task COCO 2012 [Dataset]. https://www.kaggle.com/datasets/snikhilrao/coco-multitask-dataset-2024
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 15, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Nikdintel
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Multitask COCO Dataset – Detection, Keypoints, and Segmentation

    This publicly available Multitask COCO dataset has been preprocessed for seamless use in object detection, keypoint detection, and segmentation tasks. It enables multi-label annotations for COCO, ensuring robust performance across various vision applications. Special thanks to yermandy for providing access to multi-label annotations.

    Optimized for deep learning models, this dataset is structured for easy integration into training pipelines, supporting diverse applications in computer vision research.

  4. coco dataset

    • kaggle.com
    zip
    Updated Jul 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ProgramerSalar (2025). coco dataset [Dataset]. https://www.kaggle.com/datasets/salargamer/coco-dataset
    Explore at:
    zip(20043918455 bytes)Available download formats
    Dataset updated
    Jul 5, 2025
    Authors
    ProgramerSalar
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The COCO dataset is a foundational large-scale benchmark for object detection, segmentation, captioning, and keypoint analysis. Created by Microsoft, it features complex everyday scenes with common objects in their natural contexts. With over 330,000 images and 2.5 million labeled instances, it has become the gold standard for training and evaluating computer vision models.

    File Information

    images/
    Contains 2 subdirectories split by usage:
    train2017/: Main training set (118K images)
    val2017/: Validation set (5K images)
    File Naming: 000000000009.jpg (12-digit zero-padded IDs)
    Formats: JPEG images with varying resolutions (average 640×480)
    
    annotations/
    Contains task-specific JSON files with consistent naming:
    captions_*.json: 5 human-generated descriptions per image
    
  5. R

    Microsoft Coco Dataset

    • universe.roboflow.com
    zip
    Updated Jul 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 23, 2025
    Dataset authored and provided by
    Microsoft
    Variables measured
    Object Bounding Boxes
    Description

    Microsoft Common Objects in Context (COCO) Dataset

    The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

    While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

    The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

    The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

  6. coco formatted infant pose estimates for: Computer vision to automatically...

    • figshare.com
    txt
    Updated Nov 6, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Melanie Segado; Claire Chambers; Nidhi Seethapathi; Rachit Saluja; Michelle J. Johnson; Konrad Paul Kording (2024). coco formatted infant pose estimates for: Computer vision to automatically assess infant neuromotor risk [Dataset]. http://doi.org/10.6084/m9.figshare.25316500.v1
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 6, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Melanie Segado; Claire Chambers; Nidhi Seethapathi; Rachit Saluja; Michelle J. Johnson; Konrad Paul Kording
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COCO formatted 2D skeletal keypoints for YouTube and Clinical datasets from: Computer vision to automatically assess infant neuromotor risk (Chambers et al 2020) extracted with ViTPose-H implemented in MMPose. For original dataset and data see: https://figshare.com/s/10034c230ad9b2b2a6a4Includes:json files with bounding boxes and 2D keypoints/confidencesvideo metadata (fps, original dimensions)Data descriptions:Youtube Dataset: 94 infants, 19 excluded19 annotations removed for meeting one or more of the following exclusion criteria:8 Partially overlapping twins4 NICU and/or hospital settings1 In water1 On a rocker2 Face occluded by caregiver and/or toy3 Low contrast/very poor video qualityClinical Dataset: 19 infants (31 videos total)Additional data available in original figshare

  7. Microsoft Coco Pose Detection Dataset

    • universe.roboflow.com
    zip
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2025). Microsoft Coco Pose Detection Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco-pose-detection/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    Variables measured
    Objects
    Description

    Microsoft COCO Pose Detection

    ## Overview
    
    Microsoft COCO Pose Detection is a dataset for computer vision tasks - it contains Objects annotations for 5,105 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
  8. COCO 2017 Keypoints

    • kaggle.com
    zip
    Updated Nov 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muhammad Asaduddin (2023). COCO 2017 Keypoints [Dataset]. https://www.kaggle.com/asad11914/coco-2017-keypoints
    Explore at:
    zip(9604631984 bytes)Available download formats
    Dataset updated
    Nov 22, 2023
    Authors
    Muhammad Asaduddin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This Is Keypoint-Only subset from COCO 2017 Dataset. You can access the original COCO Dataset from here

    This Dataset contains three folders: annotations, val2017, and train2017. - Contents in annotation folder is two jsons, for val dan train. Each jsons contains various informations, like the image id, bounding box, and keypoints locations. - Contents of val2017 and train2017 is various images that have been filtered. They are the images that have num_keypoints > 0 according to the annotation file.

  9. h

    COCO

    • huggingface.co
    • datasets.activeloop.ai
    Updated Feb 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2023). COCO [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/COCO
    Explore at:
    Dataset updated
    Feb 6, 2023
    Dataset authored and provided by
    HuggingFaceM4
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.

  10. Z

    ActiveHuman Part 2

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Nov 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charalampos Georgiadis (2023). ActiveHuman Part 2 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8361113
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Aristotle University of Thessaloniki (AUTh)
    Authors
    Charalampos Georgiadis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is Part 2/2 of the ActiveHuman dataset! Part 1 can be found here. Dataset Description ActiveHuman was generated using Unity's Perception package. It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals). The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset. Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

    Folder configuration The dataset consists of 3 folders:

    JSON Data: Contains all the generated JSON files. RGB Images: Contains the generated RGB images. Semantic Segmentation Images: Contains the generated semantic segmentation images.

    Essential Terminology

    Annotation: Recorded data describing a single capture. Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.). Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor). Ego coordinate system: Coordinates with respect to the ego. Global coordinate system: Coordinates with respect to the global origin in Unity. Sensor: Device that captures the dataset (in this instance the sensor is a camera). Sensor coordinate system: Coordinates with respect to the sensor. Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital. UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.

    Dataset Data The dataset includes 4 types of JSON annotation files files:

    annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:

    id: Integer identifier of the annotation's definition. name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation). description: Description of the annotation's specifications. format: Format of the file containing the annotation specifications (e.g., json, PNG). spec: Format-specific specifications for the annotation values generated by each Labeler.

    Most Labelers generate different annotation specifications in the spec key-value pair:

    BoundingBox2DLabeler/BoundingBox3DLabeler:

    label_id: Integer identifier of a label. label_name: String identifier of a label. KeypointLabeler:

    template_id: Keypoint template UUID. template_name: Name of the keypoint template. key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:

    label: Joint label. index: Joint index. color: RGBA values of the keypoint. color_code: Hex color code of the keypoint skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:

    label1: Label of the first joint. label2: Label of the second joint. joint1: Index of the first joint. joint2: Index of the second joint. color: RGBA values of the connection. color_code: Hex color code of the connection. SemanticSegmentationLabeler:

    label_name: String identifier of a label. pixel_value: RGBA values of the label. color_code: Hex color code of the label.

    captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:

    id: UUID of the capture. sequence_id: UUID of the sequence. step: Index of the capture within a sequence. timestamp: Timestamp (in ms) since the beginning of a sequence. sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:

    sensor_id: Sensor UUID. ego_id: Ego UUID. modality: Modality of the sensor (e.g., camera, radar). translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system. rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system. camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration. projection: Projection type used by the camera (e.g., orthographic, perspective). ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:

    ego_id: Ego UUID. translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system. rotation: Quaternion variable containing the ego's orientation. velocity: 3D vector containing the ego's velocity (in meters per second). acceleration: 3D vector containing the ego's acceleration (in ). format: Format of the file captured by the sensor (e.g., PNG, JPG). annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:

    id: Annotation UUID . annotation_definition: Integer identifier of the annotation's definition. filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image. values: List of key-value pairs containing annotation data for the current Labeler.

    Each Labeler generates different annotation specifications in the values key-value pair:

    BoundingBox2DLabeler:

    label_id: Integer identifier of a label. label_name: String identifier of a label. instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values. x: Position of the 2D bounding box on the X axis. y: Position of the 2D bounding box position on the Y axis. width: Width of the 2D bounding box. height: Height of the 2D bounding box. BoundingBox3DLabeler:

    label_id: Integer identifier of a label. label_name: String identifier of a label. instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values. translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters). size: 3D vector containing the size of the 3D bounding box (in meters) rotation: Quaternion variable containing the orientation of the 3D bounding box. velocity: 3D vector containing the velocity of the 3D bounding box (in meters per second). acceleration: 3D vector containing the acceleration of the 3D bounding box acceleration (in ). KeypointLabeler:

    label_id: Integer identifier of a label. instance_id: UUID of one instance of a joint. Keypoints with the same joint label that are visible on the same capture have different instance_id values. template_id: UUID of the keypoint template. pose: Pose label for that particular capture. keypoints: Array containing the properties of each keypoint. Each keypoint that exists in the keypoint template file is one element of the array. Each entry's contents have as follows:

    index: Index of the keypoint in the keypoint template file. x: Pixel coordinates of the keypoint on the X axis. y: Pixel coordinates of the keypoint on the Y axis. state: State of the keypoint.

    The SemanticSegmentationLabeler does not contain a values list.

    egos.json: Contains collections of key-value pairs for each ego. These include:

    id: UUID of the ego. description: Description of the ego. sensors.json: Contains collections of key-value pairs for all sensors of the simulation. These include:

    id: UUID of the sensor. ego_id: UUID of the ego on which the sensor is attached. modality: Modality of the sensor (e.g., camera, radar, sonar). description: Description of the sensor (e.g., camera, radar).

    Image names The RGB and semantic segmentation images share the same image naming convention. However, the semantic segmentation images also contain the string Semantic_ at the beginning of their filenames. Each RGB image is named "e_h_l_d_r.jpg", where:

    e denotes the id of the environment. h denotes the id of the person. l denotes the id of the lighting condition. d denotes the camera distance at which the image was captured. r denotes the camera angle at which the image was captured.

  11. R

    Coco Kp Dataset

    • universe.roboflow.com
    zip
    Updated Sep 3, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tapway (2024). Coco Kp Dataset [Dataset]. https://universe.roboflow.com/tapway-fyofi/coco-kp
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 3, 2024
    Dataset authored and provided by
    Tapway
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    CuRQ
    Description

    Coco Kp

    ## Overview
    
    Coco Kp is a dataset for computer vision tasks - it contains  CuRQ annotations for 319 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  12. COCO 2017

    • kaggle.com
    zip
    Updated Nov 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikdintel (2024). COCO 2017 [Dataset]. https://www.kaggle.com/datasets/snikhilrao/coco-2017
    Explore at:
    zip(26884588931 bytes)Available download formats
    Dataset updated
    Nov 14, 2024
    Authors
    Nikdintel
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    📌 What's Included:

    • Training Set: 118K images with annotations for detection, segmentation, and keypoints.
    • Validation Set: 5K images with full annotations for validation.
    • Testing Set: Images are divided into two splits—dev and challenge—replacing the four splits (dev, standard, reserve, challenge) used in previous years.
    • Stuff Annotations: Available for 40K images in the training set and 5K validation images, enabling semantic segmentation research.
    • Unlabeled Data: A set of 120K images with no annotations, mirroring the class distribution of the labeled data. This is ideal for exploring semi-supervised learning techniques.

    🔍 Key Changes in COCO 2017:

    • The train/val split was updated based on community feedback, now featuring 118K/5K images instead of the previous 83K/41K split.
    • While the annotations for detection and keypoints are consistent with previous years, additional stuff annotations were introduced in 2017.
    • Unlabeled data is now available for semi-supervised learning tasks, opening new avenues for experimentation.

    📂 Dataset Structure:

    • train2017: Images and annotations
    • val2017: Images and annotations
    • test2017: Images (no annotations provided)
    • unlabeled2017: Unlabeled images

    This dataset can be used for a variety of computer vision tasks, including object detection, instance segmentation, keypoint detection, semantic segmentation, and image captioning. Whether you're working on supervised or semi-supervised learning, this resource is designed to meet your needs.

  13. R

    From_coco Dataset

    • universe.roboflow.com
    zip
    Updated Sep 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CV (2025). From_coco Dataset [Dataset]. https://universe.roboflow.com/cv-7ypnf/from_coco-cmj7b
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2025
    Dataset authored and provided by
    CV
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Armor WTdO
    Description

    From_coco

    ## Overview
    
    From_coco is a dataset for computer vision tasks - it contains Armor WTdO annotations for 9,293 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  14. Z

    ActiveHuman Part 1

    • data.niaid.nih.gov
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charalampos Georgiadis (2023). ActiveHuman Part 1 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8359765
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Aristotle University of Thessaloniki (AUTh)
    Authors
    Charalampos Georgiadis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here. Dataset Description ActiveHuman was generated using Unity's Perception package. It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals). The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset. Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

    Folder configuration The dataset consists of 3 folders:

    JSON Data: Contains all the generated JSON files. RGB Images: Contains the generated RGB images. Semantic Segmentation Images: Contains the generated semantic segmentation images.

    Essential Terminology

    Annotation: Recorded data describing a single capture. Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.). Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor). Ego coordinate system: Coordinates with respect to the ego. Global coordinate system: Coordinates with respect to the global origin in Unity. Sensor: Device that captures the dataset (in this instance the sensor is a camera). Sensor coordinate system: Coordinates with respect to the sensor. Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital. UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.

    Dataset Data The dataset includes 4 types of JSON annotation files files:

    annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:

    id: Integer identifier of the annotation's definition. name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation). description: Description of the annotation's specifications. format: Format of the file containing the annotation specifications (e.g., json, PNG). spec: Format-specific specifications for the annotation values generated by each Labeler.

    Most Labelers generate different annotation specifications in the spec key-value pair:

    BoundingBox2DLabeler/BoundingBox3DLabeler:

    label_id: Integer identifier of a label. label_name: String identifier of a label. KeypointLabeler:

    template_id: Keypoint template UUID. template_name: Name of the keypoint template. key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:

    label: Joint label. index: Joint index. color: RGBA values of the keypoint. color_code: Hex color code of the keypoint skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:

    label1: Label of the first joint. label2: Label of the second joint. joint1: Index of the first joint. joint2: Index of the second joint. color: RGBA values of the connection. color_code: Hex color code of the connection. SemanticSegmentationLabeler:

    label_name: String identifier of a label. pixel_value: RGBA values of the label. color_code: Hex color code of the label.

    captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:

    id: UUID of the capture. sequence_id: UUID of the sequence. step: Index of the capture within a sequence. timestamp: Timestamp (in ms) since the beginning of a sequence. sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:

    sensor_id: Sensor UUID. ego_id: Ego UUID. modality: Modality of the sensor (e.g., camera, radar). translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system. rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system. camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration. projection: Projection type used by the camera (e.g., orthographic, perspective). ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:

    ego_id: Ego UUID. translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system. rotation: Quaternion variable containing the ego's orientation. velocity: 3D vector containing the ego's velocity (in meters per second). acceleration: 3D vector containing the ego's acceleration (in ). format: Format of the file captured by the sensor (e.g., PNG, JPG). annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:

    id: Annotation UUID . annotation_definition: Integer identifier of the annotation's definition. filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image. values: List of key-value pairs containing annotation data for the current Labeler.

    Each Labeler generates different annotation specifications in the values key-value pair:

    BoundingBox2DLabeler:

    label_id: Integer identifier of a label. label_name: String identifier of a label. instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values. x: Position of the 2D bounding box on the X axis. y: Position of the 2D bounding box position on the Y axis. width: Width of the 2D bounding box. height: Height of the 2D bounding box. BoundingBox3DLabeler:

    label_id: Integer identifier of a label. label_name: String identifier of a label. instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values. translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters). size: 3D vector containing the size of the 3D bounding box (in meters) rotation: Quaternion variable containing the orientation of the 3D bounding box. velocity: 3D vector containing the velocity of the 3D bounding box (in meters per second). acceleration: 3D vector containing the acceleration of the 3D bounding box acceleration (in ). KeypointLabeler:

    label_id: Integer identifier of a label. instance_id: UUID of one instance of a joint. Keypoints with the same joint label that are visible on the same capture have different instance_id values. template_id: UUID of the keypoint template. pose: Pose label for that particular capture. keypoints: Array containing the properties of each keypoint. Each keypoint that exists in the keypoint template file is one element of the array. Each entry's contents have as follows:

    index: Index of the keypoint in the keypoint template file. x: Pixel coordinates of the keypoint on the X axis. y: Pixel coordinates of the keypoint on the Y axis. state: State of the keypoint.

    The SemanticSegmentationLabeler does not contain a values list.

    egos.json: Contains collections of key-value pairs for each ego. These include:

    id: UUID of the ego. description: Description of the ego. sensors.json: Contains collections of key-value pairs for all sensors of the simulation. These include:

    id: UUID of the sensor. ego_id: UUID of the ego on which the sensor is attached. modality: Modality of the sensor (e.g., camera, radar, sonar). description: Description of the sensor (e.g., camera, radar).

    Image names The RGB and semantic segmentation images share the same image naming convention. However, the semantic segmentation images also contain the string Semantic_ at the beginning of their filenames. Each RGB image is named "e_h_l_d_r.jpg", where:

    e denotes the id of the environment. h denotes the id of the person. l denotes the id of the lighting condition. d denotes the camera distance at which the image was captured. r denotes the camera angle at which the image was captured.

  15. n

    COCO

    • scidm.nchc.org.tw
    Updated Oct 10, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). COCO [Dataset]. https://scidm.nchc.org.tw/dataset/coco
    Explore at:
    Dataset updated
    Oct 10, 2020
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COCO is a large-scale object detection, segmentation, and captioning dataset. http://cocodataset.org COCO has several features: Object segmentation Recognition in context Superpixel stuff segmentation 330K images (>200K labeled) 1.5 million object instances 80 object categories 91 stuff categories 5 captions per image * 250,000 people with keypoints

  16. Mini Human Pose Dataset

    • kaggle.com
    zip
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Legosy7 (2024). Mini Human Pose Dataset [Dataset]. https://www.kaggle.com/datasets/legosy7/mini-human-pose-dataset/code
    Explore at:
    zip(463579720 bytes)Available download formats
    Dataset updated
    Apr 15, 2024
    Authors
    Legosy7
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This dataset is extracted from 2017 COCO keypoints dataset. I filtered out images with only one person in the frame for my specific project. Because it is derived from COCO keypoint dataset so it's annotation format is similar to COCO. Feel free to use it 😁.

  17. Hand Keypoint Dataset 26K

    • kaggle.com
    zip
    Updated May 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RION DSILVA (2024). Hand Keypoint Dataset 26K [Dataset]. https://www.kaggle.com/datasets/riondsilva21/hand-keypoint-dataset-26k/code
    Explore at:
    zip(393611785 bytes)Available download formats
    Dataset updated
    May 26, 2024
    Authors
    RION DSILVA
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Hand Keypoint Dataset

    Overview

    This dataset contains 26,768 images of hands annotated with keypoints, making it suitable for training models for hand detection and keypoint estimation. The annotations were generated using the MediaPipe library, ensuring high accuracy and consistency. The dataset is compatible with both COCO and YOLOv8 formats.

    Dataset Structure

    The dataset is organized as follows:

    hand_keypoint_dataset/ │ ├── images/ │ ├── train/ │ ├── val/ │ ├── coco_annotation/ │ ├── train/ │ │ ├── _annotations.coco.json │ ├── val/ │ │ ├── _annotations.coco.json │
    ├── labels/ │ ├── train/ │ ├── val/ │ └── README.md

    images: Contains all the images divided into training and validation. annotations: Contains the annotations for the images in COCO. labels: Contains the annotations for the images in YOLO formats.

    Keypoints

    The dataset includes keypoints for hand detection. The keypoints are annotated as follows:

    1. Wrist
    2. Thumb (4 points)
    3. Index finger (4 points)
    4. Middle finger (4 points)
    5. Ring finger (4 points)
    6. Little finger (4 points)

    Each hand has a total of 21 keypoints.

    Usage

    COCO Format

    To use the dataset with COCO-compatible models, you can directly load the JSON files using COCO APIs available in various deep learning frameworks.

    YOLOv8 Format

    For YOLOv8, ensure you have the required environment set up. You can use the provided text files to train YOLOv8 models by specifying the dataset path in your configuration file.

    Credits

    We would like to thank the following sources for providing the images used in this dataset:

    https://sites.google.com/view/11khands https://www.kaggle.com/datasets/ritikagiridhar/2000-hand-gestures https://www.kaggle.com/datasets/imsparsh/gesture-recognition

    The images were collected and used under the respective licenses provided by each platform.

    For any questions or issues, please contact its.riondsilva@gmail.com

    Thank you for using the Hand Keypoint Dataset!

  18. Parcel3D - A Synthetic Dataset of Damaged and Intact Parcel Images with 2D...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Jul 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alexander Naumann; Alexander Naumann; Felix Hertlein; Felix Hertlein; Laura Dörr; Laura Dörr; Kai Furmans; Kai Furmans (2023). Parcel3D - A Synthetic Dataset of Damaged and Intact Parcel Images with 2D and 3D Annotations [Dataset]. http://doi.org/10.5281/zenodo.8032204
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 13, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alexander Naumann; Alexander Naumann; Felix Hertlein; Felix Hertlein; Laura Dörr; Laura Dörr; Kai Furmans; Kai Furmans
    Description

    Synthetic dataset of over 13,000 images of damaged and intact parcels with full 2D and 3D annotations in the COCO format. For details see our paper and for visual samples our project page.


    Relevant computer vision tasks:

    • bounding box detection
    • classification
    • instance segmentation
    • keypoint estimation
    • 3D bounding box estimation
    • 3D voxel reconstruction
    • 3D reconstruction

    The dataset is for academic research use only, since it uses resources with restrictive licenses.
    For a detailed description of how the resources are used, we refer to our paper and project page.

    Licenses of the resources in detail:

    You can use our textureless models (i.e. the obj files) of damaged parcels under CC BY 4.0 (note that this does not apply to the textures).

    If you use this resource for scientific research, please consider citing

    @inproceedings{naumannParcel3DShapeReconstruction2023,
      author  = {Naumann, Alexander and Hertlein, Felix and D\"orr, Laura and Furmans, Kai},
      title   = {Parcel3D: Shape Reconstruction From Single RGB Images for Applications in Transportation Logistics},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
      month   = {June},
      year   = {2023},
      pages   = {4402-4412}
    }
  19. h

    roboflow-garlic

    • huggingface.co
    Updated Dec 17, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Lips (2024). roboflow-garlic [Dataset]. https://huggingface.co/datasets/tlpss/roboflow-garlic
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 17, 2024
    Authors
    Thomas Lips
    Description

    Garlic Keypoint Detection dataset

    This dataset contains 1000 images of a single garlic clove in a pressumably industrial setting. The annotations are coco-formatted and are composed of a bounding box and 2 keypoints: head and tail. The dataset was taken from https://universe.roboflow.com/gesture-recognition-dsn2n/garlic_keypoint/dataset/1. Refer to the original repo for licensing questions. The annotations json files were slightly modified (formatting, image base directory,..)… See the full description on the dataset page: https://huggingface.co/datasets/tlpss/roboflow-garlic.

  20. Ear210_Dataset_coco

    • kaggle.com
    zip
    Updated Jun 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HongCheng (2023). Ear210_Dataset_coco [Dataset]. https://www.kaggle.com/datasets/chg0901/ear210-dataset-coco/discussion
    Explore at:
    zip(712272354 bytes)Available download formats
    Dataset updated
    Jun 2, 2023
    Authors
    HongCheng
    Description

    Ear acupoint key point detection data set, MS COCO format, divided into training set and test set, and written a sample config configuration file for openMMLab mmPose and mmDet Markers: Zhang Zihao, Tian Wenbo

    耳朵穴位关键点检测数据集,MS COCO格式,划分好了训练集和测试集,并写好了样例config配置文件 链接: https://pan.baidu.com/s/1swTLpArj7XEDXW4d0lo7Mg 提取码: 741p 标注人:张子豪、田文博

    I share this dataset for the openMMLab 2rd AI Camp.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GTS, COCO Dataset 2017 [Dataset]. https://gts.ai/dataset-download/coco-dataset-2017/

COCO Dataset 2017

Explore at:
45 scholarly articles cite this dataset (View in Google Scholar)
jsonAvailable download formats
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

The MS COCO (Microsoft Common Objects in Context) 2017 dataset is a large-scale benchmark for object detection, segmentation, key-point detection, and image captioning. It includes over 328K images with comprehensive annotations that drive advancements in computer vision research.

Search
Clear search
Close search
Google apps
Main menu