44 datasets found
  1. R

    Yolo To Coco Json Dataset

    • universe.roboflow.com
    zip
    Updated Feb 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    cocoforrcnn (2025). Yolo To Coco Json Dataset [Dataset]. https://universe.roboflow.com/cocoforrcnn/yolo-to-coco-json-7ot5m/model/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 24, 2025
    Dataset authored and provided by
    cocoforrcnn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Objects Bounding Boxes
    Description

    Yolo To Coco Json

    ## Overview
    
    Yolo To Coco Json is a dataset for object detection tasks - it contains Objects annotations for 1,954 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  2. Sartorius COCO Format Dataset

    • kaggle.com
    zip
    Updated Oct 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ari (2021). Sartorius COCO Format Dataset [Dataset]. https://www.kaggle.com/vexxingbanana/sartorius-coco-format-dataset
    Explore at:
    zip(9798602 bytes)Available download formats
    Dataset updated
    Oct 28, 2021
    Authors
    Ari
    Description

    Dataset

    This dataset was created by Ari

    Contents

  3. f

    Databases in MS COCO (json) format

    • figshare.com
    • springernature.figshare.com
    zip
    Updated Nov 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Robert Klopfleisch; Andreas Maier; Marc Aubreville; Christof Bertram; Christian Marzahl (2020). Databases in MS COCO (json) format [Dataset]. http://doi.org/10.6084/m9.figshare.12805244.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Nov 20, 2020
    Dataset provided by
    figshare
    Authors
    Robert Klopfleisch; Andreas Maier; Marc Aubreville; Christof Bertram; Christian Marzahl
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Databases in MS COCO (json) format

  4. coco json file of mtsd train and val

    • kaggle.com
    zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GREAT23U5 (2024). coco json file of mtsd train and val [Dataset]. https://www.kaggle.com/datasets/great23u5/coco-json-file-of-mtsd-train-and-val/code
    Explore at:
    zip(16813160 bytes)Available download formats
    Dataset updated
    Jul 17, 2024
    Authors
    GREAT23U5
    Description

    Dataset

    This dataset was created by GREAT23U5

    Contents

  5. COCO-JSON Annotated Wind Turbine Surface Damage

    • kaggle.com
    zip
    Updated May 11, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajifoster3 (2022). COCO-JSON Annotated Wind Turbine Surface Damage [Dataset]. https://www.kaggle.com/datasets/ajifoster3/cocojson-annotated-wind-turbine-surface-damage/data
    Explore at:
    zip(300522633 bytes)Available download formats
    Dataset updated
    May 11, 2022
    Authors
    Ajifoster3
    Description

    Dataset

    This dataset was created by Ajifoster3

    Contents

  6. Z

    COCO dataset and neural network weights for micro-FTIR particle detection on...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Schowing, Thibault (2024). COCO dataset and neural network weights for micro-FTIR particle detection on filters. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10839526
    Explore at:
    Dataset updated
    Aug 13, 2024
    Dataset authored and provided by
    Schowing, Thibault
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.

    More information about the project here.

    Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.

    Contents:

    Weights File (neuralNetWeights_V3.pth):

    Format: .pth

    Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.

    Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):

    Format: .zip

    Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.

    Contents:

    Images: JPEG format images of micro-FTIR filters.

    Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.

    Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.

    Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.

    Usage Notes:

    The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.

    The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.

    Code can be found on the related Github repository.

  7. Esefjorden Marine Vegetation Segmentation Dataset (EMVSD)

    • figshare.com
    bin
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn Christian Weinbach (2024). Esefjorden Marine Vegetation Segmentation Dataset (EMVSD) [Dataset]. http://doi.org/10.6084/m9.figshare.24072606.v4
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    figshare
    Authors
    Bjørn Christian Weinbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train, val, and test, located under the images/ directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train, val, and test, located under the labels/ directory. - Additional metadata: - counts.txt: Summary of label distributions. - Cache files (train.cache, val.cache, test.cache) for efficient dataset loading.- Metadata: - classes.txt: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json - val_annotations.json - test_annotations.json- Configuration File: - EMVSD.yaml: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml

  8. MOBDrone: a large-scale drone-view dataset for man overboard detection

    • zenodo.org
    • data.niaid.nih.gov
    json, pdf, zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donato Cafarelli; Donato Cafarelli; Luca Ciampi; Luca Ciampi; Lucia Vadicamo; Lucia Vadicamo; Claudio Gennaro; Claudio Gennaro; Andrea Berton; Andrea Berton; Marco Paterni; Marco Paterni; Chiara Benvenuti; Mirko Passera; Mirko Passera; Fabrizio Falchi; Fabrizio Falchi; Chiara Benvenuti (2024). MOBDrone: a large-scale drone-view dataset for man overboard detection [Dataset]. http://doi.org/10.5281/zenodo.5996890
    Explore at:
    json, zip, pdfAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Donato Cafarelli; Donato Cafarelli; Luca Ciampi; Luca Ciampi; Lucia Vadicamo; Lucia Vadicamo; Claudio Gennaro; Claudio Gennaro; Andrea Berton; Andrea Berton; Marco Paterni; Marco Paterni; Chiara Benvenuti; Mirko Passera; Mirko Passera; Fabrizio Falchi; Fabrizio Falchi; Chiara Benvenuti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.

    In this repository, we provide:

    • 66 Full HD video clips (total size: 5.5 GB)

    • 126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)

    • 3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):

      • annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood

      • annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.

      • annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.

    The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:

    • Test set: All the images whose filename starts with "DJI_0804" (total: 37,604 images)
    • Training set: All the images whose filename starts with "DJI_0915" (total: 88,568 images)

    More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973
    The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval
    See also http://aimh.isti.cnr.it/dataset/MOBDrone

    Citing the MOBDrone

    The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form.
    Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people

    @inproceedings{MOBDrone2021,
    title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue},
    author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi},
    booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing},
    year={2021}
    }
    

    and this Zenodo Dataset

    @dataset{donato_cafarelli_2022_5996890,
    author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi},
     title    = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}},
     month    = feb,
     year     = 2022,
     publisher  = {Zenodo},
     version   = {1.0.0},
     doi     = {10.5281/zenodo.5996890},
     url     = {https://doi.org/10.5281/zenodo.5996890}
    }

    Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.

    Contact Information

    If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it

    Acknowledgements

    This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.

  9. Data from: Life beneath the ice: jellyfish and ctenophores from the Ross...

    • zenodo.org
    • data.niaid.nih.gov
    Updated Jul 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gerlien Verhaegen; Gerlien Verhaegen; Emiliano Cimoli; Emiliano Cimoli; Dhugal J Lindsay; Dhugal J Lindsay (2021). Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning [Dataset]. http://doi.org/10.5281/zenodo.5118013
    Explore at:
    Dataset updated
    Jul 30, 2021
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gerlien Verhaegen; Gerlien Verhaegen; Emiliano Cimoli; Emiliano Cimoli; Dhugal J Lindsay; Dhugal J Lindsay
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Ross Sea, Antarctica
    Description

    This Zenodo dataset contain the Common Objects in Context (COCO) files linked to the following publication:

    Verhaegen, G, Cimoli, E, & Lindsay, D (2021). Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning. Biodiversity Data Journal.

    Each COCO zip folder contains an "annotations" folder including a json file and an "images" folder containing the annotated images.

    Details on each COCO zip folders:

    • Beroe_sp_A_images-coco 1.0.zip

    COCO annotations of Beroe sp. A for the following 114 images:

    MCMEC2018_20181116_NIKON_Beroe_sp_A_c_1 to MCMEC2018_20181116_NIKON_Beroe_sp_A_c_16, MCMEC2018_20181125_NIKON_Beroe_sp_A_d_1 to MCMEC2018_20181125_NIKON_Beroe_sp_A_d_57, MCMEC2018_20181127_NIKON_Beroe_sp_A_e_1 to MCMEC2018_20181127_NIKON_Beroe_sp_A_e_2, MCMEC2019_20191116_SONY_Beroe_sp_A_a_1 to MCMEC2019_20191116_SONY_Beroe_sp_A_a_28, and MCMEC2019_20191127_SONY_Beroe_sp_A_f_1 to MCMEC2019_20191127_SONY_Beroe_sp_A_f_12

    • Beroe_sp_B_images-coco 1.0.zip

    COCO annotations of Beroe sp. B for the following 2 images:

    MCMEC2019_20191115_SONY_Beroe_sp_B_a_1 and MCMEC2019_20191115_SONY_Beroe_sp_B_a_2

    • Callianira_cristata_images-coco 1.0.zip

    COCO annotations of Callianira cristata for the following 21 images:

    MCMEC2019_20191120_SONY_Callianira_cristata_b_1 to MCMEC2019_20191120_SONY_Callianira_cristata_b_21

    • Diplulmaris_antarctica_images-coco 1.0.zip

    COCO annotations of Diplulmaris antarctica for the following 83 images:

    MCMEC2019_20191116_SONY_Diplulmaris_antarctica_a_1 to MCMEC2019_20191116_SONY_Diplulmaris_antarctica_a_9, and MCMEC2019_20191201_SONY_Diplulmaris_antarctica_c_1 to MCMEC2019_20191201_SONY_Diplulmaris_antarctica_c_74

    • Koellikerina_maasi_images-coco 1.0.zip

    COCO annotations of Koellikerina maasi for the following 49 images:

    MCMEC2018_20181127_NIKON_Koellikerina_maasi_b_1 to MCMEC2018_20181127_NIKON_Koellikerina_maasi_b_4, MCMEC2018_20181129_NIKON_Koellikerina_maasi_c_1 to MCMEC2018_20181129_NIKON_Koellikerina_maasi_c_29, and MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1 to MCMEC2019_20191126_SONY_Koellikerina_maasi_a_16

    • Leptomedusa_sp_A-coco 1.0.zip

    COCO annotations of Leptomedusa sp. A for Figure 5 (see paper).

    • Leuckartiara_brownei_images-coco 1.0.zip

    COCO annotations of Leuckartiara brownei for the following 48 images:

    MCMEC2018_20181129_NIKON_Leuckartiara_brownei_b_1 to MCMEC2018_20181129_NIKON_Leuckartiara_brownei_b_27, MCMEC2018_20181129_NIKON_Leuckartiara_brownei_c_1 to MCMEC2018_20181129_NIKON_Leuckartiara_brownei_c_6, and MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_1 to MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_15

    • MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_3-coco 1.0.zip

    COCO annotations of Mertensiidae sp. A for the following video (total of 1847 frames): MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_3 (https://youtu.be/0W2HHLW71Pw)

    • MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_3-coco 1.0.zip

    COCO annotations of Leuckartiara brownei for the following video (total of 1367 frames): MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_3 (https://youtu.be/dEIbVYlF_TQ)

    • MCMEC2019_20191122_SONY_Callianira_cristata_a_1-coco 1.0.zip

    COCO annotations of Callianira cristata for the following video (total of 2423 frames): MCMEC2019_20191122_SONY_Callianira_cristata_a_1 (https://youtu.be/30g9CvYh5JE)

    • MCMEC2019_20191122_SONY_Leptomedusa_sp_B_a_1-coco 1.0.zip

    COCO annotations of Leptomedusa sp. B for the following video (total of 1164 frames): MCMEC2019_20191122_SONY_Leptomedusa_sp_B_a_1 (https://youtu.be/hrufuPQ7F8U)

    • MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1-coco 1.0.zip

    COCO annotations of Koellikerina maasi for the following video (total of 1643 frames): MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1 (https://youtu.be/QiBPf_HYrQ8)

    • MCMEC2019_20191129_SONY_Mertensiidae_sp_A_b_1-coco 1.0.zip

    COCO annotations of Mertensiidae sp. A for the following video (total of 239 frames): MCMEC2019_20191129_SONY_Mertensiidae_sp_A_b_1 (https://youtu.be/pvXYlQGZIVg)

    • MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_2-coco 1.0.zip

    COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 444 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_2 (https://youtu.be/2rrQCybEg0Q)

    • MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_3-coco 1.0.zip

    COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 683 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_3 (https://youtu.be/G9tev_gdUvQ)

    • MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_4-coco 1.0.zip

    COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 1127 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_4 (https://youtu.be/NfJjKBRh5Hs)

    • MCMEC2019_20191130_SONY_Beroe_sp_A_b_1-coco 1.0.zip

    COCO annotations of Beroe sp. A for the following video (total of 2171 frames): MCMEC2019_20191130_SONY_Beroe_sp_A_b_1 (https://youtu.be/kGBUQ7ZtH9U)

    • MCMEC2019_20191130_SONY_Beroe_sp_A_b_2-coco 1.0.zip

    COCO annotations of Beroe sp. A for the following video (total of 359 frames): MCMEC2019_20191130_SONY_Beroe_sp_A_b_2 (https://youtu.be/Vbl_KEmPNmU)

    • Mertensiidae_sp_A_images-coco 1.0.zip

    COCO annotations of Mertensiidae sp. A for the following 49 images:

    MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_c_1 to MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_c_2, MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_f_1 to MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_f_8, MCMEC2018_20181129_NIKON_Mertensiidae_sp_A_d_1 to MCMEC2018_20181129_NIKON_Mertensiidae_sp_A_d_13, MCMEC2018_20181201_ROV_Mertensiidae_sp_A_e_1 to MCMEC2018_20181201_ROV_Mertensiidae_sp_A_e_15, and MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_1 to MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_11

    • Pyrostephos_vanhoeffeni_images-coco 1.0.zip

    COCO annotations of Pyrostephos vanhoeffeni for the following 14 images: MCMEC2019_20191125_SONY_Pyrostephos_vanhoeffeni_a_1 to MCMEC2019_20191125_SONY_Pyrostephos_vanhoeffeni_a_8, MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_1 to MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_6

    • Solmundella_bitentaculata_images-coco 1.0.zip

    COCO annotations of Solmundella bitentaculata for the following 13 images: MCMEC2018_20181127_NIKON_Solmundella_bitentaculata_a_1 to MCMEC2018_20181127_NIKON_Solmundella_bitentaculata_a_13

  10. R

    Cash Counter Dataset

    • universe.roboflow.com
    zip
    Updated Mar 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Hyams (2025). Cash Counter Dataset [Dataset]. https://universe.roboflow.com/alex-hyams-cosqx/cash-counter/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 11, 2025
    Dataset authored and provided by
    Alex Hyams
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Money Bounding Boxes
    Description

    This projects combines the Dollar Bill Detection project from Alex Hyams (v13 of the project was exported in COCO JSON format for import to this project) and the Final Counter, or Coin Counter, project from Dawson Mcgee (v6 of the project was exported in COCO JSON format for import to this project).

    v1 contains the original imported images, without augmentations. This is the version to download and import to your own project if you'd like to add your own augmentations.

    This dataset can be used to create computer vision applications in the banking and finance industry for use cases like detecting and counting US currency.

  11. Z

    SPEECH-COCO

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Laurent Besacier (2020). SPEECH-COCO [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4282266
    Explore at:
    Dataset updated
    Nov 24, 2020
    Dataset provided by
    Laurent Besacier
    William N. Havard
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SpeechCoco

    Introduction

    Our corpus is an extension of the MS COCO image recognition and captioning dataset. MS COCO comprises images paired with a set of five captions. Yet, it does not include any speech. Therefore, we used Voxygen's text-to-speech system to synthesise the available captions.

    The addition of speech as a new modality enables MSCOCO to be used for researches in the field of language acquisition, unsupervised term discovery, keyword spotting, or semantic embedding using speech and vision.

    Our corpus is licensed under a Creative Commons Attribution 4.0 License.

    Data Set

    This corpus contains 616,767 spoken captions from MSCOCO's val2014 and train2014 subsets (respectively 414,113 for train2014 and 202,654 for val2014).

    We used 8 different voices. 4 of them have a British accent (Paul, Bronwen, Judith, and Elizabeth) and the 4 others have an American accent (Phil, Bruce, Amanda, Jenny).

    In order to make the captions sound more natural, we used SOX tempo command, enabling us to change the speed without changing the pitch. 1/3 of the captions are 10% slower than the original pace, 1/3 are 10% faster. The last third of the captions was kept untouched.

    We also modified approximately 30% of the original captions and added disfluencies such as "um", "uh", "er" so that the captions would sound more natural.

    Each WAV file is paired with a JSON file containing various information: timecode of each word in the caption, name of the speaker, name of the WAV file, etc. The JSON files have the following data structure:

    { "duration": float, "speaker": string, "synthesisedCaption": string, "timecode": list, "speed": float, "wavFilename": string, "captionID": int, "imgID": int, "disfluency": list }

    On average, each caption comprises 10.79 tokens, disfluencies included. The WAV files are on average 3.52 seconds long.

    Repository

    The repository is organized as follows:

    CORPUS-MSCOCO (~75GB once decompressed)

    train2014/ : folder contains 413,915 captions

    json/

    wav/

    translations/

    train_en_ja.txt

    train_translate.sqlite3

    train_2014.sqlite3

    val2014/ : folder contains 202,520 captions

    json/

    wav/

    translations/

    train_en_ja.txt

    train_translate.sqlite3

    val_2014.sqlite3

    speechcoco_API/

    speechcoco/

    init.py

    speechcoco.py

    setup.py

    Filenames

    .wav files contain the spoken version of a caption

    .json files contain all the metadata of a given WAV file

    .sqlite3 files are SQLite databases containing all the information contained in the JSON files

    We adopted the following naming convention for both the WAV and JSON files:

    imageID_captionID_Speaker_DisfluencyPosition_Speed[.wav/.json]

    Script

    We created a script called speechcoco.py in order to handle the metadata and allow the user to easily find captions according to specific filters. The script uses the *.db files.

    Features:

    Aggregate all the information in the JSON files into a single SQLite database

    Find captions according to specific filters (name, gender and nationality of the speaker, disfluency position, speed, duration, and words in the caption). The script automatically builds the SQLite query. The user can also provide his own SQLite query.

    The following Python code returns all the captions spoken by a male with an American accent for which the speed was slowed down by 10% and that contain "keys" at any position

    create SpeechCoco object

    db = SpeechCoco(train_2014.sqlite3, train_translate.sqlite3, verbose=True)

    filter captions (returns Caption Objects)

    captions = db.filterCaptions(gender="Male", nationality="US", speed=0.9, text='%keys%') for caption in captions: print(' {}\t{}\t{}\t{}\t{}\t{}\t\t{}'.format(caption.imageID, caption.captionID, caption.speaker.name, caption.speaker.nationality, caption.speed, caption.filename, caption.text))

    ... 298817 26763 Phil 0.9 298817_26763_Phil_None_0-9.wav A group of turkeys with bushes in the background. 108505 147972 Phil 0.9 108505_147972_Phil_Middle_0-9.wav Person using a, um, slider cell phone with blue backlit keys. 258289 154380 Bruce 0.9 258289_154380_Bruce_None_0-9.wav Some donkeys and sheep are in their green pens . 545312 201303 Phil 0.9 545312_201303_Phil_None_0-9.wav A man walking next to a couple of donkeys. ...

    Find all the captions belonging to a specific image

    captions = db.getImgCaptions(298817) for caption in captions: print(' {}'.format(caption.text))

    Birds wondering through grassy ground next to bushes. A flock of turkeys are making their way up a hill. Um, ah. Two wild turkeys in a field walking around. Four wild turkeys and some bushes trees and weeds. A group of turkeys with bushes in the background.

    Parse the timecodes and have them structured

    input:

    ... [1926.3068, "SYL", ""], [1926.3068, "SEPR", " "], [1926.3068, "WORD", "white"], [1926.3068, "PHO", "w"], [2050.7955, "PHO", "ai"], [2144.6591, "PHO", "t"], [2179.3182, "SYL", ""], [2179.3182, "SEPR", " "] ...

    output:

    print(caption.timecode.parse())

    ... { 'begin': 1926.3068, 'end': 2179.3182, 'syllable': [{'begin': 1926.3068, 'end': 2179.3182, 'phoneme': [{'begin': 1926.3068, 'end': 2050.7955, 'value': 'w'}, {'begin': 2050.7955, 'end': 2144.6591, 'value': 'ai'}, {'begin': 2144.6591, 'end': 2179.3182, 'value': 't'}], 'value': 'wait'}], 'value': 'white' }, ...

    Convert the timecodes to Praat TextGrid files

    caption.timecode.toTextgrid(outputDir, level=3)

    Get the words, syllables and phonemes between n seconds/milliseconds

    The following Python code returns all the words between 0.2 and 0.6 seconds for which at least 50% of the word's total length is within the specified interval

    pprint(caption.getWords(0.20, 0.60, seconds=True, level=1, olapthr=50))

    ... 404537 827239 Bruce US 0.9 404537_827239_Bruce_None_0-9.wav Eyeglasses, a cellphone, some keys and other pocket items are all laid out on the cloth. . [ { 'begin': 0.0, 'end': 0.7202778, 'overlapPercentage': 55.53412863758955, 'word': 'eyeglasses' } ] ...

    Get the translations of the selected captions

    As for now, only japanese translations are available. We also used Kytea to tokenize and tag the captions translated with Google Translate

    captions = db.getImgCaptions(298817) for caption in captions: print(' {}'.format(caption.text))

    # Get translations and POS
    print('\tja_google: {}'.format(db.getTranslation(caption.captionID, "ja_google")))
    print('\t\tja_google_tokens: {}'.format(db.getTokens(caption.captionID, "ja_google")))
    print('\t\tja_google_pos: {}'.format(db.getPOS(caption.captionID, "ja_google")))
    print('\tja_excite: {}'.format(db.getTranslation(caption.captionID, "ja_excite")))
    

    Birds wondering through grassy ground next to bushes. ja_google: 鳥は茂みの下に茂った地面を抱えています。 ja_google_tokens: 鳥 は 茂み の 下 に 茂 っ た 地面 を 抱え て い ま す 。 ja_google_pos: 鳥/名詞/とり は/助詞/は 茂み/名詞/しげみ の/助詞/の 下/名詞/した に/助詞/に 茂/動詞/しげ っ/語尾/っ た/助動詞/た 地面/名詞/じめん を/助詞/を 抱え/動詞/かかえ て/助詞/て い/動詞/い ま/助動詞/ま す/語尾/す 。/補助記号/。 ja_excite: 低木と隣接した草深いグラウンドを通って疑う鳥。

    A flock of turkeys are making their way up a hill. ja_google: 七面鳥の群れが丘を上っています。 ja_google_tokens: 七 面 鳥 の 群れ が 丘 を 上 っ て い ま す 。 ja_google_pos: 七/名詞/なな 面/名詞/めん 鳥/名詞/とり の/助詞/の 群れ/名詞/むれ が/助詞/が 丘/名詞/おか を/助詞/を 上/動詞/のぼ っ/語尾/っ て/助詞/て い/動詞/い ま/助動詞/ま す/語尾/す 。/補助記号/。 ja_excite: 七面鳥の群れは丘の上で進んでいる。

    Um, ah. Two wild turkeys in a field walking around. ja_google: 野生のシチメンチョウ、野生の七面鳥 ja_google_tokens: 野生 の シチメンチョウ 、 野生 の 七 面 鳥 ja_google_pos: 野生/名詞/やせい の/助詞/の シチメンチョウ/名詞/しちめんちょう 、/補助記号/、 野生/名詞/やせい の/助詞/の 七/名詞/なな 面/名詞/めん 鳥/名詞/ちょう ja_excite: まわりで移動しているフィールドの2羽の野生の七面鳥

    Four wild turkeys and some bushes trees and weeds. ja_google: 4本の野生のシチメンチョウといくつかの茂みの木と雑草 ja_google_tokens: 4 本 の 野生 の シチメンチョウ と いく つ か の 茂み の 木 と 雑草 ja_google_pos: 4/名詞/4 本/接尾辞/ほん の/助詞/の 野生/名詞/やせい の/助詞/の シチメンチョウ/名詞/しちめんちょう と/助詞/と

  12. H

    Replication Data for: Training Deep Convolutional Object Detectors for...

    • dataverse.harvard.edu
    Updated Apr 16, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tomasz Gandor (2022). Replication Data for: Training Deep Convolutional Object Detectors for Images Affected by Lossy Compression [Dataset]. http://doi.org/10.7910/DVN/UHEP3C
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 16, 2022
    Dataset provided by
    Harvard Dataverse
    Authors
    Tomasz Gandor
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This collection contains the trained models and object detection results of 2 architectures found in the Detectron2 library, on the MS COCO val2017 dataset, under different JPEG compresion level Q = {5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96} (14 levels per trained model). Architectures: F50 – Faster R-CNN on ResNet-50 with FPN R50 – RetinaNet on ResNet-50 with FPN Training type: D2 – Detectron2 Model ZOO pre-trained 1x model (90.000 iterations, batch 16) STD – standard 1x training (90.000 iterations) on original train2017 dataset Q20 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=20 Q40 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=40 T20 – extra 1x training on top of D2 on train2017 dataset degraded to Q=20 T40 – extra 1x training on top of D2 on train2017 dataset degraded to Q=40 Model and metrics files models_FasterRCNN.tar.gz (F50-STD, F50-Q20, …) models_RetinaNet.tar.gz (R50-STD, R50-Q20, …) For every model there are 3 files: config.yaml – the Detectron2 config of the model. model_final.pth – the weights (training snapshot) in PyTorch format. metrics.json – training metrics (like time, total loss, etc.) every 20 iterations. The D2 models were not included, because they are available from the Detectron2 Model ZOO, as faster_rcnn_R_50_FPN_1x (F50-D2) and retinanet_R_50_FPN_1x (R50-D2). Result files F50-results.tar.gz – results for Faster R-CNN models (inluding D2). R50-results.tar.gz – results for RetinaNet models (inluding D2). For every model there are 14 subdirectories, e.g. evaluator_dump_R50x1_005 through evaluator_dump_R50x1_096, for each of the JPEG Q values. Each such folder contains: coco_instances_results.json – all detected objects (image id, bounding box, class index and confidence). results.json – AP metrics as computed by COCO API. Source code for processing the data The data can be processed using our code, published at: https://github.com/tgandor/urban_oculus. Additional dependencies for the source code: COCO API Detectron2

  13. Z

    WormSwin: C. elegans Video Datasets

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deserno, Maurice (2024). WormSwin: C. elegans Video Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7456802
    Explore at:
    Dataset updated
    Jan 31, 2024
    Dataset provided by
    Deserno, Maurice
    Bozek, Katarzyna
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data used for our paper "WormSwin: Instance Segmentation of C. elegans using Vision Transformer".This publication is divided into three parts:

    CSB-1 Dataset

    Synthetic Images Dataset

    MD Dataset

    The CSB-1 Dataset consists of frames extracted from videos of Caenorhabditis elegans (C. elegans) annotated with binary masks. Each C. elegans is separately annotated, providing accurate annotations even for overlapping instances. All annotations are provided in binary mask format and as COCO Annotation JSON files (see COCO website).

    The videos are named after the following pattern:

    <"worm age in hours"_"mutation"_"irradiated (binary)"_"video index (zero based)">

    For mutation the following values are possible:

    wild type

    csb-1 mutant

    csb-1 with rescue mutation

    An example video name would be 24_1_1_2 meaning it shows C. elegans with csb-1 mutation, being 24h old which got irradiated.

    Video data was provided by M. Rieckher; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.The Synthetic Images Dataset was created by cutting out C. elegans (foreground objects) from the CSB-1 Dataset and placing them randomly on background images also taken from the CSB-1 Dataset. Foreground objects were flipped, rotated and slightly blurred before placed on the background images.The same was done with the binary mask annotations taken from CSB-1 Dataset so that they match the foreground objects in the synthetic images. Additionally, we added rings of random color, size, thickness and position to the background images to simulate petri-dish edges.

    This synthetic dataset was generated by M. Deserno.The Mating Dataset (MD) consists of 450 grayscale image patches of 1,012 x 1,012 px showing C. elegans with high overlap, crawling on a petri-dish.We took the patches from a 10 min. long video of size 3,036 x 3,036 px. The video was downsampled from 25 fps to 5 fps before selecting 50 random frames for annotating and patching.Like the other datasets, worms were annotated with binary masks and annotations are provided as COCO Annotation JSON files.

    The video data was provided by X.-L. Chu; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.

    Further details about the datasets can be found in our paper.

  14. C

    Annotations for ConfLab A Rich Multimodal Multisensor Dataset of...

    • data.4tu.nl
    Updated Jun 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Annotations for ConfLab A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild [Dataset]. http://doi.org/10.4121/20017664.v1
    Explore at:
    Dataset updated
    Jun 8, 2022
    Dataset provided by
    4TU.ResearchData
    Authors
    Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
    License

    https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf

    Description

    This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.

    ------------------

    ./actions/speaking_status:

    ./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status

    The processed annotations consist of:

    ./speaking: The first row contains person IDs matching the sensor IDs,

    The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).

    ./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.

    To load these files with pandas: pd.read_csv(p, index_col=False)


    ./raw.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

    Annotations were done at 60 fps.

    --------------------

    ./pose:

    ./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints

    To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))

    The skeleton structure (limbs) is contained within each file in:

    f['categories'][0]['skeleton']

    and keypoint names at:

    f['categories'][0]['keypoints']

    ./raw.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

    Annotations were done at 60 fps.

    ---------------------

    ./f_formations:

    seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).

    seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).

    Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.

    First column: time stamp

    Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.


    phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone

  15. The Object Detection for Olfactory References (ODOR) Dataset

    • zenodo.org
    • data.niaid.nih.gov
    csv, json +2
    Updated Apr 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mathias Zinnen; Mathias Zinnen; Prathmesh Madhu; Prathmesh Madhu; Andreas Maier; Andreas Maier; Peter Bell; Peter Bell; Vincent Christlein; Vincent Christlein (2024). The Object Detection for Olfactory References (ODOR) Dataset [Dataset]. http://doi.org/10.5281/zenodo.11070878
    Explore at:
    json, zip, csv, text/x-pythonAvailable download formats
    Dataset updated
    Apr 26, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mathias Zinnen; Mathias Zinnen; Prathmesh Madhu; Prathmesh Madhu; Andreas Maier; Andreas Maier; Peter Bell; Peter Bell; Vincent Christlein; Vincent Christlein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Object Detection for Olfactory References (ODOR) Dataset

    Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes.

    Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories.

    It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas.

    Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception.

    How to use

    The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year.

    In addition to a zip containing the dataset images, we provide links to their source collections in the metadata file and a Python script to conveniently download the artwork images (`download_imgs.py`).

    The mapping between the `images` array of the `annotations.json` and the `metadata.csv` file can be accomplished via the `file_name` attribute of the elements of the `images` array and the unique `File Name` column of the `metadata.csv` file, respectively.

  16. Synthetically Spoken COCO

    • zenodo.org
    application/gzip, bin +2
    Updated Jan 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi; Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi (2020). Synthetically Spoken COCO [Dataset]. http://doi.org/10.5281/zenodo.400926
    Explore at:
    txt, json, bin, application/gzipAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi; Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Synthetically Spoken COCO

    Version 1.0

    This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This
    dataset was created as part the research reported in [5].
    The speech was generated using gTTS [2]. The dataset consists of the following files:

    - dataset.json: Captions associated with MS COCO images. This information comes from [3].
    - sentid.txt: List of caption IDs. This file can be used to locate MFCC features of the MP3 files
    in the numpy array stored in dataset.mfcc.npy.
    - mp3.tgz: MP3 files with the audio. Each file name corresponds to caption ID in dataset.json
    and in sentid.txt.
    - dataset.mfcc.npy: Numpy array with the Mel Frequence Cepstral Coefficients extracted from
    the audio. Each row corresponds to a caption. The order or the captions corresponds to the
    ordering in the file sentid.txt. MFCCs were extracted using [4].

    [1] http://mscoco.org/dataset/#overview
    [2] https://pypi.python.org/pypi/gTTS
    [3] https://github.com/karpathy/neuraltalk
    [4] https://github.com/jameslyons/python_speech_features
    [5] https://arxiv.org/abs/1702.01991

  17. ActiveHuman Part 1

    • zenodo.org
    • data.niaid.nih.gov
    Updated Nov 14, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charalampos Georgiadis; Charalampos Georgiadis (2023). ActiveHuman Part 1 [Dataset]. http://doi.org/10.5281/zenodo.8359766
    Explore at:
    Dataset updated
    Nov 14, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Charalampos Georgiadis; Charalampos Georgiadis
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here.

    Dataset Description

    ActiveHuman was generated using Unity's Perception package.

    It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).

    The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.

    Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

    Folder configuration

    The dataset consists of 3 folders:

    • JSON Data: Contains all the generated JSON files.
    • RGB Images: Contains the generated RGB images.
    • Semantic Segmentation Images: Contains the generated semantic segmentation images.

    Essential Terminology

    • Annotation: Recorded data describing a single capture.
    • Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.).
    • Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor).
    • Ego coordinate system: Coordinates with respect to the ego.
    • Global coordinate system: Coordinates with respect to the global origin in Unity.
    • Sensor: Device that captures the dataset (in this instance the sensor is a camera).
    • Sensor coordinate system: Coordinates with respect to the sensor.
    • Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital.
    • UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.

    Dataset Data

    The dataset includes 4 types of JSON annotation files files:

    • annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:
      • id: Integer identifier of the annotation's definition.
      • name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation).
      • description: Description of the annotation's specifications.
      • format: Format of the file containing the annotation specifications (e.g., json, PNG).
      • spec: Format-specific specifications for the annotation values generated by each Labeler.

    Most Labelers generate different annotation specifications in the spec key-value pair:

    • BoundingBox2DLabeler/BoundingBox3DLabeler:
      • label_id: Integer identifier of a label.
      • label_name: String identifier of a label.
    • KeypointLabeler:
      • template_id: Keypoint template UUID.
      • template_name: Name of the keypoint template.
      • key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:
        • label: Joint label.
        • index: Joint index.
        • color: RGBA values of the keypoint.
        • color_code: Hex color code of the keypoint
      • skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:
        • label1: Label of the first joint.
        • label2: Label of the second joint.
        • joint1: Index of the first joint.
        • joint2: Index of the second joint.
        • color: RGBA values of the connection.
        • color_code: Hex color code of the connection.
    • SemanticSegmentationLabeler:
      • label_name: String identifier of a label.
      • pixel_value: RGBA values of the label.
      • color_code: Hex color code of the label.

    • captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:
      • id: UUID of the capture.
      • sequence_id: UUID of the sequence.
      • step: Index of the capture within a sequence.
      • timestamp: Timestamp (in ms) since the beginning of a sequence.
      • sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:
        • sensor_id: Sensor UUID.
        • ego_id: Ego UUID.
        • modality: Modality of the sensor (e.g., camera, radar).
        • translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system.
        • rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system.
        • camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration.
        • projection: Projection type used by the camera (e.g., orthographic, perspective).
      • ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:
        • ego_id: Ego UUID.
        • translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system.
        • rotation: Quaternion variable containing the ego's orientation.
        • velocity: 3D vector containing the ego's velocity (in meters per second).
        • acceleration: 3D vector containing the ego's acceleration (in ).
      • format: Format of the file captured by the sensor (e.g., PNG, JPG).
      • annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:
        • id: Annotation UUID .
        • annotation_definition: Integer identifier of the annotation's definition.
        • filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image.
        • values: List of key-value pairs containing annotation data for the current Labeler.

    Each Labeler generates different annotation specifications in the values key-value pair:

    • BoundingBox2DLabeler:
      • label_id: Integer identifier of a label.
      • label_name: String identifier of a label.
      • instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
      • x: Position of the 2D bounding box on the X axis.
      • y: Position of the 2D bounding box position on the Y axis.
      • width: Width of the 2D bounding box.
      • height: Height of the 2D bounding box.
    • BoundingBox3DLabeler:
      • label_id: Integer identifier of a label.
      • label_name: String identifier of a label.
      • instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
      • translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters).
      • size: 3D

  18. Udacity Self Driving Car Dataset

    • universe.roboflow.com
    • kaggle.com
    zip
    Updated Aug 8, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Roboflow (2022). Udacity Self Driving Car Dataset [Dataset]. https://universe.roboflow.com/roboflow-gw7yv/self-driving-car/dataset/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 8, 2022
    Dataset authored and provided by
    Roboflow
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Variables measured
    Obstacles
    Description

    Overview

    The original Udacity Self Driving Car Dataset is missing labels for thousands of pedestrians, bikers, cars, and traffic lights. This will result in poor model performance. When used in the context of self driving cars, this could even lead to human fatalities.

    We re-labeled the dataset to correct errors and omissions. We have provided convenient downloads in many formats including VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and more.

    Some examples of labels missing from the original dataset: https://i.imgur.com/A5J3qSt.jpg" alt="Examples of Missing Labels">

    Stats

    The dataset contains 97,942 labels across 11 classes and 15,000 images. There are 1,720 null examples (images with no labels).

    All images are 1920x1200 (download size ~3.1 GB). We have also provided a version downsampled to 512x512 (download size ~580 MB) that is suitable for most common machine learning models (including YOLO v3, Mask R-CNN, SSD, and mobilenet).

    Annotations have been hand-checked for accuracy by Roboflow.

    https://i.imgur.com/bOFkueI.pnghttps://" alt="Class Balance">

    Annotation Distribution: https://i.imgur.com/NwcrQKK.png" alt="Annotation Heatmap">

    Use Cases

    Udacity is building an open source self driving car! You might also try using this dataset to do person-detection and tracking.

    Using this Dataset

    Our updates to the dataset are released under the MIT License (the same license as the original annotations and images).

    Note: the dataset contains many duplicated bounding boxes for the same subject which we have not corrected. You will probably want to filter them by taking the IOU for classes that are 100% overlapping or it could affect your model performance (expecially in stoplight detection which seems to suffer from an especially severe case of duplicated bounding boxes).

    About Roboflow

    Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

    Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

    Roboflow Wordmark

  19. W

    TexBiG

    • webis.de
    • anthology.aicmu.ac.cn
    6885143
    Updated 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Volker Rodehorst; Benno Stein (2022). TexBiG [Dataset]. http://doi.org/10.5281/zenodo.6885143
    Explore at:
    6885143Available download formats
    Dataset updated
    2022
    Dataset provided by
    The Web Technology & Information Systems Network
    Bauhaus-Universität Weimar
    Authors
    Volker Rodehorst; Benno Stein
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.

  20. R

    Pascal VOC 2012 Object Detection Dataset - raw

    • public.roboflow.com
    zip
    Updated May 23, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    PASCAL (2024). Pascal VOC 2012 Object Detection Dataset - raw [Dataset]. https://public.roboflow.com/object-detection/pascal-voc-2012/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 23, 2024
    Dataset authored and provided by
    PASCAL
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of VOC
    Description

    Pascal VOC 2012 is common benchmark for object detection. It contains common objects that one might find in images on the web.

    https://i.imgur.com/y2sB9fD.png" alt="Image example">

    Note: the test set is witheld, as is common with benchmark datasets.

    You can think of it sort of like a baby COCO.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
cocoforrcnn (2025). Yolo To Coco Json Dataset [Dataset]. https://universe.roboflow.com/cocoforrcnn/yolo-to-coco-json-7ot5m/model/2

Yolo To Coco Json Dataset

yolo-to-coco-json-7ot5m

yolo-to-coco-json-dataset

Explore at:
zipAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
cocoforrcnn
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured
Objects Bounding Boxes
Description

Yolo To Coco Json

## Overview

Yolo To Coco Json is a dataset for object detection tasks - it contains Objects annotations for 1,954 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Search
Clear search
Close search
Google apps
Main menu