44 datasets found

R
Yolo To Coco Json Dataset
universe.roboflow.com
zip
Updated Feb 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
cocoforrcnn (2025). Yolo To Coco Json Dataset [Dataset]. https://universe.roboflow.com/cocoforrcnn/yolo-to-coco-json-7ot5m/model/2
Explore at:
zipAvailable download formats
Dataset updated
Feb 24, 2025
Dataset authored and provided by
cocoforrcnn
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Objects Bounding Boxes
Description
Yolo To Coco Json

## Overview Yolo To Coco Json is a dataset for object detection tasks - it contains Objects annotations for 1,954 images. ## Getting Started You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model. ## License This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Sartorius COCO Format Dataset
kaggle.com
zip
Updated Oct 28, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ari (2021). Sartorius COCO Format Dataset [Dataset]. https://www.kaggle.com/vexxingbanana/sartorius-coco-format-dataset
Explore at:
zip(9798602 bytes)Available download formats
Dataset updated
Oct 28, 2021
Authors
Ari
Description
Dataset

This dataset was created by Ari

Contents
f
Databases in MS COCO (json) format
figshare.com
springernature.figshare.com
zip
Updated Nov 20, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Robert Klopfleisch; Andreas Maier; Marc Aubreville; Christof Bertram; Christian Marzahl (2020). Databases in MS COCO (json) format [Dataset]. http://doi.org/10.6084/m9.figshare.12805244.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.12805244.v1
Dataset updated
Nov 20, 2020
Dataset provided by
figshare
Authors
Robert Klopfleisch; Andreas Maier; Marc Aubreville; Christof Bertram; Christian Marzahl
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Databases in MS COCO (json) format
coco json file of mtsd train and val
kaggle.com
zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GREAT23U5 (2024). coco json file of mtsd train and val [Dataset]. https://www.kaggle.com/datasets/great23u5/coco-json-file-of-mtsd-train-and-val/code
Explore at:
zip(16813160 bytes)Available download formats
Dataset updated
Jul 17, 2024
Authors
GREAT23U5
Description
Dataset

This dataset was created by GREAT23U5

Contents
COCO-JSON Annotated Wind Turbine Surface Damage
kaggle.com
zip
Updated May 11, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajifoster3 (2022). COCO-JSON Annotated Wind Turbine Surface Damage [Dataset]. https://www.kaggle.com/datasets/ajifoster3/cocojson-annotated-wind-turbine-surface-damage/data
Explore at:
zip(300522633 bytes)Available download formats
Dataset updated
May 11, 2022
Authors
Ajifoster3
Description
Dataset

This dataset was created by Ajifoster3

Contents
Z
COCO dataset and neural network weights for micro-FTIR particle detection on...
data.niaid.nih.gov
zenodo.org
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Schowing, Thibault (2024). COCO dataset and neural network weights for micro-FTIR particle detection on filters. [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10839526
Explore at:
Dataset updated
Aug 13, 2024
Dataset authored and provided by
Schowing, Thibault
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.

More information about the project here.

Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.

Contents:

Weights File (neuralNetWeights_V3.pth):

Format: .pth

Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.

Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):

Format: .zip

Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.

Contents:

Images: JPEG format images of micro-FTIR filters.

Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.

Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.

Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.

Usage Notes:

The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.

The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.

Code can be found on the related Github repository.
Esefjorden Marine Vegetation Segmentation Dataset (EMVSD)
figshare.com
bin
Updated Dec 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bjørn Christian Weinbach (2024). Esefjorden Marine Vegetation Segmentation Dataset (EMVSD) [Dataset]. http://doi.org/10.6084/m9.figshare.24072606.v4
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.24072606.v4
Dataset updated
Dec 9, 2024
Dataset provided by
figshare
Authors
Bjørn Christian Weinbach
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train, val, and test, located under the images/ directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train, val, and test, located under the labels/ directory. - Additional metadata: - counts.txt: Summary of label distributions. - Cache files (train.cache, val.cache, test.cache) for efficient dataset loading.- Metadata: - classes.txt: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json - val_annotations.json - test_annotations.json- Configuration File: - EMVSD.yaml: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml
MOBDrone: a large-scale drone-view dataset for man overboard detection
zenodo.org
data.niaid.nih.gov
json, pdf, zip
Updated Jul 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donato Cafarelli; Donato Cafarelli; Luca Ciampi; Luca Ciampi; Lucia Vadicamo; Lucia Vadicamo; Claudio Gennaro; Claudio Gennaro; Andrea Berton; Andrea Berton; Marco Paterni; Marco Paterni; Chiara Benvenuti; Mirko Passera; Mirko Passera; Fabrizio Falchi; Fabrizio Falchi; Chiara Benvenuti (2024). MOBDrone: a large-scale drone-view dataset for man overboard detection [Dataset]. http://doi.org/10.5281/zenodo.5996890
Explore at:
json, zip, pdfAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5996890
Dataset updated
Jul 17, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Donato Cafarelli; Donato Cafarelli; Luca Ciampi; Luca Ciampi; Lucia Vadicamo; Lucia Vadicamo; Claudio Gennaro; Claudio Gennaro; Andrea Berton; Andrea Berton; Marco Paterni; Marco Paterni; Chiara Benvenuti; Mirko Passera; Mirko Passera; Fabrizio Falchi; Fabrizio Falchi; Chiara Benvenuti
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.

In this repository, we provide:

66 Full HD video clips (total size: 5.5 GB)

126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)

3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):

annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood

annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.

annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.

The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:

Test set: All the images whose filename starts with "DJI_0804" (total: 37,604 images)

Training set: All the images whose filename starts with "DJI_0915" (total: 88,568 images)

More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973
The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval
See also http://aimh.isti.cnr.it/dataset/MOBDrone

Citing the MOBDrone

The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form.
Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people

@inproceedings{MOBDrone2021, title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue}, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing}, year={2021} }

and this Zenodo Dataset

@dataset{donato_cafarelli_2022_5996890, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, title = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}}, month = feb, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5996890}, url = {https://doi.org/10.5281/zenodo.5996890} }

Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.

Contact Information

If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it

Acknowledgements

This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.
Data from: Life beneath the ice: jellyfish and ctenophores from the Ross...
zenodo.org
data.niaid.nih.gov
Updated Jul 30, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gerlien Verhaegen; Gerlien Verhaegen; Emiliano Cimoli; Emiliano Cimoli; Dhugal J Lindsay; Dhugal J Lindsay (2021). Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning [Dataset]. http://doi.org/10.5281/zenodo.5118013
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5118013
Dataset updated
Jul 30, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Gerlien Verhaegen; Gerlien Verhaegen; Emiliano Cimoli; Emiliano Cimoli; Dhugal J Lindsay; Dhugal J Lindsay
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Ross Sea, Antarctica
Description
This Zenodo dataset contain the Common Objects in Context (COCO) files linked to the following publication:

Verhaegen, G, Cimoli, E, & Lindsay, D (2021). Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning. Biodiversity Data Journal.

Each COCO zip folder contains an "annotations" folder including a json file and an "images" folder containing the annotated images.

Details on each COCO zip folders:

Beroe_sp_A_images-coco 1.0.zip

COCO annotations of Beroe sp. A for the following 114 images:

MCMEC2018_20181116_NIKON_Beroe_sp_A_c_1 to MCMEC2018_20181116_NIKON_Beroe_sp_A_c_16, MCMEC2018_20181125_NIKON_Beroe_sp_A_d_1 to MCMEC2018_20181125_NIKON_Beroe_sp_A_d_57, MCMEC2018_20181127_NIKON_Beroe_sp_A_e_1 to MCMEC2018_20181127_NIKON_Beroe_sp_A_e_2, MCMEC2019_20191116_SONY_Beroe_sp_A_a_1 to MCMEC2019_20191116_SONY_Beroe_sp_A_a_28, and MCMEC2019_20191127_SONY_Beroe_sp_A_f_1 to MCMEC2019_20191127_SONY_Beroe_sp_A_f_12

Beroe_sp_B_images-coco 1.0.zip

COCO annotations of Beroe sp. B for the following 2 images:

MCMEC2019_20191115_SONY_Beroe_sp_B_a_1 and MCMEC2019_20191115_SONY_Beroe_sp_B_a_2

Callianira_cristata_images-coco 1.0.zip

COCO annotations of Callianira cristata for the following 21 images:

MCMEC2019_20191120_SONY_Callianira_cristata_b_1 to MCMEC2019_20191120_SONY_Callianira_cristata_b_21

Diplulmaris_antarctica_images-coco 1.0.zip

COCO annotations of Diplulmaris antarctica for the following 83 images:

MCMEC2019_20191116_SONY_Diplulmaris_antarctica_a_1 to MCMEC2019_20191116_SONY_Diplulmaris_antarctica_a_9, and MCMEC2019_20191201_SONY_Diplulmaris_antarctica_c_1 to MCMEC2019_20191201_SONY_Diplulmaris_antarctica_c_74

Koellikerina_maasi_images-coco 1.0.zip

COCO annotations of Koellikerina maasi for the following 49 images:

MCMEC2018_20181127_NIKON_Koellikerina_maasi_b_1 to MCMEC2018_20181127_NIKON_Koellikerina_maasi_b_4, MCMEC2018_20181129_NIKON_Koellikerina_maasi_c_1 to MCMEC2018_20181129_NIKON_Koellikerina_maasi_c_29, and MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1 to MCMEC2019_20191126_SONY_Koellikerina_maasi_a_16

Leptomedusa_sp_A-coco 1.0.zip

COCO annotations of Leptomedusa sp. A for Figure 5 (see paper).

Leuckartiara_brownei_images-coco 1.0.zip

COCO annotations of Leuckartiara brownei for the following 48 images:

MCMEC2018_20181129_NIKON_Leuckartiara_brownei_b_1 to MCMEC2018_20181129_NIKON_Leuckartiara_brownei_b_27, MCMEC2018_20181129_NIKON_Leuckartiara_brownei_c_1 to MCMEC2018_20181129_NIKON_Leuckartiara_brownei_c_6, and MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_1 to MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_15

MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_3-coco 1.0.zip

COCO annotations of Mertensiidae sp. A for the following video (total of 1847 frames): MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_3 (https://youtu.be/0W2HHLW71Pw)

MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_3-coco 1.0.zip

COCO annotations of Leuckartiara brownei for the following video (total of 1367 frames): MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_3 (https://youtu.be/dEIbVYlF_TQ)

MCMEC2019_20191122_SONY_Callianira_cristata_a_1-coco 1.0.zip

COCO annotations of Callianira cristata for the following video (total of 2423 frames): MCMEC2019_20191122_SONY_Callianira_cristata_a_1 (https://youtu.be/30g9CvYh5JE)

MCMEC2019_20191122_SONY_Leptomedusa_sp_B_a_1-coco 1.0.zip

COCO annotations of Leptomedusa sp. B for the following video (total of 1164 frames): MCMEC2019_20191122_SONY_Leptomedusa_sp_B_a_1 (https://youtu.be/hrufuPQ7F8U)

MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1-coco 1.0.zip

COCO annotations of Koellikerina maasi for the following video (total of 1643 frames): MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1 (https://youtu.be/QiBPf_HYrQ8)

MCMEC2019_20191129_SONY_Mertensiidae_sp_A_b_1-coco 1.0.zip

COCO annotations of Mertensiidae sp. A for the following video (total of 239 frames): MCMEC2019_20191129_SONY_Mertensiidae_sp_A_b_1 (https://youtu.be/pvXYlQGZIVg)

MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_2-coco 1.0.zip

COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 444 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_2 (https://youtu.be/2rrQCybEg0Q)

MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_3-coco 1.0.zip

COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 683 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_3 (https://youtu.be/G9tev_gdUvQ)

MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_4-coco 1.0.zip

COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 1127 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_4 (https://youtu.be/NfJjKBRh5Hs)

MCMEC2019_20191130_SONY_Beroe_sp_A_b_1-coco 1.0.zip

COCO annotations of Beroe sp. A for the following video (total of 2171 frames): MCMEC2019_20191130_SONY_Beroe_sp_A_b_1 (https://youtu.be/kGBUQ7ZtH9U)

MCMEC2019_20191130_SONY_Beroe_sp_A_b_2-coco 1.0.zip

COCO annotations of Beroe sp. A for the following video (total of 359 frames): MCMEC2019_20191130_SONY_Beroe_sp_A_b_2 (https://youtu.be/Vbl_KEmPNmU)

Mertensiidae_sp_A_images-coco 1.0.zip

COCO annotations of Mertensiidae sp. A for the following 49 images:

MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_c_1 to MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_c_2, MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_f_1 to MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_f_8, MCMEC2018_20181129_NIKON_Mertensiidae_sp_A_d_1 to MCMEC2018_20181129_NIKON_Mertensiidae_sp_A_d_13, MCMEC2018_20181201_ROV_Mertensiidae_sp_A_e_1 to MCMEC2018_20181201_ROV_Mertensiidae_sp_A_e_15, and MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_1 to MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_11

Pyrostephos_vanhoeffeni_images-coco 1.0.zip

COCO annotations of Pyrostephos vanhoeffeni for the following 14 images: MCMEC2019_20191125_SONY_Pyrostephos_vanhoeffeni_a_1 to MCMEC2019_20191125_SONY_Pyrostephos_vanhoeffeni_a_8, MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_1 to MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_6

Solmundella_bitentaculata_images-coco 1.0.zip

COCO annotations of Solmundella bitentaculata for the following 13 images: MCMEC2018_20181127_NIKON_Solmundella_bitentaculata_a_1 to MCMEC2018_20181127_NIKON_Solmundella_bitentaculata_a_13
R
Cash Counter Dataset
universe.roboflow.com
zip
Updated Mar 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alex Hyams (2025). Cash Counter Dataset [Dataset]. https://universe.roboflow.com/alex-hyams-cosqx/cash-counter/model/3
Explore at:
zipAvailable download formats
Dataset updated
Mar 11, 2025
Dataset authored and provided by
Alex Hyams
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Money Bounding Boxes
Description
This projects combines the Dollar Bill Detection project from Alex Hyams (v13 of the project was exported in COCO JSON format for import to this project) and the Final Counter, or Coin Counter, project from Dawson Mcgee (v6 of the project was exported in COCO JSON format for import to this project).

v1 contains the original imported images, without augmentations. This is the version to download and import to your own project if you'd like to add your own augmentations.

This dataset can be used to create computer vision applications in the banking and finance industry for use cases like detecting and counting US currency.
Z
SPEECH-COCO
data.niaid.nih.gov
zenodo.org
Updated Nov 24, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Laurent Besacier (2020). SPEECH-COCO [Dataset]. https://data.niaid.nih.gov/resources?id=ZENODO_4282266
Explore at:
Dataset updated
Nov 24, 2020
Dataset provided by
Laurent Besacier
William N. Havard
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SpeechCoco

Introduction

Our corpus is an extension of the MS COCO image recognition and captioning dataset. MS COCO comprises images paired with a set of five captions. Yet, it does not include any speech. Therefore, we used Voxygen's text-to-speech system to synthesise the available captions.

The addition of speech as a new modality enables MSCOCO to be used for researches in the field of language acquisition, unsupervised term discovery, keyword spotting, or semantic embedding using speech and vision.

Our corpus is licensed under a Creative Commons Attribution 4.0 License.

Data Set

This corpus contains 616,767 spoken captions from MSCOCO's val2014 and train2014 subsets (respectively 414,113 for train2014 and 202,654 for val2014).

We used 8 different voices. 4 of them have a British accent (Paul, Bronwen, Judith, and Elizabeth) and the 4 others have an American accent (Phil, Bruce, Amanda, Jenny).

In order to make the captions sound more natural, we used SOX tempo command, enabling us to change the speed without changing the pitch. 1/3 of the captions are 10% slower than the original pace, 1/3 are 10% faster. The last third of the captions was kept untouched.

We also modified approximately 30% of the original captions and added disfluencies such as "um", "uh", "er" so that the captions would sound more natural.

Each WAV file is paired with a JSON file containing various information: timecode of each word in the caption, name of the speaker, name of the WAV file, etc. The JSON files have the following data structure:

{ "duration": float, "speaker": string, "synthesisedCaption": string, "timecode": list, "speed": float, "wavFilename": string, "captionID": int, "imgID": int, "disfluency": list }

On average, each caption comprises 10.79 tokens, disfluencies included. The WAV files are on average 3.52 seconds long.

Repository

The repository is organized as follows:

CORPUS-MSCOCO (~75GB once decompressed)

train2014/ : folder contains 413,915 captions

json/

wav/

translations/

train_en_ja.txt

train_translate.sqlite3

train_2014.sqlite3

val2014/ : folder contains 202,520 captions

json/

wav/

translations/

train_en_ja.txt

train_translate.sqlite3

val_2014.sqlite3

speechcoco_API/

speechcoco/

init.py

speechcoco.py

setup.py

Filenames

.wav files contain the spoken version of a caption

.json files contain all the metadata of a given WAV file

.sqlite3 files are SQLite databases containing all the information contained in the JSON files

We adopted the following naming convention for both the WAV and JSON files:

imageID_captionID_Speaker_DisfluencyPosition_Speed[.wav/.json]

Script

We created a script called speechcoco.py in order to handle the metadata and allow the user to easily find captions according to specific filters. The script uses the *.db files.

Features:

Aggregate all the information in the JSON files into a single SQLite database

Find captions according to specific filters (name, gender and nationality of the speaker, disfluency position, speed, duration, and words in the caption). The script automatically builds the SQLite query. The user can also provide his own SQLite query.

The following Python code returns all the captions spoken by a male with an American accent for which the speed was slowed down by 10% and that contain "keys" at any position

create SpeechCoco object

db = SpeechCoco(train_2014.sqlite3, train_translate.sqlite3, verbose=True)

filter captions (returns Caption Objects)

captions = db.filterCaptions(gender="Male", nationality="US", speed=0.9, text='%keys%') for caption in captions: print(' {}\t{}\t{}\t{}\t{}\t{}\t\t{}'.format(caption.imageID, caption.captionID, caption.speaker.name, caption.speaker.nationality, caption.speed, caption.filename, caption.text))

... 298817 26763 Phil 0.9 298817_26763_Phil_None_0-9.wav A group of turkeys with bushes in the background. 108505 147972 Phil 0.9 108505_147972_Phil_Middle_0-9.wav Person using a, um, slider cell phone with blue backlit keys. 258289 154380 Bruce 0.9 258289_154380_Bruce_None_0-9.wav Some donkeys and sheep are in their green pens . 545312 201303 Phil 0.9 545312_201303_Phil_None_0-9.wav A man walking next to a couple of donkeys. ...

Find all the captions belonging to a specific image

captions = db.getImgCaptions(298817) for caption in captions: print(' {}'.format(caption.text))

Birds wondering through grassy ground next to bushes. A flock of turkeys are making their way up a hill. Um, ah. Two wild turkeys in a field walking around. Four wild turkeys and some bushes trees and weeds. A group of turkeys with bushes in the background.

Parse the timecodes and have them structured

input:

... [1926.3068, "SYL", ""], [1926.3068, "SEPR", " "], [1926.3068, "WORD", "white"], [1926.3068, "PHO", "w"], [2050.7955, "PHO", "ai"], [2144.6591, "PHO", "t"], [2179.3182, "SYL", ""], [2179.3182, "SEPR", " "] ...

output:

print(caption.timecode.parse())

... { 'begin': 1926.3068, 'end': 2179.3182, 'syllable': [{'begin': 1926.3068, 'end': 2179.3182, 'phoneme': [{'begin': 1926.3068, 'end': 2050.7955, 'value': 'w'}, {'begin': 2050.7955, 'end': 2144.6591, 'value': 'ai'}, {'begin': 2144.6591, 'end': 2179.3182, 'value': 't'}], 'value': 'wait'}], 'value': 'white' }, ...

Convert the timecodes to Praat TextGrid files

caption.timecode.toTextgrid(outputDir, level=3)

Get the words, syllables and phonemes between n seconds/milliseconds

The following Python code returns all the words between 0.2 and 0.6 seconds for which at least 50% of the word's total length is within the specified interval

pprint(caption.getWords(0.20, 0.60, seconds=True, level=1, olapthr=50))

... 404537 827239 Bruce US 0.9 404537_827239_Bruce_None_0-9.wav Eyeglasses, a cellphone, some keys and other pocket items are all laid out on the cloth. . [ { 'begin': 0.0, 'end': 0.7202778, 'overlapPercentage': 55.53412863758955, 'word': 'eyeglasses' } ] ...

Get the translations of the selected captions

As for now, only japanese translations are available. We also used Kytea to tokenize and tag the captions translated with Google Translate

captions = db.getImgCaptions(298817) for caption in captions: print(' {}'.format(caption.text))

# Get translations and POS print('\tja_google: {}'.format(db.getTranslation(caption.captionID, "ja_google"))) print('\t\tja_google_tokens: {}'.format(db.getTokens(caption.captionID, "ja_google"))) print('\t\tja_google_pos: {}'.format(db.getPOS(caption.captionID, "ja_google"))) print('\tja_excite: {}'.format(db.getTranslation(caption.captionID, "ja_excite")))

Birds wondering through grassy ground next to bushes. ja_google: 鳥は茂みの下に茂った地面を抱えています。 ja_google_tokens: 鳥は茂みの下に茂った地面を抱えています。 ja_google_pos: 鳥/名詞/とりは/助詞/は茂み/名詞/しげみの/助詞/の下/名詞/したに/助詞/に茂/動詞/しげっ/語尾/った/助動詞/た地面/名詞/じめんを/助詞/を抱え/動詞/かかえて/助詞/てい/動詞/いま/助動詞/ます/語尾/す。/補助記号/。 ja_excite: 低木と隣接した草深いグラウンドを通って疑う鳥。

A flock of turkeys are making their way up a hill. ja_google: 七面鳥の群れが丘を上っています。 ja_google_tokens: 七面鳥の群れが丘を上っています。 ja_google_pos: 七/名詞/なな面/名詞/めん鳥/名詞/とりの/助詞/の群れ/名詞/むれが/助詞/が丘/名詞/おかを/助詞/を上/動詞/のぼっ/語尾/って/助詞/てい/動詞/いま/助動詞/ます/語尾/す。/補助記号/。 ja_excite: 七面鳥の群れは丘の上で進んでいる。

Um, ah. Two wild turkeys in a field walking around. ja_google: 野生のシチメンチョウ、野生の七面鳥 ja_google_tokens: 野生のシチメンチョウ、野生の七面鳥 ja_google_pos: 野生/名詞/やせいの/助詞/のシチメンチョウ/名詞/しちめんちょう、/補助記号/、野生/名詞/やせいの/助詞/の七/名詞/なな面/名詞/めん鳥/名詞/ちょう ja_excite: まわりで移動しているフィールドの2羽の野生の七面鳥

Four wild turkeys and some bushes trees and weeds. ja_google: 4本の野生のシチメンチョウといくつかの茂みの木と雑草 ja_google_tokens: 4 本の野生のシチメンチョウといくつかの茂みの木と雑草 ja_google_pos: 4/名詞/4 本/接尾辞/ほんの/助詞/の野生/名詞/やせいの/助詞/のシチメンチョウ/名詞/しちめんちょうと/助詞/と
H
Replication Data for: Training Deep Convolutional Object Detectors for...
dataverse.harvard.edu
Updated Apr 16, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tomasz Gandor (2022). Replication Data for: Training Deep Convolutional Object Detectors for Images Affected by Lossy Compression [Dataset]. http://doi.org/10.7910/DVN/UHEP3C
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/UHEP3C
Dataset updated
Apr 16, 2022
Dataset provided by
Harvard Dataverse
Authors
Tomasz Gandor
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This collection contains the trained models and object detection results of 2 architectures found in the Detectron2 library, on the MS COCO val2017 dataset, under different JPEG compresion level Q = {5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96} (14 levels per trained model). Architectures: F50 – Faster R-CNN on ResNet-50 with FPN R50 – RetinaNet on ResNet-50 with FPN Training type: D2 – Detectron2 Model ZOO pre-trained 1x model (90.000 iterations, batch 16) STD – standard 1x training (90.000 iterations) on original train2017 dataset Q20 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=20 Q40 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=40 T20 – extra 1x training on top of D2 on train2017 dataset degraded to Q=20 T40 – extra 1x training on top of D2 on train2017 dataset degraded to Q=40 Model and metrics files models_FasterRCNN.tar.gz (F50-STD, F50-Q20, …) models_RetinaNet.tar.gz (R50-STD, R50-Q20, …) For every model there are 3 files: config.yaml – the Detectron2 config of the model. model_final.pth – the weights (training snapshot) in PyTorch format. metrics.json – training metrics (like time, total loss, etc.) every 20 iterations. The D2 models were not included, because they are available from the Detectron2 Model ZOO, as faster_rcnn_R_50_FPN_1x (F50-D2) and retinanet_R_50_FPN_1x (R50-D2). Result files F50-results.tar.gz – results for Faster R-CNN models (inluding D2). R50-results.tar.gz – results for RetinaNet models (inluding D2). For every model there are 14 subdirectories, e.g. evaluator_dump_R50x1_005 through evaluator_dump_R50x1_096, for each of the JPEG Q values. Each such folder contains: coco_instances_results.json – all detected objects (image id, bounding box, class index and confidence). results.json – AP metrics as computed by COCO API. Source code for processing the data The data can be processed using our code, published at: https://github.com/tgandor/urban_oculus. Additional dependencies for the source code: COCO API Detectron2
Z
WormSwin: C. elegans Video Datasets
data.niaid.nih.gov
zenodo.org
Updated Jan 31, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deserno, Maurice (2024). WormSwin: C. elegans Video Datasets [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7456802
Explore at:
Dataset updated
Jan 31, 2024
Dataset provided by
Deserno, Maurice
Bozek, Katarzyna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data used for our paper "WormSwin: Instance Segmentation of C. elegans using Vision Transformer".This publication is divided into three parts:

CSB-1 Dataset

Synthetic Images Dataset

MD Dataset

The CSB-1 Dataset consists of frames extracted from videos of Caenorhabditis elegans (C. elegans) annotated with binary masks. Each C. elegans is separately annotated, providing accurate annotations even for overlapping instances. All annotations are provided in binary mask format and as COCO Annotation JSON files (see COCO website).

The videos are named after the following pattern:

<"worm age in hours"_"mutation"_"irradiated (binary)"_"video index (zero based)">

For mutation the following values are possible:

wild type

csb-1 mutant

csb-1 with rescue mutation

An example video name would be 24_1_1_2 meaning it shows C. elegans with csb-1 mutation, being 24h old which got irradiated.

Video data was provided by M. Rieckher; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.The Synthetic Images Dataset was created by cutting out C. elegans (foreground objects) from the CSB-1 Dataset and placing them randomly on background images also taken from the CSB-1 Dataset. Foreground objects were flipped, rotated and slightly blurred before placed on the background images.The same was done with the binary mask annotations taken from CSB-1 Dataset so that they match the foreground objects in the synthetic images. Additionally, we added rings of random color, size, thickness and position to the background images to simulate petri-dish edges.

This synthetic dataset was generated by M. Deserno.The Mating Dataset (MD) consists of 450 grayscale image patches of 1,012 x 1,012 px showing C. elegans with high overlap, crawling on a petri-dish.We took the patches from a 10 min. long video of size 3,036 x 3,036 px. The video was downsampled from 25 fps to 5 fps before selecting 50 random frames for annotating and patching.Like the other datasets, worms were annotated with binary masks and annotations are provided as COCO Annotation JSON files.

The video data was provided by X.-L. Chu; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.

Further details about the datasets can be found in our paper.
C
Annotations for ConfLab A Rich Multimodal Multisensor Dataset of...
data.4tu.nl
Updated Jun 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung (2022). Annotations for ConfLab A Rich Multimodal Multisensor Dataset of Free-Standing Social Interactions In-the-Wild [Dataset]. http://doi.org/10.4121/20017664.v1
Explore at:
Unique identifier
https://doi.org/10.4121/20017664.v1
Dataset updated
Jun 8, 2022
Dataset provided by
4TU.ResearchData
Authors
Chirag Raman; Jose Vargas Quiros; Stephanie Tan; Ashraful Islam; Ekin Gedik; Hayley Hung
License
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
Description
This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.

------------------

./actions/speaking_status:

./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status

The processed annotations consist of:

./speaking: The first row contains person IDs matching the sensor IDs,

The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).

./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.

To load these files with pandas: pd.read_csv(p, index_col=False)

./raw.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

Annotations were done at 60 fps.

--------------------

./pose:

./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints

To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))

The skeleton structure (limbs) is contained within each file in:

f['categories'][0]['skeleton']

and keypoint names at:

f['categories'][0]['keypoints']

./raw.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)

Annotations were done at 60 fps.

---------------------

./f_formations:

seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).

seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).

Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.

First column: time stamp

Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.

phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone
The Object Detection for Olfactory References (ODOR) Dataset
zenodo.org
data.niaid.nih.gov
csv, json +2
Updated Apr 26, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mathias Zinnen; Mathias Zinnen; Prathmesh Madhu; Prathmesh Madhu; Andreas Maier; Andreas Maier; Peter Bell; Peter Bell; Vincent Christlein; Vincent Christlein (2024). The Object Detection for Olfactory References (ODOR) Dataset [Dataset]. http://doi.org/10.5281/zenodo.11070878
Explore at:
json, zip, csv, text/x-pythonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.11070878
Dataset updated
Apr 26, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mathias Zinnen; Mathias Zinnen; Prathmesh Madhu; Prathmesh Madhu; Andreas Maier; Andreas Maier; Peter Bell; Peter Bell; Vincent Christlein; Vincent Christlein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The Object Detection for Olfactory References (ODOR) Dataset

Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes.

Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories.

It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas.

Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception.

How to use

The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year.

In addition to a zip containing the dataset images, we provide links to their source collections in the metadata file and a Python script to conveniently download the artwork images (`download_imgs.py`).

The mapping between the `images` array of the `annotations.json` and the `metadata.csv` file can be accomplished via the `file_name` attribute of the elements of the `images` array and the unique `File Name` column of the `metadata.csv` file, respectively.
Synthetically Spoken COCO
zenodo.org
application/gzip, bin +2
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi; Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi (2020). Synthetically Spoken COCO [Dataset]. http://doi.org/10.5281/zenodo.400926
Explore at:
txt, json, bin, application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.400926
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi; Grzegorz Chrupała; Lieke Gelderloos; Afra Alishahi
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Synthetically Spoken COCO

Version 1.0

This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This
dataset was created as part the research reported in [5].
The speech was generated using gTTS [2]. The dataset consists of the following files:

- dataset.json: Captions associated with MS COCO images. This information comes from [3].
- sentid.txt: List of caption IDs. This file can be used to locate MFCC features of the MP3 files
in the numpy array stored in dataset.mfcc.npy.
- mp3.tgz: MP3 files with the audio. Each file name corresponds to caption ID in dataset.json
and in sentid.txt.
- dataset.mfcc.npy: Numpy array with the Mel Frequence Cepstral Coefficients extracted from
the audio. Each row corresponds to a caption. The order or the captions corresponds to the
ordering in the file sentid.txt. MFCCs were extracted using [4].

[1] http://mscoco.org/dataset/#overview
[2] https://pypi.python.org/pypi/gTTS
[3] https://github.com/karpathy/neuraltalk
[4] https://github.com/jameslyons/python_speech_features
[5] https://arxiv.org/abs/1702.01991
ActiveHuman Part 1
zenodo.org
data.niaid.nih.gov
Updated Nov 14, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Charalampos Georgiadis; Charalampos Georgiadis (2023). ActiveHuman Part 1 [Dataset]. http://doi.org/10.5281/zenodo.8359766
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.8359766
Dataset updated
Nov 14, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Charalampos Georgiadis; Charalampos Georgiadis
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here.
Dataset Description
ActiveHuman was generated using Unity's Perception package.
It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).
The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.
Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.

Folder configuration
The dataset consists of 3 folders:
JSON Data: Contains all the generated JSON files.
RGB Images: Contains the generated RGB images.
Semantic Segmentation Images: Contains the generated semantic segmentation images.

Essential Terminology
Annotation: Recorded data describing a single capture.
Capture: One completed rendering process of a Unity sensor which stored the rendered result to data files (e.g. PNG, JPG, etc.).
Ego: Object or person on which a collection of sensors is attached to (e.g., if a drone has a camera attached to it, the drone would be the ego and the camera would be the sensor).
Ego coordinate system: Coordinates with respect to the ego.
Global coordinate system: Coordinates with respect to the global origin in Unity.
Sensor: Device that captures the dataset (in this instance the sensor is a camera).
Sensor coordinate system: Coordinates with respect to the sensor.
Sequence: Time-ordered series of captures. This is very useful for video capture where the time-order relationship of two captures is vital.
UIID: Universal Unique Identifier. It is a unique hexadecimal identifier that can represent an individual instance of a capture, ego, sensor, annotation, labeled object or keypoint, or keypoint template.

Dataset Data
The dataset includes 4 types of JSON annotation files files:
annotation_definitions.json: Contains annotation definitions for all of the active Labelers of the simulation stored in an array. Each entry consists of a collection of key-value pairs which describe a particular type of annotation and contain information about that specific annotation describing how its data should be mapped back to labels or objects in the scene. Each entry contains the following key-value pairs:
id: Integer identifier of the annotation's definition.
name: Annotation name (e.g., keypoints, bounding box, bounding box 3D, semantic segmentation).
description: Description of the annotation's specifications.
format: Format of the file containing the annotation specifications (e.g., json, PNG).
spec: Format-specific specifications for the annotation values generated by each Labeler.

Most Labelers generate different annotation specifications in the spec key-value pair:
BoundingBox2DLabeler/BoundingBox3DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
KeypointLabeler:
template_id: Keypoint template UUID.
template_name: Name of the keypoint template.
key_points: Array containing all the joints defined by the keypoint template. This array includes the key-value pairs:
label: Joint label.
index: Joint index.
color: RGBA values of the keypoint.
color_code: Hex color code of the keypoint
skeleton: Array containing all the skeleton connections defined by the keypoint template. Each skeleton connection defines a connection between two different joints. This array includes the key-value pairs:
label1: Label of the first joint.
label2: Label of the second joint.
joint1: Index of the first joint.
joint2: Index of the second joint.
color: RGBA values of the connection.
color_code: Hex color code of the connection.
SemanticSegmentationLabeler:
label_name: String identifier of a label.
pixel_value: RGBA values of the label.
color_code: Hex color code of the label.

captures_xyz.json: Each of these files contains an array of ground truth annotations generated by each active Labeler for each capture separately, as well as extra metadata that describe the state of each active sensor that is present in the scene. Each array entry in the contains the following key-value pairs:
id: UUID of the capture.
sequence_id: UUID of the sequence.
step: Index of the capture within a sequence.
timestamp: Timestamp (in ms) since the beginning of a sequence.
sensor: Properties of the sensor. This entry contains a collection with the following key-value pairs:
sensor_id: Sensor UUID.
ego_id: Ego UUID.
modality: Modality of the sensor (e.g., camera, radar).
translation: 3D vector that describes the sensor's position (in meters) with respect to the global coordinate system.
rotation: Quaternion variable that describes the sensor's orientation with respect to the ego coordinate system.
camera_intrinsic: matrix containing (if it exists) the camera's intrinsic calibration.
projection: Projection type used by the camera (e.g., orthographic, perspective).
ego: Attributes of the ego. This entry contains a collection with the following key-value pairs:
ego_id: Ego UUID.
translation: 3D vector that describes the ego's position (in meters) with respect to the global coordinate system.
rotation: Quaternion variable containing the ego's orientation.
velocity: 3D vector containing the ego's velocity (in meters per second).
acceleration: 3D vector containing the ego's acceleration (in ).
format: Format of the file captured by the sensor (e.g., PNG, JPG).
annotations: Key-value pair collections, one for each active Labeler. These key-value pairs are as follows:
id: Annotation UUID .
annotation_definition: Integer identifier of the annotation's definition.
filename: Name of the file generated by the Labeler. This entry is only present for Labelers that generate an image.
values: List of key-value pairs containing annotation data for the current Labeler.

Each Labeler generates different annotation specifications in the values key-value pair:
BoundingBox2DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
x: Position of the 2D bounding box on the X axis.
y: Position of the 2D bounding box position on the Y axis.
width: Width of the 2D bounding box.
height: Height of the 2D bounding box.
BoundingBox3DLabeler:
label_id: Integer identifier of a label.
label_name: String identifier of a label.
instance_id: UUID of one instance of an object. Each object with the same label that is visible on the same capture has different instance_id values.
translation: 3D vector containing the location of the center of the 3D bounding box with respect to the sensor coordinate system (in meters).
size: 3D
Udacity Self Driving Car Dataset
universe.roboflow.com
kaggle.com
zip
Updated Aug 8, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Roboflow (2022). Udacity Self Driving Car Dataset [Dataset]. https://universe.roboflow.com/roboflow-gw7yv/self-driving-car/dataset/1
Explore at:
zipAvailable download formats
Dataset updated
Aug 8, 2022
Dataset authored and provided by
Roboflow
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Variables measured
Obstacles
Description
Overview

The original Udacity Self Driving Car Dataset is missing labels for thousands of pedestrians, bikers, cars, and traffic lights. This will result in poor model performance. When used in the context of self driving cars, this could even lead to human fatalities.

We re-labeled the dataset to correct errors and omissions. We have provided convenient downloads in many formats including VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and more.

Some examples of labels missing from the original dataset: https://i.imgur.com/A5J3qSt.jpg" alt="Examples of Missing Labels">

Stats

The dataset contains 97,942 labels across 11 classes and 15,000 images. There are 1,720 null examples (images with no labels).

All images are 1920x1200 (download size ~3.1 GB). We have also provided a version downsampled to 512x512 (download size ~580 MB) that is suitable for most common machine learning models (including YOLO v3, Mask R-CNN, SSD, and mobilenet).

Annotations have been hand-checked for accuracy by Roboflow.

https://i.imgur.com/bOFkueI.pnghttps://" alt="Class Balance">

Annotation Distribution: https://i.imgur.com/NwcrQKK.png" alt="Annotation Heatmap">

Use Cases

Udacity is building an open source self driving car! You might also try using this dataset to do person-detection and tracking.

Using this Dataset

Our updates to the dataset are released under the MIT License (the same license as the original annotations and images).

Note: the dataset contains many duplicated bounding boxes for the same subject which we have not corrected. You will probably want to filter them by taking the IOU for classes that are 100% overlapping or it could affect your model performance (expecially in stoplight detection which seems to suffer from an especially severe case of duplicated bounding boxes).

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:
W
TexBiG
webis.de
anthology.aicmu.ac.cn
6885143
Updated 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Volker Rodehorst; Benno Stein (2022). TexBiG [Dataset]. http://doi.org/10.5281/zenodo.6885143
Explore at:
6885143Available download formats
Unique identifier
https://doi.org/10.5281/zenodo.6885143
Dataset updated
2022
Dataset provided by
The Web Technology & Information Systems Network
Bauhaus-Universität Weimar
Authors
Volker Rodehorst; Benno Stein
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.
R
Pascal VOC 2012 Object Detection Dataset - raw
public.roboflow.com
zip
Updated May 23, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PASCAL (2024). Pascal VOC 2012 Object Detection Dataset - raw [Dataset]. https://public.roboflow.com/object-detection/pascal-voc-2012/1
Explore at:
zipAvailable download formats
Dataset updated
May 23, 2024
Dataset authored and provided by
PASCAL
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Variables measured
Bounding Boxes of VOC
Description
Pascal VOC 2012 is common benchmark for object detection. It contains common objects that one might find in images on the web.

https://i.imgur.com/y2sB9fD.png" alt="Image example">

Note: the test set is witheld, as is common with benchmark datasets.

You can think of it sort of like a baby COCO.

Facebook

Twitter

Click to copy link

Link copied

Cite

cocoforrcnn (2025). Yolo To Coco Json Dataset [Dataset]. https://universe.roboflow.com/cocoforrcnn/yolo-to-coco-json-7ot5m/model/2

Yolo To Coco Json Dataset

yolo-to-coco-json-7ot5m

yolo-to-coco-json-dataset

Explore at:

zipAvailable download formats

Dataset updated

Feb 24, 2025

Dataset authored and provided by

cocoforrcnn

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Variables measured

Objects Bounding Boxes

Description

Yolo To Coco Json

## Overview

Yolo To Coco Json is a dataset for object detection tasks - it contains Objects annotations for 1,954 images.

## Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

  ## License

  This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).

Clear search

Close search

Google apps

Main menu

Yolo To Coco Json Dataset

Yolo To Coco Json

Sartorius COCO Format Dataset

Dataset

Contents

Databases in MS COCO (json) format

coco json file of mtsd train and val

Dataset

Contents

COCO-JSON Annotated Wind Turbine Surface Damage

Dataset

Contents

COCO dataset and neural network weights for micro-FTIR particle detection on...

Esefjorden Marine Vegetation Segmentation Dataset (EMVSD)

MOBDrone: a large-scale drone-view dataset for man overboard detection

Data from: Life beneath the ice: jellyfish and ctenophores from the Ross...

Cash Counter Dataset

SPEECH-COCO

create SpeechCoco object

filter captions (returns Caption Objects)

Replication Data for: Training Deep Convolutional Object Detectors for...

WormSwin: C. elegans Video Datasets

Annotations for ConfLab A Rich Multimodal Multisensor Dataset of...

The Object Detection for Olfactory References (ODOR) Dataset

Synthetically Spoken COCO

ActiveHuman Part 1

Udacity Self Driving Car Dataset

Overview

Stats

Use Cases

Using this Dataset

About Roboflow

TexBiG

Pascal VOC 2012 Object Detection Dataset - raw

Yolo To Coco Json Dataset

yolo-to-coco-json-7ot5m

yolo-to-coco-json-dataset

Yolo To Coco Json