Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Yolo To Coco Json is a dataset for object detection tasks - it contains Objects annotations for 1,954 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
This dataset was created by Ari
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Databases in MS COCO (json) format
This dataset was created by GREAT23U5
This dataset was created by Ajifoster3
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The IMPTOX project has received funding from the EU's H2020 framework programme for research and innovation under grant agreement n. 965173. Imptox is part of the European MNP cluster on human health.
More information about the project here.
Description: This repository includes the trained weights and a custom COCO-formatted dataset used for developing and testing a Faster R-CNN R_50_FPN_3x object detector, specifically designed to identify particles in micro-FTIR filter images.
Contents:
Weights File (neuralNetWeights_V3.pth):
Format: .pth
Description: This file contains the trained weights for a Faster R-CNN model with a ResNet-50 backbone and a Feature Pyramid Network (FPN), trained for 3x schedule. These weights are specifically tuned for detecting particles in micro-FTIR filter images.
Custom COCO Dataset (uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip):
Format: .zip
Description: This zip archive contains a custom COCO-formatted dataset, including JPEG images and their corresponding annotation file. The dataset consists of images of micro-FTIR filters with annotated particles.
Contents:
Images: JPEG format images of micro-FTIR filters.
Annotations: A JSON file in COCO format providing detailed annotations of the particles in the images.
Management: The dataset can be managed and manipulated using the Pycocotools library, facilitating easy integration with existing COCO tools and workflows.
Applications: The provided weights and dataset are intended for researchers and practitioners in the field of microscopy and particle detection. The dataset and model can be used for further training, validation, and fine-tuning of object detection models in similar domains.
Usage Notes:
The neuralNetWeights_V3.pth file should be loaded into a PyTorch model compatible with the Faster R-CNN architecture, such as Detectron2.
The contents of uFTIR_curated_square.v5-uftir_curated_square_2024-03-14.coco-segmentation.zip should be extracted and can be used with any COCO-compatible object detection framework for training and evaluation purposes.
Code can be found on the related Github repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train
, val
, and test
, located under the images/
directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train
, val
, and test
, located under the labels/
directory. - Additional metadata: - counts.txt
: Summary of label distributions. - Cache files (train.cache
, val.cache
, test.cache
) for efficient dataset loading.- Metadata: - classes.txt
: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json
- val_annotations.json
- test_annotations.json
- Configuration File: - EMVSD.yaml
: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset
The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.
In this repository, we provide:
66 Full HD video clips (total size: 5.5 GB)
126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)
3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):
annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood
annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.
annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.
The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:
More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973
The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval
See also http://aimh.isti.cnr.it/dataset/MOBDrone
Citing the MOBDrone
The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form.
Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people
@inproceedings{MOBDrone2021, title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue}, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing}, year={2021} }
and this Zenodo Dataset
@dataset{donato_cafarelli_2022_5996890, author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi}, title = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}}, month = feb, year = 2022, publisher = {Zenodo}, version = {1.0.0}, doi = {10.5281/zenodo.5996890}, url = {https://doi.org/10.5281/zenodo.5996890} }
Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.
Contact Information
If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it
Acknowledgements
This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This Zenodo dataset contain the Common Objects in Context (COCO) files linked to the following publication:
Verhaegen, G, Cimoli, E, & Lindsay, D (2021). Life beneath the ice: jellyfish and ctenophores from the Ross Sea, Antarctica, with an image-based training set for machine learning. Biodiversity Data Journal.
Each COCO zip folder contains an "annotations" folder including a json file and an "images" folder containing the annotated images.
Details on each COCO zip folders:
COCO annotations of Beroe sp. A for the following 114 images:
MCMEC2018_20181116_NIKON_Beroe_sp_A_c_1 to MCMEC2018_20181116_NIKON_Beroe_sp_A_c_16, MCMEC2018_20181125_NIKON_Beroe_sp_A_d_1 to MCMEC2018_20181125_NIKON_Beroe_sp_A_d_57, MCMEC2018_20181127_NIKON_Beroe_sp_A_e_1 to MCMEC2018_20181127_NIKON_Beroe_sp_A_e_2, MCMEC2019_20191116_SONY_Beroe_sp_A_a_1 to MCMEC2019_20191116_SONY_Beroe_sp_A_a_28, and MCMEC2019_20191127_SONY_Beroe_sp_A_f_1 to MCMEC2019_20191127_SONY_Beroe_sp_A_f_12
COCO annotations of Beroe sp. B for the following 2 images:
MCMEC2019_20191115_SONY_Beroe_sp_B_a_1 and MCMEC2019_20191115_SONY_Beroe_sp_B_a_2
COCO annotations of Callianira cristata for the following 21 images:
MCMEC2019_20191120_SONY_Callianira_cristata_b_1 to MCMEC2019_20191120_SONY_Callianira_cristata_b_21
COCO annotations of Diplulmaris antarctica for the following 83 images:
MCMEC2019_20191116_SONY_Diplulmaris_antarctica_a_1 to MCMEC2019_20191116_SONY_Diplulmaris_antarctica_a_9, and MCMEC2019_20191201_SONY_Diplulmaris_antarctica_c_1 to MCMEC2019_20191201_SONY_Diplulmaris_antarctica_c_74
COCO annotations of Koellikerina maasi for the following 49 images:
MCMEC2018_20181127_NIKON_Koellikerina_maasi_b_1 to MCMEC2018_20181127_NIKON_Koellikerina_maasi_b_4, MCMEC2018_20181129_NIKON_Koellikerina_maasi_c_1 to MCMEC2018_20181129_NIKON_Koellikerina_maasi_c_29, and MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1 to MCMEC2019_20191126_SONY_Koellikerina_maasi_a_16
COCO annotations of Leptomedusa sp. A for Figure 5 (see paper).
COCO annotations of Leuckartiara brownei for the following 48 images:
MCMEC2018_20181129_NIKON_Leuckartiara_brownei_b_1 to MCMEC2018_20181129_NIKON_Leuckartiara_brownei_b_27, MCMEC2018_20181129_NIKON_Leuckartiara_brownei_c_1 to MCMEC2018_20181129_NIKON_Leuckartiara_brownei_c_6, and MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_1 to MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_15
COCO annotations of Mertensiidae sp. A for the following video (total of 1847 frames): MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_3 (https://youtu.be/0W2HHLW71Pw)
COCO annotations of Leuckartiara brownei for the following video (total of 1367 frames): MCMEC2019_20191116_SONY_Leuckartiara_brownei_a_3 (https://youtu.be/dEIbVYlF_TQ)
COCO annotations of Callianira cristata for the following video (total of 2423 frames): MCMEC2019_20191122_SONY_Callianira_cristata_a_1 (https://youtu.be/30g9CvYh5JE)
COCO annotations of Leptomedusa sp. B for the following video (total of 1164 frames): MCMEC2019_20191122_SONY_Leptomedusa_sp_B_a_1 (https://youtu.be/hrufuPQ7F8U)
COCO annotations of Koellikerina maasi for the following video (total of 1643 frames): MCMEC2019_20191126_SONY_Koellikerina_maasi_a_1 (https://youtu.be/QiBPf_HYrQ8)
COCO annotations of Mertensiidae sp. A for the following video (total of 239 frames): MCMEC2019_20191129_SONY_Mertensiidae_sp_A_b_1 (https://youtu.be/pvXYlQGZIVg)
COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 444 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_2 (https://youtu.be/2rrQCybEg0Q)
COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 683 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_3 (https://youtu.be/G9tev_gdUvQ)
COCO annotations of Pyrostephos vanhoeffeni for the following video (total of 1127 frames): MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_4 (https://youtu.be/NfJjKBRh5Hs)
COCO annotations of Beroe sp. A for the following video (total of 2171 frames): MCMEC2019_20191130_SONY_Beroe_sp_A_b_1 (https://youtu.be/kGBUQ7ZtH9U)
COCO annotations of Beroe sp. A for the following video (total of 359 frames): MCMEC2019_20191130_SONY_Beroe_sp_A_b_2 (https://youtu.be/Vbl_KEmPNmU)
COCO annotations of Mertensiidae sp. A for the following 49 images:
MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_c_1 to MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_c_2, MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_f_1 to MCMEC2018_20181127_NIKON_Mertensiidae_sp_A_f_8, MCMEC2018_20181129_NIKON_Mertensiidae_sp_A_d_1 to MCMEC2018_20181129_NIKON_Mertensiidae_sp_A_d_13, MCMEC2018_20181201_ROV_Mertensiidae_sp_A_e_1 to MCMEC2018_20181201_ROV_Mertensiidae_sp_A_e_15, and MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_1 to MCMEC2019_20191115_SONY_Mertensiidae_sp_A_a_11
COCO annotations of Pyrostephos vanhoeffeni for the following 14 images: MCMEC2019_20191125_SONY_Pyrostephos_vanhoeffeni_a_1 to MCMEC2019_20191125_SONY_Pyrostephos_vanhoeffeni_a_8, MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_1 to MCMEC2019_20191129_SONY_Pyrostephos_vanhoeffeni_b_6
COCO annotations of Solmundella bitentaculata for the following 13 images: MCMEC2018_20181127_NIKON_Solmundella_bitentaculata_a_1 to MCMEC2018_20181127_NIKON_Solmundella_bitentaculata_a_13
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This projects combines the Dollar Bill Detection project from Alex Hyams (v13
of the project was exported in COCO JSON format for import to this project) and the Final Counter, or Coin Counter, project from Dawson Mcgee (v6
of the project was exported in COCO JSON format for import to this project).
v1
contains the original imported images, without augmentations. This is the version to download and import to your own project if you'd like to add your own augmentations.
This dataset can be used to create computer vision applications in the banking and finance industry for use cases like detecting and counting US currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SpeechCoco
Introduction
Our corpus is an extension of the MS COCO image recognition and captioning dataset. MS COCO comprises images paired with a set of five captions. Yet, it does not include any speech. Therefore, we used Voxygen's text-to-speech system to synthesise the available captions.
The addition of speech as a new modality enables MSCOCO to be used for researches in the field of language acquisition, unsupervised term discovery, keyword spotting, or semantic embedding using speech and vision.
Our corpus is licensed under a Creative Commons Attribution 4.0 License.
Data Set
This corpus contains 616,767 spoken captions from MSCOCO's val2014 and train2014 subsets (respectively 414,113 for train2014 and 202,654 for val2014).
We used 8 different voices. 4 of them have a British accent (Paul, Bronwen, Judith, and Elizabeth) and the 4 others have an American accent (Phil, Bruce, Amanda, Jenny).
In order to make the captions sound more natural, we used SOX tempo command, enabling us to change the speed without changing the pitch. 1/3 of the captions are 10% slower than the original pace, 1/3 are 10% faster. The last third of the captions was kept untouched.
We also modified approximately 30% of the original captions and added disfluencies such as "um", "uh", "er" so that the captions would sound more natural.
Each WAV file is paired with a JSON file containing various information: timecode of each word in the caption, name of the speaker, name of the WAV file, etc. The JSON files have the following data structure:
{ "duration": float, "speaker": string, "synthesisedCaption": string, "timecode": list, "speed": float, "wavFilename": string, "captionID": int, "imgID": int, "disfluency": list }
On average, each caption comprises 10.79 tokens, disfluencies included. The WAV files are on average 3.52 seconds long.
Repository
The repository is organized as follows:
CORPUS-MSCOCO (~75GB once decompressed)
train2014/ : folder contains 413,915 captions
json/
wav/
translations/
train_en_ja.txt
train_translate.sqlite3
train_2014.sqlite3
val2014/ : folder contains 202,520 captions
json/
wav/
translations/
train_en_ja.txt
train_translate.sqlite3
val_2014.sqlite3
speechcoco_API/
speechcoco/
init.py
speechcoco.py
setup.py
Filenames
.wav files contain the spoken version of a caption
.json files contain all the metadata of a given WAV file
.sqlite3 files are SQLite databases containing all the information contained in the JSON files
We adopted the following naming convention for both the WAV and JSON files:
imageID_captionID_Speaker_DisfluencyPosition_Speed[.wav/.json]
Script
We created a script called speechcoco.py in order to handle the metadata and allow the user to easily find captions according to specific filters. The script uses the *.db files.
Features:
Aggregate all the information in the JSON files into a single SQLite database
Find captions according to specific filters (name, gender and nationality of the speaker, disfluency position, speed, duration, and words in the caption). The script automatically builds the SQLite query. The user can also provide his own SQLite query.
The following Python code returns all the captions spoken by a male with an American accent for which the speed was slowed down by 10% and that contain "keys" at any position
db = SpeechCoco(train_2014.sqlite3, train_translate.sqlite3, verbose=True)
captions = db.filterCaptions(gender="Male", nationality="US", speed=0.9, text='%keys%') for caption in captions: print(' {}\t{}\t{}\t{}\t{}\t{}\t\t{}'.format(caption.imageID, caption.captionID, caption.speaker.name, caption.speaker.nationality, caption.speed, caption.filename, caption.text))
... 298817 26763 Phil 0.9 298817_26763_Phil_None_0-9.wav A group of turkeys with bushes in the background. 108505 147972 Phil 0.9 108505_147972_Phil_Middle_0-9.wav Person using a, um, slider cell phone with blue backlit keys. 258289 154380 Bruce 0.9 258289_154380_Bruce_None_0-9.wav Some donkeys and sheep are in their green pens . 545312 201303 Phil 0.9 545312_201303_Phil_None_0-9.wav A man walking next to a couple of donkeys. ...
Find all the captions belonging to a specific image
captions = db.getImgCaptions(298817) for caption in captions: print(' {}'.format(caption.text))
Birds wondering through grassy ground next to bushes. A flock of turkeys are making their way up a hill. Um, ah. Two wild turkeys in a field walking around. Four wild turkeys and some bushes trees and weeds. A group of turkeys with bushes in the background.
Parse the timecodes and have them structured
input:
... [1926.3068, "SYL", ""], [1926.3068, "SEPR", " "], [1926.3068, "WORD", "white"], [1926.3068, "PHO", "w"], [2050.7955, "PHO", "ai"], [2144.6591, "PHO", "t"], [2179.3182, "SYL", ""], [2179.3182, "SEPR", " "] ...
output:
print(caption.timecode.parse())
... { 'begin': 1926.3068, 'end': 2179.3182, 'syllable': [{'begin': 1926.3068, 'end': 2179.3182, 'phoneme': [{'begin': 1926.3068, 'end': 2050.7955, 'value': 'w'}, {'begin': 2050.7955, 'end': 2144.6591, 'value': 'ai'}, {'begin': 2144.6591, 'end': 2179.3182, 'value': 't'}], 'value': 'wait'}], 'value': 'white' }, ...
Convert the timecodes to Praat TextGrid files
caption.timecode.toTextgrid(outputDir, level=3)
Get the words, syllables and phonemes between n seconds/milliseconds
The following Python code returns all the words between 0.2 and 0.6 seconds for which at least 50% of the word's total length is within the specified interval
pprint(caption.getWords(0.20, 0.60, seconds=True, level=1, olapthr=50))
... 404537 827239 Bruce US 0.9 404537_827239_Bruce_None_0-9.wav Eyeglasses, a cellphone, some keys and other pocket items are all laid out on the cloth. . [ { 'begin': 0.0, 'end': 0.7202778, 'overlapPercentage': 55.53412863758955, 'word': 'eyeglasses' } ] ...
Get the translations of the selected captions
As for now, only japanese translations are available. We also used Kytea to tokenize and tag the captions translated with Google Translate
captions = db.getImgCaptions(298817) for caption in captions: print(' {}'.format(caption.text))
# Get translations and POS
print('\tja_google: {}'.format(db.getTranslation(caption.captionID, "ja_google")))
print('\t\tja_google_tokens: {}'.format(db.getTokens(caption.captionID, "ja_google")))
print('\t\tja_google_pos: {}'.format(db.getPOS(caption.captionID, "ja_google")))
print('\tja_excite: {}'.format(db.getTranslation(caption.captionID, "ja_excite")))
Birds wondering through grassy ground next to bushes. ja_google: 鳥は茂みの下に茂った地面を抱えています。 ja_google_tokens: 鳥 は 茂み の 下 に 茂 っ た 地面 を 抱え て い ま す 。 ja_google_pos: 鳥/名詞/とり は/助詞/は 茂み/名詞/しげみ の/助詞/の 下/名詞/した に/助詞/に 茂/動詞/しげ っ/語尾/っ た/助動詞/た 地面/名詞/じめん を/助詞/を 抱え/動詞/かかえ て/助詞/て い/動詞/い ま/助動詞/ま す/語尾/す 。/補助記号/。 ja_excite: 低木と隣接した草深いグラウンドを通って疑う鳥。
A flock of turkeys are making their way up a hill. ja_google: 七面鳥の群れが丘を上っています。 ja_google_tokens: 七 面 鳥 の 群れ が 丘 を 上 っ て い ま す 。 ja_google_pos: 七/名詞/なな 面/名詞/めん 鳥/名詞/とり の/助詞/の 群れ/名詞/むれ が/助詞/が 丘/名詞/おか を/助詞/を 上/動詞/のぼ っ/語尾/っ て/助詞/て い/動詞/い ま/助動詞/ま す/語尾/す 。/補助記号/。 ja_excite: 七面鳥の群れは丘の上で進んでいる。
Um, ah. Two wild turkeys in a field walking around. ja_google: 野生のシチメンチョウ、野生の七面鳥 ja_google_tokens: 野生 の シチメンチョウ 、 野生 の 七 面 鳥 ja_google_pos: 野生/名詞/やせい の/助詞/の シチメンチョウ/名詞/しちめんちょう 、/補助記号/、 野生/名詞/やせい の/助詞/の 七/名詞/なな 面/名詞/めん 鳥/名詞/ちょう ja_excite: まわりで移動しているフィールドの2羽の野生の七面鳥
Four wild turkeys and some bushes trees and weeds. ja_google: 4本の野生のシチメンチョウといくつかの茂みの木と雑草 ja_google_tokens: 4 本 の 野生 の シチメンチョウ と いく つ か の 茂み の 木 と 雑草 ja_google_pos: 4/名詞/4 本/接尾辞/ほん の/助詞/の 野生/名詞/やせい の/助詞/の シチメンチョウ/名詞/しちめんちょう と/助詞/と
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This collection contains the trained models and object detection results of 2 architectures found in the Detectron2 library, on the MS COCO val2017 dataset, under different JPEG compresion level Q = {5, 12, 19, 26, 33, 40, 47, 54, 61, 68, 75, 82, 89, 96} (14 levels per trained model). Architectures: F50 – Faster R-CNN on ResNet-50 with FPN R50 – RetinaNet on ResNet-50 with FPN Training type: D2 – Detectron2 Model ZOO pre-trained 1x model (90.000 iterations, batch 16) STD – standard 1x training (90.000 iterations) on original train2017 dataset Q20 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=20 Q40 – 1x training (90.000 iterations) on train2017 dataset degraded to Q=40 T20 – extra 1x training on top of D2 on train2017 dataset degraded to Q=20 T40 – extra 1x training on top of D2 on train2017 dataset degraded to Q=40 Model and metrics files models_FasterRCNN.tar.gz (F50-STD, F50-Q20, …) models_RetinaNet.tar.gz (R50-STD, R50-Q20, …) For every model there are 3 files: config.yaml – the Detectron2 config of the model. model_final.pth – the weights (training snapshot) in PyTorch format. metrics.json – training metrics (like time, total loss, etc.) every 20 iterations. The D2 models were not included, because they are available from the Detectron2 Model ZOO, as faster_rcnn_R_50_FPN_1x (F50-D2) and retinanet_R_50_FPN_1x (R50-D2). Result files F50-results.tar.gz – results for Faster R-CNN models (inluding D2). R50-results.tar.gz – results for RetinaNet models (inluding D2). For every model there are 14 subdirectories, e.g. evaluator_dump_R50x1_005 through evaluator_dump_R50x1_096, for each of the JPEG Q values. Each such folder contains: coco_instances_results.json – all detected objects (image id, bounding box, class index and confidence). results.json – AP metrics as computed by COCO API. Source code for processing the data The data can be processed using our code, published at: https://github.com/tgandor/urban_oculus. Additional dependencies for the source code: COCO API Detectron2
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data used for our paper "WormSwin: Instance Segmentation of C. elegans using Vision Transformer".This publication is divided into three parts:
CSB-1 Dataset
Synthetic Images Dataset
MD Dataset
The CSB-1 Dataset consists of frames extracted from videos of Caenorhabditis elegans (C. elegans) annotated with binary masks. Each C. elegans is separately annotated, providing accurate annotations even for overlapping instances. All annotations are provided in binary mask format and as COCO Annotation JSON files (see COCO website).
The videos are named after the following pattern:
<"worm age in hours"_"mutation"_"irradiated (binary)"_"video index (zero based)">
For mutation the following values are possible:
wild type
csb-1 mutant
csb-1 with rescue mutation
An example video name would be 24_1_1_2 meaning it shows C. elegans with csb-1 mutation, being 24h old which got irradiated.
Video data was provided by M. Rieckher; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.The Synthetic Images Dataset was created by cutting out C. elegans (foreground objects) from the CSB-1 Dataset and placing them randomly on background images also taken from the CSB-1 Dataset. Foreground objects were flipped, rotated and slightly blurred before placed on the background images.The same was done with the binary mask annotations taken from CSB-1 Dataset so that they match the foreground objects in the synthetic images. Additionally, we added rings of random color, size, thickness and position to the background images to simulate petri-dish edges.
This synthetic dataset was generated by M. Deserno.The Mating Dataset (MD) consists of 450 grayscale image patches of 1,012 x 1,012 px showing C. elegans with high overlap, crawling on a petri-dish.We took the patches from a 10 min. long video of size 3,036 x 3,036 px. The video was downsampled from 25 fps to 5 fps before selecting 50 random frames for annotating and patching.Like the other datasets, worms were annotated with binary masks and annotations are provided as COCO Annotation JSON files.
The video data was provided by X.-L. Chu; Instance Segmentation Annotations were created under supervision of K. Bozek and M. Deserno.
Further details about the datasets can be found in our paper.
https://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdfhttps://data.4tu.nl/info/fileadmin/user_upload/Documenten/4TU.ResearchData_Restricted_Data_2022.pdf
This file contains the annotations for the ConfLab dataset, including actions (speaking status), pose, and F-formations.
------------------
./actions/speaking_status:
./processed: the processed speaking status files, aggregated into a single data frame per segment. Skipped rows in the raw data (see https://josedvq.github.io/covfee/docs/output for details) have been imputed using the code at: https://github.com/TUDelft-SPC-Lab/conflab/tree/master/preprocessing/speaking_status
The processed annotations consist of:
./speaking: The first row contains person IDs matching the sensor IDs,
The rest of the row contains binary speaking status annotations at 60fps for the corresponding 2 min video segment (7200 frames).
./confidence: Same as above. These annotations reflect the continuous-valued rating of confidence of the annotators in their speaking annotation.
To load these files with pandas: pd.read_csv(p, index_col=False)
./raw.zip: the raw outputs from speaking status annotation for each of the eight annotated 2-min video segments. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
--------------------
./pose:
./coco: the processed pose files in coco JSON format, aggregated into a single data frame per video segment. These files have been generated from the raw files using the code at: https://github.com/TUDelft-SPC-Lab/conflab-keypoints
To load in Python: f = json.load(open('/path/to/cam2_vid3_seg1_coco.json'))
The skeleton structure (limbs) is contained within each file in:
f['categories'][0]['skeleton']
and keypoint names at:
f['categories'][0]['keypoints']
./raw.zip: the raw outputs from continuous pose annotation. These were were output by the covfee annotation tool (https://github.com/josedvq/covfee)
Annotations were done at 60 fps.
---------------------
./f_formations:
seg 2: 14:00 onwards, for videos of the form x2xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
seg 3: for videos of the form x3xxx.MP4 in /video/raw/ for the relevant cameras (2,4,6,8,10).
Note that camera 10 doesn't include meaningful subject information/body parts that are not already covered in camera 8.
First column: time stamp
Second column: "()" delineates groups, "<>" delineates subjects, cam X indicates the best camera view for which a particular group exists.
phone.csv: time stamp (pertaining to seg3), corresponding group, ID of person using the phone
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Object Detection for Olfactory References (ODOR) Dataset
Real-world applications of computer vision in the humanities require algorithms to be robust against artistic abstraction, peripheral objects, and subtle differences between fine-grained target classes.
Existing datasets provide instance-level annotations on artworks but are generally biased towards the image centre and limited with regard to detailed object classes. The ODOR dataset fills this gap, offering 38,116 object-level annotations across 4,712 images, spanning an extensive set of 139 fine-grained categories.
It has challenging dataset properties, such as a detailed set of categories, dense and overlapping objects, and spatial distribution over the whole image canvas.
Inspiring further research on artwork object detection and broader visual cultural heritage studies, the dataset challenges researchers to explore the intersection of object recognition and smell perception.
How to use
The annotations are provided in COCO JSON format. To represent the two-level hierarchy of the object classes, we make use of the supercategory field in the categories array as defined by COCO. In addition to the object-level annotations, we provide an additional CSV file with image-level metadata, which includes content-related fields, such as Iconclass codes or image descriptions, as well as formal annotations, such as artist, license, or creation year.
In addition to a zip containing the dataset images, we provide links to their source collections in the metadata file and a Python script to conveniently download the artwork images (`download_imgs.py`).
The mapping between the `images` array of the `annotations.json` and the `metadata.csv` file can be accomplished via the `file_name` attribute of the elements of the `images` array and the unique `File Name` column of the `metadata.csv` file, respectively.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Synthetically Spoken COCO
Version 1.0
This dataset contain synthetically generated spoken versions of MS COCO [1] captions. This
dataset was created as part the research reported in [5].
The speech was generated using gTTS [2]. The dataset consists of the following files:
- dataset.json: Captions associated with MS COCO images. This information comes from [3].
- sentid.txt: List of caption IDs. This file can be used to locate MFCC features of the MP3 files
in the numpy array stored in dataset.mfcc.npy.
- mp3.tgz: MP3 files with the audio. Each file name corresponds to caption ID in dataset.json
and in sentid.txt.
- dataset.mfcc.npy: Numpy array with the Mel Frequence Cepstral Coefficients extracted from
the audio. Each row corresponds to a caption. The order or the captions corresponds to the
ordering in the file sentid.txt. MFCCs were extracted using [4].
[1] http://mscoco.org/dataset/#overview
[2] https://pypi.python.org/pypi/gTTS
[3] https://github.com/karpathy/neuraltalk
[4] https://github.com/jameslyons/python_speech_features
[5] https://arxiv.org/abs/1702.01991
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is Part 1/2 of the ActiveHuman dataset! Part 2 can be found here.
Dataset Description
ActiveHuman was generated using Unity's Perception package.
It consists of 175428 RGB images and their semantic segmentation counterparts taken at different environments, lighting conditions, camera distances and angles. In total, the dataset contains images for 8 environments, 33 humans, 4 lighting conditions, 7 camera distances (1m-4m) and 36 camera angles (0-360 at 10-degree intervals).
The dataset does not include images at every single combination of available camera distances and angles, since for some values the camera would collide with another object or go outside the confines of an environment. As a result, some combinations of camera distances and angles do not exist in the dataset.
Alongside each image, 2D Bounding Box, 3D Bounding Box and Keypoint ground truth annotations are also generated via the use of Labelers and are stored as a JSON-based dataset. These Labelers are scripts that are responsible for capturing ground truth annotations for each captured image or frame. Keypoint annotations follow the COCO format defined by the COCO keypoint annotation template offered in the perception package.
Folder configuration
The dataset consists of 3 folders:
Essential Terminology
Dataset Data
The dataset includes 4 types of JSON annotation files files:
Most Labelers generate different annotation specifications in the spec key-value pair:
Each Labeler generates different annotation specifications in the values key-value pair:
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
The original Udacity Self Driving Car Dataset is missing labels for thousands of pedestrians, bikers, cars, and traffic lights. This will result in poor model performance. When used in the context of self driving cars, this could even lead to human fatalities.
We re-labeled the dataset to correct errors and omissions. We have provided convenient downloads in many formats including VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and more.
Some examples of labels missing from the original dataset:
https://i.imgur.com/A5J3qSt.jpg" alt="Examples of Missing Labels">
The dataset contains 97,942 labels across 11 classes and 15,000 images. There are 1,720 null examples (images with no labels).
All images are 1920x1200 (download size ~3.1 GB). We have also provided a version downsampled to 512x512 (download size ~580 MB) that is suitable for most common machine learning models (including YOLO v3, Mask R-CNN, SSD, and mobilenet).
Annotations have been hand-checked for accuracy by Roboflow.
https://i.imgur.com/bOFkueI.pnghttps://" alt="Class Balance">
Annotation Distribution:
https://i.imgur.com/NwcrQKK.png" alt="Annotation Heatmap">
Udacity is building an open source self driving car! You might also try using this dataset to do person-detection and tracking.
Our updates to the dataset are released under the MIT License (the same license as the original annotations and images).
Note: the dataset contains many duplicated bounding boxes for the same subject which we have not corrected. You will probably want to filter them by taking the IOU for classes that are 100% overlapping or it could affect your model performance (expecially in stoplight detection which seems to suffer from an especially severe case of duplicated bounding boxes).
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
TexBiG (from the German Text-Bild-Gefüge, meaning Text-Image-Structure) is a document layout analysis dataset for historical documents in the late 19th and early 20th century. The dataset provides instance segmentation (bounding boxes and polygons/masks) annotations for 19 different classes with more then 52.000 instances. Annotations are manually annotated by experts and evaluated with Krippendorff's Alpha, for each document image are least two different annotators have labeled the document. The dataset uses the common COCO-JSON format.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Pascal VOC 2012 is common benchmark for object detection. It contains common objects that one might find in images on the web.
https://i.imgur.com/y2sB9fD.png" alt="Image example">
Note: the test set is witheld, as is common with benchmark datasets.
You can think of it sort of like a baby COCO.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Yolo To Coco Json is a dataset for object detection tasks - it contains Objects annotations for 1,954 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).