19 datasets found

P
COCO (Common Objects in Context) Dataset
paperswithcode.com
Updated Dec 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). COCO (Common Objects in Context) Dataset [Dataset]. https://paperswithcode.com/dataset/coco
Explore at:
Dataset updated
Dec 10, 2023
Description
The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.
a
MS Common Objects in Context (COCO2014)
academictorrents.com
bittorrent
Updated Oct 9, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MS Common Objects in Context (COCO2014) [Dataset]. https://academictorrents.com/details/f993c01f3c268b5d57219a38f8ec73ee7524421a
Explore at:
bittorrent(26815885986)Available download formats
Dataset updated
Oct 9, 2015
Dataset authored and provided by
Microsoft
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
Microsoft COCO is a new image recognition, segmentation, and captioning dataset. Microsoft COCO has several features: Object segmentation Recognition in Context Multiple objects per image More than 300,000 images More than 2 Million instances 80 object categories 5 captions per image The 2014 Testing Images are for the MS COCO Captioning Challenge, while the 2015 Testing Images are for the MS COCO Detection Challenge. The train and val data are common to both challenges. Note also that as an alternative to downloading the large image zip files, individual images may be downloaded from the COCO website using the "coco_url" field specified in the image info struct.
g
COCO Dataset 2017
gts.ai
json
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
GTS, COCO Dataset 2017 [Dataset]. https://gts.ai/dataset-download/coco-dataset-2017/
Explore at:
jsonAvailable download formats
Dataset provided by
GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
Authors
GTS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.
R
Microsoft Coco Dataset
universe.roboflow.com
zip
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
Explore at:
zipAvailable download formats
Dataset updated
Apr 4, 2025
Dataset authored and provided by
Microsoft
Variables measured
Object Bounding Boxes
Description
Microsoft Common Objects in Context (COCO) Dataset

The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

An introduction to the COCO dataset

Weird images in COCO, and what that tells us about the utility and limits of COCO
COCO 2014 test
kaggle.com
zip
Updated Jul 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
akashdeepjassal (2019). COCO 2014 test [Dataset]. https://www.kaggle.com/akashdeepjassal/coco-2014-test
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Jul 23, 2019
Authors
akashdeepjassal
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
What is COCO?

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:

MS-COCO website
P
COCO-Noisy Dataset
paperswithcode.com
Updated Mar 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng (2024). COCO-Noisy Dataset [Dataset]. https://paperswithcode.com/dataset/coco-noisy
Explore at:
Dataset updated
Mar 24, 2024
Authors
Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng
Description
This dataset is based on MS COCO that have 20% of data randomly shuffled to simulate noisy correspondence.
h
COCO
huggingface.co
datasets.activeloop.ai
Updated Feb 6, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HuggingFaceM4 (2023). COCO [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/COCO
Explore at:
Dataset updated
Feb 6, 2023
Dataset authored and provided by
HuggingFaceM4
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.
COCO, LVIS, Open Images V4 classes mapping
zenodo.org
bin, csv, txt
Updated Oct 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo (2022). COCO, LVIS, Open Images V4 classes mapping [Dataset]. http://doi.org/10.5281/zenodo.7194300
Explore at:
csv, txt, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7194300
Dataset updated
Oct 13, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.

COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.

We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet

This repository contains the following files:

coco_classes_map.txt, contains the mapping for the 80 coco classes

lvis_classes_map.txt, contains the mapping for the 1460 coco classes

openimages_classes_map.txt, contains the mapping for the 601 coco classes

classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy

all-classnames.xlsx, contains a side-by-side view of all classes considered

This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022] that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet [Zhang et al. 2021] (trained on COCO dataset), Mask R-CNN [He et al. 2017] (trained on LVIS), and a Faster R-CNN+Inception ResNet (trained on the Open Images V4).

This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form:

@inproceedings{amato2021visione, title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio}, journal={Journal of Imaging}, volume={7}, number={5}, pages={76}, year={2021}, publisher={Multidisciplinary Digital Publishing Institute} }

References:

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_52

[Amato et al. 2021] Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L. and Vairo, C., 2021. The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging, 7(5), p.76.

[Gupta et al.2019] Gupta, A., Dollar, P. and Girshick, R., 2019. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).
h
COCO-Caption
huggingface.co
Updated Sep 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LMMs-Lab (2024). COCO-Caption [Dataset]. https://huggingface.co/datasets/lmms-lab/COCO-Caption
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 1, 2024
Dataset authored and provided by
LMMs-Lab
Description
Large-scale Multi-modality Models Evaluation Suite

Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

This Dataset

This is a formatted version of COCO-Caption-2014-version. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @misc{lin2015microsoft, title={Microsoft COCO: Common Objects in Context}… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/COCO-Caption.
f
Activities of Daily Living Object Dataset
figshare.com
bin
Updated Nov 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md Tanzil Shahria; Mohammad H Rahman (2024). Activities of Daily Living Object Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27263424.v3
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27263424.v3
Dataset updated
Nov 28, 2024
Dataset provided by
figshare
Authors
Md Tanzil Shahria; Mohammad H Rahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Activities of Daily Living Object DatasetOverviewThe ADL (Activities of Daily Living) Object Dataset is a curated collection of images and annotations specifically focusing on objects commonly interacted with during daily living activities. This dataset is designed to facilitate research and development in assistive robotics in home environments.Data Sources and LicensingThe dataset comprises images and annotations sourced from four publicly available datasets:COCO DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), 740–755.Open Images DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2020). The Open Images Dataset V6: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision, 128(7), 1956–1981.LVIS DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5356–5364.Roboflow UniverseLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation: The following repositories from Roboflow Universe were used in compiling this dataset:Work, U. AI Based Automatic Stationery Billing System Data Dataset. 2022. Accessible at: https://universe.roboflow.com/university-work/ai-based-automatic-stationery-billing-system-data (accessed on 11 October 2024).Destruction, P.M. Pencilcase Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/pencilcase-se7nb (accessed on 11 October 2024).Destruction, P.M. Final Project Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/final-project-wsuvj (accessed on 11 October 2024).Personal. CSST106 Dataset. 2024. Accessible at: https://universe.roboflow.com/personal-pgkq6/csst106 (accessed on 11 October 2024).New-Workspace-kubz3. Pencilcase Dataset. 2022. Accessible at: https://universe.roboflow.com/new-workspace-kubz3/pencilcase-s9ag9 (accessed on 11 October 2024).Finespiralnotebook. Spiral Notebook Dataset. 2024. Accessible at: https://universe.roboflow.com/finespiralnotebook/spiral_notebook (accessed on 11 October 2024).Dairymilk. Classmate Dataset. 2024. Accessible at: https://universe.roboflow.com/dairymilk/classmate (accessed on 11 October 2024).Dziubatyi, M. Domace Zadanie Notebook Dataset. 2023. Accessible at: https://universe.roboflow.com/maksym-dziubatyi/domace-zadanie-notebook (accessed on 11 October 2024).One. Stationery Dataset. 2024. Accessible at: https://universe.roboflow.com/one-vrmjr/stationery-mxtt2 (accessed on 11 October 2024).jk001226. Liplip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/liplip (accessed on 11 October 2024).jk001226. Lip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/lip-uteep (accessed on 11 October 2024).Upwork5. Socks3 Dataset. 2022. Accessible at: https://universe.roboflow.com/upwork5/socks3 (accessed on 11 October 2024).Book. DeskTableLamps Material Dataset. 2024. Accessible at: https://universe.roboflow.com/book-mxasl/desktablelamps-material-rjbgd (accessed on 11 October 2024).Gary. Medicine Jar Dataset. 2024. Accessible at: https://universe.roboflow.com/gary-ofgwc/medicine-jar (accessed on 11 October 2024).TEST. Kolmarbnh Dataset. 2023. Accessible at: https://universe.roboflow.com/test-wj4qi/kolmarbnh (accessed on 11 October 2024).Tube. Tube Dataset. 2024. Accessible at: https://universe.roboflow.com/tube-nv2vt/tube-9ah9t (accessed on 11 October 2024). Staj. Canned Goods Dataset. 2024. Accessible at: https://universe.roboflow.com/staj-2ipmz/canned-goods-isxbi (accessed on 11 October 2024).Hussam, M. Wallet Dataset. 2024. Accessible at: https://universe.roboflow.com/mohamed-hussam-cq81o/wallet-sn9n2 (accessed on 14 October 2024).Training, K. Perfume Dataset. 2022. Accessible at: https://universe.roboflow.com/kdigital-training/perfume (accessed on 14 October 2024).Keyboards. Shoe-Walking Dataset. 2024. Accessible at: https://universe.roboflow.com/keyboards-tjtri/shoe-walking (accessed on 14 October 2024).MOMO. Toilet Paper Dataset. 2024. Accessible at: https://universe.roboflow.com/momo-nutwk/toilet-paper-wehrw (accessed on 14 October 2024).Project-zlrja. Toilet Paper Detection Dataset. 2024. Accessible at: https://universe.roboflow.com/project-zlrja/toilet-paper-detection (accessed on 14 October 2024).Govorkov, Y. Highlighter Detection Dataset. 2023. Accessible at: https://universe.roboflow.com/yuriy-govorkov-j9qrv/highlighter_detection (accessed on 14 October 2024).Stock. Plum Dataset. 2024. Accessible at: https://universe.roboflow.com/stock-qxdzf/plum-kdznw (accessed on 14 October 2024).Ibnu. Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/ibnu-h3cda/avocado-g9fsl (accessed on 14 October 2024).Molina, N. Detection Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/norberto-molina-zakki/detection-avocado (accessed on 14 October 2024).in Lab, V.F. Peach Dataset. 2023. Accessible at: https://universe.roboflow.com/vietnam-fruit-in-lab/peach-ejdry (accessed on 14 October 2024).Group, K. Tomato Detection 4 Dataset. 2023. Accessible at: https://universe.roboflow.com/kkabs-group-dkcni/tomato-detection-4 (accessed on 14 October 2024).Detection, M. Tomato Checker Dataset. 2024. Accessible at: https://universe.roboflow.com/money-detection-xez0r/tomato-checker (accessed on 14 October 2024).University, A.S. Smart Cam V1 Dataset. 2023. Accessible at: https://universe.roboflow.com/ain-shams-university-byja6/smart_cam_v1 (accessed on 14 October 2024).EMAD, S. Keysdetection Dataset. 2023. Accessible at: https://universe.roboflow.com/shehab-emad-n2q9i/keysdetection (accessed on 14 October 2024).Roads. Chips Dataset. 2024. Accessible at: https://universe.roboflow.com/roads-rvmaq/chips-a0us5 (accessed on 14 October 2024).workspace bgkzo, N. Object Dataset. 2021. Accessible at: https://universe.roboflow.com/new-workspace-bgkzo/object-eidim (accessed on 14 October 2024).Watch, W. Wrist Watch Dataset. 2024. Accessible at: https://universe.roboflow.com/wrist-watch/wrist-watch-0l25c (accessed on 14 October 2024).WYZUP. Milk Dataset. 2024. Accessible at: https://universe.roboflow.com/wyzup/milk-onbxt (accessed on 14 October 2024).AussieStuff. Food Dataset. 2024. Accessible at: https://universe.roboflow.com/aussiestuff/food-al9wr (accessed on 14 October 2024).Almukhametov, A. Pencils Color Dataset. 2023. Accessible at: https://universe.roboflow.com/almas-almukhametov-hs5jk/pencils-color (accessed on 14 October 2024).All images and annotations obtained from these datasets are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits sharing and adaptation of the material in any medium or format, for any purpose, even commercially, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.Redistribution Permission:As all images and annotations are under the CC BY 4.0 license, we are legally permitted to redistribute this data within our dataset. We have complied with the license terms by:Providing appropriate attribution to the original creators.Including links to the CC BY 4.0 license.Indicating any changes made to the original material.Dataset StructureThe dataset includes:Images: High-quality images featuring ADL objects suitable for robotic manipulation.Annotations: Bounding boxes and class labels formatted in the YOLO (You Only Look Once) Darknet format.ClassesThe dataset focuses on objects commonly involved in daily living activities. A full list of object classes is provided in the classes.txt file.FormatImages: JPEG format.Annotations: Text files corresponding to each image, containing bounding box coordinates and class labels in YOLO Darknet format.How to Use the DatasetDownload the DatasetUnpack the Datasetunzip ADL_Object_Dataset.zipHow to Cite This DatasetIf you use this dataset in your research, please cite our paper:@article{shahria2024activities, title={Activities of Daily Living Object Dataset: Advancing Assistive Robotic Manipulation with a Tailored Dataset}, author={Shahria, Md Tanzil and Rahman, Mohammad H.}, journal={Sensors}, volume={24}, number={23}, pages={7566}, year={2024}, publisher={MDPI}}LicenseThis dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License Link: https://creativecommons.org/licenses/by/4.0/By using this dataset, you agree to provide appropriate credit, indicate if changes were made, and not impose additional restrictions beyond those of the original licenses.AcknowledgmentsWe gratefully acknowledge the use of data from the following open-source datasets, which were instrumental in the creation of our specialized ADL object dataset:COCO Dataset: We thank the creators and contributors of the COCO dataset for making their images and annotations publicly available under the CC BY 4.0 license.Open Images Dataset: We express our gratitude to the Open Images team for providing a comprehensive dataset of annotated images under the CC BY 4.0 license.LVIS Dataset: We appreciate the efforts of the LVIS dataset creators for releasing their extensive dataset under the CC BY 4.0 license.Roboflow Universe:
m
GF-FRCNN MSCOCO
data.mendeley.com
Updated Jul 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MD SHAMIM HOSSAIN (2024). GF-FRCNN MSCOCO [Dataset]. http://doi.org/10.17632/sf238jg557.1
Explore at:
Unique identifier
https://doi.org/10.17632/sf238jg557.1
Dataset updated
Jul 1, 2024
Authors
MD SHAMIM HOSSAIN
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Geometric features play an important role in image captioning. Nowadays, many researchers focus on objects' geometric aspects and their interrelationships. However, all possible geometric features of MSCOCO (Lin et al., 2014) are not publicly available in an integrated form. To address this gap, a dataset named "GF-FRCNN MSCOCO" has been created, which stands for Geometric Features Extracted from 36 Bounding Boxes (Anderson et al., 2018) of Faster R-CNN (Ren et al., 2015) for each image in the MSCOCO (Lin et al., 2014) Image Captioning Dataset. This dataset, containing all the possible geometric features extracted from 123,287 images in the MSCOCO image captioning dataset (Lin et al., 2014) , provides essential spatial information about each object.

To ensure scale-invariant models that generalize across different image sizes, we normalized the bounding box features relative to image dimensions. This involves computing relative values for the top-left and bottom-right coordinates, width, height, area, and center coordinates of the bounding box. Additional derived features, such as the perimeter, diagonal length, margins from the image edges, and distance to the image center, are also normalized. These relative features are then aggregated into a comprehensive set and clipped to a range of [0, 1] to ensure valid values. The normalized and clipped features provide a robust representation of the geometric properties of objects within the images. For more details about the dataset structure and the content of each folder, please refer to the readme.pdf file included with the dataset.

References Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster r-cnn: Towards real time object detection with region proposal networks. Advances in neural information processing systems 28.

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014. Microsoft coco: Common objects in context, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer. pp. 740–755.

Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L., 2018. Bottom-up and top-down attention for image captioning and visual question answering, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6077–6086.
Z
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...
data.niaid.nih.gov
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Amato (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from MVK Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8355036
Explore at:
Dataset updated
Jan 25, 2024
Dataset provided by
Giuseppe Amato
Paolo Bolettieri
Claudio Gennaro
Lucia Vadicamo
Fabio Carrara
Nicola Messina
Fabrizio Falchi
Claudio Vairo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 1,372 video files. We divided each video into 1 second segments.

This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

This repository comprises the following files:

msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.

features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the 1s video segments.

objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

"object_class_names": vector with the class name of each detected object.

"object_scores": scores corresponding to each detected object.

"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

References:

[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...
zenodo.org
application/gzip
Updated Feb 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from V3C1+V3C2 Dataset [Dataset]. http://doi.org/10.5281/zenodo.8188570
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.8188570
Dataset updated
Feb 12, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a diverse set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/.

It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments.

We subdivided the predefined video segments longer than 10 seconds into multiple segments, with each segment spanning no longer than 16 seconds. As a result, we obtained a total of 2,648,219 segments. For each segment, we extracted one frame, specifically the middle one, and computed several features, which are described in detail below.

This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

This repository comprises the following files:

msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original V3C videos available.

features-aladin.tar.gz^† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

features-clip-laion.tar.gz^† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

features-clip-openai.tar.gz^† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

features-clip2video.tar.gz^† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments. In particular 1) we concatenate consecutive short segments so to create segments at least 3 seconds long; 2) we downsample the obtained segments to 2.5 fps; 3) we feed the network with the first min(36, n) frames, where n is the number of frames of the segment. Notice that the minimum processed length consists of 7 frames, given that the segment is no shorter than 3s.

objects-frcnn-oiv4.tar.gz^* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

objects-mrcnn-lvis.tar.gz^* contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).

objects-vfnet64-coco.tar.gz^* contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).

*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

"object_class_names": vector with the class name of each detected object.

"object_scores": scores corresponding to each detected object.

"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

^†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the V3C1+V3C2 dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used. Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

References:

[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).
Z
VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...
data.niaid.nih.gov
zenodo.org
Updated Jan 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bolettieri, Paolo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from VBSLHE Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10013328
Explore at:
Dataset updated
Jan 25, 2024
Dataset provided by
Claudio Gennaro
Lucia Vadicamo
Fabio Carrara
Nicola Messina
Amato, Giuseppe
Fabrizio Falchi
Claudio Vairo
Bolettieri, Paolo
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.

This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

@inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

This repository (v2) comprises the following files:

msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.

features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments.

objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

*Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

*Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

"object_class_names": vector with the class name of each detected object.

"object_scores": scores corresponding to each detected object.

"object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

†Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

References:

[Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

[Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

[Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

[He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

[Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

[Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

[Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

[Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

[Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

[Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV
SiDroForest: Synthetic Siberian Larch Tree Crown Dataset of 10.000 instances...
doi.pangaea.de
html, tsv
Updated Jun 18, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Femke van Geffen; Frederic Brieger; Luidmila A Pestryakova; Ulrike Herzschuh; Stefan Kruse; Evgenii S Zakharov (2021). SiDroForest: Synthetic Siberian Larch Tree Crown Dataset of 10.000 instances in the Microsoft's Common Objects in Context dataset (coco) format [Dataset]. http://doi.org/10.1594/PANGAEA.932795
Explore at:
tsv, htmlAvailable download formats
Unique identifier
https://doi.org/10.1594/PANGAEA.932795
Dataset updated
Jun 18, 2021
Dataset provided by
PANGAEA
Authors
Femke van Geffen; Frederic Brieger; Luidmila A Pestryakova; Ulrike Herzschuh; Stefan Kruse; Evgenii S Zakharov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 25, 2018 - Aug 21, 2018
Area covered

Variables measured
Binary Object, Binary Object (MD5 Hash), Binary Object (File Size)
Description
This synthetic Siberian Larch tree crown dataset was created for upscaling and machine learning purposes as a part of the SiDroForest (Siberia Drone Forest Inventory) project. The SiDroForest data collection (https://www.pangaea.de/?q=keyword%3A%22SiDroForest%22) consists of vegetation plots covered in Siberia during a 2-month fieldwork expedition in 2018 by the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research in Germany. During fieldwork fifty-six, 50*50-meter vegetation plots were covered by Unmanned Aerial Vehicle (UAV) flights and Red Green Blue (RGB) and Red Green Near Infrared (RGNIR) photographs were taken with a consumer grade DJI Phantom 4 quadcopter. The synthetic dataset provided here contains Larch (Larix gmelinii (Rupr.) Rupr. and Larix cajanderi Mayr.) tree crowns extracted from the onboard camera RGB UAV images of five selected vegetation plots from this expedition, placed on top of full-resized images from the same RGB flights. […]
P
SuperAnimal-Quadruped Dataset
paperswithcode.com
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shaokai Ye; Anastasiia Filippova; Jessy Lauer; Steffen Schneider; Maxime Vidal; Tian Qiu; Alexander Mathis; Mackenzie Weygandt Mathis, SuperAnimal-Quadruped Dataset [Dataset]. https://paperswithcode.com/dataset/superanimal-quadruped
Explore at:
Authors
Shaokai Ye; Anastasiia Filippova; Jessy Lauer; Steffen Schneider; Maxime Vidal; Tian Qiu; Alexander Mathis; Mackenzie Weygandt Mathis
Description
Introduction This dataset supports Ye et al. 2024 Nature Communications.

Ye, S., Filippova, A., Lauer, J. et al. SuperAnimal pretrained pose estimation models for behavioral analysis. Nat Commun 15, 5165 (2024). https://doi.org/10.1038/s41467-024-48792-2

Please cite this dataset and paper if you use this resource. Please also see Ye et al. 2024 for the full DataSheet accompanying this download, including the metadata for how to use this data if you want to compare model results on benchmark tasks. Below is just a summary. Also see the dataset licensing below.

Training Data It consists of being trained together on the following datasets:

AwA-Pose Quadruped dataset, see full details at (1). AnimalPose See full details at (2). AcinoSet See full details at (3). Horse-30 Horse-30 dataset, benchmark task is called Horse-10; See full details at (4). StanfordDogs See full details at (5, 6). AP-10K See full details at (7). iRodent We utilized the iNaturalist API functions for scraping observations with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10), pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392. APT-36K See full details at (12). Here is an image with a keypoint guide.

Ethical Considerations • No experimental data was collected for this model; all datasets used are cited above.

Caveats and Recommendations • Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2024 for our Supplementary Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance of a model trained on this data is not as good as you need it to be, first try video adaptation (see Ye et al. 2024), or fine-tune the weights with your own labeling.

License Modified MIT.

Copyright 2023-present by Mackenzie Mathis, Shaokai Ye, and contributors.

Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive, and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”) to use the "DATASET" subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software:

This data or resulting software may not be used to harm any animal deliberately.

LICENSEE acknowledges that the DATASET is a research tool. THE DATASET IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.

If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis (mackenzie@post.harvard.edu) for a commercial use license.

Please cite Ye et al 2024 if you use this DATASET in your work.

References Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021

Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.

Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset: A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13901–13908, 2021.

Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1859–1868, 2021.

Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.

Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014

Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022
Z
Toloka Visual Question Answering Dataset
data.niaid.nih.gov
Updated Oct 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ustalov, Dmitry (2023). Toloka Visual Question Answering Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057740
Explore at:
Dataset updated
Oct 10, 2023
Dataset authored and provided by
Ustalov, Dmitry
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/.

Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset was available for everyone since the start of the challenge. The public test dataset was available since the evaluation phase of the competition, but without any ground truth labels. After the end of the competition, public and private sets were released.

The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns.

Column Type Description image string URL of an image on a public content delivery network width integer image width height integer image height left integer bounding box coordinate: left top integer bounding box coordinate: top right integer bounding box coordinate: right bottom integer bounding box coordinate: bottom question string question in English

This upload also contains a ZIP file with the images from MS COCO.
BePLi Dataset v2: Beach Plastic Litter Dataset version 2
seanoe.org
bin
Updated Dec 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hidaka Mitsuko; Murakami Koshiro; Kawahara Shintaro; Sugiyama Daisuke; Nakagawa Yujin; Kako Shin’ichiro; Matsuoka Daisuke (2023). BePLi Dataset v2: Beach Plastic Litter Dataset version 2 [Dataset]. http://doi.org/10.17882/96963
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.17882/96963
Dataset updated
Dec 1, 2023
Dataset provided by
SEANOE
Authors
Hidaka Mitsuko; Murakami Koshiro; Kawahara Shintaro; Sugiyama Daisuke; Nakagawa Yujin; Kako Shin’ichiro; Matsuoka Daisuke
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Time period covered
Dec 31, 2010 - Dec 31, 2018
Area covered

Description
the beach plastic litter dataset version 2: bepli dataset v2 is an updated version of bepli dataset v1. the bepli dataset v2 comprises 3722 original beach plastic litter images and corresponding instance segmentation annotations that were manually annotated. the annotations are provided as ms coco (microsoft common objects in context) object detection task formatted json file. plastic litter object classes are defined as 13 classes: "pet_bottle", "other_bottle", "plastic_bag", "box_shaped_case", "other_container", "rope", "other_string", "fishing_net", "buoy", "other_fishing_gear", "styrene_foam", "others", "fragment". all target objects are made from plastics. this dataset employed rle encoding for the instance segmentation for a single object and all the annotations are given for single objects.
SuperAnimal-Quadruped-80K
zenodo.org
application/gzip
Updated Nov 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zenodo (2024). SuperAnimal-Quadruped-80K [Dataset]. http://doi.org/10.5281/zenodo.14016777
Explore at:
application/gzipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14016777
Dataset updated
Nov 1, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Time period covered
Jun 9, 2024
Description
Introduction

This dataset supports Ye et al. 2024 Nature Communications. Please cite this dataset and paper if you use this resource. Please also see Ye et al. 2024 for the full DataSheet that accompanies this download, including the meta data for how to use this data is you want to compare model results on benchmark tasks. Below is just a summary. Also see the dataset licensing below.

Training Data

It consists of being trained together on the following datasets:

AwA-Pose Quadruped dataset, see full details at (1).

AnimalPose See full details at (2).

AcinoSet See full details at (3).

Horse-30 Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).

StanfordDogs See full details at (5, 6).

AP-10K See full details at (7).

iRodent We utilized the iNaturalist API functions for scraping observations with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10), pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392.

APT-36K See full details at (12).

https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" target="_blank" rel="noopener">Here is an image with a keypoint guide.

Ethical Considerations

• No experimental data was collected for this model; all datasets used are cited above.

Caveats and Recommendations

• Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2024 for our Supplementary Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance of a model trained on this data is not as good as you need it to be, first try video adaptation (see Ye et al. 2024), or fine-tune the weights with your own labeling.

License

Modified MIT.

Copyright 2023-present by Mackenzie Mathis, Shaokai Ye, and contributors.

Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
to use the "DATASET" subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial
portions of the Software:

This data or resulting software may not be used to harm any animal deliberately.

LICENSEE acknowledges that the DATASET is a research tool.
THE DATASET IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.

If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis
(mackenzie@post.harvard.edu) for a commercial use license.

Please cite Ye et al if you use this DATASET in your work.

References

Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021

Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.

Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset: A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13901–13908, 2021.

Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1859–1868, 2021.

Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.

Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.

Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014

Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022

Versioning Note:

- V2 includes fixes to Stanford Dog data; it affected less than 1% of the data.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

(2023). COCO (Common Objects in Context) Dataset [Dataset]. https://paperswithcode.com/dataset/coco

COCO (Common Objects in Context) Dataset

Common Objects in Context

Explore at:

42 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Dec 10, 2023

Description

The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.

Clear search

Close search

Google apps

Main menu

COCO (Common Objects in Context) Dataset

MS Common Objects in Context (COCO2014)

COCO Dataset 2017

Microsoft Coco Dataset

Microsoft Common Objects in Context (COCO) Dataset

COCO 2014 test

What is COCO?

COCO-Noisy Dataset

COCO

COCO, LVIS, Open Images V4 classes mapping

COCO-Caption

Activities of Daily Living Object Dataset

GF-FRCNN MSCOCO

VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

SiDroForest: Synthetic Siberian Larch Tree Crown Dataset of 10.000 instances...

SuperAnimal-Quadruped Dataset

Toloka Visual Question Answering Dataset

BePLi Dataset v2: Beach Plastic Litter Dataset version 2

SuperAnimal-Quadruped-80K

Introduction

Training Data

Ethical Considerations

Caveats and Recommendations

License

References

COCO (Common Objects in Context) Dataset

Common Objects in Context