19 datasets found
  1. P

    COCO (Common Objects in Context) Dataset

    • paperswithcode.com
    Updated Dec 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). COCO (Common Objects in Context) Dataset [Dataset]. https://paperswithcode.com/dataset/coco
    Explore at:
    Dataset updated
    Dec 10, 2023
    Description

    The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.

  2. a

    MS Common Objects in Context (COCO2014)

    • academictorrents.com
    bittorrent
    Updated Oct 9, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MS Common Objects in Context (COCO2014) [Dataset]. https://academictorrents.com/details/f993c01f3c268b5d57219a38f8ec73ee7524421a
    Explore at:
    bittorrent(26815885986)Available download formats
    Dataset updated
    Oct 9, 2015
    Dataset authored and provided by
    Microsoft
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    Microsoft COCO is a new image recognition, segmentation, and captioning dataset. Microsoft COCO has several features: Object segmentation Recognition in Context Multiple objects per image More than 300,000 images More than 2 Million instances 80 object categories 5 captions per image The 2014 Testing Images are for the MS COCO Captioning Challenge, while the 2015 Testing Images are for the MS COCO Detection Challenge. The train and val data are common to both challenges. Note also that as an alternative to downloading the large image zip files, individual images may be downloaded from the COCO website using the "coco_url" field specified in the image info struct.

  3. g

    COCO Dataset 2017

    • gts.ai
    json
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GTS, COCO Dataset 2017 [Dataset]. https://gts.ai/dataset-download/coco-dataset-2017/
    Explore at:
    jsonAvailable download formats
    Dataset provided by
    GLOBOSE TECHNOLOGY SOLUTIONS PRIVATE LIMITED
    Authors
    GTS
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset.

  4. R

    Microsoft Coco Dataset

    • universe.roboflow.com
    zip
    Updated Apr 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 4, 2025
    Dataset authored and provided by
    Microsoft
    Variables measured
    Object Bounding Boxes
    Description

    Microsoft Common Objects in Context (COCO) Dataset

    The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

    While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

    The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

    The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

  5. COCO 2014 test

    • kaggle.com
    zip
    Updated Jul 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    akashdeepjassal (2019). COCO 2014 test [Dataset]. https://www.kaggle.com/akashdeepjassal/coco-2014-test
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Jul 23, 2019
    Authors
    akashdeepjassal
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    What is COCO?

    COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:

    MS-COCO website

  6. P

    COCO-Noisy Dataset

    • paperswithcode.com
    Updated Mar 24, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng (2024). COCO-Noisy Dataset [Dataset]. https://paperswithcode.com/dataset/coco-noisy
    Explore at:
    Dataset updated
    Mar 24, 2024
    Authors
    Zhenyu Huang; guocheng niu; Xiao Liu; Wenbiao Ding; Xinyan Xiao; Hua Wu; Xi Peng
    Description

    This dataset is based on MS COCO that have 20% of data randomly shuffled to simulate noisy correspondence.

  7. h

    COCO

    • huggingface.co
    • datasets.activeloop.ai
    Updated Feb 6, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2023). COCO [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/COCO
    Explore at:
    Dataset updated
    Feb 6, 2023
    Dataset authored and provided by
    HuggingFaceM4
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    MS COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features: Object segmentation, Recognition in context, Superpixel stuff segmentation, 330K images (>200K labeled), 1.5 million object instances, 80 object categories, 91 stuff categories, 5 captions per image, 250,000 people with keypoints.

  8. COCO, LVIS, Open Images V4 classes mapping

    • zenodo.org
    bin, csv, txt
    Updated Oct 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo (2022). COCO, LVIS, Open Images V4 classes mapping [Dataset]. http://doi.org/10.5281/zenodo.7194300
    Explore at:
    csv, txt, binAvailable download formats
    Dataset updated
    Oct 13, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a mapping between the classes of COCO, LVIS, and Open Images V4 datasets into a unique set of 1460 classes.

    COCO [Lin et al 2014] contains 80 classes, LVIS [gupta2019lvis] contains 1460 classes, Open Images V4 [Kuznetsova et al. 2020] contains 601 classes.

    We built a mapping of these classes using a semi-automatic procedure in order to have a unique final list of 1460 classes. We also generated a hierarchy for each class, using wordnet

    This repository contains the following files:

    • coco_classes_map.txt, contains the mapping for the 80 coco classes
    • lvis_classes_map.txt, contains the mapping for the 1460 coco classes
    • openimages_classes_map.txt, contains the mapping for the 601 coco classes
    • classname_hyperset_definition.csv, contains the final set of 1460 classes, their definition and hierarchy
    • all-classnames.xlsx, contains a side-by-side view of all classes considered

    This mapping was used in VISIONE [Amato et al. 2021, Amato et al. 2022] that is a content-based retrieval system that supports various search functionalities (text search, object/color-based search, semantic and visual similarity search, temporal search). For the object detection VISIONE uses three pre-trained models: VfNet [Zhang et al. 2021] (trained on COCO dataset), Mask R-CNN [He et al. 2017] (trained on LVIS), and a Faster R-CNN+Inception ResNet (trained on the Open Images V4).

    This is repository is released under a Creative Commons Attribution license, please cite the following paper if you use it in your work in any form:

    @inproceedings{amato2021visione,
     title={The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval},
     author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Debole, Franca and Falchi, Fabrizio and Gennaro, Claudio and Vadicamo, Lucia and Vairo, Claudio},
     journal={Journal of Imaging},
     volume={7},
     number={5},
     pages={76},
     year={2021},
     publisher={Multidisciplinary Digital Publishing Institute}
    }
    

    References:

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham. https://doi.org/10.1007/978-3-030-98355-0_52

    [Amato et al. 2021] Amato, G., Bolettieri, P., Carrara, F., Debole, F., Falchi, F., Gennaro, C., Vadicamo, L. and Vairo, C., 2021. The visione video search system: exploiting off-the-shelf text search engines for large-scale video retrieval. Journal of Imaging, 7(5), p.76.

    [Gupta et al.2019] Gupta, A., Dollar, P. and Girshick, R., 2019. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5356-5364).

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).

  9. h

    COCO-Caption

    • huggingface.co
    Updated Sep 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LMMs-Lab (2024). COCO-Caption [Dataset]. https://huggingface.co/datasets/lmms-lab/COCO-Caption
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 1, 2024
    Dataset authored and provided by
    LMMs-Lab
    Description

    Large-scale Multi-modality Models Evaluation Suite

    Accelerating the development of large-scale multi-modality models (LMMs) with lmms-eval

    🏠 Homepage | 📚 Documentation | 🤗 Huggingface Datasets

      This Dataset
    

    This is a formatted version of COCO-Caption-2014-version. It is used in our lmms-eval pipeline to allow for one-click evaluations of large multi-modality models. @misc{lin2015microsoft, title={Microsoft COCO: Common Objects in Context}… See the full description on the dataset page: https://huggingface.co/datasets/lmms-lab/COCO-Caption.

  10. f

    Activities of Daily Living Object Dataset

    • figshare.com
    bin
    Updated Nov 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md Tanzil Shahria; Mohammad H Rahman (2024). Activities of Daily Living Object Dataset [Dataset]. http://doi.org/10.6084/m9.figshare.27263424.v3
    Explore at:
    binAvailable download formats
    Dataset updated
    Nov 28, 2024
    Dataset provided by
    figshare
    Authors
    Md Tanzil Shahria; Mohammad H Rahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Activities of Daily Living Object DatasetOverviewThe ADL (Activities of Daily Living) Object Dataset is a curated collection of images and annotations specifically focusing on objects commonly interacted with during daily living activities. This dataset is designed to facilitate research and development in assistive robotics in home environments.Data Sources and LicensingThe dataset comprises images and annotations sourced from four publicly available datasets:COCO DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common Objects in Context. European Conference on Computer Vision (ECCV), 740–755.Open Images DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Duerig, T., & Ferrari, V. (2020). The Open Images Dataset V6: Unified Image Classification, Object Detection, and Visual Relationship Detection at Scale. International Journal of Computer Vision, 128(7), 1956–1981.LVIS DatasetLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation:Gupta, A., Dollar, P., & Girshick, R. (2019). LVIS: A Dataset for Large Vocabulary Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5356–5364.Roboflow UniverseLicense: Creative Commons Attribution 4.0 International (CC BY 4.0)License Link: https://creativecommons.org/licenses/by/4.0/Citation: The following repositories from Roboflow Universe were used in compiling this dataset:Work, U. AI Based Automatic Stationery Billing System Data Dataset. 2022. Accessible at: https://universe.roboflow.com/university-work/ai-based-automatic-stationery-billing-system-data (accessed on 11 October 2024).Destruction, P.M. Pencilcase Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/pencilcase-se7nb (accessed on 11 October 2024).Destruction, P.M. Final Project Dataset. 2023. Accessible at: https://universe.roboflow.com/project-mental-destruction/final-project-wsuvj (accessed on 11 October 2024).Personal. CSST106 Dataset. 2024. Accessible at: https://universe.roboflow.com/personal-pgkq6/csst106 (accessed on 11 October 2024).New-Workspace-kubz3. Pencilcase Dataset. 2022. Accessible at: https://universe.roboflow.com/new-workspace-kubz3/pencilcase-s9ag9 (accessed on 11 October 2024).Finespiralnotebook. Spiral Notebook Dataset. 2024. Accessible at: https://universe.roboflow.com/finespiralnotebook/spiral_notebook (accessed on 11 October 2024).Dairymilk. Classmate Dataset. 2024. Accessible at: https://universe.roboflow.com/dairymilk/classmate (accessed on 11 October 2024).Dziubatyi, M. Domace Zadanie Notebook Dataset. 2023. Accessible at: https://universe.roboflow.com/maksym-dziubatyi/domace-zadanie-notebook (accessed on 11 October 2024).One. Stationery Dataset. 2024. Accessible at: https://universe.roboflow.com/one-vrmjr/stationery-mxtt2 (accessed on 11 October 2024).jk001226. Liplip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/liplip (accessed on 11 October 2024).jk001226. Lip Dataset. 2024. Accessible at: https://universe.roboflow.com/jk001226/lip-uteep (accessed on 11 October 2024).Upwork5. Socks3 Dataset. 2022. Accessible at: https://universe.roboflow.com/upwork5/socks3 (accessed on 11 October 2024).Book. DeskTableLamps Material Dataset. 2024. Accessible at: https://universe.roboflow.com/book-mxasl/desktablelamps-material-rjbgd (accessed on 11 October 2024).Gary. Medicine Jar Dataset. 2024. Accessible at: https://universe.roboflow.com/gary-ofgwc/medicine-jar (accessed on 11 October 2024).TEST. Kolmarbnh Dataset. 2023. Accessible at: https://universe.roboflow.com/test-wj4qi/kolmarbnh (accessed on 11 October 2024).Tube. Tube Dataset. 2024. Accessible at: https://universe.roboflow.com/tube-nv2vt/tube-9ah9t (accessed on 11 October 2024). Staj. Canned Goods Dataset. 2024. Accessible at: https://universe.roboflow.com/staj-2ipmz/canned-goods-isxbi (accessed on 11 October 2024).Hussam, M. Wallet Dataset. 2024. Accessible at: https://universe.roboflow.com/mohamed-hussam-cq81o/wallet-sn9n2 (accessed on 14 October 2024).Training, K. Perfume Dataset. 2022. Accessible at: https://universe.roboflow.com/kdigital-training/perfume (accessed on 14 October 2024).Keyboards. Shoe-Walking Dataset. 2024. Accessible at: https://universe.roboflow.com/keyboards-tjtri/shoe-walking (accessed on 14 October 2024).MOMO. Toilet Paper Dataset. 2024. Accessible at: https://universe.roboflow.com/momo-nutwk/toilet-paper-wehrw (accessed on 14 October 2024).Project-zlrja. Toilet Paper Detection Dataset. 2024. Accessible at: https://universe.roboflow.com/project-zlrja/toilet-paper-detection (accessed on 14 October 2024).Govorkov, Y. Highlighter Detection Dataset. 2023. Accessible at: https://universe.roboflow.com/yuriy-govorkov-j9qrv/highlighter_detection (accessed on 14 October 2024).Stock. Plum Dataset. 2024. Accessible at: https://universe.roboflow.com/stock-qxdzf/plum-kdznw (accessed on 14 October 2024).Ibnu. Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/ibnu-h3cda/avocado-g9fsl (accessed on 14 October 2024).Molina, N. Detection Avocado Dataset. 2024. Accessible at: https://universe.roboflow.com/norberto-molina-zakki/detection-avocado (accessed on 14 October 2024).in Lab, V.F. Peach Dataset. 2023. Accessible at: https://universe.roboflow.com/vietnam-fruit-in-lab/peach-ejdry (accessed on 14 October 2024).Group, K. Tomato Detection 4 Dataset. 2023. Accessible at: https://universe.roboflow.com/kkabs-group-dkcni/tomato-detection-4 (accessed on 14 October 2024).Detection, M. Tomato Checker Dataset. 2024. Accessible at: https://universe.roboflow.com/money-detection-xez0r/tomato-checker (accessed on 14 October 2024).University, A.S. Smart Cam V1 Dataset. 2023. Accessible at: https://universe.roboflow.com/ain-shams-university-byja6/smart_cam_v1 (accessed on 14 October 2024).EMAD, S. Keysdetection Dataset. 2023. Accessible at: https://universe.roboflow.com/shehab-emad-n2q9i/keysdetection (accessed on 14 October 2024).Roads. Chips Dataset. 2024. Accessible at: https://universe.roboflow.com/roads-rvmaq/chips-a0us5 (accessed on 14 October 2024).workspace bgkzo, N. Object Dataset. 2021. Accessible at: https://universe.roboflow.com/new-workspace-bgkzo/object-eidim (accessed on 14 October 2024).Watch, W. Wrist Watch Dataset. 2024. Accessible at: https://universe.roboflow.com/wrist-watch/wrist-watch-0l25c (accessed on 14 October 2024).WYZUP. Milk Dataset. 2024. Accessible at: https://universe.roboflow.com/wyzup/milk-onbxt (accessed on 14 October 2024).AussieStuff. Food Dataset. 2024. Accessible at: https://universe.roboflow.com/aussiestuff/food-al9wr (accessed on 14 October 2024).Almukhametov, A. Pencils Color Dataset. 2023. Accessible at: https://universe.roboflow.com/almas-almukhametov-hs5jk/pencils-color (accessed on 14 October 2024).All images and annotations obtained from these datasets are released under the Creative Commons Attribution 4.0 International License (CC BY 4.0). This license permits sharing and adaptation of the material in any medium or format, for any purpose, even commercially, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated.Redistribution Permission:As all images and annotations are under the CC BY 4.0 license, we are legally permitted to redistribute this data within our dataset. We have complied with the license terms by:Providing appropriate attribution to the original creators.Including links to the CC BY 4.0 license.Indicating any changes made to the original material.Dataset StructureThe dataset includes:Images: High-quality images featuring ADL objects suitable for robotic manipulation.Annotations: Bounding boxes and class labels formatted in the YOLO (You Only Look Once) Darknet format.ClassesThe dataset focuses on objects commonly involved in daily living activities. A full list of object classes is provided in the classes.txt file.FormatImages: JPEG format.Annotations: Text files corresponding to each image, containing bounding box coordinates and class labels in YOLO Darknet format.How to Use the DatasetDownload the DatasetUnpack the Datasetunzip ADL_Object_Dataset.zipHow to Cite This DatasetIf you use this dataset in your research, please cite our paper:@article{shahria2024activities, title={Activities of Daily Living Object Dataset: Advancing Assistive Robotic Manipulation with a Tailored Dataset}, author={Shahria, Md Tanzil and Rahman, Mohammad H.}, journal={Sensors}, volume={24}, number={23}, pages={7566}, year={2024}, publisher={MDPI}}LicenseThis dataset is released under the Creative Commons Attribution 4.0 International License (CC BY 4.0).License Link: https://creativecommons.org/licenses/by/4.0/By using this dataset, you agree to provide appropriate credit, indicate if changes were made, and not impose additional restrictions beyond those of the original licenses.AcknowledgmentsWe gratefully acknowledge the use of data from the following open-source datasets, which were instrumental in the creation of our specialized ADL object dataset:COCO Dataset: We thank the creators and contributors of the COCO dataset for making their images and annotations publicly available under the CC BY 4.0 license.Open Images Dataset: We express our gratitude to the Open Images team for providing a comprehensive dataset of annotated images under the CC BY 4.0 license.LVIS Dataset: We appreciate the efforts of the LVIS dataset creators for releasing their extensive dataset under the CC BY 4.0 license.Roboflow Universe:

  11. m

    GF-FRCNN MSCOCO

    • data.mendeley.com
    Updated Jul 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    MD SHAMIM HOSSAIN (2024). GF-FRCNN MSCOCO [Dataset]. http://doi.org/10.17632/sf238jg557.1
    Explore at:
    Dataset updated
    Jul 1, 2024
    Authors
    MD SHAMIM HOSSAIN
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Geometric features play an important role in image captioning. Nowadays, many researchers focus on objects' geometric aspects and their interrelationships. However, all possible geometric features of MSCOCO (Lin et al., 2014) are not publicly available in an integrated form. To address this gap, a dataset named "GF-FRCNN MSCOCO" has been created, which stands for Geometric Features Extracted from 36 Bounding Boxes (Anderson et al., 2018) of Faster R-CNN (Ren et al., 2015) for each image in the MSCOCO (Lin et al., 2014) Image Captioning Dataset. This dataset, containing all the possible geometric features extracted from 123,287 images in the MSCOCO image captioning dataset (Lin et al., 2014) , provides essential spatial information about each object.

    To ensure scale-invariant models that generalize across different image sizes, we normalized the bounding box features relative to image dimensions. This involves computing relative values for the top-left and bottom-right coordinates, width, height, area, and center coordinates of the bounding box. Additional derived features, such as the perimeter, diagonal length, margins from the image edges, and distance to the image center, are also normalized. These relative features are then aggregated into a comprehensive set and clipped to a range of [0, 1] to ensure valid values. The normalized and clipped features provide a robust representation of the geometric properties of objects within the images. For more details about the dataset structure and the content of each folder, please refer to the readme.pdf file included with the dataset.

    References Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster r-cnn: Towards real time object detection with region proposal networks. Advances in neural information processing systems 28.

    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., 2014. Microsoft coco: Common objects in context, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, Springer. pp. 740–755.

    Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., Zhang, L., 2018. Bottom-up and top-down attention for image captioning and visual question answering, in: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6077–6086.

  12. Z

    VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • data.niaid.nih.gov
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from MVK Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8355036
    Explore at:
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Giuseppe Amato
    Paolo Bolettieri
    Claudio Gennaro
    Lucia Vadicamo
    Fabio Carrara
    Nicola Messina
    Fabrizio Falchi
    Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the marine video (underwater) dataset (MVK) . These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    We used a snapshot of the MVK dataset from 2023, that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 1,372 video files. We divided each video into 1 second segments.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

    This repository comprises the following files:

    msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

    extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original MVK videos available.

    features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

    features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

    features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

    features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the 1s video segments.

    objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

    objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

    objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    "object_class_names": vector with the class name of each detected object.

    "object_scores": scores corresponding to each detected object.

    "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    †Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the MVK dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV

  13. VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • zenodo.org
    application/gzip
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from V3C1+V3C2 Dataset [Dataset]. http://doi.org/10.5281/zenodo.8188570
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/.

    It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments.

    We subdivided the predefined video segments longer than 10 seconds into multiple segments, with each segment spanning no longer than 16 seconds. As a result, we obtained a total of 2,648,219 segments. For each segment, we extracted one frame, specifically the middle one, and computed several features, which are described in detail below.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, 
    title={VISIONE at Video Browser Showdown 2023}, 
    author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, 
    booktitle={International Conference on Multimedia Modeling}, 
    pages={615--621}, 
    year={2023}, 
    organization={Springer} 
    }

    This repository comprises the following files:

    • msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").
    • extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original V3C videos available.
    • features-aladin.tar.gz contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.
    • features-clip-laion.tar.gz contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.
    • features-clip-openai.tar.gz contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.
    • features-clip2video.tar.gz contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments. In particular 1) we concatenate consecutive short segments so to create segments at least 3 seconds long; 2) we downsample the obtained segments to 2.5 fps; 3) we feed the network with the first min(36, n) frames, where n is the number of frames of the segment. Notice that the minimum processed length consists of 7 frames, given that the segment is no shorter than 3s.
    • objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).
    • objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).
    • objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    • "object_class_names": vector with the class name of each detected object.
    • "object_scores": scores corresponding to each detected object.
    • "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the V3C1+V3C2 dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used. Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).

  14. Z

    VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jan 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bolettieri, Paolo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from VBSLHE Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10013328
    Explore at:
    Dataset updated
    Jan 25, 2024
    Dataset provided by
    Claudio Gennaro
    Lucia Vadicamo
    Fabio Carrara
    Nicola Messina
    Amato, Giuseppe
    Fabrizio Falchi
    Claudio Vairo
    Bolettieri, Paolo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the VBSLHE dataset (laparoscopic gynecology) . These features will be utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] in the next editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    We used a snapshot of the dataset provided by the Medical University of Vienna and Toronto that can be downloaded using the instructions provided at https://download-dbis.dmi.unibas.ch/mvk/. It comprises 75 video files. We divided each video into video shots with a maximum duration of 5 seconds.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, title={VISIONE at Video Browser Showdown 2023}, author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, booktitle={International Conference on Multimedia Modeling}, pages={615--621}, year={2023}, organization={Springer} }

    This repository (v2) comprises the following files:

    msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").

    extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original VBSLHE videos available.

    features-aladin.tar.gz† contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.

    features-clip-laion.tar.gz† contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.

    features-clip-openai.tar.gz† contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.

    features-clip2video.tar.gz† contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments.

    objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).

    objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN He et al. 2017.

    objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet Zhang et al. 2021.

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    "object_class_names": vector with the class name of each detected object.

    "object_scores": scores corresponding to each detected object.

    "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    †Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the VBSLHE dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used (see links above). Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CV

  15. SiDroForest: Synthetic Siberian Larch Tree Crown Dataset of 10.000 instances...

    • doi.pangaea.de
    html, tsv
    Updated Jun 18, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Femke van Geffen; Frederic Brieger; Luidmila A Pestryakova; Ulrike Herzschuh; Stefan Kruse; Evgenii S Zakharov (2021). SiDroForest: Synthetic Siberian Larch Tree Crown Dataset of 10.000 instances in the Microsoft's Common Objects in Context dataset (coco) format [Dataset]. http://doi.org/10.1594/PANGAEA.932795
    Explore at:
    tsv, htmlAvailable download formats
    Dataset updated
    Jun 18, 2021
    Dataset provided by
    PANGAEA
    Authors
    Femke van Geffen; Frederic Brieger; Luidmila A Pestryakova; Ulrike Herzschuh; Stefan Kruse; Evgenii S Zakharov
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 25, 2018 - Aug 21, 2018
    Area covered
    Variables measured
    Binary Object, Binary Object (MD5 Hash), Binary Object (File Size)
    Description

    This synthetic Siberian Larch tree crown dataset was created for upscaling and machine learning purposes as a part of the SiDroForest (Siberia Drone Forest Inventory) project. The SiDroForest data collection (https://www.pangaea.de/?q=keyword%3A%22SiDroForest%22) consists of vegetation plots covered in Siberia during a 2-month fieldwork expedition in 2018 by the Alfred Wegener Institute Helmholtz Centre for Polar and Marine Research in Germany. During fieldwork fifty-six, 50*50-meter vegetation plots were covered by Unmanned Aerial Vehicle (UAV) flights and Red Green Blue (RGB) and Red Green Near Infrared (RGNIR) photographs were taken with a consumer grade DJI Phantom 4 quadcopter. The synthetic dataset provided here contains Larch (Larix gmelinii (Rupr.) Rupr. and Larix cajanderi Mayr.) tree crowns extracted from the onboard camera RGB UAV images of five selected vegetation plots from this expedition, placed on top of full-resized images from the same RGB flights. […]

  16. P

    SuperAnimal-Quadruped Dataset

    • paperswithcode.com
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shaokai Ye; Anastasiia Filippova; Jessy Lauer; Steffen Schneider; Maxime Vidal; Tian Qiu; Alexander Mathis; Mackenzie Weygandt Mathis, SuperAnimal-Quadruped Dataset [Dataset]. https://paperswithcode.com/dataset/superanimal-quadruped
    Explore at:
    Authors
    Shaokai Ye; Anastasiia Filippova; Jessy Lauer; Steffen Schneider; Maxime Vidal; Tian Qiu; Alexander Mathis; Mackenzie Weygandt Mathis
    Description

    Introduction This dataset supports Ye et al. 2024 Nature Communications.

    Ye, S., Filippova, A., Lauer, J. et al. SuperAnimal pretrained pose estimation models for behavioral analysis. Nat Commun 15, 5165 (2024). https://doi.org/10.1038/s41467-024-48792-2

    Please cite this dataset and paper if you use this resource. Please also see Ye et al. 2024 for the full DataSheet accompanying this download, including the metadata for how to use this data if you want to compare model results on benchmark tasks. Below is just a summary. Also see the dataset licensing below.

    Training Data It consists of being trained together on the following datasets:

    AwA-Pose Quadruped dataset, see full details at (1). AnimalPose See full details at (2). AcinoSet See full details at (3). Horse-30 Horse-30 dataset, benchmark task is called Horse-10; See full details at (4). StanfordDogs See full details at (5, 6). AP-10K See full details at (7). iRodent We utilized the iNaturalist API functions for scraping observations with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10), pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392. APT-36K See full details at (12). Here is an image with a keypoint guide.

    Ethical Considerations • No experimental data was collected for this model; all datasets used are cited above.

    Caveats and Recommendations • Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2024 for our Supplementary Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance of a model trained on this data is not as good as you need it to be, first try video adaptation (see Ye et al. 2024), or fine-tune the weights with your own labeling.

    License Modified MIT.

    Copyright 2023-present by Mackenzie Mathis, Shaokai Ye, and contributors.

    Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive, and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”) to use the "DATASET" subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software:

    This data or resulting software may not be used to harm any animal deliberately.

    LICENSEE acknowledges that the DATASET is a research tool. THE DATASET IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.

    If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis (mackenzie@post.harvard.edu) for a commercial use license.

    Please cite Ye et al 2024 if you use this DATASET in your work.

    References Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021

    Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.

    Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset: A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13901–13908, 2021.

    Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1859–1868, 2021.

    Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.

    Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.

    iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020

    Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.

    Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.

    Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014

    Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022

  17. Z

    Toloka Visual Question Answering Dataset

    • data.niaid.nih.gov
    Updated Oct 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ustalov, Dmitry (2023). Toloka Visual Question Answering Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7057740
    Explore at:
    Dataset updated
    Oct 10, 2023
    Dataset authored and provided by
    Ustalov, Dmitry
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Our dataset consists of the images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform, https://toloka.ai/.

    Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset was available for everyone since the start of the challenge. The public test dataset was available since the evaluation phase of the competition, but without any ground truth labels. After the end of the competition, public and private sets were released.

    The datasets will be provided as files in the comma-separated values (CSV) format containing the following columns.

        Column
        Type
        Description
    
    
        image
        string
        URL of an image on a public content delivery network
    
    
        width
        integer
        image width
    
    
        height
        integer
        image height
    
    
        left
        integer
        bounding box coordinate: left
    
    
        top
        integer
        bounding box coordinate: top
    
    
        right
        integer
        bounding box coordinate: right
    
    
        bottom
        integer
        bounding box coordinate: bottom
    
    
        question
        string
        question in English
    

    This upload also contains a ZIP file with the images from MS COCO.

  18. BePLi Dataset v2: Beach Plastic Litter Dataset version 2

    • seanoe.org
    bin
    Updated Dec 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hidaka Mitsuko; Murakami Koshiro; Kawahara Shintaro; Sugiyama Daisuke; Nakagawa Yujin; Kako Shin’ichiro; Matsuoka Daisuke (2023). BePLi Dataset v2: Beach Plastic Litter Dataset version 2 [Dataset]. http://doi.org/10.17882/96963
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 1, 2023
    Dataset provided by
    SEANOE
    Authors
    Hidaka Mitsuko; Murakami Koshiro; Kawahara Shintaro; Sugiyama Daisuke; Nakagawa Yujin; Kako Shin’ichiro; Matsuoka Daisuke
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 2010 - Dec 31, 2018
    Area covered
    Description

    the beach plastic litter dataset version 2: bepli dataset v2 is an updated version of bepli dataset v1. the bepli dataset v2 comprises 3722 original beach plastic litter images and corresponding instance segmentation annotations that were manually annotated. the annotations are provided as ms coco (microsoft common objects in context) object detection task formatted json file. plastic litter object classes are defined as 13 classes: "pet_bottle", "other_bottle", "plastic_bag", "box_shaped_case", "other_container", "rope", "other_string", "fishing_net", "buoy", "other_fishing_gear", "styrene_foam", "others", "fragment". all target objects are made from plastics. this dataset employed rle encoding for the instance segmentation for a single object and all the annotations are given for single objects.

  19. SuperAnimal-Quadruped-80K

    • zenodo.org
    application/gzip
    Updated Nov 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zenodo (2024). SuperAnimal-Quadruped-80K [Dataset]. http://doi.org/10.5281/zenodo.14016777
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Nov 1, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Time period covered
    Jun 9, 2024
    Description

    Introduction

    This dataset supports Ye et al. 2024 Nature Communications. Please cite this dataset and paper if you use this resource. Please also see Ye et al. 2024 for the full DataSheet that accompanies this download, including the meta data for how to use this data is you want to compare model results on benchmark tasks. Below is just a summary. Also see the dataset licensing below.

    Training Data

    It consists of being trained together on the following datasets:

    • AwA-Pose Quadruped dataset, see full details at (1).
    • AnimalPose See full details at (2).
    • AcinoSet See full details at (3).
    • Horse-30 Horse-30 dataset, benchmark task is called Horse-10; See full details at (4).
    • StanfordDogs See full details at (5, 6).
    • AP-10K See full details at (7).
    • iRodent We utilized the iNaturalist API functions for scraping observations with the taxon ID of Suborder Myomorpha (8). The functions allowed us to filter the large amount of observations down to the ones with photos under the CC BY-NC creative license. The most common types of rodents from the collected observations are Muskrat (Ondatra zibethicus), Brown Rat (Rattus norvegicus), House Mouse (Mus musculus), Black Rat (Rattus rattus), Hispid Cotton Rat (Sigmodon hispidus), Meadow Vole (Microtus pennsylvanicus), Bank Vole (Clethrionomys glareolus), Deer Mouse (Peromyscus maniculatus), White-footed Mouse (Peromyscus leucopus), Striped Field Mouse (Apodemus agrarius). We then generated segmentation masks over target animals in the data by processing the media through an algorithm we designed that uses a Mask Region Based Convolutional Neural Networks(Mask R-CNN) (9) model with a ResNet-50-FPN backbone (10), pretrained on the COCO datasets (11). The processed 443 images were then manually labeled with both pose annotations and segmentation masks. iRodent data is banked at https://zenodo.org/record/8250392.
    • APT-36K See full details at (12).

    https://images.squarespace-cdn.com/content/v1/57f6d51c9f74566f55ecf271/1690988780004-AG00N6OU1R21MZ0AU9RE/modelcard-SAQ.png?format=1500w" target="_blank" rel="noopener">Here is an image with a keypoint guide.

    Ethical Considerations

    • No experimental data was collected for this model; all datasets used are cited above.

    Caveats and Recommendations

    • Please note that each dataest was labeled by separate labs & separate individuals, therefore while we map names to a unified pose vocabulary, there will be annotator bias in keypoint placement (See Ye et al. 2024 for our Supplementary Note on annotator bias). You will also note the dataset is highly diverse across species, but collectively has more representation of domesticated animals like dogs, cats, horses, and cattle. We recommend if performance of a model trained on this data is not as good as you need it to be, first try video adaptation (see Ye et al. 2024), or fine-tune the weights with your own labeling.

    License

    Modified MIT.

    Copyright 2023-present by Mackenzie Mathis, Shaokai Ye, and contributors.

    Permission is hereby granted to you (hereafter "LICENSEE") a fully-paid, non-exclusive,
    and non-transferable license for academic, non-commercial purposes only (hereafter “LICENSE”)
    to use the "DATASET" subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all copies or substantial
    portions of the Software:

    This data or resulting software may not be used to harm any animal deliberately.

    LICENSEE acknowledges that the DATASET is a research tool.
    THE DATASET IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
    BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE DATASET OR THE USE OR OTHER DEALINGS IN THE DATASET.

    If this license is not appropriate for your application, please contact Prof. Mackenzie W. Mathis
    (mackenzie@post.harvard.edu) for a commercial use license.

    Please cite Ye et al if you use this DATASET in your work.

    References

    1. Prianka Banik, Lin Li, and Xishuang Dong. A novel dataset for keypoint detection of quadruped animals from images. ArXiv, abs/2108.13958, 2021
    2. Jinkun Cao, Hongyang Tang, Haoshu Fang, Xiaoyong Shen, Cewu Lu, and Yu-Wing Tai. Cross-domain adaptation for animal pose estimation. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9497–9506, 2019.
    3. Daniel Joska, Liam Clark, Naoya Muramatsu, Ricardo Jericevich, Fred Nicolls, Alexander Mathis, Mackenzie W. Mathis, and Amir Patel. Acinoset: A 3d pose estimation dataset and baseline models for cheetahs in the wild. 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 13901–13908, 2021.
    4. Alexander Mathis, Thomas Biasi, Steffen Schneider, Mert Yuksekgonul, Byron Rogers, Matthias Bethge, and Mackenzie W Mathis. Pretraining boosts out-of-domain robustness for pose estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1859–1868, 2021.
    5. Aditya Khosla, Nityananda Jayadevaprakash, Bangpeng Yao, and Li Fei-Fei. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, June 2011.
    6. Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, and Roberto Cipolla. Creatures great and smal: Recovering the shape and motion of animals from video. In Asian Conference on Computer Vision, pages 3–19. Springer, 2018.
    7. Hang Yu, Yufei Xu, Jing Zhang, Wei Zhao, Ziyu Guan, and Dacheng Tao. Ap-10k: A benchmark for animal pose estimation in the wild. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
    8. iNaturalist. OGBIF Occurrence Download. https://doi.org/10.15468/dl.p7nbxt. iNaturalist, July 2020
    9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
    10. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection, 2016.
    11. Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll’ar, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR, abs/1405.0312, 2014
    12. Yuxiang Yang, Junjie Yang, Yufei Xu, Jing Zhang, Long Lan, and Dacheng Tao. Apt-36k: A large-scale benchmark for animal pose estimation and tracking. Advances in Neural Information Processing Systems, 35:17301–17313, 2022

    Versioning Note:

    - V2 includes fixes to Stanford Dog data; it affected less than 1% of the data.

  20. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2023). COCO (Common Objects in Context) Dataset [Dataset]. https://paperswithcode.com/dataset/coco

COCO (Common Objects in Context) Dataset

Common Objects in Context

Explore at:
42 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Dec 10, 2023
Description

The COCO (Common Objects in Context) dataset is a large-scale object detection, segmentation, and captioning dataset. It is designed to encourage research on a wide variety of object categories and is commonly used for benchmarking computer vision models. It is an essential dataset for researchers and developers working on object detection, segmentation, and pose estimation tasks.

Search
Clear search
Close search
Google apps
Main menu