53 datasets found
  1. T

    coco

    • tensorflow.org
    • huggingface.co
    Updated Jun 1, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). coco [Dataset]. https://www.tensorflow.org/datasets/catalog/coco
    Explore at:
    Dataset updated
    Jun 1, 2024
    Description

    COCO is a large-scale object detection, segmentation, and captioning dataset.

    Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('coco', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">

  2. P

    MS COCO Dataset

    • paperswithcode.com
    Updated Apr 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tsung-Yi Lin; Michael Maire; Serge Belongie; Lubomir Bourdev; Ross Girshick; James Hays; Pietro Perona; Deva Ramanan; C. Lawrence Zitnick; Piotr Dollár, MS COCO Dataset [Dataset]. https://paperswithcode.com/dataset/coco
    Explore at:
    Dataset updated
    Apr 15, 2024
    Authors
    Tsung-Yi Lin; Michael Maire; Serge Belongie; Lubomir Bourdev; Ross Girshick; James Hays; Pietro Perona; Deva Ramanan; C. Lawrence Zitnick; Piotr Dollár
    Description

    The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.

    Splits: The first version of MS COCO dataset was released in 2014. It contains 164K images split into training (83K), validation (41K) and test (41K) sets. In 2015 additional test set of 81K images was released, including all the previous test images and 40K new images.

    Based on community feedback, in 2017 the training/validation split was changed from 83K/41K to 118K/5K. The new split uses the same images and annotations. The 2017 test set is a subset of 41K images of the 2015 test set. Additionally, the 2017 release contains a new unannotated dataset of 123K images.

    Annotations: The dataset has annotations for

    object detection: bounding boxes and per-instance segmentation masks with 80 object categories, captioning: natural language descriptions of the images (see MS COCO Captions), keypoints detection: containing more than 200,000 images and 250,000 person instances labeled with keypoints (17 possible keypoints, such as left eye, nose, right hip, right ankle), stuff image segmentation – per-pixel segmentation masks with 91 stuff categories, such as grass, wall, sky (see MS COCO Stuff), panoptic: full scene segmentation, with 80 thing categories (such as person, bicycle, elephant) and a subset of 91 stuff categories (grass, sky, road), dense pose: more than 39,000 images and 56,000 person instances labeled with DensePose annotations – each labeled person is annotated with an instance id and a mapping between image pixels that belong to that person body and a template 3D model. The annotations are publicly available only for training and validation images.

  3. Microsoft Coco Dataset

    • universe.roboflow.com
    zip
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2025). Microsoft Coco Dataset [Dataset]. https://universe.roboflow.com/microsoft/coco/model/3
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 23, 2025
    Dataset authored and provided by
    Microsofthttp://microsoft.com/
    Variables measured
    Object Bounding Boxes
    Description

    Microsoft Common Objects in Context (COCO) Dataset

    The Common Objects in Context (COCO) dataset is a widely recognized collection designed to spur object detection, segmentation, and captioning research. Created by Microsoft, COCO provides annotations, including object categories, keypoints, and more. The model it a valuable asset for machine learning practitioners and researchers. Today, many model architectures are benchmarked against COCO, which has enabled a standard system by which architectures can be compared.

    While COCO is often touted to comprise over 300k images, it's pivotal to understand that this number includes diverse formats like keypoints, among others. Specifically, the labeled dataset for object detection stands at 123,272 images.

    The full object detection labeled dataset is made available here, ensuring researchers have access to the most comprehensive data for their experiments. With that said, COCO has not released their test set annotations, meaning the test data doesn't come with labels. Thus, this data is not included in the dataset.

    The Roboflow team has worked extensively with COCO. Here are a few links that may be helpful as you get started working with this dataset:

  4. Common Object Detection

    • hub.arcgis.com
    • sdiinnovation-geoplatform.hub.arcgis.com
    Updated Feb 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Esri (2023). Common Object Detection [Dataset]. https://hub.arcgis.com/content/a91bed8bc0fe4e1bb8db45c23959e5f1
    Explore at:
    Dataset updated
    Feb 28, 2023
    Dataset authored and provided by
    Esrihttp://esri.com/
    Description

    This is an open source object detection model by TensorFlow in TensorFlow Lite format. While it is not recommended to use this model in production surveys, it can be useful for demonstration purposes and to get started with smart assistants in ArcGIS Survey123. You are responsible for the use of this model. When using Survey123, it is your responsibility to review and manually correct outputs.This object detection model was trained using the Common Objects in Context (COCO) dataset. COCO is a large-scale object detection dataset that is available for use under the Creative Commons Attribution 4.0 License.The dataset contains 80 object categories and 1.5 million object instances that include people, animals, food items, vehicles, and household items. For a complete list of common objects this model can detect, see Classes.The model can be used in ArcGIS Survey123 to detect common objects in photos that are captured with the Survey123 field app. Using the modelFollow the guide to use the model. You can use this model to detect or redact common objects in images captured with the Survey123 field app. The model must be configured for a survey in Survey123 Connect.Fine-tuning the modelThis model cannot be fine-tuned using ArcGIS tools.InputCamera feed (either low-resolution preview or high-resolution capture).OutputImage with common object detections written to its EXIF metadata or an image with detected objects redacted.Model architectureThis is an open source object detection model by TensorFlow in TensorFlow Lite format with MobileNet architecture. The model is available for use under the Apache License 2.0.Sample resultsHere are a few results from the model.

  5. R

    Coco Val Dataset

    • universe.roboflow.com
    zip
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Radhe Radhe (2024). Coco Val Dataset [Dataset]. https://universe.roboflow.com/radhe-radhe-yrigi/coco-val-o7nn2/dataset/4
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 7, 2024
    Dataset authored and provided by
    Radhe Radhe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    All Coco Dataset Classes Box Bounding Boxes
    Description

    CoCo Val

    ## Overview
    
    CoCo Val is a dataset for object detection tasks - it contains All Coco Dataset Classes Box annotations for 9,419 images.
    
    ## Getting Started
    
    You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
    
      ## License
    
      This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
    
  6. h

    cocostuff

    • huggingface.co
    • opendatalab.com
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shunsuke Kitada (2023). cocostuff [Dataset]. https://huggingface.co/datasets/shunk031/cocostuff
    Explore at:
    Dataset updated
    Apr 20, 2023
    Authors
    Shunsuke Kitada
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COCO-Stuff augments all 164K images of the popular COCO dataset with pixel-level stuff annotations. These annotations can be used for scene understanding tasks like semantic segmentation, object detection and image captioning.

  7. P

    COCO Captions Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Sep 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xinlei Chen; Hao Fang; Tsung-Yi Lin; Ramakrishna Vedantam; Saurabh Gupta; Piotr Dollar; C. Lawrence Zitnick (2022). COCO Captions Dataset [Dataset]. https://paperswithcode.com/dataset/coco-captions
    Explore at:
    Dataset updated
    Sep 13, 2022
    Authors
    Xinlei Chen; Hao Fang; Tsung-Yi Lin; Ramakrishna Vedantam; Saurabh Gupta; Piotr Dollar; C. Lawrence Zitnick
    Description

    COCO Captions contains over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions are be provided for each image.

  8. coco-human-inpainted-objects

    • huggingface.co
    Updated Nov 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rapidata (2024). coco-human-inpainted-objects [Dataset]. https://huggingface.co/datasets/Rapidata/coco-human-inpainted-objects
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 8, 2024
    Dataset provided by
    Rapidata AG
    Authors
    Rapidata
    License

    https://choosealicense.com/licenses/cdla-permissive-2.0/https://choosealicense.com/licenses/cdla-permissive-2.0/

    Description

    About:

    The dataset was collected on the https://www.rapidata.ai platform and contains tens of thousands of human annotations of 70+ different kinds of objects. Rapidata makes it easy to collect manual labels in several data modalities with this repository containing freehand drawings on ~2000 images from the COCO dataset. Users are shown an image and are asked to paint a class of objects with a brush tool - there is always a single such object on the image, so the task is not… See the full description on the dataset page: https://huggingface.co/datasets/Rapidata/coco-human-inpainted-objects.

  9. Esefjorden Marine Vegetation Segmentation Dataset (EMVSD)

    • figshare.com
    bin
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn Christian Weinbach (2024). Esefjorden Marine Vegetation Segmentation Dataset (EMVSD) [Dataset]. http://doi.org/10.6084/m9.figshare.24072606.v4
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    figshare
    Authors
    Bjørn Christian Weinbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Esefjorden Marine Vegetation Segmentation Dataset (EMVSD):Comprising 17,000 meticulously labeled images, this dataset is suited for instance segmentation tasks and represents a significant leap forward for marine research in the region. The images are stored in YOLO and COCO formats, ensuring compatibility with widely recognized and adopted object detection frameworks. Our decision to make this dataset publicly accessible underscores our commitment to collaborative research and the advancement of the broader scientific community.Dataset Structure:- Images: - Organized into three subsets: train, val, and test, located under the images/ directory. - Each subset contains high-resolution images optimized for object detection and segmentation tasks.- Annotations: - Available in YOLO txt and COCO formats for compatibility with major object detection frameworks. - Organized into three subsets: train, val, and test, located under the labels/ directory. - Additional metadata: - counts.txt: Summary of label distributions. - Cache files (train.cache, val.cache, test.cache) for efficient dataset loading.- Metadata: - classes.txt: Definitions for all annotated classes in the dataset. - Detailed COCO-format annotations in: - train_annotations.json - val_annotations.json - test_annotations.json- Configuration File: - EMVSD.yaml: Configuration file for seamless integration with machine learning libraries.Example Directory Structure:EMVSD/├── images/│ ├── train/│ ├── val/│ └── test/├── labels/│ ├── train/│ ├── val/│ ├── test/│ ├── counts.txt│ ├── train.cache│ ├── val.cache│ └── test.cache├── classes.txt├── train_annotations.json├── val_annotations.json├── test_annotations.json└── EMVSD.yaml

  10. t

    COCO panoptic validation set - Dataset - LDM

    • service.tib.eu
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). COCO panoptic validation set - Dataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/coco-panoptic-validation-set
    Explore at:
    Dataset updated
    Dec 3, 2024
    Description

    Panoptic segmentation aims to unify instance and semantic segmentation in the same framework. Existing works propose to merge instance and semantic segmentation using post-processing layers. Recent works unify both segmentation tasks by producing binary masks and class scores for both things and stuff classes.

  11. O

    SketchyCOCO

    • opendatalab.com
    • paperswithcode.com
    zip
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nanjing University (2023). SketchyCOCO [Dataset]. https://opendatalab.com/OpenDataLab/SketchyCOCO
    Explore at:
    zip(12051986316 bytes)Available download formats
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    Huawei Noah’s Ark Lab
    Sun Yat-sen University
    Nanjing University
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    SketchyCOCO dataset consists of two parts: Object-level data Object-level data contains $20198(train18869+val1329)$ triplets of {foreground sketch, foreground image, foreground edge map} examples covering 14 classes, $27683(train22171+val5512)$ pairs of {background sketch, background image} examples covering 3 classes. Scene-level data Scene-level data contains $14081(train 11265 + val 2816)$ pairs of {foreground image&background sketch, scene image} examples, $14081(train 11265 + val 2816)$ pairs of {scene sketch, scene image} examples and the segmentation ground truth for $14081(train 11265 + val 2816)$ scene sketches. Some val scene images come from the train images of the COCO-Stuff dataset for increasing the number of the val images of the SketchyCOCO dataset.

  12. MOBDrone: a large-scale drone-view dataset for man overboard detection

    • zenodo.org
    • data.niaid.nih.gov
    json, pdf, zip
    Updated Jul 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Donato Cafarelli; Donato Cafarelli; Luca Ciampi; Luca Ciampi; Lucia Vadicamo; Lucia Vadicamo; Claudio Gennaro; Claudio Gennaro; Andrea Berton; Andrea Berton; Marco Paterni; Marco Paterni; Chiara Benvenuti; Mirko Passera; Mirko Passera; Fabrizio Falchi; Fabrizio Falchi; Chiara Benvenuti (2024). MOBDrone: a large-scale drone-view dataset for man overboard detection [Dataset]. http://doi.org/10.5281/zenodo.5996890
    Explore at:
    json, zip, pdfAvailable download formats
    Dataset updated
    Jul 17, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Donato Cafarelli; Donato Cafarelli; Luca Ciampi; Luca Ciampi; Lucia Vadicamo; Lucia Vadicamo; Claudio Gennaro; Claudio Gennaro; Andrea Berton; Andrea Berton; Marco Paterni; Marco Paterni; Chiara Benvenuti; Mirko Passera; Mirko Passera; Fabrizio Falchi; Fabrizio Falchi; Chiara Benvenuti
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    The Man OverBoard Drone (MOBDrone) dataset is a large-scale collection of aerial footage images. It contains 126,170 frames extracted from 66 video clips gathered from one UAV flying at an altitude of 10 to 60 meters above the mean sea level. Images are manually annotated with more than 180K bounding boxes localizing objects belonging to 5 categories --- person, boat, lifebuoy, surfboard, wood. More than 113K of these bounding boxes belong to the person category and localize people in the water simulating the need to be rescued.

    In this repository, we provide:

    • 66 Full HD video clips (total size: 5.5 GB)

    • 126,170 images extracted from the videos at a rate of 30 FPS (total size: 243 GB)

    • 3 annotation files for the extracted images that follow the MS COCO data format (for more info see https://cocodataset.org/#format-data):

      • annotations_5_custom_classes.json: this file contains annotations concerning all five categories; please note that class ids do not correspond with the ones provided by the MS COCO standard since we account for two new classes not previously considered in the MS COCO dataset --- lifebuoy and wood

      • annotations_3_coco_classes.json: this file contains annotations concerning the three classes also accounted by the MS COCO dataset --- person, boat, surfboard. Class ids correspond with the ones provided by the MS COCO standard.

      • annotations_person_coco_classes.json: this file contains annotations concerning only the 'person' class. Class id corresponds to the one provided by the MS COCO standard.

    The MOBDrone dataset is intended as a test data benchmark. However, for researchers interested in using our data also for training purposes, we provide training and test splits:

    • Test set: All the images whose filename starts with "DJI_0804" (total: 37,604 images)
    • Training set: All the images whose filename starts with "DJI_0915" (total: 88,568 images)

    More details about data generation and the evaluation protocol can be found at our MOBDrone paper: https://arxiv.org/abs/2203.07973
    The code to reproduce our results is available at this GitHub Repository: https://github.com/ciampluca/MOBDrone_eval
    See also http://aimh.isti.cnr.it/dataset/MOBDrone

    Citing the MOBDrone

    The MOBDrone is released under a Creative Commons Attribution license, so please cite the MOBDrone if it is used in your work in any form.
    Published academic papers should use the academic paper citation for our MOBDrone paper, where we evaluated several pre-trained state-of-the-art object detectors focusing on the detection of the overboard people

    @inproceedings{MOBDrone2021,
    title={MOBDrone: a Drone Video Dataset for Man OverBoard Rescue},
    author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi},
    booktitle={ICIAP2021: 21th International Conference on Image Analysis and Processing},
    year={2021}
    }
    

    and this Zenodo Dataset

    @dataset{donato_cafarelli_2022_5996890,
    author={Donato Cafarelli and Luca Ciampi and Lucia Vadicamo and Claudio Gennaro and Andrea Berton and Marco Paterni and Chiara Benvenuti and Mirko Passera and Fabrizio Falchi},
     title    = {{MOBDrone: a large-scale drone-view dataset for man overboard detection}},
     month    = feb,
     year     = 2022,
     publisher  = {Zenodo},
     version   = {1.0.0},
     doi     = {10.5281/zenodo.5996890},
     url     = {https://doi.org/10.5281/zenodo.5996890}
    }

    Personal works, such as machine learning projects/blog posts, should provide a URL to the MOBDrone Zenodo page (https://doi.org/10.5281/zenodo.5996890), though a reference to our MOBDrone paper would also be appreciated.

    Contact Information

    If you would like further information about the MOBDrone or if you experience any issues downloading files, please contact us at mobdrone[at]isti.cnr.it

    Acknowledgements

    This work was partially supported by NAUSICAA - "NAUtical Safety by means of Integrated Computer-Assistance Appliances 4.0" project funded by the Tuscany region (CUP D44E20003410009). The data collection was carried out with the collaboration of the Fly&Sense Service of the CNR of Pisa - for the flight operations of remotely piloted aerial systems - and of the Institute of Clinical Physiology (IFC) of the CNR - for the water immersion operations.

  13. R

    Custom Yolov7 On Kaggle On Custom Dataset

    • universe.roboflow.com
    zip
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Owais Ahmad (2023). Custom Yolov7 On Kaggle On Custom Dataset [Dataset]. https://universe.roboflow.com/owais-ahmad/custom-yolov7-on-kaggle-on-custom-dataset-rakiq/dataset/2
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jan 29, 2023
    Dataset authored and provided by
    Owais Ahmad
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Person Car Bounding Boxes
    Description

    Custom Training with YOLOv7 🔥

    Some Important links

    Contact Information

    Objective

    To Showcase custom Object Detection on the Given Dataset to train and Infer the Model using newly launched YoloV7.

    Data Acquisition

    The goal of this task is to train a model that can localize and classify each instance of Person and Car as accurately as possible.

    from IPython.display import Markdown, display
    
    display(Markdown("../input/Car-Person-v2-Roboflow/README.roboflow.txt"))
    

    Custom Training with YOLOv7 🔥

    In this Notebook, I have processed the images with RoboFlow because in COCO formatted dataset was having different dimensions of image and Also data set was not splitted into different Format. To train a custom YOLOv7 model we need to recognize the objects in the dataset. To do so I have taken the following steps:

    • Export the dataset to YOLOv7
    • Train YOLOv7 to recognize the objects in our dataset
    • Evaluate our YOLOv7 model's performance
    • Run test inference to view performance of YOLOv7 model at work

    📦 YOLOv7

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/car-person-2.PNG" width=800>

    Image Credit - jinfagang

    Step 1: Install Requirements

    !git clone https://github.com/WongKinYiu/yolov7 # Downloading YOLOv7 repository and installing requirements
    %cd yolov7
    !pip install -qr requirements.txt
    !pip install -q roboflow
    

    Downloading YOLOV7 starting checkpoint

    !wget "https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7.pt"
    
    import os
    import glob
    import wandb
    import torch
    from roboflow import Roboflow
    from kaggle_secrets import UserSecretsClient
    from IPython.display import Image, clear_output, display # to display images
    
    
    
    print(f"Setup complete. Using torch {torch._version_} ({torch.cuda.get_device_properties(0).name if torch.cuda.is_available() else 'CPU'})")
    

    https://camo.githubusercontent.com/dd842f7b0be57140e68b2ab9cb007992acd131c48284eaf6b1aca758bfea358b/68747470733a2f2f692e696d6775722e636f6d2f52557469567a482e706e67">

    I will be integrating W&B for visualizations and logging artifacts and comparisons of different models!

    YOLOv7-Car-Person-Custom

    try:
      user_secrets = UserSecretsClient()
      wandb_api_key = user_secrets.get_secret("wandb_api")
      wandb.login(key=wandb_api_key)
      anonymous = None
    except:
      wandb.login(anonymous='must')
      print('To use your W&B account,
    Go to Add-ons -> Secrets and provide your W&B access token. Use the Label name as WANDB. 
    Get your W&B access token from here: https://wandb.ai/authorize')
      
      
      
    wandb.init(project="YOLOvR",name=f"7. YOLOv7-Car-Person-Custom-Run-7")
    

    Step 2: Assemble Our Dataset

    https://uploads-ssl.webflow.com/5f6bc60e665f54545a1e52a5/615627e5824c9c6195abfda9_computer-vision-cycle.png" alt="">

    In order to train our custom model, we need to assemble a dataset of representative images with bounding box annotations around the objects that we want to detect. And we need our dataset to be in YOLOv7 format.

    In Roboflow, We can choose between two paths:

    Version v2 Aug 12, 2022 Looks like this.

    https://raw.githubusercontent.com/Owaiskhan9654/Yolo-V7-Custom-Dataset-Train-on-Kaggle/main/Roboflow.PNG" alt="">

    user_secrets = UserSecretsClient()
    roboflow_api_key = user_secrets.get_secret("roboflow_api")
    
    rf = Roboflow(api_key=roboflow_api_key)
    project = rf.workspace("owais-ahmad").project("custom-yolov7-on-kaggle-on-custom-dataset-rakiq")
    dataset = project.version(2).download("yolov7")
    

    Step 3: Training Custom pretrained YOLOv7 model

    Here, I am able to pass a number of arguments: - img: define input image size - batch: determine

  14. P

    ConQA Dataset

    • paperswithcode.com
    Updated Apr 2, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ConQA Dataset [Dataset]. https://paperswithcode.com/dataset/conqa
    Explore at:
    Dataset updated
    Apr 2, 2024
    Authors
    Juan Manuel Rodriguez; Nima Tavassoli; Eliezer Levy; Gil Lederman; Dima Sivov; Matteo Lissandrini; Davide Mottin
    Description

    ConQA is a dataset created using the intersection between VisualGenome and MS-COCO. The goal of this dataset is to provide a new benchmark for text to image retrieval using short and less descriptive queries than the commonly use captions from MS-COCO or Flicker. ConQA consists of 80 queries divided into 50 conceptual and 30 descriptive queries. A descriptive query mentions some of the objects in the image, for instance, people chopping vegetables. While, a conceptual query does not mention objects or only refers to objects in a general context, e.g., working class life.

    Dataset generation For the dataset generation, we followed a 3 step workflow: filtering images, generating queries and seeding relevant, and crowd-sourcing extended annotations.

    Filtering images The first step is focused on filtering images that have meaningful scene graphs and captions. To filter the images, we used the following procedure:

    The image should have one or more captions. Hence, we discarded the YFCC images with no caption, obtaining images from the MS-COCO subset of Visual Genome. The image should describe a complex scene with multiple objects. We filtered all the scene graphs that did not contain any edges. images pass this filter. The relationships should be verbs and not contain nouns or pronouns. To detect this, we generated sentences for each edge as a concatenation of the words on the labels of the nodes and the relationship and applied Part of Speech tagging. We performed the POS Tagging using the model en_core_web_sm provided by SpaCy. We filter all scene graphs that contain an edge not tagged as a verb or that the tag is not in an ad-hoc list of allowed non-verb keywords. The allowed keywords are top, front, end, side, edge, middle, rear, part, bottom, under, next, left, right, center, background, back, and parallel. We allowed these keywords as they represent positional relationships between objects. After filtering, we obtain images.

    Generating Queries To generate ConQA, the dataset authors worked in three pairs and acted as annotators to manually design the queries, namely 50 conceptual and 30 descriptive queries. After that, we proceeded to use the model "ViT-B/32" from CLIP to find relevant images. For conceptual queries, it was challenging to find relevant images directly, so alternative proxy queries were used to identify an initial set of relevant images. These images are the seed for finding other relevant images that were annotated through Amazon Mechanical Turk.

    Annotation crowdsourcing Having the initial relevant set defined by the dataset authors, we expanded the relevant candidates by looking into the top-100 visually closest images according to a pre-trained ResNet152 model for each query. As a result, we increase the number of potentially relevant images to analyze without adding human bias to the task.

    After selecting the images to annotate, we set up a set of Human Intelligence Tasks (HITs) on Amazon Mechanical Turk. Each task consisted of a query and 5 potentially relevant images. Then, the workers were instructed to determine whether each image is relevant for the given query. If they were not sure, they could alternatively mark the image as “Unsure”. To reduce presentation bias, we randomize the order of images and the options. Additionally, we include validation tasks with control images to ensure a minimum quality in the annotation process, so workers failing 70% or more of validation queries were excluded.

  15. Z

    COCOPlaces

    • data.niaid.nih.gov
    • zenodo.org
    Updated Feb 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vít Škvára (2023). COCOPlaces [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_7612052
    Explore at:
    Dataset updated
    Feb 8, 2023
    Dataset authored and provided by
    Vít Škvára
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The COCOPlaces dataset contains foreground objects from the COCO dataset overlayed over backgrounds from the Places dataset. It contains annotations and is suitable for benchmarking disentangling or factor identification. Originally used for the project https://github.com/vitskvara/sgad. There are two versions - non-mixed and mixed. In the non-mixed version (uniform_data_64.npy and uniform_labels_64.npy), a total of 10 classes of images were created, where in a single class, the background and object labels are the same. In the mixed version (mashed_data_64.npy and mashed_labels_64.npy), each image has a random object and background (out of 10 possible classes). Then, the label is a tuple of two numbers, describing the individual (object,background) labels.

  16. T

    ref_coco

    • tensorflow.org
    • opendatalab.com
    Updated May 31, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). ref_coco [Dataset]. https://www.tensorflow.org/datasets/catalog/ref_coco
    Explore at:
    Dataset updated
    May 31, 2024
    Description

    A collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.

    RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.

    Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".

    Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):

    datasetpartitionsplitrefsimages
    refcocogoogletrain4000019213
    refcocogoogleval50004559
    refcocogoogletest50004527
    refcocounctrain4240416994
    refcocouncval38111500
    refcocounctestA1975750
    refcocounctestB1810750
    refcoco+unctrain4227816992
    refcoco+uncval38051500
    refcoco+unctestA1975750
    refcoco+unctestB1798750
    refcocoggoogletrain4482224698
    refcocoggoogleval50004650
    refcocogumdtrain4222621899
    refcocogumdval25731300
    refcocogumdtest50232600

    To use this dataset:

    import tensorflow_datasets as tfds
    
    ds = tfds.load('ref_coco', split='train')
    for ex in ds.take(4):
     print(ex)
    

    See the guide for more informations on tensorflow_datasets.

    https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">

  17. f

    Characteristics of COCO data-set.

    • plos.figshare.com
    xls
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Asra Khalid; Karsten Lundqvist; Anne Yates; Mustansar Ali Ghzanfar (2023). Characteristics of COCO data-set. [Dataset]. http://doi.org/10.1371/journal.pone.0245485.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Asra Khalid; Karsten Lundqvist; Anne Yates; Mustansar Ali Ghzanfar
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Characteristics of COCO data-set.

  18. Mechanical Parts Dataset 2022

    • zenodo.org
    Updated Jan 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mübarek Mazhar Çakır; Mübarek Mazhar Çakır (2023). Mechanical Parts Dataset 2022 [Dataset]. http://doi.org/10.5281/zenodo.7504801
    Explore at:
    Dataset updated
    Jan 5, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mübarek Mazhar Çakır; Mübarek Mazhar Çakır
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Mechanical Parts Dataset

    The dataset consists of a total of 2250 images obtained by downloading from various internet platforms. Among the images in the dataset, there are 714 images with bearings, 632 images with bolts, 616 images with gears and 586 images with nuts. A total of 10597 manual labeling processes were carried out in the dataset, including 2099 labels belonging to the bearing class, 2734 labels belonging to the bolt class, 2662 labels belonging to the gear class and 3102 labels belonging to the nut class.

    Folder Content

    The created dataset is divided into 3 as 80% train, 10% validation and 10% test. In the "Mechanical Parts Dataset" folder, there are three separate folders as "train", "test" and "val". In each of these three folders there are folders named "images" and "labels". Images are kept in the "images" folder and tag information is kept in the "labels" folder.

    Finally, inside the folder there is a yaml file named "mech_parts_data" for the Yolo algorithm. This file contains the number of classes and class names.

    Images and Labels

    The dataset was prepared in accordance with the Yolov5 algorithm.
    For example, the tag information of the image named "2a0xhkr_jpg.rf.45a11bf63c40ad6e47da384fdf6bb7a1.jpg" is stored in the txt file with the same name. The tag information (coordinates) in the txt file are as follows: "class x_center y_center width height".

    Update 05.01.2023

    ***Pascal voc and coco json formats have been added.***

    Related paper: doi.org/10.5281/zenodo.7496767

  19. One-hot endcoding for flickr_imagenet_coco

    • figshare.com
    bin
    Updated May 24, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shahi Dost (2020). One-hot endcoding for flickr_imagenet_coco [Dataset]. http://doi.org/10.6084/m9.figshare.12363965.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 24, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Shahi Dost
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dictionary files for one-hot encoding of flickr, imagenet and coco classes.

  20. h

    CoCo

    • huggingface.co
    Updated Mar 23, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yuzhe Gu (2025). CoCo [Dataset]. https://huggingface.co/datasets/Tracygu/CoCo
    Explore at:
    Dataset updated
    Mar 23, 2025
    Authors
    Yuzhe Gu
    Description

    This is a subset of CoCo2017 dataset. This dataset is used for image classification, and class-conditional image generation.

      task_categories:
    
    • image-classification size_categories:
    • 10K<n<100K
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
(2024). coco [Dataset]. https://www.tensorflow.org/datasets/catalog/coco

coco

Explore at:
6 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Jun 1, 2024
Description

COCO is a large-scale object detection, segmentation, and captioning dataset.

Note: * Some images from the train and validation sets don't have annotations. * Coco 2014 and 2017 uses the same images, but different train/val/test splits * The test split don't have any annotations (only images). * Coco defines 91 classes but the data only uses 80 classes. * Panotptic annotations defines defines 200 classes but only uses 133.

To use this dataset:

import tensorflow_datasets as tfds

ds = tfds.load('coco', split='train')
for ex in ds.take(4):
 print(ex)

See the guide for more informations on tensorflow_datasets.

https://storage.googleapis.com/tfds-data/visualization/fig/coco-2014-1.1.0.png" alt="Visualization" width="500px">

Search
Clear search
Close search
Google apps
Main menu