Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Dataset Card for paper-parts
** The original COCO dataset is stored at dataset.tar.gz**
Dataset Summary
paper-parts
Supported Tasks and Leaderboards
object-detection: The dataset can be used to train a model for Object Detection.
Languages
English
Dataset Structure
Data Instances
A data point comprises an image and its object annotations. { 'image_id': 15, 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB… See the full description on the dataset page: https://huggingface.co/datasets/Francesco/paper-parts.
Facebook
TwitterThis dataset is part of the paper "Zeroshot Instance Segmentation (ZSI)" The paper link ZSI
The dataset consists of the following items: 1: MS COCO 2014 (Training and validation images only) 2: MS COCO 2014 Annotations (special split annotation which is described in the paper)
I uploaded the dataset to run the implementation of this paper. The github code of the paper ZSI
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Coco Rock Paper Scissor is a dataset for object detection tasks - it contains Hands annotations for 806 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterA collection of 3 referring expression datasets based off images in the COCO dataset. A referring expression is a piece of text that describes a unique object in an image. These datasets are collected by asking human raters to disambiguate objects delineated by bounding boxes in the COCO dataset.
RefCoco and RefCoco+ are from Kazemzadeh et al. 2014. RefCoco+ expressions are strictly appearance based descriptions, which they enforced by preventing raters from using location based descriptions (e.g., "person to the right" is not a valid description for RefCoco+). RefCocoG is from Mao et al. 2016, and has more rich description of objects compared to RefCoco due to differences in the annotation process. In particular, RefCoco was collected in an interactive game-based setting, while RefCocoG was collected in a non-interactive setting. On average, RefCocoG has 8.4 words per expression while RefCoco has 3.5 words.
Each dataset has different split allocations that are typically all reported in papers. The "testA" and "testB" sets in RefCoco and RefCoco+ contain only people and only non-people respectively. Images are partitioned into the various splits. In the "google" split, objects, not images, are partitioned between the train and non-train splits. This means that the same image can appear in both the train and validation split, but the objects being referred to in the image will be different between the two sets. In contrast, the "unc" and "umd" splits partition images between the train, validation, and test split. In RefCocoG, the "google" split does not have a canonical test set, and the validation set is typically reported in papers as "val*".
Stats for each dataset and split ("refs" is the number of referring expressions, and "images" is the number of images):
| dataset | partition | split | refs | images |
|---|---|---|---|---|
| refcoco | train | 40000 | 19213 | |
| refcoco | val | 5000 | 4559 | |
| refcoco | test | 5000 | 4527 | |
| refcoco | unc | train | 42404 | 16994 |
| refcoco | unc | val | 3811 | 1500 |
| refcoco | unc | testA | 1975 | 750 |
| refcoco | unc | testB | 1810 | 750 |
| refcoco+ | unc | train | 42278 | 16992 |
| refcoco+ | unc | val | 3805 | 1500 |
| refcoco+ | unc | testA | 1975 | 750 |
| refcoco+ | unc | testB | 1798 | 750 |
| refcocog | train | 44822 | 24698 | |
| refcocog | val | 5000 | 4650 | |
| refcocog | umd | train | 42226 | 21899 |
| refcocog | umd | val | 2573 | 1300 |
| refcocog | umd | test | 5023 | 2600 |
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('ref_coco', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/ref_coco-refcoco_unc-1.1.0.png" alt="Visualization" width="500px">
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
MTTN(read mutton) is a dataset that is aimed at text-2-text generation, with a focus on diffusion model prompts. The model trained on this data will be able to fill gaps and create natural prompts that can be used to generate images directly.
MTTN is derived from different popular text datasets, including
The increased interest in diffusion models has opened up opportunities for advancements in generative text modeling. These models can produce impressive images when given a well-crafted prompt, but creating a powerful or meaningful prompt can be hit-or-miss. To address this, we have created a large-scale dataset that is derived and synthesized from real prompts and indexed with popular image-text datasets such as MS-COCO and Flickr. We have also implemented stages that gradually reduce context and increase complexity, which will further enhance the output due to the complex annotations created. The dataset, called MTTN, includes over 2.4 million sentences divided into 5 stages, resulting in a total of over 12 million pairs, and a vocabulary of over 300,000 unique words, providing ample variation. The original 2.4 million pairs are designed to reflect the way language is used on the internet globally, making the dataset more robust for any model trained on it.
To form MTTN, the data was cleaned of any trailing ASCII values of special characters, following which there different Emojis were removed. Finally, the dataset was then stripped step by step till we were left with only subject and objects of the sentences.
MTTN paper can be accessed from here, Github, Papers With Code
All the subsets are available in json format, and can be used in the following manner in python:
import pandas as pd
df = pd.read_json('downloaded_json_file_path', orient='split', compression='infer')
@misc{https://doi.org/10.48550/arxiv.2301.10172, doi = {10.48550/ARXIV.2301.10172}, url = {https://arxiv.org/abs/2301.10172}, author = {Ghosh, Archan and Ghosh, Debgandhar and Maji, Madhurima and Chanda, Suchinta and Goswami, Kalporup}, keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {MTTN: Multi-Pair Text to Text Narratives for Prompt Generation}, publisher = {arXiv}, year = {2023}, copyright = {Creative Commons Attribution Share Alike 4.0 International} }
Facebook
TwitterThis dataset contains preprocessed annotations for the IP102 Insect Pest Recognition Dataset converted to COCO format, making it ready for object detection models like DETR, Faster R-CNN, YOLO, and other modern detectors.
IP102 is a large-scale benchmark dataset for insect pest recognition containing: - 75,222 images of insect pests - 102 categories of agricultural pests - Images collected from real agricultural scenarios
This dataset provides:
- train_annotations.json - Training set annotations in COCO format
- val_annotations.json - Validation set annotations in COCO format
- test_annotations.json (optional) - Test set annotations
Annotations follow the standard COCO Object Detection format:
json
{
"images": [
{
"id": 1,
"file_name": "image_001.jpg",
"width": 640,
"height": 480
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 5,
"bbox": [x, y, width, height],
"area": 12345,
"iscrowd": 0
}
],
"categories": [
{
"id": 1,
"name": "rice_leaf_roller",
"supercategory": "insect"
}
]
}
import json
from pycocotools.coco import COCO
# Load annotations
with open('/kaggle/input/ip102-coco-annotations/train_annotations.json') as f:
coco_data = json.load(f)
# Or use COCO API
coco = COCO('/kaggle/input/ip102-coco-annotations/train_annotations.json')
print(f"Number of images: {len(coco_data['images'])}")
print(f"Number of annotations: {len(coco_data['annotations'])}")
print(f"Number of categories: {len(coco_data['categories'])}")
If you use this dataset, please cite the original IP102 paper:
@article{wu2019ip102,
title={IP102: A Large-Scale Benchmark Dataset for Insect Pest Recognition},
author={Wu, Xiaoping and Zhan, Chi and Lai, Yu-Kun and Cheng, Ming-Ming and Yang, Jufeng},
journal={CVPR},
year={2019}
}
Original dataset by Wu et al. (CVPR 2019). This is a format conversion for easier integration with modern detection frameworks.
Ready to train your insect detection model! 🐛🔍 ```
object detection
computer vision
agriculture
coco format
insect recognition
pest detection
deep learning
detr
dataset
annotation
CC BY-NC-SA 4.0 (same as original IP102)
ou ``` Database: Open Database, Contents: © Original Authors
Facebook
TwitterCOCO 2014 DensePose Relabeling with Body Parts
This dataset is formatted for Ultralytics YOLO and is ready for training. IMPORTANT !!!! Update the paths in the yaml inside the dataset folder
Demo
Here is what inference looks like:
Based on:
GitHub Repository Paper
Classes:
{ 1: "Person", 2: "Torso", 3: "Hand", 4: "Foot", 5: "Upper Leg", 6:"Lower Leg", 7: "Upper Arm", 8: "Lower Arm", 9: "Head" }… See the full description on the dataset page: https://huggingface.co/datasets/Xuban/coco_body_part.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
ComCo & SimCo Datasets
🔗 GitHub Project Page | 📄 arXiv Paper
Overview
This repository contains two datasets, ComCo and SimCo, designed for evaluating multi-object representation in Vision-Language Models (VLMs). These datasets provide controlled environments for analyzing model biases, object recognition, and compositionality in multi-object scenarios.
ComCo: Composed of real-world objects derived from the COCO dataset. SimCo: Contains simple geometric shapes in… See the full description on the dataset page: https://huggingface.co/datasets/clip-oscope/simco-comco.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description from the SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery GitHub Repository * The "Note" was added by the Roboflow team.
This is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison. The missing wing, as it was found after 3 weeks, is shown below.
https://michaeltpublic.s3.amazonaws.com/images/anomaly_small.jpg" alt="anomaly">
The dataset contains the following:
| Set | Images | Annotations |
|---|---|---|
| Train | 1808 | 3048 |
| Validate | 490 | 747 |
| Test | 254 | 411 |
| Total | 2552 | 4206 |
The data is in the COCO format, and is directly compatible with faster r-cnn as implemented in Facebook's Detectron2.
Download the data here: sarnet.zip
Or follow these steps
# download the dataset
wget https://michaeltpublic.s3.amazonaws.com/sarnet.zip
# extract the files
unzip sarnet.zip
***Note* with Roboflow, you can download the data here** (original, raw images, with annotations): https://universe.roboflow.com/roboflow-public/sarnet-search-and-rescue/ (download v1, original_raw-images) * Download the dataset in COCO JSON format, or another format of choice, and import them to Roboflow after unzipping the folder to get started on your project.
Get started with a Faster R-CNN model pretrained on SaRNet: SaRNet_Demo.ipynb
Source code for the paper is located here: SaRNet_train_test.ipynb
@misc{thoreau2021sarnet,
title={SaRNet: A Dataset for Deep Learning Assisted Search and Rescue with Satellite Imagery},
author={Michael Thoreau and Frazer Wilson},
year={2021},
eprint={2107.12469},
archivePrefix={arXiv},
primaryClass={eess.IV}
}
The source data was generously provided by Planet Labs, Airbus Defence and Space, and Maxar Technologies.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
TF-ID arXiv papers dataset
This is the dataset for finetuning TF-ID models. It contains about 4,600 images (academic paper pages) with bounding boxes of tables and figures in coco format. The papers are selected from Hugging Face Daily Papers, covering mostly AI/ML/DL related topics. You can use this dataset to reproduce all TF-ID models. All bounding boxes were annotated manually by Yifei Hu
Project Repo
github.com/ai8hyf/TF-ID
Variants
Unzip the… See the full description on the dataset page: https://huggingface.co/datasets/yifeihu/TF-ID-arxiv-papers.
Facebook
TwitterDigital image forensic has gained a lot of attention as it is becoming easier for anyone to make forged images. Several areas are concerned by image manipulation: a doctored image can increase the credibility of fake news, impostors can use morphed images to pretend being someone else.
It became of critical importance to be able to recognize the manipulations suffered by the images. To do this, the first need is to be able to rely on reliable and controlled data sets representing the most characteristic cases encountered. The purpose of this work is to lay the foundations of a body of tests allowing both the qualification of automatic methods of authentication and detection of manipulations and the training of these methods.
This dataset contains about 25000 object-removal forgeries are available under the inpainting directory. Each object-removal is accompanied by two binary masks. One under the probe_mask subdirectory indicates the location of the forgery and one under the inpaint_mask which is the mask use for the inpainting algorithm.
If you use this dataset for your research, please refer to the original paper :
@INPROCEEDINGS{DEFACTODataset, AUTHOR=”Gaël Mahfoudi and Badr Tajini and Florent Retraint and Fr{'e}d{'e}ric Morain-Nicolier and Jean Luc Dugelay and Marc Pic”, TITLE=”{DEFACTO:} Image and Face Manipulation Dataset”, BOOKTITLE=”27th European Signal Processing Conference (EUSIPCO 2019)”, ADDRESS=”A Coruña, Spain”, DAYS=1, MONTH=sep, YEAR=2019 }
and to the MSCOCO dataset
The DEFACTO Consortium does not own the copyright of those images. Please refer to the MSCOCO terms of use for all images based on their Dataset.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This repository is used to store the COCO version of the dataset DWTAL (Deformable Wireframe) mentioned in the paper at https://arxiv.org/abs/2504.20682. DWTAL-s represents a smaller scale dataset, where the deformation table is simpler, while DWTAL-l represents a larger scale dataset version, where the deformation table is more complex. The code from the paper is open source at https://github.com/justliulong/OGHFYOLO.
Facebook
TwitterDigital image forensic has gained a lot of attention as it is becoming easier for anyone to make forged images. Several areas are concerned by image manipulation: a doctored image can increase the credibility of fake news, impostors can use morphed images to pretend being someone else.
It became of critical importance to be able to recognize the manipulations suffered by the images. To do this, the first need is to be able to rely on reliable and controlled data sets representing the most characteristic cases encountered. The purpose of this work is to lay the foundations of a body of tests allowing both the qualification of automatic methods of authentication and detection of manipulations and the training of these methods.
This dataset contains about 40000 face morphing and 40000 face swapping forgeries are available under the face morphing directory. Each face morphing and swapping is accompanied by two binary masks. One under the probe_mask subdirectory indicates the location of the forgery and one under the donor_mask indicates the location of the source. The external image can be found in the JSON file under the graph subdirectory.
If you use this dataset for your research, please refer to the original paper :
@INPROCEEDINGS{DEFACTODataset, AUTHOR=”Gaël Mahfoudi and Badr Tajini and Florent Retraint and Fr{'e}d{'e}ric Morain-Nicolier and Jean Luc Dugelay and Marc Pic”, TITLE=”{DEFACTO:} Image and Face Manipulation Dataset”, BOOKTITLE=”27th European Signal Processing Conference (EUSIPCO 2019)”, ADDRESS=”A Coruña, Spain”, DAYS=1, MONTH=sep, YEAR=2019 }
and to the MSCOCO dataset
The DEFACTO Consortium does not own the copyright of those images. This Dataset contains images of persons gathered on IMDB. If any of this images belongs to you and you wish it to be removed contact us at defacto.dataset@gmail.com.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FMIYC (Find Me If You Can) Dataset
Dataset Description
Paper: FindMeIfYouCan: Bringing Open Set metrics to near, far and farther Out-of-Distribution Object detection The FMIYC (Find Me If You Can) dataset is designed for Out-Of-Distribution (OOD) Object Detection tasks. It comprises images and annotations derived and adapted from the COCO (Common Objects in Context) and OpenImages datasets. The FMIYC dataset curates these sources into new evaluation splits categorized as… See the full description on the dataset page: https://huggingface.co/datasets/CEAai/FindMeIfYouCan.
Facebook
TwitterDigital image forensic has gained a lot of attention as it is becoming easier for anyone to make forged images. Several areas are concerned by image manipulation: a doctored image can increase the credibility of fake news, impostors can use morphed images to pretend being someone else.
It became of critical importance to be able to recognize the manipulations suffered by the images. To do this, the first need is to be able to rely on reliable and controlled data sets representing the most characteristic cases encountered. The purpose of this work is to lay the foundations of a body of tests allowing both the qualification of automatic methods of authentication and detection of manipulations and the training of these methods.
This dataset contains about 105000 splicing forgeries are available under the splicing directory. Each splicing is accompanied by two binary masks. One under the probe_mask subdirectory indicates the location of the forgery and one under the donor_mask indicates the location of the source. The external image can be found in the JSON file under the graph subdirectory.
If you use this dataset for your research, please refer to the original paper :
@INPROCEEDINGS{DEFACTODataset, AUTHOR=”Gaël Mahfoudi and Badr Tajini and Florent Retraint and Fr{'e}d{'e}ric Morain-Nicolier and Jean Luc Dugelay and Marc Pic”, TITLE=”{DEFACTO:} Image and Face Manipulation Dataset”, BOOKTITLE=”27th European Signal Processing Conference (EUSIPCO 2019)”, ADDRESS=”A Coruña, Spain”, DAYS=1, MONTH=sep, YEAR=2019 }
and to the MSCOCO dataset
The DEFACTO Consortium does not own the copyright of those images. Please refer to the MSCOCO terms of use for all images based on their Dataset.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
We have published a MOT expansion to the Brackish Dataset with 9 new sequences, a framework for generating synthetic training sequences, and ground truth annotations in MOTChallenge format. The dataset will be presented at the Scandinavian Conference on Image Processing in Levi, Finland, in April 2023. You can read more about the dataset in the paper BrackishMOT: The Brackish Multi-Object Tracking Dataset
The dataset has been updated on August 25 2020 in order to fix a range of false negative annotations. Approximately 14,000 new annotations have been added. Choose Version 1 if you want the old version of the dataset used in the CVPRW paper.
This is the first publicly available European underwater image dataset with bounding box annotations of fish, crabs, and other marine organisms. It has been recorded in Limfjorden, which is a brackish strait that runs through Aalborg in the northern part of Denmark.
[comment]: <> (https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3385658%2F8922fac6f7adcdab527c54501d3d6c0d%2Fvlcsnap-2019-06-13-10h27m26s500.png?generation=1561550112813161&alt=media">)
The camera setup used for capturing the data consists of three cameras and three LED lights mounted permanently on a concrete pillar of the Limfjords bridge. However, only data from a single camera has so far been annotated and published, but more will be added during 2019.
[comment]: <> (https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3385658%2Ffc82ab77ad28007ed4d5bb5aa1559d5c%2Flimfjords_setup_sketch_with_casing.png?generation=1561550170001028&alt=media">)
The setup is located 9m below surface and a single LED light has been turned on during all the recordings, which explains some slightly odd behaviors of the animals, such as the schooling of the sticklebacks directly in front of the camera.
The videos contain two objects that have been used for other research purposes, however, they have not been annotated:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3385658%2F049ce4a001752ad74f29909fed4e6f85%2Fbuoy.jpg?generation=1561548298721139&alt=media">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3385658%2Fdf66ac12dc0281c7ef6f230c183f8cc0%2Ftest_pattern.jpg?generation=1561549026868633&alt=media">
For more information about the camera setup and dataset see our paper Detection of Marine Animals in a New Underwater Dataset with Varying Visibility from the workshop on Automated Analysis of Marine Video for Environmental Monitoring CVPR2019.
89 videos are provided with annotations in the AAU Bounding Box, YOLO Darknet, and MS COCO formats. Fish are annotated in 6 different coarse categories
fish
small_fish
crab
shrimp
jellyfish
starfish
The videos are separated into folders based on their predominant labeled occurrence, but does it not mean that only that label is present in the videos.
NOTICE: Only the first 200 frames in 2019-03-19_17-07-53to2019-03-19_17-08-34_1.avi and first 100 frames in 2019-03-19_18-01-56to2019-03-19_18-02-13_1.avi are annotated. All other videos are fully annotated.
The data is split into training, validation, and test dataset follwoing a 80/10/10 split. The splits are provided in the txt files (train.txt, valid.txt, test.txt), each containing the filenames of the frames in the split.
Scripts are provided in order to convert videos into frames. The frames are extracted using ffmpeg, and the width and height of the frames are halved. Scripts in order to convert from AAU Bounding Box to MS COCO and VIAME CSV to MS COCO and AAU Bounding Box to YOLO, are also provided.
Code and scripts used for analysis of the data can be found at our bitbucket.
Please cite the following paper if you find the dataset useful:
@InProceedings{pedersen2019brackish,
title={Detection of Marine Animals in a New Underwater Dataset with Varying Visibility},
author={Pedersen, Malte and Haurum, Joakim Bruslund and Gade, Rikke and Moeslund, Thomas B. and Madsen, Niels},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
Twitterhttps://choosealicense.com/licenses/cc/https://choosealicense.com/licenses/cc/
Dataset Card for paper-parts
** The original COCO dataset is stored at dataset.tar.gz**
Dataset Summary
paper-parts
Supported Tasks and Leaderboards
object-detection: The dataset can be used to train a model for Object Detection.
Languages
English
Dataset Structure
Data Instances
A data point comprises an image and its object annotations. { 'image_id': 15, 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB… See the full description on the dataset page: https://huggingface.co/datasets/Francesco/paper-parts.