## Overview
Ball Video Segmentation is a dataset for object detection tasks - it contains Ball Pitch Net annotations for 498 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
The Densely Annotation Video Segmentation dataset (DAVIS) is a high quality and high resolution densely annotated video segmentation dataset under two resolutions, 480p and 1080p. There are 50 video sequences with 3455 densely annotated frames in pixel level. 30 videos with 2079 frames are for training and 20 videos with 1376 frames are for validation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Semi-supervised video object segmentation aims to leverage the ground truth object masks given for the first frame to segment video sequences at the pixel level. OVOS is a dataset to evaluate the performance of video object segmentation under occlusions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is flood data in the city of Parepare, South Sulawesi Province, which contains video data collected from social media Instagram. This dataset was created to develop deep learning methods for recognizing floods and surrounding objects, specializing in semantic segmentation methods. This dataset consists of three folders, namely raw video data collected from Instagram, image data resulting from splitting the video into several images, and annotation data containing images that have been color-labeled according to their objects. There are 6 object classifications based on color labels, namely: floods (blue light), buildings (red), plants (green), people (sage), vehicles (orange), and sky (dark blue). This dataset has data in image (JPEG/PNG) and video (MP4) formats. This dataset is suitable for object recognition tasks with the semantic segmentation method. In addition, because this dataset contains original data in the form of videos and images, it can be developed for other purposes in the future. As a note, if you intend to use this dataset, please ensure that you comply with applicable copyright, privacy, and regulatory requirements. If you intend to read the paper about this dataset, please visit this link: https://doi.org/10.1016/j.dib.2023.109768
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
We propose a new benchmark called Human Video Instance Segmentation (HVIS), which focuses on complex real-world scenarios with sufficient human instance masks and identities. Our dataset contains 805 videos with 1447 detailedly annotated human instances. It also includes various overlapping scenes, which integrates into the most challenging video dataset related to humans.
Youtube-vis is a video instance segmentation dataset. It contains 2,883 high-resolution YouTube videos, a per-pixel category label set including 40 common objects such as person, animals and vehicles, 4,883 unique video instances, and 131k high-quality manual annotations.
The YouTube-VIS dataset is split into 2,238 training videos, 302 validation videos and 343 test videos.
No files were removed or altered during preprocessing.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('youtube_vis', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
Panoramic Video Panoptic Segmentation Dataset is a large-scale dataset that offers high-quality panoptic segmentation labels for autonomous driving. The dataset has labels for 28 semantic categories and 2,860 temporal sequences that were captured by five cameras mounted on autonomous vehicles driving in three different geographical locations, leading to a total of 100k labeled camera images.
Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. The training and validation videos have pixel-level ground truth annotations for every 5th frame (6 fps). It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.
A semi-supervised video object segmentation dataset containing long videos. Released with NeurIPS 2020 paper "Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement".
We randomly selected three videos from the Internet, that are longer than 1.5K frames and have its main objects continuously appearing. Each video has 20 uniformly sampled frames manually annotated for evaluation. - blueboy: 2406 frames. - dressage 3589 frames. - rat: 1416 frames.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Is Taws Nem sau cov yeeb yaj kiab video nrog qhov nruab nrab ntev nyob ib puag ncig 10s, thiab kev daws teeb meem ntau dua 1920 x 1080.
This dataset is designed for water segmentation in images and videos. We follow the format of the DAVIS dataset, which has been widely adopted in video object segmentation (VOS) benchmarks.
Specifically, JPEGImages contains water-related frames, and Annotations contains the corresponding groundtruth masks. train.txt lists the names of the training sets, and val.txt lists the names of the evaluation videos.
We have two versions here: water_v1 and water_v2. We recommend you use the newest one, water_v2.
@article{liang2020waternet, title={WaterNet: An adaptive matching pipeline for segmenting water with volatile appearance}, author={Liang, Yongqing and Jafari, Navid and Luo, Xing and Chen, Qin and Cao, Yanpeng and Li, Xin}, journal={Computational Visual Media}, pages={1--14}, year={2020}, publisher={Springer} }
VISOR is a dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, and it contains 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, and 67K hand-object relations, covering 36 hours of 179 untrimmed videos.
The YouTube-VOS dataset is a sequence-to-sequence video object segmentation dataset, with 1000 videos and 1000 frames per video.
The 2018 davis challenge on video object segmentation
MeViS is a large-scale dataset for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. The dataset contains numerous motion expressions to indicate target objects in complex environments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A subset of video frames captured by the SOCRATES stereo camera trap in a wildlife park in Bonn, Germany between February and July of 2022, with corresponding instance segmentation annotations in the COCO format.
YouTubeVIS is a new dataset tailored for tasks like simultaneous detection, segmentation and tracking of object instances in videos and is collected based on the current largest video object segmentation dataset YouTubeVOS.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository includes supplementary data supporting our scoping review on deep learning for surgical video segmentation and object detection.
model parameter
Occluded video instance segmentation requires consistently segmenting and tracking objects over time. Due to the quadratic dependency on input size, directly applying self-attention to occluded video instance segmentation with high-resolution input features poses significant challenges, often leading to insufficient GPU memory capacity.
## Overview
Ball Video Segmentation is a dataset for object detection tasks - it contains Ball Pitch Net annotations for 498 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.