11 datasets found
  1. h

    Long-RVOS

    • huggingface.co
    Updated Nov 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    iSEE-Laboratory (2025). Long-RVOS [Dataset]. https://huggingface.co/datasets/iSEE-Laboratory/Long-RVOS
    Explore at:
    Dataset updated
    Nov 2, 2025
    Dataset authored and provided by
    iSEE-Laboratory
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

    GitHub Repository: https://github.com/iSEE-Laboratory/Long_RVOS Project Page: https://isee-laboratory.github.io/Long-RVOS/ Paper: arXiv:2505.12702

      Dataset Description
    
    
    
    
    
    
    
      Dataset Summary
    

    Long-RVOS is the first large-scale long-term referring video object segmentation benchmark, containing 2,000+ videos with an average duration exceeding 60 seconds. The dataset addresses… See the full description on the dataset page: https://huggingface.co/datasets/iSEE-Laboratory/Long-RVOS.

  2. h

    MOVE

    • huggingface.co
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FudanCVL (2025). MOVE [Dataset]. https://huggingface.co/datasets/FudanCVL/MOVE
    Explore at:
    Dataset updated
    Sep 28, 2025
    Dataset authored and provided by
    FudanCVL
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    MOVE: Motion-Guided Few-Shot Video Object Segmentation

    🏠 Homepage | 📄 Paper | 🔗 GitHub

      Abstract
    

    This work addresses motion-guided few-shot video object segmentation (FSVOS), which aims to segment dynamic objects in videos based on a few annotated examples with the same motion patterns. Existing FSVOS datasets and methods typically focus on object categories, which are static attributes that ignore the rich temporal dynamics in videos, limiting their application in… See the full description on the dataset page: https://huggingface.co/datasets/FudanCVL/MOVE.

  3. Multiple Lego Tracking Dataset

    • kaggle.com
    zip
    Updated Dec 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bahruz Huseynov (2024). Multiple Lego Tracking Dataset [Dataset]. https://www.kaggle.com/datasets/hbahruz/multiple-lego-tracking-dataset
    Explore at:
    zip(1816930688 bytes)Available download formats
    Dataset updated
    Dec 27, 2024
    Authors
    Bahruz Huseynov
    Description

    The dataset (Lego_Tracking folder) has been created manually by recording 12 videos by smartphone. Of these, 10 were designated for training and 2 for testing. The videos showcase conveyor belts transporting LEGO bricks, captured from various perspectives (top, front, and diagonal) to provide diverse viewpoints. The videos have been recorded in the AI laboratory of the Eötvös Loránd University (https://github.com/BahruzHuseynov/Object-Tracking-AI_Lab) and the dataset has been used to make a research about detection-segmentation-tracking pipeline to complete the AI laboratory work. The dataset includes videos of differing complexities, classified as "Overlapping," "Normal," or "Simple," with varying durations ranging from short to long shots. Additionally, the annotation of LEGO bricks was performed frame-by-frame using the RoboFlow web application (https://roboflow.com/).

    In addition, you can go to the dataset which has been prepared by applying systematic sampling on training videos to train and validate YOLOv8 and RT-DETR models from ultralytics: https://www.kaggle.com/datasets/hbahruz/multiple-lego/data

    Another dataset prepared can be used for the semantic segmentation: https://www.kaggle.com/datasets/hbahruz/lego-semantic-segmentation/

    TestViewComplexityOnly LegoFrames per secondApproxiomate duration (seconds)Num. frames
    Video 1TopNormal+20681401
    Video 2DiagonalNormal+25571444
    TrainingViewComplexityOnly LegoFrames per secondApproxiomate duration (seconds)Num. frames
    Video 1TopOverlapping+1619300
    Video 2FrontOverlapping+1316196
    Video 3DiagonalNormal+20561136
    Video 4TopOverlapping+2042839
    Video 5DiagonalOverlapping+2114839
    Video 6TopNormal-20501000
    Video 7TopSimple+1520303
    Video 8DiagonalNormal+1313277
    Video 9TopNormal+1928537
    Video 10FrontNormal-20581162
  4. h

    MOSEv2

    • huggingface.co
    Updated Sep 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    FudanCVL (2025). MOSEv2 [Dataset]. https://huggingface.co/datasets/FudanCVL/MOSEv2
    Explore at:
    Dataset updated
    Sep 28, 2025
    Dataset authored and provided by
    FudanCVL
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    MOSEv2: A More Challenging Dataset for Video Object Segmentation in Complex Scenes

    🔥 Evaluation Server | 🏠 Homepage | 📄 Paper | 🔗 GitHub

      Download
    

    We recommend using huggingface-cli to download: pip install -U "huggingface_hub[cli]" huggingface-cli download FudanCVL/MOSEv2 --repo-type dataset --local-dir ./MOSEv2 --local-dir-use-symlinks False --max-workers 16

      Dataset Summary
    

    MOSEv2 is a comprehensive video object segmentation dataset designed to advance… See the full description on the dataset page: https://huggingface.co/datasets/FudanCVL/MOSEv2.

  5. mmtracking

    • kaggle.com
    zip
    Updated Feb 8, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Araik Tamazian (2022). mmtracking [Dataset]. https://www.kaggle.com/atamazian/mmtracking
    Explore at:
    zip(3454866 bytes)Available download formats
    Dataset updated
    Feb 8, 2022
    Authors
    Araik Tamazian
    Description

    Description

    This datasets contains mmtracking package and its dependencies for an offline installation. Use mmdetection dataset to install mmdetection (it's also required by mmtracking).

    What's MMTracking?

    OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

    Source

    https://github.com/open-mmlab/mmtracking

    Documentation

    https://mmtracking.readthedocs.io/

  6. f

    Data from: Hierarchical Deep Learning Framework for Automated Marine...

    • figshare.com
    bin
    Updated Dec 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bjørn Christian Weinbach (2024). Hierarchical Deep Learning Framework for Automated Marine Vegetation and Fauna Analysis Using ROV Video Data [Dataset]. http://doi.org/10.6084/m9.figshare.25688718.v4
    Explore at:
    binAvailable download formats
    Dataset updated
    Dec 9, 2024
    Dataset provided by
    figshare
    Authors
    Bjørn Christian Weinbach
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Experimental data for the paper "Hierarchical Deep Learning Framework for Automated Marine Vegetation and Fauna Analysis Using ROV Video Data."This dataset supports the study "Hierarchical Deep Learning Framework for Automated Marine Vegetation and Fauna Analysis Using ROV Video Data" by providing resources essential for reproducing and validating the research findings.Dataset Contents and Structure:Hierarchical Model Weights: - .pth files containing trained weights for all alpha regularization values used in hierarchical classification models.MaskRCNN-Segmented Objects: - .jpg files representing segmented objects detected by the MaskRCNN model. - Accompanied by maskrcnn-segmented-objects-dataset.parquet, which includes metadata and classifications: - Columns:masked_image: Path to the segmented image file.confidence: Confidence score for the prediction.predicted_species: Predicted species label.species: True species label.MaskRCNN Weights: - Trained MaskRCNN model weights, including hierarchical CNN models integrated with MaskRCNN in the processing pipeline.Pre-Trained Models:.pt files for all object detectors trained on the Esefjorden Marine Vegetation Segmentation Dataset (EMVSD) in YOLO txt format.Segmented Object Outputs: - Segmentation outputs and datasets for the following models: - RT-DETR: - Segmented objects: rtdetr-segmented-objects/ - Dataset: rtdetr-segmented-objects-dataset.parquet - YOLO-SAG: - Segmented objects: yolosag-segmented-objects/ - Dataset: yolosag-segmented-objects-dataset.parquet - YOLOv11: - Segmented objects: yolov11-segmented-objects/ - Dataset: yolov11-segmented-objects-dataset.parquet - YOLOv8: - Segmented objects: yolov8-segmented-objects/ - Dataset: yolov8-segmented-objects-dataset.parquet - YOLOv9: - Segmented objects: yolov9-segmented-objects/ - Dataset: yolov9-segmented-objects-dataset.parquetUsage Instructions:1. Download and extract the dataset.2. Utilize the Python scripts provided in the associated GitHub repository for evaluation and inference: https://github.com/Ci2Lab/FjordVisionReproducibility:The dataset includes pre-trained weights, segmentation outputs, and experimental results to facilitate reproducibility. The .parquet files and segmented object directories follow a standardized format to ensure consistency.Licensing:This dataset is released under the CC-BY 4.0 license, permitting reuse with proper attribution.Related Materials:- GitHub Repository: https://github.com/Ci2Lab/FjordVision

  7. h

    SeCVOS

    • huggingface.co
    Updated Jul 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IXCLab@Shanghai AI Lab (2025). SeCVOS [Dataset]. https://huggingface.co/datasets/OpenIXCLab/SeCVOS
    Explore at:
    Dataset updated
    Jul 19, 2025
    Dataset authored and provided by
    IXCLab@Shanghai AI Lab
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

    [📂 GitHub] [📦 Model] [🌐 Homepage] [📄 Paper]

      Highlights
    

    🔥We introduce Segment Concept (SeC), a concept-driven segmentation framework for video object segmentation that integrates Large Vision-Language Models (LVLMs) for robust, object-centric representations. 🔥SeC dynamically balances semantic reasoning with feature matching, adaptively adjusting computational efforts based on… See the full description on the dataset page: https://huggingface.co/datasets/OpenIXCLab/SeCVOS.

  8. R

    EgoHands Object Detection Dataset - specific

    • public.roboflow.com
    zip
    Updated Apr 22, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IU Computer Vision Lab (2022). EgoHands Object Detection Dataset - specific [Dataset]. https://public.roboflow.com/object-detection/hands/1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 22, 2022
    Dataset authored and provided by
    IU Computer Vision Lab
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Variables measured
    Bounding Boxes of hands
    Description

    https://i.imgur.com/eEWi4PT.png" alt="EgoHands Dataset">

    About this dataset

    The EgoHands dataset is a collection of 4800 annotated images of human hands from a first-person view originally collected and labeled by Sven Bambach, Stefan Lee, David Crandall, and Chen Yu of Indiana University.

    The dataset was captured via frames extracted from video recorded through head-mounted cameras on a Google Glass headset while peforming four activities: building a puzzle, playing chess, playing Jenga, and playing cards. There are 100 labeled frames for each of 48 video clips.

    Our modifications

    The original EgoHands dataset was labeled with polygons for segmentation and released in a Matlab binary format. We converted it to an object detection dataset using a modified version of this script from @molyswu and have archived it in many popular formats for use with your computer vision models.

    After converting to bounding boxes for object detection, we noticed that there were several dozen unlabeled hands. We added these by hand and improved several hundred of the other labels that did not fully encompass the hands (usually to include omitted fingertips, knuckles, or thumbs). In total, 344 images' annotations were edited manually.

    We chose a new random train/test split of 80% training, 10% validation, and 10% testing. Notably, this is not the same split as in the original EgoHands paper.

    There are two versions of the converted dataset available: * specific is labeled with four classes: myleft, myright, yourleft, yourright representing which hand of which person (the viewer or the opponent across the table) is contained in the bounding box. * generic contains the same boxes but with a single hand class.

    Using this dataset

    The authors have graciously allowed Roboflow to re-host this derivative dataset. It is released under a Creative Commons by Attribution 4.0 license. You may use it for academic or commercial purposes but must cite the original paper.

    Please use the following Bibtext: @inproceedings{egohands2015iccv, title = {Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions}, author = {Sven Bambach and Stefan Lee and David Crandall and Chen Yu}, booktitle = {IEEE International Conference on Computer Vision (ICCV)}, year = {2015} }

  9. h

    M3_VOS

    • huggingface.co
    Updated Aug 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SII-Jiaxin Li (2025). M3_VOS [Dataset]. https://huggingface.co/datasets/Lijiaxin0111/M3_VOS
    Explore at:
    Dataset updated
    Aug 8, 2025
    Authors
    SII-Jiaxin Li
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    [CVPR 2025] M3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation

    If you like our project, please give us a star ⭐ on GitHub for the latest update.

      💡 Description
    

    Venue: CVPR2025 Repository: 🛠️Tool, 🏠Page Paper: arxiv.org/html/2412.13803v2 Point of Contact: Jiaxin Li , Zixuan Chen

      📁 Structure
    

    This dataset contains annotated videos and images for object segmentation tasks with phase transition information. The directory… See the full description on the dataset page: https://huggingface.co/datasets/Lijiaxin0111/M3_VOS.

  10. VISIONE Feature Repository for VBS: Multi-Modal Features and Detected...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    application/gzip
    Updated Feb 12, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo (2024). VISIONE Feature Repository for VBS: Multi-Modal Features and Detected Objects from V3C1+V3C2 Dataset [Dataset]. http://doi.org/10.5281/zenodo.8188570
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Feb 12, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Giuseppe Amato; Giuseppe Amato; Paolo Bolettieri; Paolo Bolettieri; Fabio Carrara; Fabio Carrara; Fabrizio Falchi; Fabrizio Falchi; Claudio Gennaro; Claudio Gennaro; Nicola Messina; Nicola Messina; Lucia Vadicamo; Lucia Vadicamo; Claudio Vairo; Claudio Vairo
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains a diverse set of features extracted from the V3C1+V3C2 dataset, sourced from the Vimeo Creative Commons Collection. These features were utilized in the VISIONE system [Amato et al. 2023, Amato et al. 2022] during the latest editions of the Video Browser Showdown (VBS) competition (https://www.videobrowsershowdown.org/).

    The original V3C1+V3C2 dataset, provided by NIST, can be downloaded using the instructions provided at https://videobrowsershowdown.org/about-vbs/existing-data-and-tools/.

    It comprises 7,235 video files, amounting for 2,300h of video content and encompassing 2,508,113 predefined video segments.

    We subdivided the predefined video segments longer than 10 seconds into multiple segments, with each segment spanning no longer than 16 seconds. As a result, we obtained a total of 2,648,219 segments. For each segment, we extracted one frame, specifically the middle one, and computed several features, which are described in detail below.

    This repository is released under a Creative Commons Attribution license. If you use it in any form for your work, please cite the following paper:

    @inproceedings{amato2023visione, 
    title={VISIONE at Video Browser Showdown 2023}, 
    author={Amato, Giuseppe and Bolettieri, Paolo and Carrara, Fabio and Falchi, Fabrizio and Gennaro, Claudio and Messina, Nicola and Vadicamo, Lucia and Vairo, Claudio}, 
    booktitle={International Conference on Multimedia Modeling}, 
    pages={615--621}, 
    year={2023}, 
    organization={Springer} 
    }

    This repository comprises the following files:

    • msb.tar.gz contains tab-separated files (.tsv) for each video. Each tsv file reports, for each video segment, the timestamp and frame number marking the start/end of the video segment, along with the timestamp of the extracted middle frame and the associated identifier ("id_visione").
    • extract-keyframes-from-msb.tar.gz contains a Python script designed to extract the middle frame of each video segment from the MSB files. To run the script successfully, please ensure that you have the original V3C videos available.
    • features-aladin.tar.gz contains ALADIN [Messina N. et al. 2022] features extracted for all the segment's middle frames.
    • features-clip-laion.tar.gz contains CLIP ViT-H/14 - LAION-2B [Schuhmann et al. 2022] features extracted for all the segment's middle frames.
    • features-clip-openai.tar.gz contains CLIP ViT-L/14 [Radford et al. 2021] features extracted for all the segment's middle frames.
    • features-clip2video.tar.gz contains CLIP2Video [Fang H. et al. 2021] extracted for all the video segments. In particular 1) we concatenate consecutive short segments so to create segments at least 3 seconds long; 2) we downsample the obtained segments to 2.5 fps; 3) we feed the network with the first min(36, n) frames, where n is the number of frames of the segment. Notice that the minimum processed length consists of 7 frames, given that the segment is no shorter than 3s.
    • objects-frcnn-oiv4.tar.gz* contains the objects detected using Faster R-CNN+Inception ResNet (trained on the Open Images V4 [Kuznetsova et al. 2020]).
    • objects-mrcnn-lvis.tar.gz* contains the objects detected using Mask R-CNN [He et al. 2017] (trained on LVIS).
    • objects-vfnet64-coco.tar.gz* contains the objects detected using VfNet [Zhang et al. 2021] (trained on COCO dataset).

    *Please be sure to use the v2 version of this repository, since v1 feature files may contain inconsistencies that have now been corrected

    *Note on the object annotations: Within an object archive, there is a jsonl file for each video, where each row contains a record of a video segment (the "_id" corresponds to the "id_visione" used in the msb.tar.gz) . Additionally, there are three arrays representing the objects detected, the corresponding scores, and the bounding boxes. The format of these arrays is as follows:

    • "object_class_names": vector with the class name of each detected object.
    • "object_scores": scores corresponding to each detected object.
    • "object_boxes_yxyx": bounding boxes of the detected objects in the format (ymin, xmin, ymax, xmax).

    Note on the cross-modal features: The extracted multi-modal features (ALADIN, CLIPs, CLIP2Video) enable internal searches within the V3C1+V3C2 dataset using the query-by-image approach (features can be compared with the dot product). However, to perform searches based on free text, the text needs to be transformed into the joint embedding space according to the specific network being used. Please be aware that the service for transforming text into features is not provided within this repository and should be developed independently using the original feature repositories linked above.

    We have plans to release the code in the future, allowing the reproduction of the VISIONE system, including the instantiation of all the services to transform text into cross-modal features. However, this work is still in progress, and the code is not currently available.

    References:

    [Amato et al. 2023] Amato, G.et al., 2023, January. VISIONE at Video Browser Showdown 2023. In International Conference on Multimedia Modeling (pp. 615-621). Cham: Springer International Publishing.

    [Amato et al. 2022] Amato, G. et al. (2022). VISIONE at Video Browser Showdown 2022. In: , et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13142. Springer, Cham.

    [Fang H. et al. 2021] Fang H. et al., 2021. Clip2video: Mastering video-text retrieval via image clip. arXiv preprint arXiv:2106.11097.

    [He et al. 2017] He, K., Gkioxari, G., Dollár, P. and Girshick, R., 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961-2969).

    [Kuznetsova et al. 2020] Kuznetsova, A., Rom, H., Alldrin, N., Uijlings, J., Krasin, I., Pont-Tuset, J., Kamali, S., Popov, S., Malloci, M., Kolesnikov, A. and Duerig, T., 2020. The open images dataset v4. International Journal of Computer Vision, 128(7), pp.1956-1981.

    [Lin et al. 2014] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P. and Zitnick, C.L., 2014, September. Microsoft coco: Common objects in context. In European conference on computer vision (pp. 740-755). Springer, Cham.

    [Messina et al. 2022] Messina N. et al., 2022, September. Aladin: distilling fine-grained alignment scores for efficient image-text matching and retrieval. In Proceedings of the 19th International Conference on Content-based Multimedia Indexing (pp. 64-70).

    [Radford et al. 2021] Radford A. et al., 2021, July. Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748-8763). PMLR.

    [Schuhmann et al. 2022] Schuhmann C. et al., 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35, pp.25278-25294.

    [Zhang et al. 2021] Zhang, H., Wang, Y., Dayoub, F. and Sunderhauf, N., 2021. Varifocalnet: An iou-aware dense object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8514-8523).

  11. CamSeq 2007 (Semantic Segmentation)

    • kaggle.com
    Updated May 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlo Lepelaars (2020). CamSeq 2007 (Semantic Segmentation) [Dataset]. https://www.kaggle.com/carlolepelaars/camseq-semantic-segmentation/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Carlo Lepelaars
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    N.B. The owner of this Dataset is The University of Cambridge (2007).

    I'm not in any way affiliated with The University of Cambridge. I just thought it would be nice for people to have this dataset available on Kaggle.

    Context

    CamSeq01 is a groundtruth dataset that can be freely used for research work in object recognition in video.

    Content

    This dataset contains 101 960x720 pixel image pairs. Every mask is designated by an "_L" addition to the file name. All images (original and groundtruth) are in uncompressed 24-bit color PNG format.

    Citation

    Julien Fauqueur, Gabriel Brostow, Roberto Cipolla, Assisted Video Object Labeling By Joint Tracking of Regions and Keypoints, IEEE International Conference on Computer Vision (ICCV'2007) Interactive Computer Vision Workshop. Rio de Janeiro, Brazil, October 2007

    Acknowledgements

    This work has been carried out with the support of Toyota Motor Europe.

    The original dataset can be found here: http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamSeq01

    Source / Contact

    http://mi.eng.cam.ac.uk/research/projects/VideoRec/CamSeq01

    Image Source

    https://sthalles.github.io/deep_segmentation_network

  12. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
iSEE-Laboratory (2025). Long-RVOS [Dataset]. https://huggingface.co/datasets/iSEE-Laboratory/Long-RVOS

Long-RVOS

iSEE-Laboratory/Long-RVOS

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
Dataset updated
Nov 2, 2025
Dataset authored and provided by
iSEE-Laboratory
License

Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically

Description

Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation

GitHub Repository: https://github.com/iSEE-Laboratory/Long_RVOS Project Page: https://isee-laboratory.github.io/Long-RVOS/ Paper: arXiv:2505.12702

  Dataset Description







  Dataset Summary

Long-RVOS is the first large-scale long-term referring video object segmentation benchmark, containing 2,000+ videos with an average duration exceeding 60 seconds. The dataset addresses… See the full description on the dataset page: https://huggingface.co/datasets/iSEE-Laboratory/Long-RVOS.

Search
Clear search
Close search
Google apps
Main menu