15 datasets found
  1. h

    MSR-VTT

    • huggingface.co
    Updated Feb 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kong (2025). MSR-VTT [Dataset]. https://huggingface.co/datasets/friedrichor/MSR-VTT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 28, 2025
    Authors
    Kong
    Description

    MSRVTT contains 10K video clips and 200K captions. We adopt the standard 1K-A split protocol, which was introduced in JSFusion and has since become the de facto benchmark split in the Text-Video Retrieval field. Train:

    train_7k: 7,010 videos, 140,200 captions
    train_9k: 9,000 videos, 180,000 captions

    Test:

    test_1k: 1,000 videos, 1,000 captions

      🌟 Citation
    

    @inproceedings{xu2016msrvtt, title={Msr-vtt: A large video description dataset for bridging video and language}… See the full description on the dataset page: https://huggingface.co/datasets/friedrichor/MSR-VTT.

  2. h

    CLIP-MSR-VTT

    • huggingface.co
    Updated Apr 20, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maciej Kilian (2023). CLIP-MSR-VTT [Dataset]. https://huggingface.co/datasets/iejMac/CLIP-MSR-VTT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 20, 2023
    Authors
    Maciej Kilian
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    iejMac/CLIP-MSR-VTT dataset hosted on Hugging Face and contributed by the HF Datasets community

  3. O

    MSR-VTT

    • opendatalab.com
    zip
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Research (2023). MSR-VTT [Dataset]. https://opendatalab.com/OpenDataLab/MSR-VTT
    Explore at:
    zip(463724383 bytes)Available download formats
    Dataset updated
    Mar 21, 2023
    Dataset provided by
    Microsoft Research
    Description

    MSR-VTT (Microsoft Research Video to Text) is a large-scale dataset for the open domain video captioning, which consists of 10,000 video clips from 20 categories, and each video clip is annotated with 20 English sentences by Amazon Mechanical Turks. There are about 29,000 unique words in all captions. The standard splits uses 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing.

  4. h

    MSRVTT-CTN

    • huggingface.co
    Updated May 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Narrative Bridge (2024). MSRVTT-CTN [Dataset]. http://doi.org/10.57967/hf/2477
    Explore at:
    Dataset updated
    May 27, 2024
    Authors
    Narrative Bridge
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    MSRVTT-CTN Dataset

    This dataset contains CTN annotations for the MSRVTT-CTN benchmark dataset in JSON format. It has three files for the train, test, and validation splits. For project details, visit https://narrativebridge.github.io/.

      Dataset Structure
    

    Each JSON file contains a dictionary where the keys are the video IDs and the values are the corresponding Causal-Temporal Narrative (CTN) captions. The CTN captions are represented as a dictionary with two keys: "Cause"… See the full description on the dataset page: https://huggingface.co/datasets/narrativebridge/MSRVTT-CTN.

  5. h

    msr-vtt-clipped-large-embedded-test

    • huggingface.co
    Updated Mar 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arihan Yadav (2025). msr-vtt-clipped-large-embedded-test [Dataset]. https://huggingface.co/datasets/aircrypto/msr-vtt-clipped-large-embedded-test
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 15, 2025
    Authors
    Arihan Yadav
    Description

    aircrypto/msr-vtt-clipped-large-embedded-test dataset hosted on Hugging Face and contributed by the HF Datasets community

  6. MSVD - MSR-VTT frames

    • kaggle.com
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    khoa Doãn (2025). MSVD - MSR-VTT frames [Dataset]. https://www.kaggle.com/datasets/cainachchuale/msvd-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 8, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    khoa Doãn
    Description

    Dataset

    This dataset was created by khoa Doãn

    Contents

  7. MSR-VTT_features_expert

    • kaggle.com
    Updated Jan 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ody kon (2024). MSR-VTT_features_expert [Dataset]. https://www.kaggle.com/odykon/msr-vtt-features-expert/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 21, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    ody kon
    Description

    Dataset

    This dataset was created by ody kon

    Contents

  8. MSRVTT-captionDataset

    • kaggle.com
    Updated Nov 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Vishnutheep B (2022). MSRVTT-captionDataset [Dataset]. https://www.kaggle.com/datasets/vishnutheepb/msrvttcaptiondataset/suggestions?status=pending&yourSuggestions=true
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 30, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Vishnutheep B
    Description

    Dataset

    This dataset was created by Vishnutheep B

    Contents

  9. h

    msrvtt

    • huggingface.co
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    morpheushoc (2024). msrvtt [Dataset]. https://huggingface.co/datasets/morpheushoc/msrvtt
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2024
    Authors
    morpheushoc
    Description

    morpheushoc/msrvtt dataset hosted on Hugging Face and contributed by the HF Datasets community

  10. h

    msrvtt

    • huggingface.co
    Updated Jun 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tevatron (2025). msrvtt [Dataset]. https://huggingface.co/datasets/Tevatron/msrvtt
    Explore at:
    Dataset updated
    Jun 12, 2025
    Dataset authored and provided by
    Tevatron
    Description

    Tevatron/msrvtt dataset hosted on Hugging Face and contributed by the HF Datasets community

  11. h

    msrvtt

    • huggingface.co
    Updated Aug 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    minghao qin (2024). msrvtt [Dataset]. https://huggingface.co/datasets/CharmingDog/msrvtt
    Explore at:
    Dataset updated
    Aug 13, 2024
    Authors
    minghao qin
    Description

    CharmingDog/msrvtt dataset hosted on Hugging Face and contributed by the HF Datasets community

  12. h

    MSRVTT-Personalization

    • huggingface.co
    Updated Jun 26, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    LIMinghan (2025). MSRVTT-Personalization [Dataset]. https://huggingface.co/datasets/LIMinghan/MSRVTT-Personalization
    Explore at:
    Dataset updated
    Jun 26, 2025
    Authors
    LIMinghan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    MSRVTT-Personalization

    Follow instruction to get the msrvtt-personalization data.

      LICENSE
    

    See License of MSRVTT-Personalization

  13. h

    msrvtt-qa

    • huggingface.co
    Updated Aug 28, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    morpheushoc (2024). msrvtt-qa [Dataset]. https://huggingface.co/datasets/morpheushoc/msrvtt-qa
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 28, 2024
    Authors
    morpheushoc
    Description

    morpheushoc/msrvtt-qa dataset hosted on Hugging Face and contributed by the HF Datasets community

  14. h

    Multi-Source-Video-Captioning

    • huggingface.co
    Updated Jun 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Language Technology Lab at Alibaba DAMO Academy (2024). Multi-Source-Video-Captioning [Dataset]. https://huggingface.co/datasets/DAMO-NLP-SG/Multi-Source-Video-Captioning
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 13, 2024
    Dataset authored and provided by
    Language Technology Lab at Alibaba DAMO Academy
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Multi-source Video Captioning (MSVC) Dataset Card

      Dataset details
    

    Dataset type: MSVC is a set of collected video captioning data. It is constructed to ensure a robust and thorough evaluation of Video-LLMs' video-captioning capabilities. Dataset detail: MSVC is introduced to address limitations in existing video caption benchmarks, MSVC samples a total of 1,500 videos with human-annotated captions from MSVD, MSRVTT, and VATEX, ensuring diverse scenarios and domains.… See the full description on the dataset page: https://huggingface.co/datasets/DAMO-NLP-SG/Multi-Source-Video-Captioning.

  15. O

    iVQA (Instructional Video Question Answering)

    • opendatalab.com
    zip
    Updated Apr 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Czech Technical University in Prague (2023). iVQA (Instructional Video Question Answering) [Dataset]. https://opendatalab.com/OpenDataLab/iVQA
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 1, 2023
    Dataset provided by
    PSL Research University
    Czech Technical University in Prague
    Institut national de recherche en informatique et en automatique
    License

    https://antoyang.github.io/just-ask.html#ivqahttps://antoyang.github.io/just-ask.html#ivqa

    Description

    Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and prevents scalability. In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision. We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations. Given narrated videos, we then automatically generate the HowToVQA69M dataset with 69M video-question-answer triplets. To handle the open vocabulary of diverse answers in this dataset, we propose a training procedure based on a contrastive loss between a video-question multi-modal transformer and an answer transformer. We introduce the zero-shot VideoQA task and show excellent results, in particular for rare answers. Furthermore, we demonstrate our method to significantly outperform the state of the art on MSRVTT-QA, MSVD-QA, ActivityNet-QA and How2QA. Finally, for a detailed evaluation we introduce iVQA, a new VideoQA dataset with reduced language biases and high-quality redundant manual annotations.

  16. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Kong (2025). MSR-VTT [Dataset]. https://huggingface.co/datasets/friedrichor/MSR-VTT

MSR-VTT

friedrichor/MSR-VTT

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2025
Authors
Kong
Description

MSRVTT contains 10K video clips and 200K captions. We adopt the standard 1K-A split protocol, which was introduced in JSFusion and has since become the de facto benchmark split in the Text-Video Retrieval field. Train:

train_7k: 7,010 videos, 140,200 captions
train_9k: 9,000 videos, 180,000 captions

Test:

test_1k: 1,000 videos, 1,000 captions

  🌟 Citation

@inproceedings{xu2016msrvtt, title={Msr-vtt: A large video description dataset for bridging video and language}… See the full description on the dataset page: https://huggingface.co/datasets/friedrichor/MSR-VTT.

Search
Clear search
Close search
Google apps
Main menu