15 datasets found

h
MSR-VTT
huggingface.co
Updated Feb 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kong (2025). MSR-VTT [Dataset]. https://huggingface.co/datasets/friedrichor/MSR-VTT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 28, 2025
Authors
Kong
Description
MSRVTT contains 10K video clips and 200K captions. We adopt the standard 1K-A split protocol, which was introduced in JSFusion and has since become the de facto benchmark split in the Text-Video Retrieval field. Train:

train_7k: 7,010 videos, 140,200 captions
train_9k: 9,000 videos, 180,000 captions

Test:

test_1k: 1,000 videos, 1,000 captions

🌟 Citation

@inproceedings{xu2016msrvtt, title={Msr-vtt: A large video description dataset for bridging video and language}… See the full description on the dataset page: https://huggingface.co/datasets/friedrichor/MSR-VTT.
h
CLIP-MSR-VTT
huggingface.co
Updated Apr 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maciej Kilian (2023). CLIP-MSR-VTT [Dataset]. https://huggingface.co/datasets/iejMac/CLIP-MSR-VTT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 20, 2023
Authors
Maciej Kilian
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
iejMac/CLIP-MSR-VTT dataset hosted on Hugging Face and contributed by the HF Datasets community
O
MSR-VTT
opendatalab.com
zip
Updated Mar 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Research (2023). MSR-VTT [Dataset]. https://opendatalab.com/OpenDataLab/MSR-VTT
Explore at:
zip(463724383 bytes)Available download formats
Dataset updated
Mar 21, 2023
Dataset provided by
Microsoft Research
Description
MSR-VTT (Microsoft Research Video to Text) is a large-scale dataset for the open domain video captioning, which consists of 10,000 video clips from 20 categories, and each video clip is annotated with 20 English sentences by Amazon Mechanical Turks. There are about 29,000 unique words in all captions. The standard splits uses 6,513 clips for training, 497 clips for validation, and 2,990 clips for testing.
h
MSRVTT-CTN
huggingface.co
Updated May 27, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Narrative Bridge (2024). MSRVTT-CTN [Dataset]. http://doi.org/10.57967/hf/2477
Explore at:
Unique identifier
https://doi.org/10.57967/hf/2477
Dataset updated
May 27, 2024
Authors
Narrative Bridge
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
MSRVTT-CTN Dataset

This dataset contains CTN annotations for the MSRVTT-CTN benchmark dataset in JSON format. It has three files for the train, test, and validation splits. For project details, visit https://narrativebridge.github.io/.

Dataset Structure

Each JSON file contains a dictionary where the keys are the video IDs and the values are the corresponding Causal-Temporal Narrative (CTN) captions. The CTN captions are represented as a dictionary with two keys: "Cause"… See the full description on the dataset page: https://huggingface.co/datasets/narrativebridge/MSRVTT-CTN.
h
msr-vtt-clipped-large-embedded-test
huggingface.co
Updated Mar 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arihan Yadav (2025). msr-vtt-clipped-large-embedded-test [Dataset]. https://huggingface.co/datasets/aircrypto/msr-vtt-clipped-large-embedded-test
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2025
Authors
Arihan Yadav
Description
aircrypto/msr-vtt-clipped-large-embedded-test dataset hosted on Hugging Face and contributed by the HF Datasets community
MSVD - MSR-VTT frames
kaggle.com
Updated Jul 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
khoa Doãn (2025). MSVD - MSR-VTT frames [Dataset]. https://www.kaggle.com/datasets/cainachchuale/msvd-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 8, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
khoa Doãn
Description
Dataset

This dataset was created by khoa Doãn

Contents
MSR-VTT_features_expert
kaggle.com
Updated Jan 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ody kon (2024). MSR-VTT_features_expert [Dataset]. https://www.kaggle.com/odykon/msr-vtt-features-expert/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 21, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
ody kon
Description
Dataset

This dataset was created by ody kon

Contents
MSRVTT-captionDataset
kaggle.com
Updated Nov 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vishnutheep B (2022). MSRVTT-captionDataset [Dataset]. https://www.kaggle.com/datasets/vishnutheepb/msrvttcaptiondataset/suggestions?status=pending&yourSuggestions=true
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 30, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vishnutheep B
Description
Dataset

This dataset was created by Vishnutheep B

Contents
h
msrvtt
huggingface.co
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
morpheushoc (2024). msrvtt [Dataset]. https://huggingface.co/datasets/morpheushoc/msrvtt
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2024
Authors
morpheushoc
Description
morpheushoc/msrvtt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
msrvtt
huggingface.co
Updated Jun 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tevatron (2025). msrvtt [Dataset]. https://huggingface.co/datasets/Tevatron/msrvtt
Explore at:
Dataset updated
Jun 12, 2025
Dataset authored and provided by
Tevatron
Description
Tevatron/msrvtt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
msrvtt
huggingface.co
Updated Aug 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
minghao qin (2024). msrvtt [Dataset]. https://huggingface.co/datasets/CharmingDog/msrvtt
Explore at:
Dataset updated
Aug 13, 2024
Authors
minghao qin
Description
CharmingDog/msrvtt dataset hosted on Hugging Face and contributed by the HF Datasets community
h
MSRVTT-Personalization
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LIMinghan (2025). MSRVTT-Personalization [Dataset]. https://huggingface.co/datasets/LIMinghan/MSRVTT-Personalization
Explore at:
Dataset updated
Jun 26, 2025
Authors
LIMinghan
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
MSRVTT-Personalization

Follow instruction to get the msrvtt-personalization data.

LICENSE

See License of MSRVTT-Personalization
h
msrvtt-qa
huggingface.co
Updated Aug 28, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
morpheushoc (2024). msrvtt-qa [Dataset]. https://huggingface.co/datasets/morpheushoc/msrvtt-qa
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 28, 2024
Authors
morpheushoc
Description
morpheushoc/msrvtt-qa dataset hosted on Hugging Face and contributed by the HF Datasets community
h
Multi-Source-Video-Captioning
huggingface.co
Updated Jun 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Language Technology Lab at Alibaba DAMO Academy (2024). Multi-Source-Video-Captioning [Dataset]. https://huggingface.co/datasets/DAMO-NLP-SG/Multi-Source-Video-Captioning
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 13, 2024
Dataset authored and provided by
Language Technology Lab at Alibaba DAMO Academy
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Multi-source Video Captioning (MSVC) Dataset Card

Dataset details

Dataset type: MSVC is a set of collected video captioning data. It is constructed to ensure a robust and thorough evaluation of Video-LLMs' video-captioning capabilities. Dataset detail: MSVC is introduced to address limitations in existing video caption benchmarks, MSVC samples a total of 1,500 videos with human-annotated captions from MSVD, MSRVTT, and VATEX, ensuring diverse scenarios and domains.… See the full description on the dataset page: https://huggingface.co/datasets/DAMO-NLP-SG/Multi-Source-Video-Captioning.
O
iVQA (Instructional Video Question Answering)
opendatalab.com
zip
Updated Apr 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Czech Technical University in Prague (2023). iVQA (Instructional Video Question Answering) [Dataset]. https://opendatalab.com/OpenDataLab/iVQA
Explore at:
zipAvailable download formats
Dataset updated
Apr 1, 2023
Dataset provided by
PSL Research University
Czech Technical University in Prague
Institut national de recherche en informatique et en automatique
License
https://antoyang.github.io/just-ask.html#ivqahttps://antoyang.github.io/just-ask.html#ivqa
Description
Recent methods for visual question answering rely on large-scale annotated datasets. Manual annotation of questions and answers for videos, however, is tedious, expensive and prevents scalability. In this work, we propose to avoid manual annotation and generate a large-scale training dataset for video question answering making use of automatic cross-modal supervision. We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations. Given narrated videos, we then automatically generate the HowToVQA69M dataset with 69M video-question-answer triplets. To handle the open vocabulary of diverse answers in this dataset, we propose a training procedure based on a contrastive loss between a video-question multi-modal transformer and an answer transformer. We introduce the zero-shot VideoQA task and show excellent results, in particular for rare answers. Furthermore, we demonstrate our method to significantly outperform the state of the art on MSRVTT-QA, MSVD-QA, ActivityNet-QA and How2QA. Finally, for a detailed evaluation we introduce iVQA, a new VideoQA dataset with reduced language biases and high-quality redundant manual annotations.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Kong (2025). MSR-VTT [Dataset]. https://huggingface.co/datasets/friedrichor/MSR-VTT

MSR-VTT

friedrichor/MSR-VTT

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Feb 28, 2025

Authors

Kong

Description

MSRVTT contains 10K video clips and 200K captions. We adopt the standard 1K-A split protocol, which was introduced in JSFusion and has since become the de facto benchmark split in the Text-Video Retrieval field. Train:

train_7k: 7,010 videos, 140,200 captions
train_9k: 9,000 videos, 180,000 captions

Test:

test_1k: 1,000 videos, 1,000 captions

  🌟 Citation

@inproceedings{xu2016msrvtt, title={Msr-vtt: A large video description dataset for bridging video and language}… See the full description on the dataset page: https://huggingface.co/datasets/friedrichor/MSR-VTT.

Clear search

Close search

Google apps

Main menu

MSR-VTT

CLIP-MSR-VTT

MSR-VTT

MSRVTT-CTN

msr-vtt-clipped-large-embedded-test

MSVD - MSR-VTT frames

Dataset

Contents

MSR-VTT_features_expert

Dataset

Contents

MSRVTT-captionDataset

Dataset

Contents

msrvtt

msrvtt

msrvtt

MSRVTT-Personalization

msrvtt-qa

Multi-Source-Video-Captioning

iVQA (Instructional Video Question Answering)

MSR-VTTSee More Versions

friedrichor/MSR-VTT

MSR-VTT