WebVid contains 10 million video clips with captions, sourced from the web. The videos are diverse and rich in their content.
Both the full 10M set and a 2.5M subset is available for download: https://github.com/m-bain/webvid-dataset
The WebVid-CoVR dataset is a collection of video-text-video triplets that can be used for the task of composed video retrieval (CoVR). CoVR is a task that involves searching for videos that match both a query image and a query text. The text typically specifies the desired modification to the query image.
The WebVid-CoVR dataset is automatically generated from web-scraped video-caption pairs, using a language model to generate the modification text. The dataset contains 1.6 million triplets, with diverse content and variations. The dataset also includes a manually annotated test set of 2.5K triplets, which can be used to evaluate CoVR models.
WebVid 10M Classified (100k)
Each description from the WebVid 10M dataset is passed through Llama 3.3 70B to classify the description as either action or no_action.
If it is classified as an action, it'll be rewritten in a clearer way. Otherwise, the rewritten description will be none.
qingy2024/webvid-10M-pro-scored dataset hosted on Hugging Face and contributed by the HF Datasets community
A dataset automatically generated using question generation neural models and alt-text video captions from the WebVid dataset, with 3M video-question-answer triplets.
3it/TransVerse-webvid-v1 dataset hosted on Hugging Face and contributed by the HF Datasets community
qingy2024/webvid-mini-100k-scored dataset hosted on Hugging Face and contributed by the HF Datasets community
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The videos of Webvid-2M could refer to this issue.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
luoruipu1/Valley-webvid2M-Pretrain-703K dataset hosted on Hugging Face and contributed by the HF Datasets community
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
[ECCV 2024] Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment
Introduction
fMRI-video dataset, 8 subjects (6 male and 2 female, aged 23-27, 3 for FCVID and 5 for WebVid) participated, and fMRI data are acquired using a 3T scanner and a 32-channel RF head coil, with the fMRI sampled at 1 frame per 0.8 seconds. In detail, stimuli videos of dimensions 256$\times$256 and 596$\times$336 are sourced from the FCVID video dataset and WebVid… See the full description on the dataset page: https://huggingface.co/datasets/Fudan-fMRI/fMRI-Video.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Introduce
A camera motion annotation dataset based on WebVid, containing a total of 36k videos.
Usage
Recovering zip files:cat webmotion.tar.gz.part_* > webmotion.tar.gz
Then unzip it:tar -xzf webmotion.tar.gz
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CamVid-30K
Summary
This is the CamVid-30K dataset introduced in our paper, "GenXD: Generating Any 3D and 4D Scenes." CamVid-30K is the first open-sourced, large-scale 4D dataset, designed to support various dynamic 3D tasks. It includes videos sourced from VIPSeg, OpenVid-1M, and WebVid-10M, with camera annotations curated using our data curation pipeline.
Project: https://gen-x-d.github.io/
Paper: https://arxiv.org/pdf/2411.02319
Code:… See the full description on the dataset page: https://huggingface.co/datasets/Yuyang-z/CamVid-30K.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Video sources
In the json files, src indicates the video sources which can be downloaded as follows.
video-vqa-webvid_qa: WebVid video-conversation-videochat2: VideoChat2 video-classification-ssv2: SSv2 video-reasoning-clevrer_qa: CLEVRER video-vqa-tgif_frame_qa: TGIF video-reasoning-next_qa: NExTQA video-conversation-videochat1: VideoChat video-vqa-tgif_transition_qa: TGIF video-reasoning-clevrer_mc: CLEVRER video-vqa-ego_qa: EgoQA video-classification-k710:… See the full description on the dataset page: https://huggingface.co/datasets/pritamqu/self-alignment.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
WebVid contains 10 million video clips with captions, sourced from the web. The videos are diverse and rich in their content.
Both the full 10M set and a 2.5M subset is available for download: https://github.com/m-bain/webvid-dataset