9 datasets found
  1. h

    HowTo100M-subtitles-small

    • huggingface.co
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diyar Hamedi (2023). HowTo100M-subtitles-small [Dataset]. https://huggingface.co/datasets/diyarhamedi/HowTo100M-subtitles-small
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 2, 2023
    Authors
    Diyar Hamedi
    Description

    HowTo100M-subtitles-small

    The subtitles from a subset of the HowTo100M dataset.

  2. h

    ego4d_train_pair_howto100m

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jilan Xu (2024). ego4d_train_pair_howto100m [Dataset]. https://huggingface.co/datasets/Jazzcharles/ego4d_train_pair_howto100m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Authors
    Jilan Xu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📙 Overview

    The metadata for Ego4d training set, with paired howto100m video clips. The ego-exo pair is constructed by choosing the ones with shared nouns/verbs.
    Each sample represents a short video clip, which consists of

    vid: the initial video id. start_second: the start timestamp of the narration. end_second: the end timestamp of the narration. text: the original narration. noun: a list containing the index of nouns in the Ego4d noun vocabulary. verb: a list containing the… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/ego4d_train_pair_howto100m.

  3. h

    HowTo100M_llama3_refined_caption

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jilan Xu (2024). HowTo100M_llama3_refined_caption [Dataset]. https://huggingface.co/datasets/Jazzcharles/HowTo100M_llama3_refined_caption
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Authors
    Jilan Xu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📙 Overview

    The metadata for HowTo100M. The original ASR is refined by LLAMA-3 language model.
    Each sample represents a short video clip, which consists of

    vid: the initial video id. uid: a given unique id to index the clip. start_second: the timestamp of the narration. end_second: the end timestamp of the narration (which is simply set to start + 1). text: the original ASR transcript. noun: a list containing the index of nouns in the noun vocabulary. verb: a list containing the… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/HowTo100M_llama3_refined_caption.

  4. O

    ACAV100M (Automatically Curated Audio-Visual)

    • opendatalab.com
    zip
    Updated Sep 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Research (2022). ACAV100M (Automatically Curated Audio-Visual) [Dataset]. https://opendatalab.com/OpenDataLab/ACAV100M
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2022
    Dataset provided by
    Microsoft Research
    NVIDIA Research
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).

  5. O

    MSR-VTT Adverbs

    • opendatalab.com
    zip
    Updated Mar 24, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Amsterdam (2023). MSR-VTT Adverbs [Dataset]. https://opendatalab.com/OpenDataLab/MSR-VTT_Adverbs
    Explore at:
    zipAvailable download formats
    Dataset updated
    Mar 24, 2023
    Dataset provided by
    University of Amsterdam
    Description

    We evaluate our approach on HowTo100M Adverbs which mined adverbs from 83 tasks in HowTo100M. Since the annotations were obtained from automatically transcribed narrations of instructional videos, they are noisy; ∼44% of the annotated action-adverb pairs are not visible in the video clip. The dataset contains 5,824 clips annotated with action-adverb pairs from 72 verbs and 6 adverbs. A clear limitation of this dataset is the small number of adverbs it contains, we thus create three new adverb datasets from existing video retrieval datasets: VATEX Adverbs, MSR-VTT Adverbs and ActivityNet Adverbs. These contain less noise and a greater variety of adverbs.

  6. h

    dibs-feature

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Exclibur (2024). dibs-feature [Dataset]. https://huggingface.co/datasets/Exclibur/dibs-feature
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Authors
    Exclibur
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    DIBS Features

    Pre-extracted CLIP and UniVL features of the YouCook2, ActivityNet and HowTo100M custom subset used in DIBS. To process the HowTo100M subset features, first combine all the split files and then extract them using the following commands:

    Combine the split files

    cat howto_subset_features.tar.gz.part* > howto_subset_features.tar.gz

    Uncompress the combined file

    tar -xvzf howto_subset_features.tar.gz

    File Structure ├── yc2 │ ├── clip_features │ │ ├── video │ │… See the full description on the dataset page: https://huggingface.co/datasets/Exclibur/dibs-feature.

  7. h

    DenseStep200K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous, DenseStep200K [Dataset]. https://huggingface.co/datasets/gmj03/DenseStep200K
    Explore at:
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains two datasets for instructional video analysis tasks:

      1. DenseStep200K.json
    
    
    
    
    
      Description
    

    A large-scale dataset containing 222,000 detailed, temporally grounded instructional steps annotated across 10,000 high-quality instructional videos (totaling 732 hours). Constructed through a training-free automated pipeline leveraging multimodal foundation models (Qwen2.5-VL-72B and DeepSeek-R1-671B) to process noisy HowTo100M videos, achieving precise… See the full description on the dataset page: https://huggingface.co/datasets/gmj03/DenseStep200K.

  8. O

    RareAct

    • opendatalab.com
    zip
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Oxford (2023). RareAct [Dataset]. https://opendatalab.com/OpenDataLab/RareAct
    Explore at:
    zip(8904008493 bytes)Available download formats
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    University of Oxford
    Czech Institute of Informatics, Robotics and Cybernetics
    Institut national de recherche en informatique et en automatique
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately.

  9. h

    EgoThinker-SFT-Dataset

    • huggingface.co
    Updated Oct 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    hyf (2025). EgoThinker-SFT-Dataset [Dataset]. https://huggingface.co/datasets/hyf015/EgoThinker-SFT-Dataset
    Explore at:
    Dataset updated
    Oct 30, 2025
    Authors
    hyf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset Description:

    The data format is a pair of video and text annotations. Our dataset comprises four categories:

    EgoRe: The QA pairs annotated in our egocentric videos comprise three short, long, and chain-of-thought (CoT) data with video sources derived from Ego4D and HowTo100M.

    General: A comprehensive collection of general-purpose image and video datasets, including K400, NextQA, SSV2, VideoChatGPT, and GPT-4o annotated QA data.

    Ego-Related: Collection of publicly released… See the full description on the dataset page: https://huggingface.co/datasets/hyf015/EgoThinker-SFT-Dataset.

  10. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Diyar Hamedi (2023). HowTo100M-subtitles-small [Dataset]. https://huggingface.co/datasets/diyarhamedi/HowTo100M-subtitles-small

HowTo100M-subtitles-small

diyarhamedi/HowTo100M-subtitles-small

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 2, 2023
Authors
Diyar Hamedi
Description

HowTo100M-subtitles-small

The subtitles from a subset of the HowTo100M dataset.

Search
Clear search
Close search
Google apps
Main menu