10 datasets found
  1. h

    howto100m

    • huggingface.co
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2022). howto100m [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/howto100m
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    HuggingFaceM4
    Description

    HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of - 136M video clips with captions sourced from 1.2M YouTube videos (15 years of video) - 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness

    Each video is associated with a narration available as subtitles automatically downloaded from YouTube.

  2. h

    youtube_subs_howto100M

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wonchang Chung (2023). youtube_subs_howto100M [Dataset]. https://huggingface.co/datasets/totuta/youtube_subs_howto100M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Authors
    Wonchang Chung
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    YouTube
    Description

    Dataset Card for youtube_subs_howto100M

      Dataset Summary
    

    The youtube_subs_howto100M dataset is an English-language dataset of instruction-response pairs extracted from 309136 YouTube videos. The dataset was orignally inspired by and sourced from the HowTo100M dataset, which was developed for natural language search for video clips.

      Supported Tasks and Leaderboards
    

    conversational: The dataset can be used to train a model for instruction(request) and a long form… See the full description on the dataset page: https://huggingface.co/datasets/totuta/youtube_subs_howto100M.

  3. h

    ego4d_train_pair_howto100m

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jilan Xu (2024). ego4d_train_pair_howto100m [Dataset]. https://huggingface.co/datasets/Jazzcharles/ego4d_train_pair_howto100m
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Authors
    Jilan Xu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📙 Overview

    The metadata for Ego4d training set, with paired howto100m video clips. The ego-exo pair is constructed by choosing the ones with shared nouns/verbs.
    Each sample represents a short video clip, which consists of

    vid: the initial video id. start_second: the start timestamp of the narration. end_second: the end timestamp of the narration. text: the original narration. noun: a list containing the index of nouns in the Ego4d noun vocabulary. verb: a list containing the… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/ego4d_train_pair_howto100m.

  4. h

    HowTo100M_llama3_refined_caption

    • huggingface.co
    Updated May 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jilan Xu (2024). HowTo100M_llama3_refined_caption [Dataset]. https://huggingface.co/datasets/Jazzcharles/HowTo100M_llama3_refined_caption
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 7, 2024
    Authors
    Jilan Xu
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    📙 Overview

    The metadata for HowTo100M. The original ASR is refined by LLAMA-3 language model.
    Each sample represents a short video clip, which consists of

    vid: the initial video id. uid: a given unique id to index the clip. start_second: the timestamp of the narration. end_second: the end timestamp of the narration (which is simply set to start + 1). text: the original ASR transcript. noun: a list containing the index of nouns in the noun vocabulary. verb: a list containing the… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/HowTo100M_llama3_refined_caption.

  5. h

    tigerbot-youtube-howto-en-50k

    • huggingface.co
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiger Research (2023). tigerbot-youtube-howto-en-50k [Dataset]. https://huggingface.co/datasets/TigerResearch/tigerbot-youtube-howto-en-50k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2023
    Dataset authored and provided by
    Tiger Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Tigerbot 基于开源数据加工的sft,youtube中如何做(howto)系列。 原始来源:https://www.di.ens.fr/willow/research/howto100m/

      Usage
    

    import datasets ds_sft = datasets.load_dataset('TigerResearch/tigerbot-youtube-howto-en-50k')

  6. O

    How2R

    • opendatalab.com
    zip
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft (2021). How2R [Dataset]. https://opendatalab.com/OpenDataLab/How2R
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 2, 2021
    Dataset provided by
    Microsoft
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Amazon Mechanical Turk (AMT) is used to collect annotations on HowTo100M videos. 30k 60-second clips are randomly sampled from 9,421 videos and present each clip to the turkers, who are asked to select a video segment containing a single, self-contained scene. After this segment selection step, another group of workers are asked to write descriptions for each displayed segment. Narrations are not provided to the workers to ensure that their written queries are based on visual content only. These final video segments are 10-20 seconds long on average, and the length of queries ranges from 8 to 20 words. From this process, 51,390 queries are collected for 24k 60-second clips from 9,371 videos in HowTo100M, on average 2-3 queries per clip. The video clips and its associated queries are split into 80% train, 10% val and 10% test.

  7. O

    ACAV100M (Automatically Curated Audio-Visual)

    • opendatalab.com
    zip
    Updated Sep 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Research (2022). ACAV100M (Automatically Curated Audio-Visual) [Dataset]. https://opendatalab.com/OpenDataLab/ACAV100M
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2022
    Dataset provided by
    NVIDIA Research
    Microsoft Research
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).

  8. h

    dibs-feature

    • huggingface.co
    Updated Dec 15, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Exclibur (2024). dibs-feature [Dataset]. https://huggingface.co/datasets/Exclibur/dibs-feature
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 15, 2024
    Authors
    Exclibur
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    DIBS Features

    Pre-extracted CLIP and UniVL features of the YouCook2, ActivityNet and HowTo100M custom subset used in DIBS. To process the HowTo100M subset features, first combine all the split files and then extract them using the following commands:

    Combine the split files

    cat howto_subset_features.tar.gz.part* > howto_subset_features.tar.gz

    Uncompress the combined file

    tar -xvzf howto_subset_features.tar.gz

    File Structure ├── yc2 │ ├── clip_features │ │ ├── video │ │… See the full description on the dataset page: https://huggingface.co/datasets/Exclibur/dibs-feature.

  9. h

    DenseStep200K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous, DenseStep200K [Dataset]. https://huggingface.co/datasets/gmj03/DenseStep200K
    Explore at:
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains two datasets for instructional video analysis tasks:

      1. DenseStep200K.json
    
    
    
    
    
      Description
    

    A large-scale dataset containing 222,000 detailed, temporally grounded instructional steps annotated across 10,000 high-quality instructional videos (totaling 732 hours). Constructed through a training-free automated pipeline leveraging multimodal foundation models (Qwen2.5-VL-72B and DeepSeek-R1-671B) to process noisy HowTo100M videos, achieving precise… See the full description on the dataset page: https://huggingface.co/datasets/gmj03/DenseStep200K.

  10. O

    RareAct

    • opendatalab.com
    zip
    Updated Mar 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Oxford (2023). RareAct [Dataset]. https://opendatalab.com/OpenDataLab/RareAct
    Explore at:
    zip(8904008493 bytes)Available download formats
    Dataset updated
    Mar 17, 2023
    Dataset provided by
    University of Oxford
    Institut national de recherche en informatique et en automatique
    Czech Institute of Informatics, Robotics and Cybernetics
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately.

  11. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HuggingFaceM4 (2022). howto100m [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/howto100m

howto100m

HuggingFaceM4/howto100m

Explore at:
Dataset updated
Jun 30, 2022
Dataset authored and provided by
HuggingFaceM4
Description

HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of - 136M video clips with captions sourced from 1.2M YouTube videos (15 years of video) - 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness

Each video is associated with a narration available as subtitles automatically downloaded from YouTube.

Search
Clear search
Close search
Google apps
Main menu