12 datasets found
  1. h

    howto100m

    • huggingface.co
    • paperswithcode.com
    Updated Jun 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    HuggingFaceM4 (2022). howto100m [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/howto100m
    Explore at:
    Dataset updated
    Jun 30, 2022
    Dataset authored and provided by
    HuggingFaceM4
    Description

    HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of - 136M video clips with captions sourced from 1.2M YouTube videos (15 years of video) - 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness

    Each video is associated with a narration available as subtitles automatically downloaded from YouTube.

  2. h

    youtube_subs_howto100M

    • huggingface.co
    Updated Mar 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wonchang Chung (2023). youtube_subs_howto100M [Dataset]. https://huggingface.co/datasets/totuta/youtube_subs_howto100M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 31, 2023
    Authors
    Wonchang Chung
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    YouTube
    Description

    Dataset Card for youtube_subs_howto100M

      Dataset Summary
    

    The youtube_subs_howto100M dataset is an English-language dataset of instruction-response pairs extracted from 309136 YouTube videos. The dataset was orignally inspired by and sourced from the HowTo100M dataset, which was developed for natural language search for video clips.

      Supported Tasks and Leaderboards
    

    conversational: The dataset can be used to train a model for instruction(request) and a long form… See the full description on the dataset page: https://huggingface.co/datasets/totuta/youtube_subs_howto100M.

  3. h

    HowTo100M-subtitles-small

    • huggingface.co
    Updated Nov 2, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Diyar Hamedi (2023). HowTo100M-subtitles-small [Dataset]. https://huggingface.co/datasets/diyarhamedi/HowTo100M-subtitles-small
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 2, 2023
    Authors
    Diyar Hamedi
    Description

    HowTo100M-subtitles-small

    The subtitles from a subset of the HowTo100M dataset.

  4. P

    Sieve & Swap - HowTo100M (Cooking) Dataset

    • paperswithcode.com
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anil Batra; Davide Moltisanti; Laura Sevilla-Lara; Marcus Rohrbach; Frank Keller (2024). Sieve & Swap - HowTo100M (Cooking) Dataset [Dataset]. https://paperswithcode.com/dataset/sieve-swap-howto100m-cooking
    Explore at:
    Dataset updated
    Jul 23, 2024
    Authors
    Anil Batra; Davide Moltisanti; Laura Sevilla-Lara; Marcus Rohrbach; Frank Keller
    Description

    Procedural videos show step-by-step demonstrations of tasks like recipe preparation. Understanding such videos is challenging, involving the precise localization of steps and the generation of textual instructions. Manually annotating steps and writing instructions is costly, which limits the size of current datasets and hinders effective learning. Leveraging large but noisy video-transcript datasets for pre-training can boost performance, but demands significant computational resources. Furthermore, transcripts contain irrelevant content and exhibit style variation compared to instructions written by human annotators. To mitigate both issues, we propose a technique, Sieve-&-Swap, to automatically curate a smaller dataset: (i) Sieve filters irrelevant transcripts and (ii) Swap enhances the quality of the text instruction by automatically replacing the transcripts with human-written instructions from a text-only recipe dataset. The curated dataset, three orders of magnitude smaller than current webscale datasets, enables efficient training of large-scale models with competitive performance.

  5. O

    ACAV100M (Automatically Curated Audio-Visual)

    • opendatalab.com
    zip
    Updated Sep 2, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Microsoft Research (2022). ACAV100M (Automatically Curated Audio-Visual) [Dataset]. https://opendatalab.com/OpenDataLab/ACAV100M
    Explore at:
    zipAvailable download formats
    Dataset updated
    Sep 2, 2022
    Dataset provided by
    Microsoft Research
    NVIDIA Research
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).

  6. h

    tigerbot-youtube-howto-en-50k

    • huggingface.co
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tiger Research (2023). tigerbot-youtube-howto-en-50k [Dataset]. https://huggingface.co/datasets/TigerResearch/tigerbot-youtube-howto-en-50k
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 25, 2023
    Dataset authored and provided by
    Tiger Research
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Tigerbot 基于开源数据加工的sft,youtube中如何做(howto)系列。 原始来源:https://www.di.ens.fr/willow/research/howto100m/

      Usage
    

    import datasets ds_sft = datasets.load_dataset('TigerResearch/tigerbot-youtube-howto-en-50k')

  7. P

    AIR Dataset

    • paperswithcode.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Davide Moltisanti; Frank Keller; Hakan Bilen; Laura Sevilla-Lara, AIR Dataset [Dataset]. https://paperswithcode.com/dataset/air
    Explore at:
    Authors
    Davide Moltisanti; Frank Keller; Hakan Bilen; Laura Sevilla-Lara
    Description

    Adverbs in Recipes (AIR) is a dataset specifically collected for adverb recognition. AIR is a subset of HowTo100M where recipe videos show actions performed in ways that change according to an adverb (e.g. chop thinly/coarsely). AIR was carefully reviewed to ensure reliable annotations.

  8. h

    Howto-Interlink7M

    • huggingface.co
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alex Jinpeng Wang (2024). Howto-Interlink7M [Dataset]. https://huggingface.co/datasets/Awiny/Howto-Interlink7M
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 4, 2024
    Authors
    Alex Jinpeng Wang
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Howto-Interlink7M

      📙 Overview
    

    Howto-Interlink7M presents a unique interleaved video-text dataset, carefully derived from the raw video content of Howto100M.

    In the creation of this dataset, we turn a long video into a vision-text interleaved documents by BLIP2 (Img Captioner), GRIT (Img Detector), Whisper (ASR). Similar to VLog. Then, we employed the GPT-4 for an extensive 7 million high-quality pretraining data. During this process, we meticulously filtered out clips… See the full description on the dataset page: https://huggingface.co/datasets/Awiny/Howto-Interlink7M.

  9. a

    youcook2_features_howto100m

    • academictorrents.com
    bittorrent
    Updated Nov 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    None (2020). youcook2_features_howto100m [Dataset]. https://academictorrents.com/details/70417e3793dbbb03ca68981307860254766d5a1d
    Explore at:
    bittorrent(662214346)Available download formats
    Dataset updated
    Nov 25, 2020
    Authors
    None
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    A BitTorrent file to download data with the title 'youcook2_features_howto100m'

  10. P

    RareAct Dataset

    • library.toponeai.link
    • paperswithcode.com
    • +1more
    Updated Nov 27, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Antoine Miech; Jean-Baptiste Alayrac; Ivan Laptev; Josef Sivic; Andrew Zisserman (2024). RareAct Dataset [Dataset]. https://library.toponeai.link/dataset/rareact
    Explore at:
    Dataset updated
    Nov 27, 2024
    Authors
    Antoine Miech; Jean-Baptiste Alayrac; Ivan Laptev; Josef Sivic; Andrew Zisserman
    Description

    RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately.

  11. P

    How2R Dataset

    • paperswithcode.com
    • opendatalab.com
    Updated Feb 2, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Linjie Li; Yen-Chun Chen; Yu Cheng; Zhe Gan; Licheng Yu; Jingjing Liu (2021). How2R Dataset [Dataset]. https://paperswithcode.com/dataset/how2r
    Explore at:
    Dataset updated
    Feb 2, 2021
    Authors
    Linjie Li; Yen-Chun Chen; Yu Cheng; Zhe Gan; Licheng Yu; Jingjing Liu
    Description

    Amazon Mechanical Turk (AMT) is used to collect annotations on HowTo100M videos. 30k 60-second clips are randomly sampled from 9,421 videos and present each clip to the turkers, who are asked to select a video segment containing a single, self-contained scene. After this segment selection step, another group of workers are asked to write descriptions for each displayed segment. Narrations are not provided to the workers to ensure that their written queries are based on visual content only. These final video segments are 10-20 seconds long on average, and the length of queries ranges from 8 to 20 words. From this process, 51,390 queries are collected for 24k 60-second clips from 9,371 videos in HowTo100M, on average 2-3 queries per clip. The video clips and its associated queries are split into 80% train, 10% val and 10% test.

  12. h

    DenseStep200K

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anonymous, DenseStep200K [Dataset]. https://huggingface.co/datasets/gmj03/DenseStep200K
    Explore at:
    Authors
    Anonymous
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This repository contains two datasets for instructional video analysis tasks:

      1. DenseStep200K.json
    
    
    
    
    
      Description
    

    A large-scale dataset containing 222,000 detailed, temporally grounded instructional steps annotated across 10,000 high-quality instructional videos (totaling 732 hours). Constructed through a training-free automated pipeline leveraging multimodal foundation models (Qwen2.5-VL-72B and DeepSeek-R1-671B) to process noisy HowTo100M videos, achieving precise… See the full description on the dataset page: https://huggingface.co/datasets/gmj03/DenseStep200K.

  13. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
HuggingFaceM4 (2022). howto100m [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/howto100m

howto100m

HuggingFaceM4/howto100m

Explore at:
Dataset updated
Jun 30, 2022
Dataset authored and provided by
HuggingFaceM4
Description

HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of - 136M video clips with captions sourced from 1.2M YouTube videos (15 years of video) - 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness

Each video is associated with a narration available as subtitles automatically downloaded from YouTube.

Search
Clear search
Close search
Google apps
Main menu