10 datasets found

h
howto100m
huggingface.co
Updated Jun 30, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
HuggingFaceM4 (2022). howto100m [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/howto100m
Explore at:
Dataset updated
Jun 30, 2022
Dataset authored and provided by
HuggingFaceM4
Description
HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of - 136M video clips with captions sourced from 1.2M YouTube videos (15 years of video) - 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness

Each video is associated with a narration available as subtitles automatically downloaded from YouTube.
h
youtube_subs_howto100M
huggingface.co
Updated Mar 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wonchang Chung (2023). youtube_subs_howto100M [Dataset]. https://huggingface.co/datasets/totuta/youtube_subs_howto100M
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 31, 2023
Authors
Wonchang Chung
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
YouTube
Description
Dataset Card for youtube_subs_howto100M

Dataset Summary

The youtube_subs_howto100M dataset is an English-language dataset of instruction-response pairs extracted from 309136 YouTube videos. The dataset was orignally inspired by and sourced from the HowTo100M dataset, which was developed for natural language search for video clips.

Supported Tasks and Leaderboards

conversational: The dataset can be used to train a model for instruction(request) and a long form… See the full description on the dataset page: https://huggingface.co/datasets/totuta/youtube_subs_howto100M.
h
ego4d_train_pair_howto100m
huggingface.co
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jilan Xu (2024). ego4d_train_pair_howto100m [Dataset]. https://huggingface.co/datasets/Jazzcharles/ego4d_train_pair_howto100m
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2024
Authors
Jilan Xu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
📙 Overview

The metadata for Ego4d training set, with paired howto100m video clips. The ego-exo pair is constructed by choosing the ones with shared nouns/verbs.
Each sample represents a short video clip, which consists of

vid: the initial video id. start_second: the start timestamp of the narration. end_second: the end timestamp of the narration. text: the original narration. noun: a list containing the index of nouns in the Ego4d noun vocabulary. verb: a list containing the… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/ego4d_train_pair_howto100m.
h
HowTo100M_llama3_refined_caption
huggingface.co
Updated May 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jilan Xu (2024). HowTo100M_llama3_refined_caption [Dataset]. https://huggingface.co/datasets/Jazzcharles/HowTo100M_llama3_refined_caption
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 7, 2024
Authors
Jilan Xu
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
📙 Overview

The metadata for HowTo100M. The original ASR is refined by LLAMA-3 language model.
Each sample represents a short video clip, which consists of

vid: the initial video id. uid: a given unique id to index the clip. start_second: the timestamp of the narration. end_second: the end timestamp of the narration (which is simply set to start + 1). text: the original ASR transcript. noun: a list containing the index of nouns in the noun vocabulary. verb: a list containing the… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/HowTo100M_llama3_refined_caption.
h
tigerbot-youtube-howto-en-50k
huggingface.co
Updated Jun 25, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Tiger Research (2023). tigerbot-youtube-howto-en-50k [Dataset]. https://huggingface.co/datasets/TigerResearch/tigerbot-youtube-howto-en-50k
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 25, 2023
Dataset authored and provided by
Tiger Research
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Tigerbot 基于开源数据加工的sft，youtube中如何做(howto)系列。原始来源：https://www.di.ens.fr/willow/research/howto100m/

Usage

import datasets ds_sft = datasets.load_dataset('TigerResearch/tigerbot-youtube-howto-en-50k')
O
How2R
opendatalab.com
zip
Updated Feb 2, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft (2021). How2R [Dataset]. https://opendatalab.com/OpenDataLab/How2R
Explore at:
zipAvailable download formats
Dataset updated
Feb 2, 2021
Dataset provided by
Microsoft
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Amazon Mechanical Turk (AMT) is used to collect annotations on HowTo100M videos. 30k 60-second clips are randomly sampled from 9,421 videos and present each clip to the turkers, who are asked to select a video segment containing a single, self-contained scene. After this segment selection step, another group of workers are asked to write descriptions for each displayed segment. Narrations are not provided to the workers to ensure that their written queries are based on visual content only. These final video segments are 10-20 seconds long on average, and the length of queries ranges from 8 to 20 words. From this process, 51,390 queries are collected for 24k 60-second clips from 9,371 videos in HowTo100M, on average 2-3 queries per clip. The video clips and its associated queries are split into 80% train, 10% val and 10% test.
O
ACAV100M (Automatically Curated Audio-Visual)
opendatalab.com
zip
Updated Sep 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Microsoft Research (2022). ACAV100M (Automatically Curated Audio-Visual) [Dataset]. https://opendatalab.com/OpenDataLab/ACAV100M
Explore at:
zipAvailable download formats
Dataset updated
Sep 2, 2022
Dataset provided by
NVIDIA Research
Microsoft Research
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
ACAV100M processes 140 million full-length videos (total duration 1,030 years) which are used to produce a dataset of 100 million 10-second clips (31 years) with high audio-visual correspondence. This is two orders of magnitude larger than the current largest video dataset used in the audio-visual learning literature, i.e., AudioSet (8 months), and twice as large as the largest video dataset in the literature, i.e., HowTo100M (15 years).
h
dibs-feature
huggingface.co
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Exclibur (2024). dibs-feature [Dataset]. https://huggingface.co/datasets/Exclibur/dibs-feature
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 15, 2024
Authors
Exclibur
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
DIBS Features

Pre-extracted CLIP and UniVL features of the YouCook2, ActivityNet and HowTo100M custom subset used in DIBS. To process the HowTo100M subset features, first combine all the split files and then extract them using the following commands:

Combine the split files

cat howto_subset_features.tar.gz.part* > howto_subset_features.tar.gz

Uncompress the combined file

tar -xvzf howto_subset_features.tar.gz

File Structure ├── yc2 │ ├── clip_features │ │ ├── video │ │… See the full description on the dataset page: https://huggingface.co/datasets/Exclibur/dibs-feature.
h
DenseStep200K
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anonymous, DenseStep200K [Dataset]. https://huggingface.co/datasets/gmj03/DenseStep200K
Explore at:
Authors
Anonymous
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains two datasets for instructional video analysis tasks:

1. DenseStep200K.json Description

A large-scale dataset containing 222,000 detailed, temporally grounded instructional steps annotated across 10,000 high-quality instructional videos (totaling 732 hours). Constructed through a training-free automated pipeline leveraging multimodal foundation models (Qwen2.5-VL-72B and DeepSeek-R1-671B) to process noisy HowTo100M videos, achieving precise… See the full description on the dataset page: https://huggingface.co/datasets/gmj03/DenseStep200K.
O
RareAct
opendatalab.com
zip
Updated Mar 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Oxford (2023). RareAct [Dataset]. https://opendatalab.com/OpenDataLab/RareAct
Explore at:
zip(8904008493 bytes)Available download formats
Dataset updated
Mar 17, 2023
Dataset provided by
University of Oxford
Institut national de recherche en informatique et en automatique
Czech Institute of Informatics, Robotics and Cybernetics
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

HuggingFaceM4 (2022). howto100m [Dataset]. https://huggingface.co/datasets/HuggingFaceM4/howto100m

howto100m

HuggingFaceM4/howto100m

Explore at:

Dataset updated

Jun 30, 2022

Dataset authored and provided by

HuggingFaceM4

Description

HowTo100M is a large-scale dataset of narrated videos with an emphasis on instructional videos where content creators teach complex tasks with an explicit intention of explaining the visual content on screen. HowTo100M features a total of - 136M video clips with captions sourced from 1.2M YouTube videos (15 years of video) - 23k activities from domains such as cooking, hand crafting, personal care, gardening or fitness

Each video is associated with a narration available as subtitles automatically downloaded from YouTube.

Clear search

Close search

Google apps

Main menu

howto100m

youtube_subs_howto100M

ego4d_train_pair_howto100m

HowTo100M_llama3_refined_caption

tigerbot-youtube-howto-en-50k

How2R

ACAV100M (Automatically Curated Audio-Visual)

dibs-feature

Combine the split files

Uncompress the combined file

DenseStep200K

RareAct

howto100m

HuggingFaceM4/howto100m