The Kinetics-600 is a large-scale action recognition dataset which consists of around 480K videos from 600 action categories. The 480K videos are divided into 390K, 30K, 60K for training, validation and test sets, respectively. Each video in the dataset is a 10-second clip of action moment annotated from raw YouTube video. It is an extensions of the Kinetics-400 dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Kinetics dataset is a large-scale, high-quality dataset for human action recognition in videos. The dataset consists of around 500,000 video clips covering 600 human action classes with at least 600 video clips for each action class. Each video clip lasts around 10 seconds and is labeled with a single action class. The videos are collected from YouTube.
Kinetics-700 is a video dataset of 650,000 clips that covers 700 human action classes. The videos include human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 700 video clips. Each clip is annotated with an action class and lasts approximately 10 seconds.
This dataset was created by Nikiforos Vagenas
This dataset contains both 8 and 16 sampled frames of the "eating-spaghetti" video of the Kinetics-400 dataset, with the following frame indices being used:
8 frames (eating_spaghetti_8_frames.npy): 97, 98, 99, 100, 101, 102, 103, 104 16 frames (eating_spaghetti.npy): [164, 168, 172, 176, 181, 185, 189, 193, 198, 202, 206, 210, 215, 219, 223, 227]. 32 frames (eating_spaghetti_32_frames.npy): array([ 47, 51, 55… See the full description on the dataset page: https://huggingface.co/datasets/hf-internal-testing/spaghetti-video.
This repository contains the mapping from integer id's to actual label names (in HuggingFace Transformers typically called id2label) for several datasets. Current datasets include:
ImageNet-1k ImageNet-22k (also called ImageNet-21k as there are 21,843 classes) COCO detection 2017 COCO panoptic 2017 ADE20k (actually, the MIT Scene Parsing benchmark, which is a subset of ADE20k) Cityscapes VQAv2 Kinetics-700 RVL-CDIP PASCAL VOC Kinetics-400 ...
You can read in a label file as follows (using… See the full description on the dataset page: https://huggingface.co/datasets/huggingface/label-files.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Victor Efstatevits
Released under MIT
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The Kinetics-600 is a large-scale action recognition dataset which consists of around 480K videos from 600 action categories. The 480K videos are divided into 390K, 30K, 60K for training, validation and test sets, respectively. Each video in the dataset is a 10-second clip of action moment annotated from raw YouTube video. It is an extensions of the Kinetics-400 dataset.