This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
First-person (egocentric) video dataset; multi-faceted non-scripted recordings in the wearers' homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel live audio commentary approach.
The EPIC-KITCHENS-55 dataset comprises a set of 432 egocentric videos recorded by 32 participants in their kitchens at 60fps with a head mounted camera. There is no guiding script for the participants who freely perform activities in kitchens related to cooking, food preparation or washing up among others. Each video is split into short action segments (mean duration is 3.7s) with specific start and end times and a verb and noun annotation describing the action (e.g. ‘open fridge‘). The verb classes are 125 and the noun classes 331. The dataset is divided into one train and two test splits.
kiyoonkim/EPIC-KITCHENS-100-trimmed dataset hosted on Hugging Face and contributed by the HF Datasets community
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.
http://www.nationalarchives.gov.uk/doc/non-commercial-government-licence/version/2/http://www.nationalarchives.gov.uk/doc/non-commercial-government-licence/version/2/
EPIC-KITCHENS-100: Extended Footage for EPIC-KITCHENS dataset, to 100 hours of footage. 10.5523/bris.2g1n6qdydwa9u22shpxqzp0t8m 2020-09-10 N.b. please also see ERRATUM published at 2021-05-25 This supersedes the original torrent, which had a small change made.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
EPIC-KITCHENS-100 is a large-scale dataset in first-person (egocentric) vision; multi-faceted, audio-visual, non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel 'Pause-and-Talk' narration interface.
EPIC-KITCHENS-100 is an extension of the EPIC-KITCHENS dataset released in 2018, to 100 hours of footage.
EPIC-ROI builds on top of the EPIC-KITCHENS dataset, and consists of 103 diverse images with pixel-level annotations for regions where human hands frequently touch in everyday interaction. Specifically, image regions that afford any of the most frequent actions: take, open, close, press, dry, turn, peel are considered as positives. We manually watched video for multiple participants to define a) object categories, and b) specific regions within each category where participants interacted while conducting any of the 7 selected actions. These 103 images were sampled from across 9 different kitchens (7 to 15 images with minimal overlap, from each kitchen). EPIC-ROI is only used for evaluation, and contains 32 val images and 71 test images. Images from the same kitchen are in the same split. The Regions-of-Interaction task is to score each pixel in the image with the probability of a hand interacting with it. Performance is measured using average precision.
EPIC-SOUNDS is a large scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos from EPIC-KITCHENS-100. EPIC-SOUNDS includes 78.4k categorised and 39.2k non-categorised segments of audible events and actions, distributed across 44 classes.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
📙 Overview
EPIC-Kitchen-100 video features extracted by VideoMAE_L14 at 8 fps. It is used for evaluating the video-text retrieval ability of EgoInstructor. It contains 700 files, each file (e.g. P01_01.pth.tar) is a TxD feature vector, where T refers to the length of the video and D is 768.
🏋️ How-To-Use
Please refer to code EgoInstructor for details.
🎓 Citation
@article{xu2024retrieval, title={Retrieval-augmented egocentric video captioning}, author={Xu… See the full description on the dataset page: https://huggingface.co/datasets/Jazzcharles/epic_kitchen_videomae_L14_feature_fps8.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This material is part of the free Environmental Performance in Construction (EPiC) Database. The EPiC Database contains embodied environmental flow coefficients for 250+ construction materials using a comprehensive hybrid life cycle inventory approach.Steel is a ferrous metal and is an alloy of iron and carbon, as well as potential other elements. It has a very high tensile strength. Steel has been used in the construction industry for over a century. Stainless steel is extremely resistant to corrosion.The core material for making steel is iron, which is found in iron ore. Iron is extracted from iron ore in blast furnaces through the smelting process, while controlling for the content of carbon. To render the steel stainless, chromium is needed and is typically added as stainless steel scraps. The molten steel is usually further processed before being cast for its final use.Steel is commonly used in the construction industry, mainly as a structural material. Stainless steel sheets are used to produce a range of finishing materials, such as high durability cladding, roofing (mainly for airports), and kitchen surfaces.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition. For each challenge, we define the task, provide baselines and evaluation metrics.