This dataset was created by Ismail Hossain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A thorough analysis of the existing human action recognition datasets demonstrates that only a few HRI datasets are available that target real-world applications, all of which are adapted to home settings. Therefore, given the shortage of datasets in industrial tasks, we aim to provide the community with a dataset created in a laboratory setting that includes actions commonly performed within manufacturing and service industries. In addition, the proposed dataset meets the requirements of deep learning algorithms for the development of intelligent learning models for action recognition and imitation in HRI applications.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Intelligent Sensing Lab Dataset (ISLD), Matlab sequences for Human Action Recognition based on Video Sequences.This dataset contains pose data recorded in the Intelligent Sensing Lab, Newcastle University, for posture/based Human Action Recognition (HAR), along with original RGB data. The dataset has been used to support results in [1]. Poses are obtained by using OpenPose.[1] F. Angelini, Z. Fu, Y. Long, L. Shao and S. M. Naqvi, "2D Pose-Based Real-Time Human Action Recognition With Occlusion-Handling," in IEEE Transactions on Multimedia, vol. 22, no. 6, pp. 1433-1446, June 2020, doi: 10.1109/TMM.2019.2944745.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Human Activity Detection 2 is a dataset for object detection tasks - it contains Human Activity annotations for 827 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Alireza Keshavarzian
Released under CC0: Public Domain
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Human Activity Recognition is a dataset for computer vision tasks - it contains Humans annotations for 309 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains Axivity AX3 wrist-worn activity tracker data that were collected from 151 participants in 2014-2016 around the Oxfordshire area. Participants were asked to wear the device in daily living for a period of roughly 24 hours, amounting to a total of almost 4,000 hours. Vicon Autograph wearable cameras and Whitehall II sleep diaries were used to obtain the ground truth activities performed during the period (e.g. sitting watching TV, walking the dog, washing dishes, sleeping), resulting in more than 2,500 hours of labelled data. Accompanying code to analyse this data is available at https://github.com/activityMonitoring/capture24. The following papers describe the data collection protocol in full: i.) Gershuny J, Harms T, Doherty A, Thomas E, Milton K, Kelly P, Foster C (2020) Testing self-report time-use diaries against objective instruments in real time. Sociological Methodology doi: 10.1177/0081175019884591; ii.) Willetts M, Hollowell S, Aslett L, Holmes C, Doherty A. (2018) Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. Scientific Reports. 8(1):7961. Regarding Data Protection, the Clinical Data Set will not include any direct subject identifiers. However, it is possible that the Data Set may contain certain information that could be used in combination with other information to identify a specific individual, such as a combination of activities specific to that individual ("Personal Data"). Accordingly, in the conduct of the Analysis, users will comply with all applicable laws and regulations relating to information privacy. Further, the user agrees to preserve the confidentiality of, and not attempt to identify, individuals in the Data Set.
https://rose1.ntu.edu.sg/dataset/actionRecognition/https://rose1.ntu.edu.sg/dataset/actionRecognition/
NTU RGB+D is a large-scale dataset for RGB-D human action recognition. It involves 56,880 samples of 60 action classes collected from 40 subjects. The actions can be generally divided into three categories: 40 daily actions (e.g., drinking, eating, reading), nine health-related actions (e.g., sneezing, staggering, falling down), and 11 mutual actions (e.g., punching, kicking, hugging). These actions take place under 17 different scene conditions corresponding to 17 video sequences (i.e., S001–S017). The actions were captured using three cameras with different horizontal imaging viewpoints, namely, −45∘,0∘, and +45∘. Multi-modality information is provided for action characterization, including depth maps, 3D skeleton joint position, RGB frames, and infrared sequences. The performance evaluation is performed by a cross-subject test that split the 40 subjects into training and test groups, and by a cross-view test that employed one camera (+45∘) for testing, and the other two cameras for training.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This paper explores the use of a Digital Twin of a real industrial workstation involving assembly tasks with a robotic arm interfaced with Virtual Reality (VR) to extract a digital human model. The DT simulates assembly operations performed by humans aiming to generate self-labeled data. Thereby, a Human Action Recognition dataset named InHARD-DT was created to validate a real use case in which we use the acquired auto-labeled DT data of the virtual representation of the InHARD dataset to train a Spatial–Temporal Graph Convolutional Neural Network with skeletal data on one hand. On the other hand, the Physical Twin (PT) data of the InHARD dataset was used for testing. Therefore, we introduce a RGB+S dataset named “Industrial Human Action Recognition Dataset - Digital Twin” (InHARD-DT) from a real-world setting for industrial human action recognition.
We invited 12 distinct subjects from the LINEACT laboratory (4 females and 8 males) for the DT data collection to perform the same assembly tasks of the InHARD dataset (link below) in Virtual Reality via a VR application of an industrial real workstation. This dataset contains 13 different industrial action classes and over 4800 action samples. The introduction of this dataset should allow us the study and development of various learning techniques for the task of human actions analysis inside industrial environments involving human robot collaborations. It can be used also in cross-validation scenarios where the training phase can be done using the Physical Twin (PT) data of the InHARD dataset (real world scenarios) and then test using Digital Twin (DT) data of the InHARD-DT dataset which is the main objective of this paper.
The HMDB51 dataset is a large-scale video dataset for human action recognition.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Human Action Detection Video Surveillance is a dataset for object detection tasks - it contains Action annotations for 269 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
WiFi CSI-Based Long-Range Through-Wall Human Activity Recognition with the ESP32
This repository contains the WiFi CSI human presence detection and activity recognition datasets proposed in [1].
Datasets
DP_LOS - Line-of-sight (LOS) presence detection dataset, comprised of 392 CSI amplitude spectrograms.
DP_NLOS - Non-line-of-sight (NLOS) presence detection dataset, comprised of 384 CSI amplitude spectrograms.
DA_LOS - LOS activity recognition dataset, comprised of 392 CSI amplitude spectrograms.
DA_NLOS - NLOS activity recognition dataset, comprised of 384 CSI amplitude spectrograms.
Table 1: Characteristics of presence detection and activity recognition datasets.
Dataset Scenario
Packet Sending Rate Interval
DP_LOS LOS 1 1 6 100Hz 4s (400 packets) 392
DP_NLOS NLOS 5 1 6 100Hz 4s (400 packets) 384
DA_LOS LOS 1 1 3 100Hz 4s (400 packets) 392
DA_NLOS NLOS 5 1 3 100Hz 4s (400 packets) 384
Data Format
Each dataset employs an 8:1:1 training-validation-test split, defined in the provided label files trainLabels.csv, validationLabels.csv, and testLabels.csv. Label files use the sample format [i c], with i corresponding to the spectrogram index (i.png) and c corresponding to the class. For presence detection datasets (DP_LOS , DP_NLOS), c in {0 = "no presence", 1 = "presence in room 1", ..., 5 = "presence in room 5"}. For activity recognition datasets (DA_LOS , DA_NLOS), c in {0="no activity", 1="walking", and 2="walking + arm-waving"}. Furthermore, the mean and standard deviation of a given dataset are provided in meanStd.csv.
Download and UseThis data may be used for non-commercial research purposes only. If you publish material based on this data, we request that you include a reference to our paper [1].
[1] Strohmayer, Julian, and Martin Kampel. "WiFi CSI-Based Long-Range Through-Wall Human Activity Recognition with the ESP32" International Conference on Computer Vision Systems. Cham: Springer Nature Switzerland, 2023.
BibTeX citation:
@inproceedings{strohmayer2023wifi, title={WiFi CSI-Based Long-Range Through-Wall Human Activity Recognition with the ESP32}, author={Strohmayer, Julian and Kampel, Martin}, booktitle={International Conference on Computer Vision Systems}, pages={41--50}, year={2023}, organization={Springer} }
A new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset consists of WiFi Channel State Information (CSI) and Ultra-Wideband (UWB) Channel Impulse Response (CIR) data collected for human activity recognition (wireless sensing of human activities). The main dataset folder "Wireless_sensing_human_activity_recognition" consists of 2 subfolders: (1) "WiFi_CSI" - containing WiFi channel state information (CSI) data collected across 3 rooms (2) "UWB" - containing Ultra-Wideband (UWB) channel impulse response (CIR) data collected in one room.
(1) The "WiFi_CSI" folder contains the following subfolders: (i) Room_1 (ii) Room_2 (iii) Room_3
(2) The "UWB" folder contains no subfolders.
The interested user is kindly referred to the "readme.txt" file for more informtion on the dataset in terms of directory details, equipment and parameters used, experimental setup, detailed description of each file, human activities performed, etc.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Action recognition has received increasing attentions from the computer vision and machine learning community in the last decades. Ever since then, the recognition task has evolved from single view recording under controlled laboratory environment to unconstrained environment (i.e., surveillance environment or user generated videos). Furthermore, recent work focused on other aspect of action recognition problem, such as cross-view classification, cross domain learning, multi-modality learning, and action localization. Despite the large variations of studies, we observed limited works that explore the open-set and open-view classification problem, which is a genuine inherited properties in action recognition problem. In other words, a well designed algorithm should robustly identify an unfamiliar action as “unknown” and achieved similar performance across sensors with similar field of view. The Multi-Camera Action Dataset (MCAD) is designed to evaluate the open-view classification problem under surveillance environment.
In our multi-camera action dataset, different from common action datasets we use a total of five cameras, which can be divided into two types of cameras (StaticandPTZ), to record actions. Particularly, there are three Static cameras (Cam04 & Cam05 & Cam06) with fish eye effect and two PanTilt-Zoom (PTZ) cameras (PTZ04 & PTZ06). Static camera has a resolution of 1280×960 pixels, while PTZ camera has a resolution of 704×576 pixels and a smaller field of view than Static camera. What’s more, we don’t control the illumination environment. We even set two contrasting conditions (Daytime and Nighttime environment) which makes our dataset more challenge than many controlled datasets with strongly controlled illumination environment.The distribution of the cameras is shown in the picture on the right.
We identified 18 units single person daily actions with/without object which are inherited from the KTH, IXMAS, and TRECIVD datasets etc. The list and the definition of actions are shown in the table. These actions can also be divided into 4 types actions. Micro action without object (action ID of 01, 02 ,05) and with object (action ID of 10, 11, 12 ,13). Intense action with object (action ID of 03, 04 ,06, 07, 08, 09) and with object (action ID of 14, 15, 16, 17, 18). We recruited a total of 20 human subjects. Each candidate repeats 8 times (4 times during the day and 4 times in the evening) of each action under one camera. In the recording process, we use five cameras to record each action sample separately. During recording stage we just tell candidates the action name then they could perform the action freely with their own habit, only if they do the action in the field of view of the current camera. This can make our dataset much closer to reality. As a results there is high intra action class variation among different action samples as shown in picture of action samples.
URL: http://mmas.comp.nus.edu.sg/MCAD/MCAD.html
Resources:
How to Cite:
Please cite the following paper if you use the MCAD dataset in your work (papers, articles, reports, books, software, etc):
The Berkeley Multimodal Human Action Database (MHAD) contains 11 actions performed by 7 male and 5 female subjects in the range 23-30 years of age except for one elderly subject. All the subjects performed 5 repetitions of each action, yielding about 660 action sequences which correspond to about 82 minutes of total recording time. In addition, we have recorded a T-pose for each subject which can be used for the skeleton extraction; and the background data (with and without the chair used in some of the activities). Figure 1 shows the snapshots from all the actions taken by the front-facing camera and the corresponding point clouds extracted from the Kinect data. The specified set of actions comprises of the following: (1) actions with movement in both upper and lower extremities, e.g., jumping in place, jumping jacks, throwing, etc., (2) actions with high dynamics in upper extremities, e.g., waving hands, clapping hands, etc. and (3) actions with high dynamics in lower extremities, e.g., sit down, stand up. Prior to each recording, the subjects were given instructions on what action to perform; however no specific details were given on how the action should be executed (i.e., performance style or speed). The subjects have thus incorporated different styles in performing some of the actions (e.g., punching, throwing). Figure 2 shows a snapshot of the throwing action from the reference camera of each camera cluster and from the two Kinect cameras. The figure demonstrates the amount of information that can be obtained from multi-view and depth observations as compared to a single viewpoint.
For more info, please visit this.
The Actions are: 1- Jumping in place 2- Jumping jacks 3- Bending 4- Punching 5- Waving(two hands) 6- Waving(one hand) 7- Clapping Hands 9- Throwing a ball 10- Sit Down 11- Stand Up 12- T-Pose
The efforts to create a non-trivial and publicly available dataset for action recognition was initiated at the KTH Royal Institute of Technology in 2004. The KTH dataset is one of the most standard datasets, which contains six actions: walk, jog, run, box, hand-wave, and hand clap. To account for performance nuance, each action is performed by 25 different individuals, and the setting is systematically altered for each action per actor. Setting variations include: outdoor (s1), outdoor with scale variation (s2), outdoor with different clothes (s3), and indoor (s4). These variations test the ability of each algorithm to identify actions independent of the background, appearance of the actors, and the scale of the actors.
MetaVD is a Meta Video Dataset for enhancing human action recognition datasets. It provides human-annotated relationship labels between action classes across human action recognition datasets. MetaVD is proposed in the following paper: Yuya Yoshikawa, Yutaro Shigeto, and Akikazu Takeuchi. "MetaVD: A Meta Video Dataset for enhancing human action recognition datasets." Computer Vision and Image Understanding 212 (2021): 103276. [link]
MetaVD integrates the following datasets: UCF101, HMDB51, ActivityNet, STAIR Actions, Charades, Kinetics-700
This repository does NOT provide videos in the datasets. For information on how to download the videos, please refer to the website of each dataset.
Kinetics-700 is a video dataset of 650,000 clips that covers 700 human action classes. The videos include human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 700 video clips. Each clip is annotated with an action class and lasts approximately 10 seconds.
https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy
BASE YEAR | 2024 |
HISTORICAL DATA | 2019 - 2024 |
REPORT COVERAGE | Revenue Forecast, Competitive Landscape, Growth Factors, and Trends |
MARKET SIZE 2023 | 16.88(USD Billion) |
MARKET SIZE 2024 | 20.89(USD Billion) |
MARKET SIZE 2032 | 114.8(USD Billion) |
SEGMENTS COVERED | Recognition Type ,Application ,Technology ,Deployment Mode ,End-User ,Regional |
COUNTRIES COVERED | North America, Europe, APAC, South America, MEA |
KEY MARKET DYNAMICS | AIpowered surveillance systems Adoption in healthcare and sports Edge computing advancements |
MARKET FORECAST UNITS | USD Billion |
KEY COMPANIES PROFILED | Sony ,Microsoft ,Alibaba ,Huawei ,Apple ,Tencent ,NVIDIA ,Samsung ,Qualcomm ,LG ,Intel ,Google ,Amazon ,Baidu ,Panasonic |
MARKET FORECAST PERIOD | 2025 - 2032 |
KEY MARKET OPPORTUNITIES | Edge computing and 5G network integration Contactless gesture recognition applications Healthcare and rehabilitation technologies Industrial automation and robotics Retail and customer experience enhancements |
COMPOUND ANNUAL GROWTH RATE (CAGR) | 23.74% (2025 - 2032) |
This dataset was created by Ismail Hossain