Dataset with phone and webcam videos for anti-spoofing, biometric verification, facial recognition, and access control security
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The dataset contains a comprehensive collection of human activity videos, spanning across 7 distinct classes. These classes include clapping, meeting and splitting, sitting, standing still, walking, walking while reading book, and walking while using the phone.
Each video clip in the dataset showcases a specific human activity and has been labeled with the corresponding class to facilitate supervised learning.
The primary inspiration behind creating this dataset is to enable machines to recognize and classify human activities accurately. With the advent of computer vision and deep learning techniques, it has become increasingly important to train machine learning models on large and diverse datasets to improve their accuracy and robustness.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset comprises 1,300+ videos featuring over 300 individuals who each recorded 4 videos while pronouncing a set of numbers. This dataset is designed to facilitate research in biometric verification, face recognition, and action recognition. The videos include 2 recordings on mobile devices, each lasting approximately 30 seconds and 8 seconds, and 2 webcam recordings with the same durations.
By utilizing this dataset, developers and researchers can enhance their understanding of human activity and improving object detection. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fcb41444ba6f5bd4f48f2250f686fed22%2FFrame%20174%20(5).png?generation=1733179922956991&alt=media" alt="">
This extensive collection provides high-quality video recordings that are ideal for training and testing various vision models and learning algorithms. The dataset is a valuable resource for developing and evaluating detection algorithms and object tracking systems.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fe6eb3bdb6c82da56c879aa7d1afddb61%2FFrame%20136%20(2).png?generation=1733170796165756&alt=media" alt="">
This dataset is an invaluable asset for researchers aiming to achieve higher detection accuracy and improve the performance of face recognition systems, ultimately contributing to advancements in biometric security and liveness detection technologies.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Video Dataset - 1,300+ files
The dataset comprises 1,300+ videos of 300+ people captured using mobile phones (including Android devices and iPhone) and webcams under varying lighting conditions. It is designed for research in face detection, object recognition, and event detection, leveraging high-quality videos from smartphone cameras and webcam streams. — Get the data
Dataset characteristics:
Characteristic Data
Description Each person recorded 4 videos… See the full description on the dataset page: https://huggingface.co/datasets/ud-biometrics/phone-and-webcam-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We publish a data set for YouTube's mobile streaming client, which follows the popular Dynamic Adaptive Streaming over HTTP (DASH) standard. The data was measured over 4 months, at 2 separate locations in Europe, at the network, transport and application layer for DASH.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides an end-to-end (E2E) perspective of the performance of 360-video services over mobile networks. The data was collected using a network-in-a-box setup in conjunction with a Meta Quest 2 head-mounted display (HMD) and a customer premises equipment (CPE) to provide 5G connectivity to the glasses (WiFi-native).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it is imperative to acquire brain data from freely-behaving children to assay the variability and individuality of neural patterns across gender and age.
The pervasive nature of short-form video platforms has seamlessly integrated into daily routines, yet it is important to recognize their potential adverse effects on both physical and mental health. Prior research has identified a detrimental impact of excessive short-form video consumption on attentional behavior, but the underlying neural mechanisms remain unexplored. In the current study, we aimed to investigate the effect of short-form video use on attentional functions, measured through the attention network test (ANT). A total of 48 participants, consisting of 35 females and 13 males, with a mean age of 21.8 years, were recruited. The mobile phone short video addiction tendency questionnaire (MPSVATQ) and self-control scale (SCS) were conducted to assess the short video usage behavior and self-control ability. Electroencephalogram (EEG) data were recorded during the completion of the ANT task. The correlation analysis showed a significant negative relationship between MPSVATQ and theta power index reflecting the executive control in the prefrontal region (r = −0.395, p = 0.007), this result was not observed by using theta power index of the resting-state EEG data. Furthermore, a significant negative correlation was identified between MPSVATQ and SCS outcomes (r = −0.320, p = 0.026). These results suggest that an increased tendency toward mobile phone short video addiction could negatively impact self-control and diminish executive control within the realm of attentional functions. This study sheds light on the adverse consequences stemming from short video consumption and underscores the importance of developing interventions to mitigate short video addiction.
This dataset was used to perform the experiments reported in the IJCB2024 paper : "A novel and responsible dataset for presentation attack detection on mobile devices".
The dataset consists of face videos captured using two cameras (main and front) of nine different smartphones : Apple iPhone 12, Apple iPhone 6s, Xiaomi Redmi 6 Pro, Xiaomi Redmi 9A, Samsung Galaxy S9, GooglePixel 3, Samsung Galaxy S8, iPhone 7 Plus, and iPhone 12 Mini. The dataset contains :
Bona-fide face videos: 8400 videos of bona-fide (real, non-attack) faces, with and without hygienic masks. In total, there are 70 identities (data subjects). Each video is 10 seconds long, where for the first 5 seconds the data subject was required to stay still and look at the camera, then for the last 5 seconds the subject was asked to turn their head from one side to the other (such that profile views could be captured). The videos were acquired under different lighting conditions, including normal office lighting, low lighting, and outdoor lateral lighting. The data subjects were consenting volunteers, who were required to be present during two recording sessions, which on average were separated by about three weeks. In each recording session, the volunteers were asked to record a video of their own face using the front (i.e., selfie) camera of each of the five smartphones mentioned earlier. The face data was additionally captured while the data subjects wore plain (not personalised) hygienic masks, to simulate the scenario where face recognition might need to be performed on a masked face (e.g., during a pandemic like COVID-19).
Attacks:
If you use this dataset, please cite the following publication:
N. Ramoly, A. Komaty, V. K. Hahn, L. Younes, A. -M. Awal and S. Marcel, "A Novel and Responsible Dataset for Face Presentation Attack Detection on Mobile Devices," 2024 IEEE International Joint Conference on Biometrics (IJCB), Buffalo, NY, USA, 2024, pp. 1-9, doi: 10.1109/IJCB62174.2024.10744500.
Replay-Mobile is a dataset for face recognition and presentation attack detection (anti-spoofing). The dataset consists of 1190 video clips of photo and video presentation attacks (spoofing attacks) to 40 clients, under different lighting conditions. These videos were recorded with an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android).
Database Description
All videos have been captured using the front-camera of the mobile device (tablet or phone). The front-camera produces colour videos with a resolution of 720 pixels (width) by 1280 pixels (height) and saved in ".mov" file-format. The frame rate is about 25 Hz. Real-accesses have been performed by the genuine user (presenting one's true face to the device). Attack-accesses have been performed by displaying a photo or a video recording of the attacked client, for at least 10 seconds.
Real client accesses have been recorded under five different lighting conditions (controlled, adverse, direct, lateral and diffuse). In addition, to produce the attacks, high-resolution photos and videos from each client were taken under conditions similar to those in their authentication sessions (lighton, lightoff).
The 1190 real-accesses and attacks videos were then grouped in the following way:
Training set: contains 120 real-accesses and 192 attacks under different lighting conditions;
Development set: contains 160 real-accesses and 256 attacks under different lighting conditions;
Test set: contains 110 real-accesses and 192 attacks under different lighting conditions;
Enrollment set: contains 160 real-accesses under different lighting conditions, to be used exclusively for studying the baseline performance of face recognition systems. (This set is again partitioned into 'Training', 'Development' and 'Test' sets.)
Attacks
For photos attacks a Nikon coolix P520 camera, which records 18Mpixel photographs, has been used. Video attacks were captured using the back-camera of a smartphone LG-G4, which records 1080p FHD video clips using its 16 Mpixel camera.
Attacks have been performed in two ways:
A matte-screen was used to perform the attacks (i.e., to display the digital photo or video of the attacked identity). For all such (matte-screen) attacks, a stand was used to hold capturing devices.
Print attacks. For "fixed" attacks, both capturing devices were supported on a stand (as for matte-screen attacks). For "hand" attacks, the spoofer held the capturing device in his/her own hands while the spoof-resource (printed photo) was stationary.
In total, 16 attack videos were registered for each client, 8 for each of the attacking modes described above.
4 x mobile attacks using an Philips 227ELH screen (with resolution 1920x1080 pixels)
4 x tablet attacks using an Philips 227ELH screen (with resolution 1920x1080 pixels)
2 x mobile attacks using hard-copy print attacks fixed (produced on a Konica Minolta ineo+ 224e color laser printer) occupying the whole available printing surface on A4 paper
2 x mobile attacks using hard-copy print attacks fixed (produced on a Konica Minolta ineo+ 224e color laser printer) occupying the whole available printing surface on A4 paper
2 x mobile attacks using hard-copy print attacks hand (produced on a Konica Minolta ineo+ 224e color laser printer) occupying the whole available printing surface on A4 paper
2 x mobile attacks using hard-copy print attacks hand (produced on a Konica Minolta ineo+ 224e color laser printer) occupying the whole available printing surface on A4 paper
Reference
If you use this database, please cite the following publication:
Artur Costa-Pazo, Sushil Bhattacharjee, Esteban Vazquez-Fernandez and Sébastien Marcel,"The REPLAY-MOBILE Face Presentation-Attack Database", IEEE BIOSIG 2016. 10.1109/BIOSIG.2016.7736936 http://publications.idiap.ch/index.php/publications/show/3477
We introduce the Stanford Streaming MAR dataset. The dataset contains 23 different objects of interest, divided to four categories: Books, CD covers, DVD covers and Common Objects. We first record one video for each object where the object is in a static position while the camera is moving. These videos are recorded with a hand-held mobile phone with different amounts of camera motion, glare, blur, zoom, rotation and perspective changes. Each video is 100 frames long, recorded at 30 fps with resolution 640 x 480. For each video, we provide a clean database image (no background noise) for the corresponding object of interest. We also provide 5 more videos for moving objects recorded with a moving camera. These videos help to study the effect of background clutter when there is a relative motion between the object and the background. Finally, we record 4 videos that contain multiple objects from the dataset. Each video is 200 frames long and contains 3 objects of interest where the camera captures them one after the other. We provide the ground-truth localization information for 14 videos, where we manually define a bounding quadrilateral around the object of interest in each video frame. This localization information is used in the calculation of the Jaccard index.
Static single object: 1.a. Books: Automata Theory, Computer Architecture, OpenCV, Wang Book. 1.b. CD Covers: Barry White, Chris Brown, Janet Jackson, Rascal Flatts, Sheryl Crow. 1.c. DVD Covers: Finding Nemo, Monsters Inc, Mummy Returns, Private Ryan, Rush Hour, Shrek, Titanic, Toy Story. 1.d. Common Objects: Bleach, Glade, Oreo, Polish, Tide, Tuna.
Moving object, moving camera: Barry White Moving, Chris Brown Moving, Titanic Moving, Titanic Moving - Second, Toy Story Moving.
Multiple objects: 3.a. Multiple Objects 1: Polish, Wang Book, Monsters Inc. 3.b. Multiple Objects 2: OpenCV, Barry White, Titanic. 3.c. Multiple Objects 3: Monsters Inc, Toy Story, Titanic. 3.d. Multiple Objects 4: Wang Book, Barry White, OpenCV.
Replay-Attack is a dataset for face recognition and presentation attack detection (anti-spoofing). The dataset consists of 1300 video clips of photo and video presentation attack (spoofing attacks) to 50 clients, under different lighting conditions.
Spoofing Attacks Description
The 2D face spoofing attack database consists of 1,300 video clips of photo and video attack attempts of 50 clients, under different lighting conditions.
The data is split into 4 sub-groups comprising:
Training data ("train"), to be used for training your anti-spoof classifier;
Development data ("devel"), to be used for threshold estimation;
Test data ("test"), with which to report error figures;
Enrollment data ("enroll"), that can be used to verify spoofing sensitivity on face detection algorithms.
Clients that appear in one of the data sets (train, devel or test) do not appear in any other set.
Database Description
All videos are generated by either having a (real) client trying to access a laptop through a built-in webcam or by displaying a photo or a video recording of the same client for at least 9 seconds. The webcam produces colour videos with a resolution of 320 pixels (width) by 240 pixels (height). The movies were recorded on a Macbook laptop using the QuickTime framework (codec: Motion JPEG) and saved into ".mov" files. The frame rate is about 25 Hz. Besides the native support on Apple computers, these files are easily readable using mplayer, ffmpeg or any other video utilities available under Linux or MS Windows systems.
Real client accesses as well as data collected for the attacks are taken under two different lighting conditions:
To produce the attacks, high-resolution photos and videos from each client were taken under the same conditions as in their authentication sessions, using a Canon PowerShot SX150 IS camera, which records both 12.1 Mpixel photographs and 720p high-definition video clips. The way to perform the attacks can be divided into two subsets: the first subset is composed of videos generated using a stand to hold the client biometry ("fixed"). For the second set, the attacker holds the device used for the attack with their own hands. In total, 20 attack videos were registered for each client, 10 for each of the attacking modes just described:
4 x mobile attacks using an iPhone 3GS screen (with resolution 480x320 pixels) displaying:
1 x mobile photo/controlled
1 x mobile photo/adverse
1 x mobile video/controlled
1 x mobile video/adverse
4 x high-resolution screen attacks using an iPad (first generation, with a screen resolution of 1024x768 pixels) displaying:
1 x high-resolution photo/controlled
1 x high-resolution photo/adverse
1 x high-resolution video/controlled
1 x high-resolution video/adverse
2 x hard-copy print attacks (produced on a Triumph-Adler DCC 2520 color laser printer) occupying the whole available printing surface on A4 paper for the following samples:
1 x high-resolution print of photo/controlled
1 x high-resolution print of photo/adverse
The 1300 real-accesses and attacks videos were then divided in the following way:
Training set: contains 60 real-accesses and 300 attacks under different lighting conditions;
Development set: contains 60 real-accesses and 300 attacks under different lighting conditions;
Test set: contains 80 real-accesses and 400 attacks under different lighting conditions;
Face Locations
We also provide face locations automatically annotated by a cascade of classifiers based on a variant of Local Binary Patterns (LBP) referred as Modified Census Transform (MCT) [Face Detection with the Modified Census Transform, Froba, B. and Ernst, A., 2004, IEEE International Conference on Automatic Face and Gesture Recognition, pp. 91-96]. The automatic face localisation procedure works in more than 99% of the total number of frames acquired. This means that less than 1% of the total set of frames for all videos do not possess annotated faces. User algorithms must account for this fact.
Protocol for Licit Biometric Transactions
It is possible to measure the performance of baseline face recognition systems on the 2D Face spoofing database and evaluate how well the attacks pass such systems or how, otherwise robust they are to attacks. Here we describe how to use the available data at the enrolment set to create a background model, client models and how to perform scoring using the available data.
Universal Background Model (UBM): To generate the UBM, subselect the training-set client videos from the enrollment videos. There should be 2 per client, which means you get 30 videos, each with 375 frames to create the model;
Client models: To generate client models, use the enrollment data for clients at the development and test groups. There should be 2 videos per client (one for each light condition) once more. At the end of the enrollment procedure, the development set must have 1 model for each of the 15 clients available in that set. Similarly, for the test set, 1 model for each of the 20 clients available;
For a simple baseline verification, generate scores exhaustively for all videos from the development and test real-accesses respectively, but without intermixing accross development and test sets. The scores generated against matched client videos and models (within the subset, i.e. development or test) should be considered true client accesses, while all others impostors;
If you are looking for a single number to report on the performance do the following: exclusively using the scores from the development set, tune your baseline face recognition system on the EER of the development set and use this threshold to find the HTER on the test set scores.
Protocols for Spoofing Attacks
Attack protocols are used to evaluate the (binary classification) performance of counter-measures to spoof attacks. The database can be split into 6 different protocols according to the type of device used to generate the attack: print, mobile (phone), high-definition (tablet), photo, video or grand test (all types). Furthermore, subsetting can be achieved on the top of the previous 6 groups by classifying attacks as performed by the attacker bare hands or using a fixed support. This classification scheme makes-up a total of 18 protocols that can be used for studying the performance of counter-measures to 2D face spoofing attacks. The table bellow details the amount of video clips in each protocol.
Acknowledgements
If you use this database, please cite the following publication:
I. Chingovska, A. Anjos, S. Marcel,"On the Effectiveness of Local Binary Patterns in Face Anti-spoofing"; IEEE BIOSIG, 2012. https://ieeexplore.ieee.org/document/6313548 http://publications.idiap.ch/index.php/publications/show/2447
Video instance segmentation on mobile devices is an impor- tant yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame- by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Selfies and video dataset
4000 people in this dataset. Each person took a selfie on a webcam, took a selfie on a mobile phone. In addition, people recorded video from the phone and from the webcam, on which they pronounced a given set of numbers. Includes folders corresponding to people in the dataset. Each folder includes 8 files (4 images and 4 videos).
Get the dataset
This is just an example of the data
Leave a request on https://trainingdata.pro/datasets to… See the full description on the dataset page: https://huggingface.co/datasets/UniqueData/selfie_and_video.
This dataset presents data on a study conducted between November of 2015 and march 2016 at the Department for Molecular and Applied Nutritional Psychology (180d) at the University of Hohenheim. Aim of the experimental study was to examine the influence of wearing headphones while watching a video on individual snack intake. The uploaded dataset contains the raw data of the study relevant for the according published article, available as IBM SPSS data file.
https://github.com/Li-Chongyi/Lighting-the-Darkness-in-the-Deep-Learning-Era-Open#LLIVPhonehttps://github.com/Li-Chongyi/Lighting-the-Darkness-in-the-Deep-Learning-Era-Open#LLIVPhone
LoLi-Phone is a large-scale low-light image and video dataset for Low-light image enhancement (LLIE). The images and videos are taken by different mobile phones' cameras under diverse illumination conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Here are a few use cases for this project:
Remote Learning/Teaching: The model can be used in remote learning platforms for teaching or learning American Sign Language (ASL). It can analyze teachers or students' hand gestures in real time, confirming if the generated signs are accurate.
Video Communication for Deaf individuals: Video calling platforms can use the model to interpret hand signs to provide real-time translation during a call. This can enable effective communication for users who are deaf or are hard of hearing.
Virtual ASL Tutors: It can support the development of interactive virtual ASL tutorial systems, enabling users to practice and get instant feedback on their sign usage.
AI Assisted Speech Therapists: The model could assist therapists working remotely with clients who have speech disorders. It can help in interpreting signs to reinforce communication between the therapist and client.
Accessibility in entertainment/media: Streaming platforms can use the model to provide real-time or pre-processed ASL translations of movies, TV shows, or webinars for viewers who rely on sign language to communicate.
Data Set Information:
The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.
The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.
Check the README.txt file for further details about this dataset.
A video of the experiment including an example of the 6 recorded activities with one of the participants can be seen in the following link: https://www.youtube.com/watch?v=XOEN9W05_4A
An updated version of this dataset can be found at https://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones# . It includes labels of postural transition between activities and also the full raw inertial signals instead of the ones pre-processed into windows.
Attribute Information:
For each record in the dataset it is provided: - Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration. - Triaxial Angular velocity from the gyroscope. - A 561-feature vector with time and frequency domain variables. - Its activity label. - An identifier of the subject who carried out the experiment.
The Replay-Mobile dataset consists of 1190 video clips of 40 subjects. It contains paper and replay presentation attacks under five different lighting conditions.
Dataset with phone and webcam videos for anti-spoofing, biometric verification, facial recognition, and access control security