Dataset with phone and webcam videos for anti-spoofing, biometric verification, facial recognition, and access control security
https://github.com/Li-Chongyi/Lighting-the-Darkness-in-the-Deep-Learning-Era-Open#LLIVPhonehttps://github.com/Li-Chongyi/Lighting-the-Darkness-in-the-Deep-Learning-Era-Open#LLIVPhone
LoLi-Phone is a large-scale low-light image and video dataset for Low-light image enhancement (LLIE). The images and videos are taken by different mobile phones' cameras under diverse illumination conditions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Streaming is by far the predominant type of traffic in communication networks. With thispublic dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3G/4G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Phone and Webcam videos
Dataset comprises 1,300+ videos featuring over 300 individuals who each recorded 4 videos while pronouncing a set of numbers. This dataset is designed to facilitate research in biometric verification, face recognition, and action recognition. The videos include 2 recordings on mobile devices, each lasting approximately 30 seconds and 8 seconds, and 2 webcam recordings with the same durations. By utilizing this dataset, developers and researchers can enhance… See the full description on the dataset page: https://huggingface.co/datasets/UniDataPro/phone-and-webcam-video.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides an end-to-end (E2E) perspective of the performance of 360-video services over mobile networks. The data was collected using a network-in-a-box setup in conjunction with a Meta Quest 2 head-mounted display (HMD) and a customer premises equipment (CPE) to provide 5G connectivity to the glasses (WiFi-native).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We publish a data set for YouTube's mobile streaming client, which follows the popular Dynamic Adaptive Streaming over HTTP (DASH) standard. The data was measured over 4 months, at 2 separate locations in Europe, at the network, transport and application layer for DASH.
The Multi-domain Mobile Video Physiology Dataset (MMPD), comprising 11 hours(1152K frames) of recordings from mobile phones of 33 subjects. The dataset was designed to capture videos with greater representation across skin tone, body motion, and lighting conditions. MMPD is comprehensive with eight descriptive labels and can be used in conjunction with the rPPG-toolbox and PhysBench. MMPD is widely used for rPPG tasks and remote heart rate estimation. To access the dataset, you are supposed to download this data release agreement and request downloading by email.
The Human3.6M dataset is one of the largest motion capture datasets, which consists of 3.6 million human poses and corresponding images captured by a high-speed motion capture system. There are 4 high-resolution progressive scan cameras to acquire video data at 50 Hz. The dataset contains activities by 11 professional actors in 17 scenarios: discussion, smoking, taking photo, talking on the phone, etc., as well as provides accurate 3D joint positions and high-resolution videos.
Replay-Mobile is a dataset for face recognition and presentation attack detection (anti-spoofing). The dataset consists of 1190 video clips of photo and video presentation attacks (spoofing attacks) to 40 clients, under different lighting conditions. These videos were recorded with an iPad Mini2 (running iOS) and a LG-G4 smartphone (running Android).
Database Description
All videos have been captured using the front-camera of the mobile device (tablet or phone). The front-camera produces colour videos with a resolution of 720 pixels (width) by 1280 pixels (height) and saved in ".mov" file-format. The frame rate is about 25 Hz. Real-accesses have been performed by the genuine user (presenting one's true face to the device). Attack-accesses have been performed by displaying a photo or a video recording of the attacked client, for at least 10 seconds.
Real client accesses have been recorded under five different lighting conditions (controlled, adverse, direct, lateral and diffuse). In addition, to produce the attacks, high-resolution photos and videos from each client were taken under conditions similar to those in their authentication sessions (lighton, lightoff).
The 1190 real-accesses and attacks videos were then grouped in the following way:
Attacks
For photos attacks a Nikon coolix P520 camera, which records 18Mpixel photographs, has been used. Video attacks were captured using the back-camera of a smartphone LG-G4, which records 1080p FHD video clips using its 16 Mpixel camera.
Attacks have been performed in two ways:
In total, 16 attack videos were registered for each client, 8 for each of the attacking modes described above.
Reference
If you use this database, please cite the following publication:
Artur Costa-Pazo, Sushil Bhattacharjee, Esteban Vazquez-Fernandez and Sébastien Marcel,"The REPLAY-MOBILE Face Presentation-Attack Database", IEEE BIOSIG 2016.
10.1109/BIOSIG.2016.7736936
http://publications.idiap.ch/index.php/publications/show/3477
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
it is imperative to acquire brain data from freely-behaving children to assay the variability and individuality of neural patterns across gender and age.
Replay-Attack is a dataset for face recognition and presentation attack detection (anti-spoofing). The dataset consists of 1300 video clips of photo and video presentation attack (spoofing attacks) to 50 clients, under different lighting conditions.
Spoofing Attacks Description
The 2D face spoofing attack database consists of 1,300 video clips of photo and video attack attempts of 50 clients, under different lighting conditions.
The data is split into 4 sub-groups comprising:
Training data ("train"), to be used for training your anti-spoof classifier;
Development data ("devel"), to be used for threshold estimation;
Test data ("test"), with which to report error figures;
Enrollment data ("enroll"), that can be used to verify spoofing sensitivity on face detection algorithms.
Clients that appear in one of the data sets (train, devel or test) do not appear in any other set.
Database Description
All videos are generated by either having a (real) client trying to access a laptop through a built-in webcam or by displaying a photo or a video recording of the same client for at least 9 seconds. The webcam produces colour videos with a resolution of 320 pixels (width) by 240 pixels (height). The movies were recorded on a Macbook laptop using the QuickTime framework (codec: Motion JPEG) and saved into ".mov" files. The frame rate is about 25 Hz. Besides the native support on Apple computers, these files are easily readable using mplayer, ffmpeg or any other video utilities available under Linux or MS Windows systems.
Real client accesses as well as data collected for the attacks are taken under two different lighting conditions:
To produce the attacks, high-resolution photos and videos from each client were taken under the same conditions as in their authentication sessions, using a Canon PowerShot SX150 IS camera, which records both 12.1 Mpixel photographs and 720p high-definition video clips. The way to perform the attacks can be divided into two subsets: the first subset is composed of videos generated using a stand to hold the client biometry ("fixed"). For the second set, the attacker holds the device used for the attack with their own hands. In total, 20 attack videos were registered for each client, 10 for each of the attacking modes just described:
4 x mobile attacks using an iPhone 3GS screen (with resolution 480x320 pixels) displaying:
1 x mobile photo/controlled
1 x mobile photo/adverse
1 x mobile video/controlled
1 x mobile video/adverse
4 x high-resolution screen attacks using an iPad (first generation, with a screen resolution of 1024x768 pixels) displaying:
1 x high-resolution photo/controlled
1 x high-resolution photo/adverse
1 x high-resolution video/controlled
1 x high-resolution video/adverse
2 x hard-copy print attacks (produced on a Triumph-Adler DCC 2520 color laser printer) occupying the whole available printing surface on A4 paper for the following samples:
1 x high-resolution print of photo/controlled
1 x high-resolution print of photo/adverse
The 1300 real-accesses and attacks videos were then divided in the following way:
Training set: contains 60 real-accesses and 300 attacks under different lighting conditions;
Development set: contains 60 real-accesses and 300 attacks under different lighting conditions;
Test set: contains 80 real-accesses and 400 attacks under different lighting conditions;
Face Locations
We also provide face locations automatically annotated by a cascade of classifiers based on a variant of Local Binary Patterns (LBP) referred as Modified Census Transform (MCT) [Face Detection with the Modified Census Transform, Froba, B. and Ernst, A., 2004, IEEE International Conference on Automatic Face and Gesture Recognition, pp. 91-96]. The automatic face localisation procedure works in more than 99% of the total number of frames acquired. This means that less than 1% of the total set of frames for all videos do not possess annotated faces. User algorithms must account for this fact.
Protocol for Licit Biometric Transactions
It is possible to measure the performance of baseline face recognition systems on the 2D Face spoofing database and evaluate how well the attacks pass such systems or how, otherwise robust they are to attacks. Here we describe how to use the available data at the enrolment set to create a background model, client models and how to perform scoring using the available data.
Universal Background Model (UBM): To generate the UBM, subselect the training-set client videos from the enrollment videos. There should be 2 per client, which means you get 30 videos, each with 375 frames to create the model;
Client models: To generate client models, use the enrollment data for clients at the development and test groups. There should be 2 videos per client (one for each light condition) once more. At the end of the enrollment procedure, the development set must have 1 model for each of the 15 clients available in that set. Similarly, for the test set, 1 model for each of the 20 clients available;
For a simple baseline verification, generate scores exhaustively for all videos from the development and test real-accesses respectively, but without intermixing accross development and test sets. The scores generated against matched client videos and models (within the subset, i.e. development or test) should be considered true client accesses, while all others impostors;
If you are looking for a single number to report on the performance do the following: exclusively using the scores from the development set, tune your baseline face recognition system on the EER of the development set and use this threshold to find the HTER on the test set scores.
Protocols for Spoofing Attacks
Attack protocols are used to evaluate the (binary classification) performance of counter-measures to spoof attacks. The database can be split into 6 different protocols according to the type of device used to generate the attack: print, mobile (phone), high-definition (tablet), photo, video or grand test (all types). Furthermore, subsetting can be achieved on the top of the previous 6 groups by classifying attacks as performed by the attacker bare hands or using a fixed support. This classification scheme makes-up a total of 18 protocols that can be used for studying the performance of counter-measures to 2D face spoofing attacks. The table bellow details the amount of video clips in each protocol.
Acknowledgements
If you use this database, please cite the following publication:
I. Chingovska, A. Anjos, S. Marcel,"On the Effectiveness of Local Binary Patterns in Face Anti-spoofing"; IEEE BIOSIG, 2012. https://ieeexplore.ieee.org/document/6313548 http://publications.idiap.ch/index.php/publications/show/2447
We introduce the Stanford Streaming MAR dataset. The dataset contains 23 different objects of interest, divided to four categories: Books, CD covers, DVD covers and Common Objects. We first record one video for each object where the object is in a static position while the camera is moving. These videos are recorded with a hand-held mobile phone with different amounts of camera motion, glare, blur, zoom, rotation and perspective changes. Each video is 100 frames long, recorded at 30 fps with resolution 640 x 480. For each video, we provide a clean database image (no background noise) for the corresponding object of interest. We also provide 5 more videos for moving objects recorded with a moving camera. These videos help to study the effect of background clutter when there is a relative motion between the object and the background. Finally, we record 4 videos that contain multiple objects from the dataset. Each video is 200 frames long and contains 3 objects of interest where the camera captures them one after the other. We provide the ground-truth localization information for 14 videos, where we manually define a bounding quadrilateral around the object of interest in each video frame. This localization information is used in the calculation of the Jaccard index.
Static single object: 1.a. Books: Automata Theory, Computer Architecture, OpenCV, Wang Book. 1.b. CD Covers: Barry White, Chris Brown, Janet Jackson, Rascal Flatts, Sheryl Crow. 1.c. DVD Covers: Finding Nemo, Monsters Inc, Mummy Returns, Private Ryan, Rush Hour, Shrek, Titanic, Toy Story. 1.d. Common Objects: Bleach, Glade, Oreo, Polish, Tide, Tuna.
Moving object, moving camera: Barry White Moving, Chris Brown Moving, Titanic Moving, Titanic Moving - Second, Toy Story Moving.
Multiple objects: 3.a. Multiple Objects 1: Polish, Wang Book, Monsters Inc. 3.b. Multiple Objects 2: OpenCV, Barry White, Titanic. 3.c. Multiple Objects 3: Monsters Inc, Toy Story, Titanic. 3.d. Multiple Objects 4: Wang Book, Barry White, OpenCV.
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Selfies and video dataset
4000 people in this dataset. Each person took a selfie on a webcam, took a selfie on a mobile phone. In addition, people recorded video from the phone and from the webcam, on which they pronounced a given set of numbers. Includes folders corresponding to people in the dataset. Each folder includes 8 files (4 images and 4 videos).
Get the dataset
This is just an example of the data
Leave a request on https://trainingdata.pro/datasets to… See the full description on the dataset page: https://huggingface.co/datasets/TrainingDataPro/selfie_and_video.
RareAct is a video dataset of unusual actions, including actions like “blend phone”, “cut keyboard” and “microwave shoes”. It aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately.
A video dataset for benchmarking upsampling methods. Inter4K contains 1,000 ultra-high resolution videos with 60 frames per second (fps) from online resources. The dataset provides standardized video resolutions at ultra-high definition (UHD/4K), quad-high definition (QHD/2K), full-high definition (FHD/1080p), (standard) high definition (HD/720p), one quarter of full HD (qHD/520p) and one ninth of a full HD (nHD/360p). We use frame rates of 60, 50, 30, 24 and 15 fps for each resolution. Based on this standardization, both super-resolution and frame interpolation tests can be performed for different scaling sizes ($\times 2$, $\times 3$ and $\times 4$). In this paper, we use Inter4K to address frame upsampling and interpolation. Inter4K provides both standardized UHD resolution and 60 fps for all of videos by also containing a diverse set of 1,000 5-second videos. Differences between scenes originate from the equipment (e.g., professional 4K cameras or phones), lighting conditions, variations in movements, actions or objects. The dataset is divided into 800 videos for training, 100 videos for validation and 100 videos for testing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The in-house Yoga pose dataset consists of 27 individuals, comprising nineteen females and eight males, performing each of the ten Yoga poses, namely Malasana, Ananda Balasana, Janu Sirsasana, Anjaneyasana, Tadasana, Kumbhakasana, Hasta Uttanasana, Paschimottanasana, Uttanasana, and Dandasana. The videos of the Yoga poses are collected in both 1080p and 4K resolution at a rate of 30 frames per second using MI Max and One Plus 5T mobile phones. We captured the videos at various locations like gardens, rooms, certified Yoga centers, and terraces to increase the generality of the dataset and make the dataset more realistic so that the model trained using the dataset would work in a complex real-world environment. It is worth mentioning that our in-house created dataset does not contain any video sample created in the controlled laboratory-like environment with proper illumination. We did this deliberately to enhance the generalization ability of the models trained on our dataset. The individuals have voluntarily participated in data collection and have performed the ten Yoga poses with possible variations.
Population distribution : race distribution: Asians, Caucasians, black people; gender distribution: gender balance; age distribution: from child to the elderly, the young people and the middle aged are the majorities
Collection environment : indoor scenes, outdoor scenes
Collection diversity : various postures, expressions, light condition, scenes, time periods and distances
Collection device : iPhone, android phone, iPad
Collection time : daytime,night
Image Parameter : the video format is .mov or .mp4, the image format is .jpg
Accuracy : the accuracy of actions exceeds 97%
Comprehensive dataset of 1,690 Video editing services in Brazil as of July, 2025. Includes verified contact information (email, phone), geocoded addresses, customer ratings, reviews, business categories, and operational details. Perfect for market research, lead generation, competitive analysis, and business intelligence. Download a complimentary sample to evaluate data quality and completeness.
The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.
The dataset includes:
60 video sequences. 2D pose annotations. 3D poses obtained with the method introduced in the paper. Camera poses for every frame in the sequences. 3D body scans and 3D people models (re-poseable and re-shapeable). Each sequence contains its corresponding models. 18 3D models in different clothing variations.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The pervasive nature of short-form video platforms has seamlessly integrated into daily routines, yet it is important to recognize their potential adverse effects on both physical and mental health. Prior research has identified a detrimental impact of excessive short-form video consumption on attentional behavior, but the underlying neural mechanisms remain unexplored. In the current study, we aimed to investigate the effect of short-form video use on attentional functions, measured through the attention network test (ANT). A total of 48 participants, consisting of 35 females and 13 males, with a mean age of 21.8 years, were recruited. The mobile phone short video addiction tendency questionnaire (MPSVATQ) and self-control scale (SCS) were conducted to assess the short video usage behavior and self-control ability. Electroencephalogram (EEG) data were recorded during the completion of the ANT task. The correlation analysis showed a significant negative relationship between MPSVATQ and theta power index reflecting the executive control in the prefrontal region (r = −0.395, p = 0.007), this result was not observed by using theta power index of the resting-state EEG data. Furthermore, a significant negative correlation was identified between MPSVATQ and SCS outcomes (r = −0.320, p = 0.026). These results suggest that an increased tendency toward mobile phone short video addiction could negatively impact self-control and diminish executive control within the realm of attentional functions. This study sheds light on the adverse consequences stemming from short video consumption and underscores the importance of developing interventions to mitigate short video addiction.
Dataset with phone and webcam videos for anti-spoofing, biometric verification, facial recognition, and access control security