Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The FaciaVox dataset is an extensive multimodal biometric resource designed to enable in-depth exploration of face-image and voice recording research areas in both masked and unmasked scenarios.
Features of the Dataset:
1. Multimodal Data: A total of 1,800 face images (JPG) and 6,000 audio recordings (WAV) were collected, enabling cross-domain analysis of visual and auditory biometrics.
2. Participants were categorized into four age groups for structured labeling:
Label 1: Under 16 years
Label 2: 16 to less than 31 years
Label 3: 31 to less than 46 years
Label 4: 46 years and above
3. Sibling Data: Some participants are siblings, adding a challenging layer for speaker identification and facial recognition tasks due to genetic similarities in vocal and facial features. Sibling relationships are documented in the accompanying "FaciaVox List" data file.
4. Standardized Filenames: The dataset uses a consistent, intuitive naming convention for both facial images and voice recordings. Each filename includes:
Type (F: Face Image, V: Voice Recording)
Participant ID (e.g., sub001)
Mask Type (e.g., a: unmasked, b: disposable mask, etc.)
Zoom Level or Sentence ID (e.g., 1x, 3x, 5x for images or specific sentence identifier {01, 02, 03, ..., 10} for recordings)
5. Diverse Demographics: 19 different countries.
6. A challenging face recognition problem involving reflective mask shields and severe lighting conditions.
7. Each participant uttered 7 English statements and 3 Arabic statements, regardless of their native language. This adds a challenge for speaker identification.
Research Applications
FaciaVox is a versatile dataset supporting a wide range of research domains, including but not limited to:
• Speaker Identification (SI) and Face Recognition (FR): Evaluating biometric systems under varying conditions.
• Impact of Masks on Biometrics: Investigating how different facial coverings affect recognition performance.
• Language Impact on SI: Exploring the effects of native and non-native speech on speaker identification.
• Age and Gender Estimation: Inferring demographic information from voice and facial features.
• Race and Ethnicity Matching: Studying biometrics across diverse populations.
• Synthetic Voice and Deepfake Detection: Detecting cloned or generated speech.
• Cross-Domain Biometric Fusion: Combining facial and vocal data for robust authentication.
• Speech Intelligibility: Assessing how masks influence speech clarity.
• Image Inpainting: Reconstructing occluded facial regions for improved recognition.
Researchers can use the facial images and voice recordings independently or in combination to explore multimodal biometric systems. The standardized filenames and accompanying metadata make it easy to align visual and auditory data for cross-domain analyses. Sibling relationships and demographic labels add depth for tasks such as familial voice recognition, demographic profiling, and model bias evaluation.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Heartprint is a large biometric database of multisession ECG-signal comprising 1539 records of 15 seconds length collected from 199 subjects. The signals were collected in multiple sessions over ten years starting from 2012 in resting and reading conditions of the subjects and organized in a multisession database. The dataset also covers several demographic classes such as genders, ethnicities, and age groups. Heartprint dataset could be a valuable resource for the development and evaluation of biometric recognition algorithms.
Please cite the following article if you use this dataset: Islam, M.S.; Alhichri, H.; Bazi, Y.; Ammour, N.; Alajlan, N.; Jomaa, R.M. Heartprint: A Dataset of Multisession ECG Signal with Long Interval Captured from Fingers for Biometric Recognition. Data 2022, 7, 141. https://doi.org/10.3390/data7100141
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global gait biometrics market size was valued at USD 0.9 billion in 2023 and is expected to reach approximately USD 2.5 billion by 2032, growing at a compound annual growth rate (CAGR) of 11.5% during the forecast period. This impressive growth is driven by advancements in sensor technology, increasing adoption in healthcare and security applications, and rising awareness of the benefits of gait biometrics in various sectors.
One of the primary growth factors of the gait biometrics market is the increasing prevalence of chronic diseases and conditions that impair mobility, such as Parkinson’s disease, arthritis, and stroke. These conditions necessitate advanced diagnostic and monitoring tools, propelling the demand for gait biometrics in healthcare settings. Healthcare professionals are increasingly utilizing gait analysis for early diagnosis, rehabilitation, and treatment effectiveness, which is significantly boosting market growth.
Technological advancements in the development of sophisticated sensors and machine learning algorithms are also major drivers of the gait biometrics market. Enhanced accuracy and reliability of the data collected through wearable and non-wearable sensors have broadened the scope of gait biometrics applications. Integration with artificial intelligence allows for more precise analysis and predictions, making gait biometrics a valuable tool in both clinical and non-clinical settings.
Security and surveillance are other areas witnessing significant adoption of gait biometrics. As global security concerns rise, there is growing interest in non-invasive and unobtrusive identification methods. Gait biometrics offers a unique advantage in this aspect, as gait patterns are difficult to disguise or replicate. Governments and private security agencies are increasingly investing in gait biometric systems for enhanced security measures, thus fostering market expansion.
Regionally, North America and Europe are the leading markets for gait biometrics due to the presence of advanced healthcare infrastructure, high adoption of innovative technologies, and a strong focus on research and development. However, the Asia Pacific region is expected to exhibit the highest growth rate during the forecast period, fueled by increasing healthcare investments and rising awareness about biometric technologies.
The gait biometrics market by component is segmented into hardware, software, and services. The hardware segment includes sensors, cameras, and other physical devices used in capturing gait data. This segment is experiencing robust growth due to continuous advancements in sensor technology, providing higher accuracy and reliability in data collection. Moreover, the decreasing cost of sensors is making hardware more accessible to a broader range of applications, including consumer health and fitness devices.
Software is another crucial component of the gait biometrics market, encompassing gait analysis software and machine learning algorithms that process and interpret the collected data. This segment is expected to witness significant growth due to the increasing demand for sophisticated software solutions that can handle large datasets and provide detailed analysis. The integration of cutting-edge technologies such as artificial intelligence and cloud computing is further enhancing the capabilities of gait biometrics software.
Services form an integral part of the gait biometrics ecosystem, including installation, maintenance, and training services. As gait biometrics systems become more complex, the demand for specialized services is rising. Service providers play a crucial role in ensuring the smooth operation and optimal performance of gait biometrics systems. The increasing adoption of these systems across various sectors is driving the demand for comprehensive service packages.
The synergetic interaction between hardware and software components is vital for the effective functioning of gait biometrics systems. Advanced hardware captures high-quality data, while sophisticated software processes this data to generate actionable insights. This seamless integration is essential for the widespread adoption and success of gait biometrics technology across different applications.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset provides a collection of behaviour biometrics data (commonly known as Keyboard, Mouse and Touchscreen (KMT) dynamics). The data was collected for use in a FinTech research project undertaken by academics and researchers at Computer Science Department, Edge Hill University, United Kingdom. The project called CyberSIgnature uses KMT dynamics data to distinguish between legitimate card owners and fraudsters. An application was developed that has a graphical user interface (GUI) similar to a standard online card payment form including fields for card type, name, card number, card verification code (cvc) and expiry date. Then, user KMT dynamics were captured while they entered fictitious card information on the GUI application.
The dataset consists of 1,760 KMT dynamic instances collected over 88 user sessions on the GUI application. Each user session involves 20 iterations of data entry in which the user is assigned a fictitious card information (drawn at random from a pool) to enter 10 times and subsequently presented with 10 additional card information, each to be entered once. The 10 additional card information is drawn from a pool that has been assigned or to be assigned to other users. A KMT data instance is collected during each data entry iteration. Thus, a total of 20 KMT data instances (i.e., 10 legitimate and 10 illegitimate) was collected during each user entry session on the GUI application.
The raw dataset is stored in .json format within 88 separate files. The root folder named behaviour_biometrics_dataset' consists of two sub-folders
raw_kmt_dataset' and `feature_kmt_dataset'; and a Jupyter notebook file (kmt_feature_classificatio.ipynb). Their folder and file content is described below:
-- raw_kmt_dataset': this folder contains 88 files, each named
raw_kmt_user_n.json', where n is a number from 0001 to 0088. Each file contains 20 instances of KMT dynamics data corresponding to a given fictitious card; and the data instances are equally split between legitimate (n = 10) and illegitimate (n = 10) classes. The legitimate class corresponds to KMT dynamics captured from the user that is assigned to the card detail; while the illegitimate class corresponds to KMT dynamics data collected from other users entering the same card detail.
-- feature_kmt_dataset': this folder contains two sub-folders, namely:
feature_kmt_json' and feature_kmt_xlsx'. Each folder contains 88 files (of the relevant format: .json or .xlsx) , each named
feature_kmt_user_n', where n is a number from 0001 to 0088. Each file contains 20 instances of features extracted from the corresponding `raw_kmt_user_n' file including the class labels (legitimate = 1 or illegitimate = 0).
-- `kmt_feature_classification.ipynb': this file contains python code necessary to generate features from the raw KMT files and apply simple machine learning classification task to generate results. The code is designed to run with minimal effort from the user.
The BED dataset
Version 1.0.0
Please cite as: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.
Disclaimer
While every care has been taken to ensure the accuracy of the data included in the BED dataset, the authors and the University of the West of Scotland, Durham University, and Universitat de València do not provide any guaranties and disclaim all responsibility and all liability (including without limitation, liability in negligence) for all expenses, losses, damages (including indirect or consequential damage) and costs which you might incur as a result of the provided data being inaccurate or incomplete in any way and for any reason. 2020, University of the West of Scotland, Scotland, United Kingdom.
Contact
For inquiries regarding the BED dataset, please contact:
Dataset summary
BED (Biometric EEG Dataset) is a dataset specifically designed to test EEG-based biometric approaches that use relatively inexpensive consumer-grade devices, more specifically the Emotiv EPOC+ in this case. This dataset includes EEG responses from 21 subjects to 12 different stimuli, across 3 different chronologically disjointed sessions. We have also considered stimuli aimed to elicit different affective states, so as to facilitate future research on the influence of emotions on EEG-based biometric tasks. In addition, we provide a baseline performance analysis to outline the potential of consumer-grade EEG devices for subject identification and verification. It must be noted that, in this work, EEG data were acquired in a controlled environment in order to reduce the variability in the acquired data stemming from external conditions.
The stimuli include:
For more details regarding the experimental protocol and the design of the dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, 2021. (Under review)
Dataset structure and contents
The BED dataset contains EEG recordings from 21 subjects, acquired during 3 similar sessions for each subject. The sessions were spaced one week apart from each other.
The BED dataset includes:
The dataset is organised in 3 folders:
RAW/ Contains the RAW files
RAW/sN/ Contains the RAW files associated with subject N
Each folder sN is composed by the following files:
- sN_s1.csv, sN_s2.csv, sN_s3.csv -- Files containing the EEG recordings for subject N and session 1, 2, and 3, respectively. These files contain 39 columns:
COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 ...UNUSED DATA... UNIX_TIMESTAMP
- subject_N_session_1_time_X.log, subject_N_session_2_time_X.log, subject_N_session_3_time_X.log -- Log files containing the sequence of events for the subject N and the session 1,2, and 3 respectively.
RAW_PARSED/
Contains Matlab files named sN_sM.mat. The files contain the recordings for the subject N in the session M. These files are composed by two variables:
- recording: size (time@256Hz x 17), Columns: COUNTER INTERPOLATED F3 FC5 AF3 F7 T7 P7 O1 O2 P8 T8 F8 AF4 FC6 F4 UNIX_TIMESTAMP
- events: cell array with size (events x 3) START_UNIX END_UNIX ADDITIONAL_INFO
START_UNIX is the UNIX timestamp in which the event starts
END_UNIX is the UNIX timestamp in which the event ends
ADDITIONAL INFO contains a struct with additional information regarding the specific event, in the case of the images, the expected score, the voted score, in the case of the cognitive task the input, in the case of the VEP the pattern and the frequency, etc..
Features/
Features/Identification
Features/Identification/[ARRC|MFCC|SPEC]/: Each of these folders contain the extracted features ready for classification for each of the stimuli, each file is composed by two variables, "feat" the feature matrix and "Y" the label matrix.
- feat: N x number of features
- Y: N x 2 (the #subject and the #session)
- INFO: Contains details about the event same as the ADDITIONAL INFO
Features/Verification: This folder is composed by 3 different files each of them with one different set of features extracted. Each file is composed by one cstruct array composed by:
- data: the time-series features, as described in the paper
- y: the #subject
- stimuli: the stimuli by name
- session: the #session
- INFO: Contains details about the event
The features provided are in sequential order, so index 1 and index 2, etc. are sequential in time if they belong to the same stimulus.
Additional information
For additional information regarding the creation of the BED dataset, please refer to the associated publication: Arnau-González, P., Katsigiannis, S., Arevalillo-Herráez, M., Ramzan, N., "BED: A new dataset for EEG-based biometrics", IEEE Internet of Things Journal, vol. 8, no. 15, pp. 12219 - 12230, 2021.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
The global biometrics in transportation market was valued at approximately USD 4.5 billion in 2023 and is projected to reach USD 12.9 billion by 2032, growing at a compound annual growth rate (CAGR) of 12.3% during the forecast period. The market size growth is being driven by increasing security demands, advancements in biometric technologies, and the integration of AI and IoT in transportation systems. The need for enhanced security measures in public transport systems and border controls has significantly spurred the adoption of biometric solutions, which are renowned for their accuracy and efficiency in identifying individuals.
A prominent growth factor for the biometrics in transportation market is the heightened emphasis on security across global transportation networks. As the frequency and sophistication of security threats increase, transportation authorities are prioritizing the implementation of robust security protocols to safeguard passengers and cargo. Biometric technologies, offering unique identification and verification capabilities, are being increasingly adopted in airports, railways, and other transportation hubs worldwide. This trend is further fueled by government initiatives mandating stringent security measures at critical points of infrastructure, as well as growing public acceptance of biometric solutions as a means of enhancing safety and convenience.
Another significant driver of market growth is the rapid technological advancements in biometric systems. The evolution of technologies such as fingerprint recognition, facial recognition, and iris recognition has led to more accurate, reliable, and user-friendly solutions. These innovations have expanded the potential applications of biometrics within the transportation sector, from securing access to restricted areas to streamlining passenger identification and boarding processes. Furthermore, the integration of artificial intelligence and machine learning into biometric systems has enhanced their capability to process and analyze massive datasets, thereby improving the speed and accuracy of identity verification.
The growing demand for seamless and contactless travel experiences is also contributing to the expansion of the biometrics in transportation market. In the wake of the COVID-19 pandemic, there has been an accelerated shift towards touchless solutions to minimize physical contact and reduce health risks. Biometric technologies are ideally suited to meet this demand, as they enable secure identification without the need for physical interaction. This is particularly relevant in airports, where biometric systems are increasingly being deployed to expedite the check-in, security screening, and boarding processes, thereby enhancing the overall passenger experience.
Regionally, North America holds a significant share of the biometrics in transportation market, driven by a strong focus on advanced security solutions and substantial investments in transportation infrastructure. The region's early adoption of biometric technologies in airports and border control applications has set a benchmark for other regions to follow. However, the Asia Pacific region is expected to witness the highest growth rate over the forecast period, propelled by rapid urbanization, increasing passenger traffic, and government initiatives to enhance transportation and security infrastructure. The growing economies in this region, such as China and India, are investing heavily in modernizing their transportation systems, creating lucrative opportunities for market expansion.
The technology segment within the biometrics in transportation market comprises fingerprint recognition, facial recognition, iris recognition, voice recognition, and other emerging technologies. Fingerprint recognition has traditionally been the most widely used biometric technology due to its cost-effectiveness, ease of use, and high accuracy. It is commonly employed in transportation settings for access control and verification purposes. However, its adoption is now giving way to more advanced technologies such as facial and iris recognition, which offer improved security and user experience. These technologies are increasingly being integrated into systems where rapid and contactless verification is a priority, such as airport check-ins and border control points.
Facial recognition technology has gained significant traction in recent years, becoming a preferred choice for many transportation authorities due to its non-intrusive nature and ability to quickly process large
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Project OverviewThis study aims to assess and compare stress responses in healthcare professionals during simulated pericardiocentesis, utilizing both a conventional mannequin and a virtual reality (VR) environment. Pericardiocentesis, crucial in managing cardiac tamponade, demands precision and decision-making under pressure, potentially inducing significant stress.MethodologyParticipants will perform the procedure in both simulated environments, with biometric indicators such as heart rate, skin conductance, and eletromyography recorded with a biosignal plux device, to evaluate stress intensity. Subjective stress assessments will complement biometric data, with statistical analysis identifying significant differences between simulation methods.SignificanceThe findings will elucidate the effectiveness of VR simulation in medical training compared to traditional mannequin-based methods, potentially guiding training decisions and improving healthcare professionals’ readiness for high-stress situations, thereby enhancing patient safety and treatment efficacy.ConclusionThis research aims to inform on the optimal use of VR technology in medical education and understand the psychological impacts of emergency medical procedures on healthcare providers, supporting technological innovation and psychological insights in medical education.FilesFiles were recorded adquired during the performing of pericardiocentesis in both scenarios using a biosignal plux device. Files are en txt format. This are the first 10 subjects of the project. The entire dataset is of 65 alumni, although some files are recorded sith some errors. If you need the complete dataset, please let me know.Thanks!
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
The dataset consists of 98,000 videos and selfies from 170 countries, providing a foundation for developing robust security systems and facial recognition algorithms.
While the dataset itself doesn't contain spoofing attacks, it's a valuable resource for testing liveness detection system, allowing researchers to simulate attacks and evaluate how effectively their systems can distinguish between real faces and various forms of spoofing.
By utilizing this dataset, researchers can contribute to the development of advanced security solutions, enabling the safe and reliable use of biometric technologies for authentication and verification. - Get the data
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2Fe46e401a5449bacce5f934aaea9bb06e%2FFrame%20155.png?generation=1730591437955112&alt=media" alt="">
The dataset offers a high-quality collection of videos and photos, including selfies taken with a range of popular smartphones, like iPhone, Xiaomi, Samsung, and more. The videos showcase individuals turning their heads in various directions, providing a natural range of movements for liveness detection training.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F22059654%2F8350718e93ee92840995405815739c61%2FFrame%20136%20(1).png?generation=1730591760432249&alt=media" alt="">
Furthermore, the dataset provides detailed metadata for each set, including information like gender, age, ethnicity, video resolution, duration, and frames per second. This rich metadata provides crucial context for analysis and model development.
Researchers can develop more accurate liveness detection algorithms, which is crucial for achieving the iBeta Level 2 certification, a benchmark for robust and reliable biometric systems that prevent fraud.
https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global Responsible Gaming Biometric System market size reached USD 1.18 billion in 2024, with a robust year-over-year expansion driven by regulatory mandates and mounting social responsibility requirements. The market is expected to grow at a CAGR of 18.2% from 2025 to 2033, projecting a substantial increase to USD 5.26 billion by 2033. This growth is fueled by the rapid adoption of biometric technologies across both brick-and-mortar and digital gambling platforms, as well as the increasing emphasis on player protection and responsible gaming practices globally.
A key growth factor propelling the Responsible Gaming Biometric System market is the tightening of regulatory frameworks worldwide. As governments and regulatory bodies intensify their scrutiny of the gambling industry, operators are now mandated to implement robust identity verification and player monitoring systems. Biometric solutions, including facial recognition, fingerprint scanning, and voice authentication, are increasingly seen as essential tools for ensuring compliance with anti-money laundering (AML) and Know Your Customer (KYC) regulations. These systems not only help in preventing underage gambling and self-exclusion circumvention but also enable real-time monitoring of player behavior to detect signs of problem gambling, thereby fostering a safer gaming environment.
Technological advancements are another significant driver of the Responsible Gaming Biometric System market. The integration of artificial intelligence (AI) and machine learning with biometrics has enhanced the accuracy and reliability of player identification and behavioral analytics. This has enabled gaming operators to deploy sophisticated systems capable of analyzing vast datasets to identify risky behaviors, flag anomalies, and provide timely interventions. Furthermore, the proliferation of smartphones and advancements in cloud computing have made biometric solutions more accessible and scalable, allowing both large-scale casinos and online gambling platforms to implement these systems efficiently and cost-effectively.
Consumer awareness and demand for responsible gambling measures are also accelerating the adoption of biometric systems in the gaming sector. With the rising incidence of gambling addiction and its associated social costs, there is growing public pressure on operators to prioritize player welfare. Modern biometric systems offer a seamless and user-friendly approach to safeguarding players, enabling features such as self-exclusion, time and spending limits, and real-time alerts. As a result, gaming operators are increasingly investing in biometric technologies not only to comply with regulations but also to enhance their brand reputation and foster long-term customer loyalty.
Regionally, North America and Europe are at the forefront of the Responsible Gaming Biometric System market, accounting for the largest market shares due to their mature regulatory landscapes and high adoption rates of advanced technologies. The Asia Pacific region, however, is witnessing the fastest growth, driven by the expansion of the gambling industry and the increasing digitalization of gaming platforms. Countries like Australia, Singapore, and Japan are actively implementing responsible gaming measures, while emerging markets in Latin America and the Middle East & Africa are gradually following suit. This regional diversification is expected to create new opportunities for market players and further stimulate global market growth over the forecast period.
The Responsible Gaming Biometric System market is segmented by component into hardware, software, and services, each playing a crucial role in the deployment and effectiveness of biometric solutions. Hardware components include biometric scanners, cameras, and sensors that capture and process unique physiological or behavioral characteristics of players. These devices form the foundation of any biometric system, ensuring accurate and real-time data collection. The hardware segment remains dominant in physical casinos and gaming venues, where robust and tamper-proof devices are essential for on-site identity verification and access control. As the market matures, hardware innovations such as contactless sensors and multi-modal biometric devices are gaining traction, enabling enhanced security and user convenience.&l
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Dataset Description:
The dataset comprises a collection of photos of people, organized into folders labeled "women" and "men." Each folder contains a significant number of images to facilitate training and testing of gender detection algorithms or models.
The dataset contains a variety of images capturing female and male individuals from diverse backgrounds, age groups, and ethnicities.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F12421376%2F1c4708f0b856f7889e3c0eea434fe8e2%2FFrame%2045%20(1).png?generation=1698764294000412&alt=media" alt="">
This labeled dataset can be utilized as training data for machine learning models, computer vision applications, and gender detection algorithms.
The dataset is split into train and test folders, each folder includes: - folders women and men - folders with images of people with the corresponding gender, - .csv file - contains information about the images and people in the dataset
keywords: biometric system, biometric system attacks, biometric dataset, face recognition database, face recognition dataset, face detection dataset, facial analysis, gender detection, supervised learning dataset, gender classification dataset, gender recognition dataset
Description:
The Printed Photos Attacks Dataset is a specialize resource design for the development and evaluation of liveness detection systems aimed at combating facial spoofing attempts. This dataset includes a comprehensive collection of videos that feature both authentic facial presentations and spoof attempts using print 2D photographs. By incorporating both real and fake faces, it provides a robust foundation for training and testing advanced facial recognition and anti-spoofing algorithms.
This dataset is particularly valuable for researchers and developers focus on enhancing biometric security systems. It introduces a novel method for learning and extracting distinctive facial features to effectively differentiate between genuine and spoofed inputs. The approach leverages deep neural networks (DNNs) and sophisticate biometric techniques, which have been shown to significantly improve the accuracy and reliability of liveness detection in various applications.
Download Dataset
Key features of the dataset include:
Diverse Presentation Methods: The dataset contains a range of facial presentations, including genuine facial videos and spoof videos create using high-quality print photographs. This diversity is essential for developing algorithms that can generalize across different types of spoofing attempts.
High-Resolution Videos: The videos in the dataset are capture in high resolution, ensuring that even subtle facial features and movements are visible, aiding in the accurate detection of spoofing.
Comprehensive Annotations: Each video is meticulously annotate with labels indicating whether the facial presentation is genuine or spoofed. Additionally, the dataset includes metadata such as the method of spoofing and environmental conditions, providing a rich context for algorithm development.
Unseen Spoof Detection: One of the unique aspects of this dataset is its emphasis on detecting unseen spoofing cues. The dataset is design to challenge algorithms to identify and adapt to new types of spoofing methods that may not have been encounter during the training phase.
Versatile Application: The dataset is suitable for a wide range of applications, including access control systems, mobile device authentication, and other security-sensitive environments where facial recognition is deploy.
This dataset is sourced from Kaggle.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset comprises fingerprint data collected from a total of 200 individuals, including 100 identified Type 2 Diabetes Mellitus (T2DM) patients and 100 control subjects. The dataset contains detailed fingerprint information, which can serve as a valuable resource for research and analysis in the context of T2DM and its potential associations with fingerprint characteristics. Researchers and analysts can use this dataset to explore patterns, conduct studies, and derive insights related to T2DM identification and its correlation with fingerprint features
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
KeyRecs is a keystroke dynamics dataset that can be used to train, validate, and test machine learning models for anomaly detection and robust typing pattern recognition, as well as the clustering and classification of users that present a similar behavior. It contains fixed-text and free-text samples of user typing behavior, obtained in a study with 100 participants of 20 different nationalities performing password retype and transcription exercises.
The samples consist of inter-key latencies computed by measuring the time between each key press and release during an exercise, following a digraph model. Additionally, the participants were also asked to provide their demographic information regarding age, gender, handedness, and nationality.
KeyRecs can be valuable to enhance the recognition of authorized users and prevent illegal logins in biometric authentication software, and can be combined with additional data recordings to create more extensive datasets and improve the generalization of machine learning models.
If you use this dataset, please cite the primary data article: https://doi.org/10.1016/j.dib.2023.109509
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset contains records of shorebirds that are captured, ringed and measured with mistnets on the mudflats in the Dutch Wadden Sea at Schiermonnikoog. Dataset includes the sampling event and the record of the caught shorebird species. Most commonly used biometric measures of wild birds will be add in the future and contain: Wing length, Mass, Bill Length, Total Head Length, Tarsus length, Primary score, body moult index and plumage cover.
All distance learning participants (students, professors, instructors, mentors, tutors and the rest) would like to know how well the students have assimilated the study materials being taught. The analysis and assessment of the knowledge students have acquired over a semester are an integral part of the independent studies process at the most advanced universities worldwide. A formal test or exam during the semester would cause needless stress for students. To resolve this problem, the authors of this article have developed a Biometric and Intelligent Self-Assessment of Student Progress (BISASP) System. The obtained research results are comparable with the results from other similar studies. This article ends with two case studies to demonstrate practical operation of the BISASP System. The first case study analyses the interdependencies between microtremors, stress and student marks. The second case study compares the marks assigned to students during the e-self-assessment, prior to the e-test and during the e-test. The dependence, determined in the second case study, between the student marks scored for the real examination and the marks based on their self-evaluation is statistically significant (the significance >0.99%). The original contribution of this article, compared to the research results published earlier, is as follows: the BISASP System developed by the authors is superior to the traditional self-assessment systems due to the use of voice stress analysis and a special algorithm, which permits a more detailed analysis of the knowledge attained by a student.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Biometric Attack dataset consists of >5k selfie images of people from >50 countries. Each participant provided 1 real life selfy image. Live selfies help facial recognition models to identify real faces and detect spoofing attempts, decreasing false negative results for Liveness detection tests.
Selfies provide a diverse range of facial features, lighting conditions, and capturing devices, which are essential for training robust facial recognition models that can accurately distinguish between real and spoofed faces
Liveness detection: This dataset is ideal for training and evaluating liveness detection models, enabling researchers to distinguish between real and spoof data with high accuracy
Keywords: Real life data, Live data, Selfie data, Antispoofing for AI, Liveness Detection dataset for AI, Spoof Detection dataset, Facial Recognition dataset, Biometric Authentication dataset, AI Dataset, Anti-Spoofing Technology, Facial Biometrics, Machine Learning Dataset, Deep Learning
The Custom Silicone Mask Attack Dataset (CSMAD) contains presentation attacks made of six custom-made silicone masks. Each mask cost about USD 4000. The dataset is designed for face presentation attack detection experiments.
The Custom Silicone Mask Attack Dataset (CSMAD) has been collected at the Idiap Research Institute. It is intended for face presentation attack detection experiments, where the presentation attacks have been mounted using a custom-made silicone mask of the person (or identity) being attacked.
The dataset contains videos of face-presentations, as a set of files specifying the experimental protocol corresponding the experiments presented in the corresponding publication.
Reference
If you publish results using this dataset, please cite the following publication.
Sushil Bhattacharjee, Amir Mohammadi and Sebastien Marcel: "Spoofing Deep Face Recognition With Custom Silicone Masks." in Proceedings of International Conference on Biometrics: Theory, Applications, and Systems (BTAS), 2018.
10.1109/BTAS.2018.8698550
http://publications.idiap.ch/index.php/publications/show/3887
Data Collection
Face-biometric data has been collected from 14 subjects to create this dataset. Subjects participating in this data-collection have played three roles: targets, attackers, and bona-fide clients. The subjects represented in the dataset are referred to here with letter-codes: A .. N. The subjects A..F have also been targets. That is, face-data for these six subjects has been used to construct their corresponding flexible masks (made of silicone). These masks have been made by Nimba Creations Ltd., a special effects company.
Bona fide presentations have been recorded for all subjects A..N. Attack presentations (presentations where the subject wears one of 6 masks) have been recorded for all six targets, made by different subjects. That is, each target has been attacked several times, each time by a different attacker wearing the mask in question. This is one way of increasing the variability in the dataset. Another way we have augmented the variability of the dataset is by capturing presentations under different illumination conditions. Presentations have been captured in four different lighting conditions:
All presentations have been captured with a green uniform background. See the paper mentioned above for more details of the data-collection process.
Dataset Structure
The dataset is organized in three subdirectories: ‘attack’, ‘bonafide’, ‘protocols’. The two directories: ‘attack’ and ‘bonafide’ contain presentation-videos and still images for attacks and bona fide presentations, respectively. The folder ‘protocols’ contains text files specifying the experimental protocol for vulnerability analysis of face-recognition (FR) systems.
The number of data-files per category are as follows:
The folder ‘attack/WEAR’ contains videos where the attack has been made by a person (attacker) wearing the mask of the target being attacked. The ‘attack/STAND’ folder contains videos where the attack has been made using a the target’s mask mounted on an appropriate stand.
Video File Format
The video files for the face-presentations are in ‘hdf5’ format (with file-extensions ‘.h5’. The folder structure of the hdf5 file is shown in Figure 1. Each file contains data collected using two cameras:
As shown in Figure 1, frames from the different channels (color, infrared, depth, thermal) from he two cameras are stored in separate directory-hierarchies in the hdf5 file. Each file respresents a video of approximately 10 seconds, or roughly, 300 frames.
In the hdf5 file, the directory for SR300 also contains a subdirectory named ‘aligned_color_to_depth’. This folder contains post-processed data, where the frames of depth channel have been aligned with those of the color channel based on the time-stamps of the frames.
Experimental Protocol
The ‘protocols’ folder contains text files that specify the protocols for vulnerability analysis experiments reported in the paper mentioned above. Please see the README file in the protocols folder for details.
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Native American Multi-Year Facial Image Dataset, thoughtfully curated to support the development of advanced facial recognition systems, biometric identification models, KYC verification tools, and other computer vision applications. This dataset is ideal for training AI models to recognize individuals over time, track facial changes, and enhance age progression capabilities.
This dataset includes over 5,000+ high-quality facial images, organized into individual participant sets, each containing:
To ensure model generalization and practical usability, images in this dataset reflect real-world diversity:
Each participant’s dataset is accompanied by rich metadata to support advanced model training and analysis, including:
This dataset is highly valuable for a wide range of AI and computer vision applications:
To keep pace with evolving AI needs, this dataset is regularly updated and customizable. Custom data collection options include:
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains gait-based biometric data collected from 5,000 users in a simulated environment for gait authentication in the Metaverse. It includes 16 key gait features extracted using OpenPose and MediaPipe and processed with feature engineering techniques for improved usability.
The dataset is valuable for gait-based authentication, user identification, and biometric security applications. It can be used for machine learning models, deep learning, and anomaly detection in gait recognition research.
Features include:
Stride length, step frequency, stance phase duration, swing phase duration
Hip, knee, and ankle joint angles
Ground reaction forces (GRFs), cadence variability, foot clearance
Gait symmetry index and more
Format: CSVLicense: CC BY 4.0 (Attribution Required)Citation: If using this dataset, please cite:Sandeep Ravikanti (2024). "Metaverse Gait Authentication Dataset (MGAD)." Zenodo. DOI: [10.5281/zenodo.14847773]
The NIST Fingerprint Registration and Comparison Tool (NFRaCT) is a cross-platform GUI application which allows a user to load a pair of fingerprint images, find corresponding points in both images, register and crop the images, and finally compute a series of measurements on the registered images as described in NIST Special Publication 500-336
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The FaciaVox dataset is an extensive multimodal biometric resource designed to enable in-depth exploration of face-image and voice recording research areas in both masked and unmasked scenarios.
Features of the Dataset:
1. Multimodal Data: A total of 1,800 face images (JPG) and 6,000 audio recordings (WAV) were collected, enabling cross-domain analysis of visual and auditory biometrics.
2. Participants were categorized into four age groups for structured labeling:
Label 1: Under 16 years
Label 2: 16 to less than 31 years
Label 3: 31 to less than 46 years
Label 4: 46 years and above
3. Sibling Data: Some participants are siblings, adding a challenging layer for speaker identification and facial recognition tasks due to genetic similarities in vocal and facial features. Sibling relationships are documented in the accompanying "FaciaVox List" data file.
4. Standardized Filenames: The dataset uses a consistent, intuitive naming convention for both facial images and voice recordings. Each filename includes:
Type (F: Face Image, V: Voice Recording)
Participant ID (e.g., sub001)
Mask Type (e.g., a: unmasked, b: disposable mask, etc.)
Zoom Level or Sentence ID (e.g., 1x, 3x, 5x for images or specific sentence identifier {01, 02, 03, ..., 10} for recordings)
5. Diverse Demographics: 19 different countries.
6. A challenging face recognition problem involving reflective mask shields and severe lighting conditions.
7. Each participant uttered 7 English statements and 3 Arabic statements, regardless of their native language. This adds a challenge for speaker identification.
Research Applications
FaciaVox is a versatile dataset supporting a wide range of research domains, including but not limited to:
• Speaker Identification (SI) and Face Recognition (FR): Evaluating biometric systems under varying conditions.
• Impact of Masks on Biometrics: Investigating how different facial coverings affect recognition performance.
• Language Impact on SI: Exploring the effects of native and non-native speech on speaker identification.
• Age and Gender Estimation: Inferring demographic information from voice and facial features.
• Race and Ethnicity Matching: Studying biometrics across diverse populations.
• Synthetic Voice and Deepfake Detection: Detecting cloned or generated speech.
• Cross-Domain Biometric Fusion: Combining facial and vocal data for robust authentication.
• Speech Intelligibility: Assessing how masks influence speech clarity.
• Image Inpainting: Reconstructing occluded facial regions for improved recognition.
Researchers can use the facial images and voice recordings independently or in combination to explore multimodal biometric systems. The standardized filenames and accompanying metadata make it easy to align visual and auditory data for cross-domain analyses. Sibling relationships and demographic labels add depth for tasks such as familial voice recognition, demographic profiling, and model bias evaluation.