Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This Dataset is created by organizing the WIDER FACE dataset. WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We chose 32,203 images and labeled 393,703 faces with a high degree of variability in scale, pose, and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% of data as training, validation, and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset.
Original Dataset http://shuoyang1213.me/WIDERFACE/
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Note. For each celebrity, individual-image targets were tested 50 times (5 targets x 5 test images for both ‘users’ and ‘imposters’), while average-image targets were tested 10 times (average target x 5 test images for both ‘users’ and ‘imposters’). The table shows mean performance by condition across all celebrities (SD in parentheses).Performance levels for the individual and average-image targets.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The goal is to build a Face Mask Segmentation model which includes building a face detector to locate the position of a face in an image.
WIDER Face Dataset WIDER FACE dataset is a Face Mask Segmentation benchmark dataset, of which images are selected from the publicly available WIDER dataset. This data have 32,203 images and 393,703 faces are labeled with a high degree of variability in scale, pose and occlusion as depicted in the sample images. In this project, we are using 409 images and around 1000 faces for ease of computation.
We will be using transfer learning on an already trained model to build our segmenter. We will perform transfer learning on the MobileNet model which is already trained to perform image segmentation. We will need to train the last 6-7 layers and freeze the remaining layers to train the model for face mask segmentation. To be able to train the MobileNet model for face mask segmentation, we will be using the WIDER FACE dataset for various images with a single face and multiple faces. The output of the model is the face mask segmented data which masks the face in an image. We learn to build a face mask segmentation model using Keras supported by Tensorflow.
Acknowledgment for the datasets. http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/ Mobile Net paper: https://arxiv.org/pdf/1704.04861.pdf
In this problem, we use "Transfer Learning" of an Image Segmentation model to detect any object according to the problem in hand. Here, we are particularly interested in segmenting faces in a given image.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
In recent years, wide deployment of automatic face recognition systems has been accompanied by substantial gains in algorithm performance. However, benchmarking tests designed to evaluate these systems do not account for the errors of human operators, who are often an integral part of face recognition solutions in forensic and security settings. This causes a mismatch between evaluation tests and operational accuracy. We address this by measuring user performance in a face recognition system used to screen passport applications for identity fraud. Experiment 1 measured target detection accuracy in algorithm-generated ‘candidate lists’ selected from a large database of passport images. Accuracy was notably poorer than in previous studies of unfamiliar face matching: participants made over 50% errors for adult target faces, and over 60% when matching images of children. Experiment 2 then compared performance of student participants to trained passport officers–who use the system in their daily work–and found equivalent performance in these groups. Encouragingly, a group of highly trained and experienced “facial examiners” outperformed these groups by 20 percentage points. We conclude that human performance curtails accuracy of face recognition systems–potentially reducing benchmark estimates by 50% in operational settings. Mere practise does not attenuate these limits, but superior performance of trained examiners suggests that recruitment and selection of human operators, in combination with effective training and mentorship, can improve the operational accuracy of face recognition systems.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Face Identity Preservation Benchmark
A comprehensive evaluation dataset for face transformation APIs measuring identity preservation across complexity levels and transformation categories. 🔗 Complete Repository: https://github.com/FlyMyAI/bench_M1
Dataset Summary
This benchmark evaluates identity preservation in face image transformations using 8,832 transformation pairs across three major APIs. The dataset provides systematic evaluation of face editing quality using… See the full description on the dataset page: https://huggingface.co/datasets/flymy-ai/flymyai-ffhq-edit-bench.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Native American Human Face with Occlusion Dataset, carefully curated to support the development of robust facial recognition systems, occlusion detection models, biometric identification technologies, and KYC verification tools. This dataset provides real-world variability by including facial images with common occlusions, helping AI models perform reliably under challenging conditions.
The dataset comprises over 3,000 high-quality facial images, organized into participant-wise sets. Each set includes:
To ensure robustness and real-world utility, images were captured under diverse conditions:
Each image is paired with detailed metadata to enable advanced filtering, model tuning, and analysis:
This rich metadata helps train models that can recognize faces even when partially obscured.
This dataset is ideal for a wide range of real-world and research-focused applications, including:
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
LFW (Labeled Faces in the Wild) is a benchmark dataset for facial recognition research. It contains thousands of face images captured in real-world conditions and is primarily used for evaluating face verification and recognition algorithms.
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the African Human Face with Occlusion Dataset, carefully curated to support the development of robust facial recognition systems, occlusion detection models, biometric identification technologies, and KYC verification tools. This dataset provides real-world variability by including facial images with common occlusions, helping AI models perform reliably under challenging conditions.
The dataset comprises over 5,000 high-quality facial images, organized into participant-wise sets. Each set includes:
To ensure robustness and real-world utility, images were captured under diverse conditions:
Each image is paired with detailed metadata to enable advanced filtering, model tuning, and analysis:
This rich metadata helps train models that can recognize faces even when partially obscured.
This dataset is ideal for a wide range of real-world and research-focused applications, including:
Facebook
Twitterhttps://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Welcome to the Middle Eastern Human Face with Occlusion Dataset, carefully curated to support the development of robust facial recognition systems, occlusion detection models, biometric identification technologies, and KYC verification tools. This dataset provides real-world variability by including facial images with common occlusions, helping AI models perform reliably under challenging conditions.
The dataset comprises over 3,000 high-quality facial images, organized into participant-wise sets. Each set includes:
To ensure robustness and real-world utility, images were captured under diverse conditions:
Each image is paired with detailed metadata to enable advanced filtering, model tuning, and analysis:
This rich metadata helps train models that can recognize faces even when partially obscured.
This dataset is ideal for a wide range of real-world and research-focused applications, including:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The average recognition rates (%) and the corresponding standard deviations (%) of different algorithms on the test set of the AR face database with sunglasses and scarf occlusions (sub-image size 32×32).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Description: Human Faces and Objects Dataset (HFO-5000) The Human Faces and Objects Dataset (HFO-5000) is a curated collection of 5,000 images, categorized into three distinct classes: male faces (1,500), female faces (1,500), and objects (2,000). This dataset is designed for machine learning and computer vision applications, including image classification, face detection, and object recognition. The dataset provides high-quality, labeled images with a structured CSV file for seamless integration into deep learning pipelines.
Column Description: The dataset is accompanied by a CSV file that contains essential metadata for each image. The CSV file includes the following columns: file_name: The name of the image file (e.g., image_001.jpg). label: The category of the image, with three possible values: "male" (for male face images) "female" (for female face images) "object" (for images of various objects) file_path: The full or relative path to the image file within the dataset directory.
Uniqueness and Key Features: 1) Balanced Distribution: The dataset maintains an even distribution of human faces (male and female) to minimize bias in classification tasks. 2) Diverse Object Selection: The object category consists of a wide variety of items, ensuring robustness in distinguishing between human and non-human entities. 3) High-Quality Images: The dataset consists of clear and well-defined images, suitable for both training and testing AI models. 4) Structured Annotations: The CSV file simplifies dataset management and integration into machine learning workflows. 5) Potential Use Cases: This dataset can be used for tasks such as gender classification, facial recognition benchmarking, human-object differentiation, and transfer learning applications.
Conclusion: The HFO-5000 dataset provides a well-structured, diverse, and high-quality set of labeled images that can be used for various computer vision tasks. Its balanced distribution of human faces and objects ensures fairness in training AI models, making it a valuable resource for researchers and developers. By offering structured metadata and a wide range of images, this dataset facilitates advancements in deep learning applications related to facial recognition and object classification.
Facebook
TwitterThis large-scale face image dataset features 10,109 individuals from various countries and ethnic backgrounds. Each subject has been captured in multiple real-world scenarios, resulting in diverse facial images under varying angles, lighting conditions, and expressions. Detailed annotations include gender, race, and age, making the dataset suitable for tasks such as facial recognition, face clustering, demographic analysis, and machine learning model training.The dataset has been validated by multiple AI companies and proven to deliver strong performance in real-world applications. All data collection, storage, and processing strictly adhere to global data protection regulations, including GDPR, CCPA, and PIPL, ensuring legal compliance and privacy preservation.
Facebook
Twitterririye/Benchmark-Images-for-Stable-Diffusion-Bias dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterThis dataset was created by yeheak
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and Tracking
Description
SoloFace is a custom dataset derived from the COCO-Faces and Visual Wake Word datasets, specifically designed for single-face detection tasks in resource-constrained environments. This dataset is ideal for developing machine learning models for embedded AI applications, such as TinyML, which operate on low-power devices. Each image either contains a single human face or no face, with corresponding labels providing class information and bounding box coordinates for face detection. The dataset includes data augmentation to ensure robustness across diverse conditions, such as variations in lighting, scale, and orientation.
Dataset Structure
The dataset is organized into three subsets: train, test, and val. Each subset contains:
images/: .jpg image files.labels/: .json label files with matching filenames to the images.Label Format
Each .json label file includes:
image: Name of the corresponding image file.class: 1 if a face is present, 0 otherwise.bbox: Normalized bounding box coordinates [top_left_x, top_left_y, bottom_right_x, bottom_right_y]. If no face is present, the bounding box is set to [0.0, 0.0, 0.01, 0.01].Statistics
Original Dataset:
After Data Augmentation:
Class Distribution:
Data Augmentation Details
To improve model robustness, the following augmentation techniques were applied to the training set:
Each augmentation preserved bounding box consistency with the transformed images.
Usage This dataset supports the following use cases:
Loading the Dataset
unzip soloface-detection-dataset.zip
soloface-detection-dataset/
├── train/
│ ├── images/
│ ├── labels/
├── test/
│ ├── images/
│ ├── labels/
├── val/
│ ├── images/
│ ├── labels/
License
This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
For more details, visit the CC BY 4.0 License.
Contact
For inquiries or collaborations, please contact:
sahabidyut999@gmail.comstudy.riya1792@gmail.comThis format fits Zenodo's description field requirements while providing clarity and structure. Let me know if further refinements are needed!
Facebook
TwitterContext Labeled Faces in the Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition. This database was created and maintained by researchers at the University of Massachusetts, Amherst (specific references are in Acknowledgments section). 13,233 images of 5,749 people were detected and centered by the Viola Jones face detector and collected from the web. 1,680 of the people pictured have two or more distinct photos in the dataset. The original database contains four different sets of LFW images and also three different types of "aligned" images. According to the researchers, deep-funneled images produced superior results for most face verification algorithms compared to the other image types. Hence, the dataset uploaded here is the deep-funneled version.
Content There are 11 files in this dataset. lfw-deepfunneled.zip is the file containing the images. All other 10 files are relevant metadata that may help you in forming your training and testing sets for your model. There are two sections below to help you navigate the files better. The first section provides information specifically pertaining to the images. The second section explains the content of each metadata file.
Image information:
Metadata information:
There are two recommended configurations for developing training and testing sets (pairs vs people). Depending on which route you choose, you will use the following .csv files:
Acknowledgements All data and metadata were originally found on http://vis-www.cs.umass.edu/lfw/. Please visit the site for other data versions including original, non-aligned data as well as more information on errata and training/testing model resources.
A big thank you and kudos to the creators of this dataset and relevant research:
Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.
Specifically for the deep-funneled version of the image data:
Gary B....
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The unavailability of a unified standard dataset for face mask detection and masked facial recognition motivated us to develop an in-house MDMFR dataset (MDMFR, 2022) to measure the performance of face mask detection and masked facial recognition methods. Both of these tasks have different dataset requirements. Face mask detection requires the images of multiple persons with and without mask. Whereas, masked face recognition requires multiple masked face images of the same person. Our MDMFR dataset consists of two main collections, 1) face mask detection, and 2) masked facial recognition. There are 6006 images in our MDMFR dataset. The face mask detection collection contains two categories of face images i.e., mask and unmask. Our detection database consists of 3174 with mask and 2832 without mask (unmasked) images. To construct the dataset, we captured multiple images of the same person in two configurations (mask and without mask). The masked facial recognition collection contains a total of 2896 masked images of 226 persons. More specifically, our dataset includes the images of both male and female persons of all ages including the children. The images of our dataset are diverse in terms of gender, race, and age of users, types of masks, illumination conditions, face angles, occlusions, environment, format, dimensions, and size, etc. Before being fed to our DeepMaskNet model, all images are scaled to a width and height of 256 pixels. All images have a bit depth of 24. We prepared the images of our dataset for the proposed DeepMaskNet model during preprocessing where images are cropped in Adobe-Photoshop to exclude the extra information like neck and shoulder. As the input size of our Deepmasknet model was 256-by-256, so images were resized to 256-by-256 in publicly available Plastiliq Image Resizer software (Plastiliq, 2022).
Facebook
TwitterBiometric Data
FileMarket provides a comprehensive Biometric Data set, ideal for enhancing AI applications in security, identity verification, and more. In addition to Biometric Data, we offer specialized datasets across Object Detection Data, Machine Learning (ML) Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each dataset is meticulously crafted to support the development of cutting-edge AI models.
Data Size: 20,000 IDs
Race Distribution: The dataset encompasses individuals from diverse racial backgrounds, including Black, Caucasian, Indian, and Asian groups.
Gender Distribution: The dataset equally represents all genders, ensuring a balanced and inclusive collection.
Age Distribution: The data spans a broad age range, including young, middle-aged, and senior individuals, providing comprehensive age coverage.
Collection Environment: Data has been gathered in both indoor and outdoor environments, ensuring variety and relevance for real-world applications.
Data Diversity: This dataset includes a rich variety of face poses, racial backgrounds, age groups, lighting conditions, and scenes, making it ideal for robust biometric model training.
Device: All data has been collected using mobile phones, reflecting common real-world usage scenarios.
Data Format: The data is provided in .jpg and .png formats, ensuring compatibility with various processing tools and systems.
Accuracy: The labels for face pose, race, gender, and age are highly accurate, exceeding 95%, making this dataset reliable for training high-performance biometric models.
Facebook
Twitterhttps://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Forensic facial identification examiners are required to match the identity of faces in images that vary substantially, owing to changes in viewing conditions and in a person's appearance. These identifications affect the course and outcome of criminal investigations and convictions. Despite calls for research on sources of human error in forensic examination, existing scientific knowledge of face matching accuracy is based, almost exclusively, on people without formal training. Here, we administered three challenging face matching tests to a group of forensic examiners with many years' experience of comparing face images for law enforcement and government agencies. Examiners outperformed untrained participants and computer algorithms, thereby providing the first evidence that these examiners are experts at this task. Notably, computationally fusing responses of multiple experts produced near-perfect performance. Results also revealed qualitative differences between expert and non-expert performance. First, examiners' superiority was greatest at longer exposure durations, suggestive of more entailed comparison in forensic examiners. Second, experts were less impaired by image inversion than non-expert students, contrasting with face memory studies that show larger face inversion effects in high performers. We conclude that expertise in matching identity across unfamiliar face images is supported by processes that differ qualitatively from those supporting memory for individual faces.
Facebook
TwitterAttribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.