100+ datasets found

h
wider_face
huggingface.co
opendatalab.com
+4more
Updated Jan 13, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Chinese University of Hong Kong (2022). wider_face [Dataset]. https://huggingface.co/datasets/CUHK-CSE/wider_face
Explore at:
Dataset updated
Jan 13, 2022
Dataset authored and provided by
The Chinese University of Hong Kong
License
Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically
Description
WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.
Face Detection Dataset
kaggle.com
Updated Dec 30, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sudhanshu Rastogi (2024). Face Detection Dataset [Dataset]. https://www.kaggle.com/datasets/sudhanshu2198/face-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 30, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sudhanshu Rastogi
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This Dataset is created by organizing the WIDER FACE dataset. WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We chose 32,203 images and labeled 393,703 faces with a high degree of variability in scale, pose, and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% of data as training, validation, and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset.

Original Dataset http://shuoyang1213.me/WIDERFACE/
Performance levels for the individual and average-image targets.
plos.figshare.com
xls
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David J. Robertson; Robin S. S. Kramer; A. Mike Burton (2023). Performance levels for the individual and average-image targets. [Dataset]. http://doi.org/10.1371/journal.pone.0119460.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0119460.t001
Dataset updated
Jun 2, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
David J. Robertson; Robin S. S. Kramer; A. Mike Burton
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Note. For each celebrity, individual-image targets were tested 50 times (5 targets x 5 test images for both ‘users’ and ‘imposters’), while average-image targets were tested 10 times (average target x 5 test images for both ‘users’ and ‘imposters’). The table shows mean performance by condition across all celebrities (SD in parentheses).Performance levels for the individual and average-image targets.
Face Mask Segmentation - WIDER Face Dataset
kaggle.com
zip
Updated Jul 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vinayak Shanawad (2021). Face Mask Segmentation - WIDER Face Dataset [Dataset]. https://www.kaggle.com/vinayakshanawad/face-mask-segmentation-wider-face-dataset
Explore at:
zip(833238739 bytes)Available download formats
Dataset updated
Jul 14, 2021
Authors
Vinayak Shanawad
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Project Description

The goal is to build a Face Mask Segmentation model which includes building a face detector to locate the position of a face in an image.

Data Description

WIDER Face Dataset WIDER FACE dataset is a Face Mask Segmentation benchmark dataset, of which images are selected from the publicly available WIDER dataset. This data have 32,203 images and 393,703 faces are labeled with a high degree of variability in scale, pose and occlusion as depicted in the sample images. In this project, we are using 409 images and around 1000 faces for ease of computation.

We will be using transfer learning on an already trained model to build our segmenter. We will perform transfer learning on the MobileNet model which is already trained to perform image segmentation. We will need to train the last 6-7 layers and freeze the remaining layers to train the model for face mask segmentation. To be able to train the MobileNet model for face mask segmentation, we will be using the WIDER FACE dataset for various images with a single face and multiple faces. The output of the model is the face mask segmented data which masks the face in an image. We learn to build a face mask segmentation model using Keras supported by Tensorflow.

Reference

Acknowledgment for the datasets. http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/ Mobile Net paper: https://arxiv.org/pdf/1704.04861.pdf

Objective

In this problem, we use "Transfer Learning" of an Image Segmentation model to detect any object according to the problem in hand. Here, we are particularly interested in segmenting faces in a given image.
Error Rates in Users of Automatic Face Recognition Software
plos.figshare.com
figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David White; James D. Dunn; Alexandra C. Schmid; Richard I. Kemp (2023). Error Rates in Users of Automatic Face Recognition Software [Dataset]. http://doi.org/10.1371/journal.pone.0139827
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0139827
Dataset updated
May 30, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
David White; James D. Dunn; Alexandra C. Schmid; Richard I. Kemp
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In recent years, wide deployment of automatic face recognition systems has been accompanied by substantial gains in algorithm performance. However, benchmarking tests designed to evaluate these systems do not account for the errors of human operators, who are often an integral part of face recognition solutions in forensic and security settings. This causes a mismatch between evaluation tests and operational accuracy. We address this by measuring user performance in a face recognition system used to screen passport applications for identity fraud. Experiment 1 measured target detection accuracy in algorithm-generated ‘candidate lists’ selected from a large database of passport images. Accuracy was notably poorer than in previous studies of unfamiliar face matching: participants made over 50% errors for adult target faces, and over 60% when matching images of children. Experiment 2 then compared performance of student participants to trained passport officers–who use the system in their daily work–and found equivalent performance in these groups. Encouragingly, a group of highly trained and experienced “facial examiners” outperformed these groups by 20 percentage points. We conclude that human performance curtails accuracy of face recognition systems–potentially reducing benchmark estimates by 50% in operational settings. Mere practise does not attenuate these limits, but superior performance of trained examiners suggests that recruitment and selection of human operators, in combination with effective training and mentorship, can improve the operational accuracy of face recognition systems.
h
flymyai-ffhq-edit-bench
huggingface.co
Updated Jun 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FlyMy.AI (2025). flymyai-ffhq-edit-bench [Dataset]. https://huggingface.co/datasets/flymy-ai/flymyai-ffhq-edit-bench
Explore at:
Dataset updated
Jun 20, 2025
Dataset provided by
FlyMy.AI
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Face Identity Preservation Benchmark

A comprehensive evaluation dataset for face transformation APIs measuring identity preservation across complexity levels and transformation categories. 🔗 Complete Repository: https://github.com/FlyMyAI/bench_M1

Dataset Summary

This benchmark evaluates identity preservation in face image transformations using 8,832 transformation pairs across three major APIs. The dataset provides systematic evaluation of face editing quality using… See the full description on the dataset page: https://huggingface.co/datasets/flymy-ai/flymyai-ffhq-edit-bench.
F
Native American Occluded Facial Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Native American Occluded Facial Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-occlusion-native-american
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Native American Human Face with Occlusion Dataset, carefully curated to support the development of robust facial recognition systems, occlusion detection models, biometric identification technologies, and KYC verification tools. This dataset provides real-world variability by including facial images with common occlusions, helping AI models perform reliably under challenging conditions.
Facial Image Data
The dataset comprises over 3,000 high-quality facial images, organized into participant-wise sets. Each set includes:
•
Occluded Images: 5 images per individual featuring different types of facial occlusions, masks, caps, sunglasses, or combinations of these accessories

•
Normal Image: 1 reference image of the same individual without any occlusion

Diversity & Representation
•
Geographic Coverage: Participants from across USA, Canada, Mexico and more Native American countries

•
Demographics: Individuals aged 18 to 70 years, with a 60:40 male-to-female ratio

•
File Formats: Images available in JPEG and HEIC formats

Image Quality & Capture Conditions
To ensure robustness and real-world utility, images were captured under diverse conditions:
•
Lighting Variations: Includes both natural and artificial lighting scenarios

•
Background Diversity: Indoor and outdoor backgrounds for model generalization

•
Device Quality: Captured using the latest smartphones to ensure high resolution and consistency

Metadata
Each image is paired with detailed metadata to enable advanced filtering, model tuning, and analysis:
•Unique Participant ID
•File Name
•Age
•Gender
•Country
•Demographic Profile
•Type of Occlusion
•File Format
This rich metadata helps train models that can recognize faces even when partially obscured.
Use Cases & Applications
This dataset is ideal for a wide range of real-world and research-focused applications, including:
•
Facial Recognition under Occlusion: Improve model performance when faces are partially hidden

•
Occlusion Detection: Train systems to detect and classify facial accessories like masks or sunglasses

•
Biometric Identity Systems: Enhance verification accuracy across varying conditions

•
KYC & Compliance: Support face matching even when the selfie includes common occlusions.

•
Security & Surveillance: Strengthen access control and monitoring systems in environments with mask usage

Secure & Ethical Collection
•
Data Security: Collected and processed securely on FutureBeeAI’s proprietary platform

•
Ethical Compliance: Follows strict guidelines for participant privacy and informed consent

•
Transparent Participation: All contributors provided written consent and were informed of the intended use

Dataset Updates &
g
LFW – Facial Recognition Dataset
gts.ai
jpeg
Updated Nov 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Globose Technology Solutions Private Limited (2023). LFW – Facial Recognition Dataset [Dataset]. https://gts.ai/dataset-download/lfw-dataset-premier-facial-recognition-for-ai-tools/
Explore at:
jpegAvailable download formats
Dataset updated
Nov 20, 2023
Dataset authored and provided by
Globose Technology Solutions Private Limited
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
LFW (Labeled Faces in the Wild) is a benchmark dataset for facial recognition research. It contains thousands of face images captured in real-world conditions and is primarily used for evaluating face verification and recognition algorithms.
F
African Occluded Facial Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). African Occluded Facial Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-occlusion-african
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the African Human Face with Occlusion Dataset, carefully curated to support the development of robust facial recognition systems, occlusion detection models, biometric identification technologies, and KYC verification tools. This dataset provides real-world variability by including facial images with common occlusions, helping AI models perform reliably under challenging conditions.
Facial Image Data
The dataset comprises over 5,000 high-quality facial images, organized into participant-wise sets. Each set includes:
•
Occluded Images: 5 images per individual featuring different types of facial occlusions, masks, caps, sunglasses, or combinations of these accessories

•
Normal Image: 1 reference image of the same individual without any occlusion

Diversity & Representation
•
Geographic Coverage: Participants from across Kenya, Malawi, Nigeria, Ethiopia, Benin, Somalia, Uganda, and more African countries

•
Demographics: Individuals aged 18 to 70 years, with a 60:40 male-to-female ratio

•
File Formats: Images available in JPEG and HEIC formats

Image Quality & Capture Conditions
To ensure robustness and real-world utility, images were captured under diverse conditions:
•
Lighting Variations: Includes both natural and artificial lighting scenarios

•
Background Diversity: Indoor and outdoor backgrounds for model generalization

•
Device Quality: Captured using the latest smartphones to ensure high resolution and consistency

Metadata
Each image is paired with detailed metadata to enable advanced filtering, model tuning, and analysis:
•Unique Participant ID
•File Name
•Age
•Gender
•Country
•Demographic Profile
•Type of Occlusion
•File Format
This rich metadata helps train models that can recognize faces even when partially obscured.
Use Cases & Applications
This dataset is ideal for a wide range of real-world and research-focused applications, including:
•
Facial Recognition under Occlusion: Improve model performance when faces are partially hidden

•
Occlusion Detection: Train systems to detect and classify facial accessories like masks or sunglasses

•
Biometric Identity Systems: Enhance verification accuracy across varying conditions

•
KYC & Compliance: Support face matching even when the selfie includes common occlusions.

•
Security & Surveillance: Strengthen access control and monitoring systems in environments with mask usage

Secure & Ethical Collection
•
Data Security: Collected and processed securely on FutureBeeAI’s proprietary platform

•
Ethical Compliance: Follows strict guidelines for participant privacy and informed consent

•
Transparent Participation: All contributors provided written consent and were informed of the intended use

Dataset
F
Middle Eastern Occluded Facial Image Dataset
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Middle Eastern Occluded Facial Image Dataset [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-occlusion-middle-east
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the Middle Eastern Human Face with Occlusion Dataset, carefully curated to support the development of robust facial recognition systems, occlusion detection models, biometric identification technologies, and KYC verification tools. This dataset provides real-world variability by including facial images with common occlusions, helping AI models perform reliably under challenging conditions.
Facial Image Data
The dataset comprises over 3,000 high-quality facial images, organized into participant-wise sets. Each set includes:
•
Occluded Images: 5 images per individual featuring different types of facial occlusions, masks, caps, sunglasses, or combinations of these accessories

•
Normal Image: 1 reference image of the same individual without any occlusion

Diversity & Representation
•
Geographic Coverage: Participants from across Egypt, Jordan, Suadi Arabia, UAE, Tunisia, and more Middle Eastern countries

•
Demographics: Individuals aged 18 to 70 years, with a 60:40 male-to-female ratio

•
File Formats: Images available in JPEG and HEIC formats

Image Quality & Capture Conditions
To ensure robustness and real-world utility, images were captured under diverse conditions:
•
Lighting Variations: Includes both natural and artificial lighting scenarios

•
Background Diversity: Indoor and outdoor backgrounds for model generalization

•
Device Quality: Captured using the latest smartphones to ensure high resolution and consistency

Metadata
Each image is paired with detailed metadata to enable advanced filtering, model tuning, and analysis:
•Unique Participant ID
•File Name
•Age
•Gender
•Country
•Demographic Profile
•Type of Occlusion
•File Format
This rich metadata helps train models that can recognize faces even when partially obscured.
Use Cases & Applications
This dataset is ideal for a wide range of real-world and research-focused applications, including:
•
Facial Recognition under Occlusion: Improve model performance when faces are partially hidden

•
Occlusion Detection: Train systems to detect and classify facial accessories like masks or sunglasses

•
Biometric Identity Systems: Enhance verification accuracy across varying conditions

•
KYC & Compliance: Support face matching even when the selfie includes common occlusions.

•
Security & Surveillance: Strengthen access control and monitoring systems in environments with mask usage

Secure & Ethical Collection
•
Data Security: Collected and processed securely on FutureBeeAI’s proprietary platform

•
Ethical Compliance: Follows strict guidelines for participant privacy and informed consent

•
Transparent Participation: All contributors provided written consent and were informed of the intended use

Dataset
The average recognition rates (%) and the corresponding standard deviations...
plos.figshare.com
xls
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jianzhong Wang; Yugen Yi; Wei Zhou; Yanjiao Shi; Miao Qi; Ming Zhang; Baoxue Zhang; Jun Kong (2023). The average recognition rates (%) and the corresponding standard deviations (%) of different algorithms on the test set of the AR face database with sunglasses and scarf occlusions (sub-image size 32×32). [Dataset]. http://doi.org/10.1371/journal.pone.0113198.t007
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0113198.t007
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Jianzhong Wang; Yugen Yi; Wei Zhou; Yanjiao Shi; Miao Qi; Ming Zhang; Baoxue Zhang; Jun Kong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The average recognition rates (%) and the corresponding standard deviations (%) of different algorithms on the test set of the AR face database with sunglasses and scarf occlusions (sub-image size 32×32).
m
Human Faces and Objects Mix Image Dataset
data.mendeley.com
Updated Mar 13, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bindu Garg (2025). Human Faces and Objects Mix Image Dataset [Dataset]. http://doi.org/10.17632/nzwvnrmwp3.1
Explore at:
Unique identifier
https://doi.org/10.17632/nzwvnrmwp3.1
Dataset updated
Mar 13, 2025
Authors
Bindu Garg
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Description: Human Faces and Objects Dataset (HFO-5000) The Human Faces and Objects Dataset (HFO-5000) is a curated collection of 5,000 images, categorized into three distinct classes: male faces (1,500), female faces (1,500), and objects (2,000). This dataset is designed for machine learning and computer vision applications, including image classification, face detection, and object recognition. The dataset provides high-quality, labeled images with a structured CSV file for seamless integration into deep learning pipelines.

Column Description: The dataset is accompanied by a CSV file that contains essential metadata for each image. The CSV file includes the following columns: file_name: The name of the image file (e.g., image_001.jpg). label: The category of the image, with three possible values: "male" (for male face images) "female" (for female face images) "object" (for images of various objects) file_path: The full or relative path to the image file within the dataset directory.

Uniqueness and Key Features: 1) Balanced Distribution: The dataset maintains an even distribution of human faces (male and female) to minimize bias in classification tasks. 2) Diverse Object Selection: The object category consists of a wide variety of items, ensuring robustness in distinguishing between human and non-human entities. 3) High-Quality Images: The dataset consists of clear and well-defined images, suitable for both training and testing AI models. 4) Structured Annotations: The CSV file simplifies dataset management and integration into machine learning workflows. 5) Potential Use Cases: This dataset can be used for tasks such as gender classification, facial recognition benchmarking, human-object differentiation, and transfer learning applications.

Conclusion: The HFO-5000 dataset provides a well-structured, diverse, and high-quality set of labeled images that can be used for various computer vision tasks. Its balanced distribution of human faces and objects ensures fairness in training AI models, making it a valuable resource for researchers and developers. By offering structured metadata and a wide range of images, this dataset facilitates advancements in deep learning applications related to facial recognition and object classification.
Face Recognition Dataset – 10,109 People with Multi-angle Face Images and...
nexdata.ai
Updated Jun 14, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nexdata (2024). Face Recognition Dataset – 10,109 People with Multi-angle Face Images and Demographic Labels [Dataset]. https://www.nexdata.ai/datasets/1402?source=Github
Explore at:
Dataset updated
Jun 14, 2024
Dataset authored and provided by
Nexdata
Variables measured
Data size, Data format, Data diversity, Age distribution, Race distribution, Gender distribution, Collecting environment
Description
This large-scale face image dataset features 10,109 individuals from various countries and ethnic backgrounds. Each subject has been captured in multiple real-world scenarios, resulting in diverse facial images under varying angles, lighting conditions, and expressions. Detailed annotations include gender, race, and age, making the dataset suitable for tasks such as facial recognition, face clustering, demographic analysis, and machine learning model training.The dataset has been validated by multiple AI companies and proven to deliver strong performance in real-world applications. All data collection, storage, and processing strictly adhere to global data protection regulations, including GDPR, CCPA, and PIPL, ensuring legal compliance and privacy preservation.
h
Benchmark-Images-for-Stable-Diffusion-Bias
huggingface.co
Updated Apr 16, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Reece Iriye (2024). Benchmark-Images-for-Stable-Diffusion-Bias [Dataset]. https://huggingface.co/datasets/ririye/Benchmark-Images-for-Stable-Diffusion-Bias
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 16, 2024
Authors
Reece Iriye
Description
ririye/Benchmark-Images-for-Stable-Diffusion-Bias dataset hosted on Hugging Face and contributed by the HF Datasets community
WIDER FACE
kaggle.com
zip
Updated Jan 5, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Motaz Saad (2019). WIDER FACE [Dataset]. https://www.kaggle.com/datasets/mksaad/wider-face-a-face-detection-benchmark
Explore at:
zip(3662993623 bytes)Available download formats
Dataset updated
Jan 5, 2019
Authors
Motaz Saad
Description
Dataset

This dataset was created by yeheak

Contents
SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and...
zenodo.org
zip
Updated Dec 15, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Riya Samanta; Riya Samanta; Bidyut Saha; Bidyut Saha (2024). SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and Tracking [Dataset]. http://doi.org/10.5281/zenodo.14474899
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14474899
Dataset updated
Dec 15, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Riya Samanta; Riya Samanta; Bidyut Saha; Bidyut Saha
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and Tracking

Description
SoloFace is a custom dataset derived from the COCO-Faces and Visual Wake Word datasets, specifically designed for single-face detection tasks in resource-constrained environments. This dataset is ideal for developing machine learning models for embedded AI applications, such as TinyML, which operate on low-power devices. Each image either contains a single human face or no face, with corresponding labels providing class information and bounding box coordinates for face detection. The dataset includes data augmentation to ensure robustness across diverse conditions, such as variations in lighting, scale, and orientation.

Dataset Structure
The dataset is organized into three subsets: train, test, and val. Each subset contains:

images/: .jpg image files.

labels/: .json label files with matching filenames to the images.

Label Format
Each .json label file includes:

image: Name of the corresponding image file.

class: 1 if a face is present, 0 otherwise.

bbox: Normalized bounding box coordinates [top_left_x, top_left_y, bottom_right_x, bottom_right_y]. If no face is present, the bounding box is set to [0.0, 0.0, 0.01, 0.01].

Statistics

Original Dataset:

Training images: 11,272

Testing images: 3,732

Validation images: 434

After Data Augmentation:

Training images: 56,360

Testing and validation images remain unchanged.

Class Distribution:

50% of images contain a single visible human face.

50% contain no human face.

Data Augmentation Details
To improve model robustness, the following augmentation techniques were applied to the training set:

Geometric Transformations: Random rotation (±15 degrees), scaling (±20%), and horizontal flipping (50%).

Color Transformations: Brightness and contrast adjustments (±30%).

Cropping: Random cropping up to 10% from image edges.

Each augmentation preserved bounding box consistency with the transformed images.

Usage This dataset supports the following use cases:

Training lightweight face detection models optimized for microcontroller deployment.

Benchmarking single-face detection models in resource-constrained environments.

Research on model robustness and efficiency.

Loading the Dataset

Download the dataset.

Extract the dataset using:
unzip soloface-detection-dataset.zip

Dataset structure:
soloface-detection-dataset/ ├── train/ │ ├── images/ │ ├── labels/ ├── test/ │ ├── images/ │ ├── labels/ ├── val/ │ ├── images/ │ ├── labels/

License
This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Permissions: Copy, distribute, and adapt for any purpose, including commercial.

Conditions: Provide proper attribution, a link to the license, and indicate changes.

Restrictions: No additional legal or technological restrictions.

For more details, visit the CC BY 4.0 License.

Contact
For inquiries or collaborations, please contact:

Bidyut Saha: sahabidyut999@gmail.com

Riya Samanta: study.riya1792@gmail.com

This format fits Zenodo's description field requirements while providing clarity and structure. Let me know if further refinements are needed!
Labelled Faces in the Wild (LFW) Dataset
kaggle.com
zip
Updated Feb 7, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marvin Luckianto (2024). Labelled Faces in the Wild (LFW) Dataset [Dataset]. https://www.kaggle.com/datasets/marvinluckianto/labelled-faces-in-the-wild-lfw-dataset
Explore at:
zip(117895655 bytes)Available download formats
Dataset updated
Feb 7, 2024
Authors
Marvin Luckianto
Description
Context Labeled Faces in the Wild (LFW) is a database of face photographs designed for studying the problem of unconstrained face recognition. This database was created and maintained by researchers at the University of Massachusetts, Amherst (specific references are in Acknowledgments section). 13,233 images of 5,749 people were detected and centered by the Viola Jones face detector and collected from the web. 1,680 of the people pictured have two or more distinct photos in the dataset. The original database contains four different sets of LFW images and also three different types of "aligned" images. According to the researchers, deep-funneled images produced superior results for most face verification algorithms compared to the other image types. Hence, the dataset uploaded here is the deep-funneled version.

Content There are 11 files in this dataset. lfw-deepfunneled.zip is the file containing the images. All other 10 files are relevant metadata that may help you in forming your training and testing sets for your model. There are two sections below to help you navigate the files better. The first section provides information specifically pertaining to the images. The second section explains the content of each metadata file.

Image information:

Image file format: Each image is available as "lfw/name/name_xxxx.jpg" where "xxxx" is the image number padded to four characters with leading zeroes. For example, the 10th George_W_Bush image can be found as "lfw/George_W_Bush/George_W_Bush_0010.jpg"

Image dimensions: Each image is a 250x250 jpg, detected and centered using the openCV implementation of Viola-Jones face detector. The cropping region returned by the detector was then automatically enlarged by a factor of 2.2 in each dimension to capture more of the head and then scaled to a uniform size.

Metadata information:

lfwallnames.csv: Contains all names of each face in the dataset along with number of images each face has.

lfwreadme.csv: Comprehensive readme file found on the original database. If there is any information you are missing here or are looking for additional resources you will probably find it in this file. It explains how each .csv file comes into play when forming training and testing models, as well as column metadata information for figuring out what the .csv is talking about. The original website also gives recommendations on training/testing splits and comparison benchmarks.

There are two recommended configurations for developing training and testing sets (pairs vs people). Depending on which route you choose, you will use the following .csv files:

pairs.csv: Contains randomly generated splits for 10-fold cross validation specifically for pairs. Use this for the image restricted configuration when forming training sets (refer to readme). There are 10 total sets; 5 sets contain 300 matched pairs, the other 5 sets contain 300 mismatched pairs.

people.csv: Contains randomly generated splits for 10-fold cross validation specifically for individual faces. Use this for the unrestricted configuration when forming training sets (refer to readme). There are 10 total sets, each with a different amount of people; Set 1: 601. Set 2: 555. Set 3: 552. Set 4: 560. Set 5: 567. Set 6: 527. Set 7: 597. Set 8: 601. Set 9: 580. Set 10: 609.

matchpairsDevTest.csv: Use this testing set if you decide to go with the pairs configuration. Contains 500 matched pairs of faces for testing set.

matchpairsDevTrain.csv: Use this training set if you decide to go with the pairs configuration. Contains 1100 matched pairs of faces for training set.

mismatchpairsDevTest.csv: Use this testing set if you decide to go with the pairs configuration. Contains 500 - mismatched pairs of faces for testing set.

mismatchpairsDevTrain.csv: Use this training set f you decide to go with the pairs configuration. Contains 1100 mismatched pairs of faces for training set.

peopleDevTest.csv: Use this testing test if you decide to go with the people configuration. Contains 1711 people and 3708 images.

peopleDevTrain.csv: Use this training set if you decide to go with the people configuration. Contains 4038 people and 9525 images.

Acknowledgements All data and metadata were originally found on http://vis-www.cs.umass.edu/lfw/. Please visit the site for other data versions including original, non-aligned data as well as more information on errata and training/testing model resources.

A big thank you and kudos to the creators of this dataset and relevant research:

Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.

Specifically for the deep-funneled version of the image data:

Gary B....
Z
Face mask detection and masked facial recognition dataset (MDMFR Dataset)
data.niaid.nih.gov
Updated Apr 8, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
NAEEM ULLAH; Ali Javed (2022). Face mask detection and masked facial recognition dataset (MDMFR Dataset) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6408602
Explore at:
Dataset updated
Apr 8, 2022
Dataset provided by
University of Engineering and Technology, Taxila
Authors
NAEEM ULLAH; Ali Javed
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The unavailability of a unified standard dataset for face mask detection and masked facial recognition motivated us to develop an in-house MDMFR dataset (MDMFR, 2022) to measure the performance of face mask detection and masked facial recognition methods. Both of these tasks have different dataset requirements. Face mask detection requires the images of multiple persons with and without mask. Whereas, masked face recognition requires multiple masked face images of the same person. Our MDMFR dataset consists of two main collections, 1) face mask detection, and 2) masked facial recognition. There are 6006 images in our MDMFR dataset. The face mask detection collection contains two categories of face images i.e., mask and unmask. Our detection database consists of 3174 with mask and 2832 without mask (unmasked) images. To construct the dataset, we captured multiple images of the same person in two configurations (mask and without mask). The masked facial recognition collection contains a total of 2896 masked images of 226 persons. More specifically, our dataset includes the images of both male and female persons of all ages including the children. The images of our dataset are diverse in terms of gender, race, and age of users, types of masks, illumination conditions, face angles, occlusions, environment, format, dimensions, and size, etc. Before being fed to our DeepMaskNet model, all images are scaled to a width and height of 256 pixels. All images have a bit depth of 24. We prepared the images of our dataset for the proposed DeepMaskNet model during preprocessing where images are cropped in Adobe-Photoshop to exclude the extra information like neck and shoulder. As the input size of our Deepmasknet model was 256-by-256, so images were resized to 256-by-256 in publicly available Plastiliq Image Resizer software (Plastiliq, 2022).
d
FileMarket | Diverse Human Face Data | 20,000 IDs | Face Recognition Data |...
datarade.ai
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FileMarket (2024). FileMarket | Diverse Human Face Data | 20,000 IDs | Face Recognition Data | Image/Video AI Training Data | Biometric Data [Dataset]. https://datarade.ai/data-products/filemarket-diverse-human-face-data-20-000-ids-face-reco-filemarket
Explore at:
.bin, .json, .xml, .csv, .xls, .sql, .txtAvailable download formats
Dataset updated
Jul 5, 2024
Dataset authored and provided by
FileMarket
Area covered
Georgia, Oman, Iceland, Sri Lanka, Hong Kong, Libya, Curaçao, United Kingdom, Martinique, Kyrgyzstan
Description
Biometric Data

FileMarket provides a comprehensive Biometric Data set, ideal for enhancing AI applications in security, identity verification, and more. In addition to Biometric Data, we offer specialized datasets across Object Detection Data, Machine Learning (ML) Data, Large Language Model (LLM) Data, and Deep Learning (DL) Data. Each dataset is meticulously crafted to support the development of cutting-edge AI models.

Data Size: 20,000 IDs

Race Distribution: The dataset encompasses individuals from diverse racial backgrounds, including Black, Caucasian, Indian, and Asian groups.

Gender Distribution: The dataset equally represents all genders, ensuring a balanced and inclusive collection.

Age Distribution: The data spans a broad age range, including young, middle-aged, and senior individuals, providing comprehensive age coverage.

Collection Environment: Data has been gathered in both indoor and outdoor environments, ensuring variety and relevance for real-world applications.

Data Diversity: This dataset includes a rich variety of face poses, racial backgrounds, age groups, lighting conditions, and scenes, making it ideal for robust biometric model training.

Device: All data has been collected using mobile phones, reflecting common real-world usage scenarios.

Data Format: The data is provided in .jpg and .png formats, ensuring compatibility with various processing tools and systems.

Accuracy: The labels for face pose, race, gender, and age are highly accurate, exceeding 95%, making this dataset reliable for training high-performance biometric models.
n
Perceptual expertise in forensic facial image comparison
data.niaid.nih.gov
dataone.org
+2more
zip
Updated Sep 24, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David White; P. Jonathan Phillips; Carina A. Hahn; Matthew Hill; Alice J. O'Toole; P. Jonathon Phillips (2015). Perceptual expertise in forensic facial image comparison [Dataset]. http://doi.org/10.5061/dryad.ng720
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.ng720
Dataset updated
Sep 24, 2015
Dataset provided by
National Institute of Standards and Technology
The University of Texas at Dallas
Authors
David White; P. Jonathan Phillips; Carina A. Hahn; Matthew Hill; Alice J. O'Toole; P. Jonathon Phillips
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Forensic facial identification examiners are required to match the identity of faces in images that vary substantially, owing to changes in viewing conditions and in a person's appearance. These identifications affect the course and outcome of criminal investigations and convictions. Despite calls for research on sources of human error in forensic examination, existing scientific knowledge of face matching accuracy is based, almost exclusively, on people without formal training. Here, we administered three challenging face matching tests to a group of forensic examiners with many years' experience of comparing face images for law enforcement and government agencies. Examiners outperformed untrained participants and computer algorithms, thereby providing the first evidence that these examiners are experts at this task. Notably, computationally fusing responses of multiple experts produced near-perfect performance. Results also revealed qualitative differences between expert and non-expert performance. First, examiners' superiority was greatest at longer exposure durations, suggestive of more entailed comparison in forensic examiners. Second, experts were less impaired by image inversion than non-expert students, contrasting with face memory studies that show larger face inversion effects in high performers. We conclude that expertise in matching identity across unfamiliar face images is supported by processes that differ qualitatively from those supporting memory for individual faces.

Facebook

Twitter

Click to copy link

Link copied

Cite

The Chinese University of Hong Kong (2022). wider_face [Dataset]. https://huggingface.co/datasets/CUHK-CSE/wider_face

wider_face

WIDER FACE

CUHK-CSE/wider_face

Explore at:

19 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Jan 13, 2022

Dataset authored and provided by

The Chinese University of Hong Kong

License

Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
License information was derived automatically

Description

WIDER FACE dataset is a face detection benchmark dataset, of which images are selected from the publicly available WIDER dataset. We choose 32,203 images and label 393,703 faces with a high degree of variability in scale, pose and occlusion as depicted in the sample images. WIDER FACE dataset is organized based on 61 event classes. For each event class, we randomly select 40%/10%/50% data as training, validation and testing sets. We adopt the same evaluation metric employed in the PASCAL VOC dataset. Similar to MALF and Caltech datasets, we do not release bounding box ground truth for the test images. Users are required to submit final prediction files, which we shall proceed to evaluate.

Clear search

Close search

Google apps

Main menu

wider_face

Face Detection Dataset

Performance levels for the individual and average-image targets.

Face Mask Segmentation - WIDER Face Dataset

Project Description

Data Description

Reference

Objective

Error Rates in Users of Automatic Face Recognition Software

flymyai-ffhq-edit-bench

Native American Occluded Facial Image Dataset

Introduction

Facial Image Data

Diversity & Representation

Image Quality & Capture Conditions

Metadata

Use Cases & Applications

Secure & Ethical Collection

Dataset Updates &

LFW – Facial Recognition Dataset

African Occluded Facial Image Dataset

Introduction

Facial Image Data

Diversity & Representation

Image Quality & Capture Conditions

Metadata

Use Cases & Applications

Secure & Ethical Collection

Dataset

Middle Eastern Occluded Facial Image Dataset

Introduction

Facial Image Data

Diversity & Representation

Image Quality & Capture Conditions

Metadata

Use Cases & Applications

Secure & Ethical Collection

Dataset

The average recognition rates (%) and the corresponding standard deviations...

Human Faces and Objects Mix Image Dataset

Face Recognition Dataset – 10,109 People with Multi-angle Face Images and...

Benchmark-Images-for-Stable-Diffusion-Bias

WIDER FACE

Dataset

Contents

SoloFace: A Single-Face Dataset for Resource-Constrained Face Detection and...

Labelled Faces in the Wild (LFW) Dataset

Face mask detection and masked facial recognition dataset (MDMFR Dataset)

FileMarket | Diverse Human Face Data | 20,000 IDs | Face Recognition Data |...

Perceptual expertise in forensic facial image comparison

wider_faceSee More Versions

WIDER FACE

CUHK-CSE/wider_face

wider_face