CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of 91,361 high quality 1024x1024 curated face images, and was created by "bringing to life" various 3D models and correcting bad "text to image" generations from stable diffusion model using a process similar to what is described in this short twitter thread which involve encoding the images into StyleGAN2 latent space and performing a small manipulation that turns each image into a photo-realistic image.
The dataset also contains facial landmarks (extended set) and face parsing semantic segmentation maps. An example script is provided and demonstrates how to access landmarks, segmentation maps, and textually search withing the dataset (with CLIP image/text feature vectors), and also performs some exploratory analysis of the dataset. link to github repo of the dataset.
The process that "brings to life" face-like images and creates several candidate photo-realistic ones is illustrated below: https://i.ibb.co/0sw8TkL/bring-to-life-process-SD-FS-2.png" alt="">
Overall, the SFHQ dataset contains ~425,000 high quality and curated synthetic face images that have no privacy issues or license issues surrounding them.
This dataset contains a high degree of variability on the axes of identity, ethnicity, age, pose, expression, lighting conditions, hair-style, hair-color, facial hair. It lacks variability in accessories axes such as hats or earphones as well as various jewelry. It also doesn't contain any occlusions except the self-occlusion of hair occluding the forehead, the ears and rarely the eyes. This dataset naturally inherits all the biases of it's original datasets (FFHQ, AAHQ, Close-Up Humans, Face Synthetics, LAION-5B) and the StyleGAN2 and Stable Diffusion models.
The purpose of this dataset is to be of sufficiently high quality that new machine learning models can be trained using this data, including even generative face models such as StyleGAN. The dataset may be extended from time to time with additional supervision labels (e.g. text descriptions), but no promises.
Hope this is helpful to some of you, feel free to use as you see fit...
Bblack and white face images of people taken with varying pose, expression, eyes, and sizeThis data consists of 640 black and white face images of people taken with varying pose (straight, left, right, up), expression (neutral, happy, sad, angry), eyes (wearing sunglasses or not), and size
Original Owner and Donor: Tom MitchellSchool of Computer Science Carnegie Mellon Universitytom.mitchell '@' cmu.edu http://www.cs.cmu.edu/~tom/
Each image can be characterized by the pose, expression, eyes, and size. There are 32 images for each person capturing every combination of features. To view the images, you can use the program xv. The image data can be found in /faces. This directory contains 20 subdirectories, one for each person, named by userid. Each of these directories contains several different face images of the same person. You will be interested in the images with the following naming convention: .pgm is the user id of the person in the image, and this field has 20 values: an2i, at33, boland, bpm, ch4f, cheyer, choon, danieln, glickman, karyadi, kawamura, kk49, megak, mitchell, night, phoebe, saavik, steffi, sz24, and tammo. is the head position of the person, and this field has 4 values: straight, left, right, up. is the facial expression of the person, and this field has 4 values: neutral, happy, sad, angry. is the eye state of the person, and this field has 2 values: open, sunglasses. is the scale of the image, and this field has 3 values: 1, 2, and 4. 1 indicates a full-resolution image (128 columns by 120 rows); 2 indicates a half-resolution image (64 by 60); 4 indicates a quarter-resolution image (32 by 30). If you've been looking closely in the image directories, you may notice that some images have a .bad suffix rather than the .pgm suffix. As it turns out, 16 of the 640 images taken have glitches due to problems with the camera setup; these are the .bad images. Some people had more glitches than others, but everyone who got ``faced'' should have at least 28 good face images (out of the 32 variations possible, discounting scale). More information and C code for loading the images is available here: .
N/A
T. Mitchell. Machine Learning, McGraw Hill, 1997.
Xiaofeng He and Partha Niyogi. Locality Preserving Projections. NIPS. 2003. * Marina Meila and Michael I. Jordan. Learning with Mixtures of Trees. Journal of Machine Learning Research, 1. 2000.
You may use this material free of charge for any educational purpose, provided attribution is given in any lectures or publications that make use of this material. [1] Papers were automatically harvested and associated with this data set, in collaborationwith Rexa.info
Source: http://archive.ics.uci.edu/ml/datasets/CMU+Face+Images
The image dataset is used by the CMU Face Detection Project and is provided for evaluating algorithms for detecting frontal views of human faces. This particular test set was originally assembled as part of work in Neural Network Based Face Detection. It combines images collected at CMU and MIT.
Please give appropriate acknowledgements when you use these test sets. In the lists of files below, you will find references to Test Sets A, B, C and the Rotated Test Set. Test Set B was provided by Kah-Kay Sung and Tomaso Poggio at the AI/CBCL Lab at MIT, and Test Sets A,C and the rotatated test set were collected here at CMU (by Henry A. Rowley, Shumeet Baluja, and Takeo Kanade).
In [Schneiderman and Kanade, 2000] and [Schneiderman and Kanade, 1998] we refer to the combination of test sets A, B, and C as the "combined test sets of Sung and Poggio and Rowley, Baluja, and Kanade." In [Rowley, Baluja, and Kanade, 1998] we refer to the combination of sets A, B, C as "test set one" and in [Rowley, Baluja, and Kanade, 1997] we refer to it as the "upright set" as distinguished from the "rotated set."
Additional Information
We provide ground truth in face location in the following format with one line per face (extreme side views are ignored):
filename left-eye right-eye nose left-corner-mouth center-mouth right-corner-mouth
For each feature on a face to be detected, two numbers are given. These numbers are the x and y coordinates (measured from the upper left corner) of the feature in the image.
Source: http://vasc.ri.cmu.edu/idb/images/face/frontal_images/images.html
The ORL Database of Faces contains 400 images from 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The size of each image is 92x112 pixels, with 256 grey levels per pixel.
Download dataset from Kaggle: https://www.kaggle.com/datasets/kasikrit/att-database-of-faces
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Flickr-Faces-HQ (FFHQ) is a high-quality image dataset of human faces, originally created as a benchmark for generative adversarial networks (GAN). The dataset consists of 70,000 high-quality PNG images at 1024x1024 resolution and contains considerable variation in terms of age, ethnicity and image background. It also has good coverage of accessories such as eyeglasses, sunglasses, hats, etc. The images were crawled from Flickr, thus inheriting all the biases of that website, and automatically aligned and cropped using dlib. Only images under permissive licenses were collected. Various automatic filters were used to prune the set, and finally Amazon Mechanical Turk was used to remove the occasional statues, paintings, or photos of photos.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Biometric management and that to which uses face, is indeed a very challenging work and requires a dedicated dataset which imbibes in it variations in pose, emotion and even occlusions. The Current work aims at delivering a dataset for training and testing purposes.SJB Face dataset is one such Indian face image dataset, which can be used to recognize faces. SJB Face dataset contains face images which were collected from digital camera. The face dataset collected has certain conditions such as different pose, Expressions, face partially occluded and with a uniform attire. SJB Face Dataset was collected from 48 students in which each of them consisted of 13 face images. All the images have in it the students in white attire. This database shall be used for face recognition projects in academia and industry to develop attendance systems and other relevant areas, as attendance system requires to have a systematic images for training.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Computational Intelligence and Photography Lab
Department of Computer Science, Yonsei University
https://github.com/minostauros/Real-and-Fake-Face-Detection/raw/master/samples.jpg" alt="Data Samples">
When using social networks, have you ever encountered a 'fake identity'? Anyone can create a fake profile image using image editing tools, or even using deep learning based generators. If you are interested in making the world wide web a better place by recognizing such fake faces, you should check this dataset.
Our dataset contains expert-generated high-quality photoshopped face images. The images are composite of different faces, separated by eyes, nose, mouth, or whole face. You may wonder why we need these expensive images other than images automatically generated by computers. Say we want to train a classifier for real and fake face images. In case of generative models like Generative Adversarial Networks (GAN), it is very easy to generate fake face images. Then, a classifier can be trained using those images, and they do great job discriminating real and generated face images. We can easily assume that the classifier learns some kind of pattern between images generated by GANs. However, those patterns can be futile in front of human experts, since exquisite counterfeits by experts are created in completely different process. Thus we had to create our own dataset with expert level fake face photos.
Inside the parent directory, training_real
/training_fake
contains real/fake face photos, respectively.
In case of fake photos, we have three groups; easy, mid, and hard (these groups are separated subjectively, so we do not recommend using them as explicit categories).
Also, you can use the filenames of fake images to see which part of faces are replaced (refer to the image below).
https://github.com/minostauros/Real-and-Fake-Face-Detection/raw/master/filename_description.jpg" alt="Filename description.">
You can cite our dataset as following. [Date Retrieved] should be updated by your own date.
Seonghyeon Nam, Seoung Wug Oh, Jae Yeon Kang, Chang Ha Shin, Younghyun Jo, Young Hwi Kim, Kyungmin Kim, Minho Shim, Sungho Lee, Yunji Kim, Suho Han, Gunhee Nam, Dasol Lee, Subin Jeon, In Cho, Woongoh Cho, Sejong Yang, Dongyoung Kim, Hyolim Kang, Sukjun Hwang, and Seon Joo Kim. (2019, January). Real and Fake Face Detection, Version 1. Retrieved [Date Retrieved] from https://www.kaggle.com/datasets/ciplab/real-and-fake-face-detection.
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
This AI training dataset contains 500+ Human Face with Occlusion image datasets for the Face Recognition model. Each dataset contains 5 different images of individuals with various accessories like a mask, cap, sunglasses, and a combination of mask and sunglasses, along with a normal image without any accessories. A network comprising more than 500 individuals from Latin American nations including the Argentina, Brazil, Costa Rica, Ecuador, Colombia, Peru, and many more has been used to gather the occlusion image data. The participants in the collection are males and females from the age group of 18 to 70 years. All images were collected with different lighting conditions and backgrounds to keep the Biometric dataset diverse and unbiased. All photos were taken using the most recent mobile devices with high quality.,
Along with Human face with Occlusion Image data, it also has metadata of participants like name, age, gender, country, and demographic of each participant to make it ready to use for computer vision technology. This human picture dataset for machine learning can be useful for teaching and assisting machines to recognise and identify Human face with occlusion of Latin American people. This training dataset can also be used to create models for KYC, biometric identity, and facial recognition, Occlusion Identification among other things.,
We continuously add more assets of diverse conditions and requirements in this off-the-shelf Image dataset. In accordance with your unique AI demands, we can additionally gather more detailed facial data. You can explore our crowd community for custom facial data collection.,
The license for this training dataset belongs to FutureBeeAI.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An index of face databases, their features, and how to access them has been unavailable. The “Face Image Meta-Database” (fIMDb) provides researchers with the tools to find the face images best suited to their research. The fIMDb is available from: https://cliffordworkman.com/resources/
23,110 People Multi-race and Multi-pose Face Images Data. This data includes Asian race, Caucasian race, black race, brown race and Indians. Each subject were collected 29 images under different scenes and light conditions. The 29 images include 28 photos (multi light conditions, multiple poses and multiple scenes) + 1 ID photo. This data can be used for face recognition related tasks.
The image dataset is used by the CMU Face Detection Project and is provided for evaluating algorithms for detecting frontal views of human faces. This particular test set was originally assembled as part of work in Neural Network Based Face Detection. It combines images collected at CMU and MIT.
Please give appropriate acknowledgements when you use these test sets. In the lists of files below, you will find references to Test Sets A, B, C and the Rotated Test Set. Test Set B was provided by Kah-Kay Sung and Tomaso Poggio at the AI/CBCL Lab at MIT, and Test Sets A,C and the rotatated test set were collected here at CMU (by Henry A. Rowley, Shumeet Baluja, and Takeo Kanade).
In [Schneiderman and Kanade, 2000] and [Schneiderman and Kanade, 1998] we refer to the combination of test sets A, B, and C as the "combined test sets of Sung and Poggio and Rowley, Baluja, and Kanade." In [Rowley, Baluja, and Kanade, 1998] we refer to the combination of sets A, B, C as "test set one" and in [Rowley, Baluja, and Kanade, 1997] we refer to it as the "upright set" as distinguished from the "rotated set."
Additional Information
We provide ground truth in face location in the following format with one line per face (extreme side views are ignored):
filename left-eye right-eye nose left-corner-mouth center-mouth right-corner-mouth
For each feature on a face to be detected, two numbers are given. These numbers are the x and y coordinates (measured from the upper left corner) of the feature in the image.
Source: http://vasc.ri.cmu.edu/idb/images/face/frontal_images/images.html
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data is used in the second experimental evaluation of face smile detection in the paper titled "Smile detection using Hybrid Face Representaion" - O.A.Arigbabu et al. 2015.
Download the main images from LFWcrop website: http://conradsanderson.id.au/lfwcrop/ to select the samples we used for smile and non-smile, as in the list.
Kindly cite:
Arigbabu, Olasimbo Ayodeji, et al. "Smile detection using hybrid face representation." Journal of Ambient Intelligence and Humanized Computing (2016): 1-12.
C. Sanderson, B.C. Lovell. Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference. ICB 2009, LNCS 5558, pp. 199-208, 2009
Huang GB, Mattar M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, Technical Report
Flickr-Faces-HQ (FFHQ) consists of 70,000 high-quality PNG images at 1024×1024 resolution and contains considerable variation in terms of age, ethnicity and image background. It also has good coverage of accessories such as eyeglasses, sunglasses, hats, etc. The images were crawled from Flickr, thus inheriting all the biases of that website, and automatically aligned and cropped using dlib. Only images under permissive licenses were collected. Various automatic filters were used to prune the set, and finally Amazon Mechanical Turk was used to remove the occasional statues, paintings, or photos of photos.
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
This AI training dataset contains 500+ human historical image datasets for the face recognition model. Each dataset contains 22 different historical images and 1 enrollment image of a human face. A network comprising more than 500 individuals from African nations including Kenya, Malawi, Nigeria, Benin, Ethiopia, and many more has been used to gather the historical image data. The participants in the collection are males and females from the age group of 18 to 70 years. All images were collected with different lighting conditions and backgrounds to keep the Biometric dataset diverse and unbiased. All photos are of high quality.,
Along with past Image data, it also has metadata of participants like name, age, gender, country, and demographic of each participant to make it ready to use for computer vision technology. This human picture dataset for machine learning can be useful for teaching and assisting machines to recognise and identify African people's faces. This training dataset can also be used to create models for KYC, biometric identity, and facial recognition, among other things.,
We continuously add more assets of diverse conditions and requirements in this off-the-shelf Image dataset. In accordance with your unique AI demands, we can additionally gather more detailed facial data. You can explore our crowd community for custom facial data collection.,
The license for this training dataset belongs to FutureBeeAI.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Georgia Tech face database (128MB) contains images of 50 people taken in two or three sessions between 06/01/99 and 11/15/99 at the Center for Signal and Image Processing at Georgia Institute of Technology. All people in the database are represented by 15 color JPEG images with cluttered background taken at resolution 640x480 pixels. The average size of the faces in these images is 150x150 pixels. The pictures show frontal and/or tilted faces with different facial expressions, lighting conditions and scale. Each image is manually labeled to determine the position of the face in the image. The set of label files is available here. The Readme.txt file gives more details about the database.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and age. We propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples). Thus, along with the name (i.e., identification) labels and task protocols (e.g., list of pairs for face verification, pre-packaged data-table with additional metadata and labels, etc.), BFW groups into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)). Thus, the motivation and intent are that BFW will provide a proxy to characterize FR systems with demographic-specific analysis now possible. For instance, various confusion metrics and the predefined criteria (i.e., score threshold) are fundamental when characterizing performance ratings of FR systems. The following visualization summarizes the confusion metrics in a way that relates to the different measurements.
The free Face dataset made for students and teachers. It contains 10,000 photos with equal distribution of race and gender parameters, along with metadata and facial landmarks. Free to use for research with citation Photos by Generated.Photos.
Photos
All the photos are 100% synthetic. Based on model-released photos. Royalty-free. Can be used for any research purpose except for the ones violating the law. Worldwide. No time limitations. Quantity 10,000 Quality 256x256px Diversity Ethnicity, gender
Metadata
The JSON files contain the metadata for each image in a machine-readable format, including: (1) FaceLandmarks: mouth, right_eyebrow, left_eyebrow, right_eye, left_eye, nose, jaw. (2) FaceAttributes: headPose, gender, makeup, emotion, facialHair, hair (hairColor, hairLength, bald), occlusion, ethnicity, eye_color, smile, age
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Similar face recognition has always been one of the most challenging research directions in face recognition.This project shared similar face images (SFD.zip) that we have collected so far. All images are labeld and collected from publicly available datasets such as LFW, CASIA-WebFace.We will continue to collect larger-scale data and continue to update this project.Because the data set is too large, we uploaded a compressed zip file (SFD.zip). Meanwhile here we upload a few examples for everyone to view.email: ileven@shu.edu.cn
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset for this project is characterised by photos of individual human emotion expression and these photos are taken with the help of both digital camera and a mobile phone camera from different angles, posture, background, light exposure, and distances. This task might look and sound very easy but there were some challenges encountered along the process which are reviewed below: 1) People constraint One of the major challenges faced during this project is getting people to participate in the image capturing process as school was on vacation, and other individuals gotten around the environment were not willing to let their images be captured for personal and security reasons even after explaining the notion behind the project which is mainly for academic research purposes. Due to this challenge, we resorted to capturing the images of the researcher and just a few other willing individuals. 2) Time constraint As with all deep learning projects, the more data available the more accuracy and less error the result will produce. At the initial stage of the project, it was agreed to have 10 emotional expression photos each of at least 50 persons and we can increase the number of photos for more accurate results but due to the constraint in time of this project an agreement was later made to just capture the researcher and a few other people that are willing and available. These photos were taken for just two types of human emotion expression that is, “happy” and “sad” faces due to time constraint too. To expand our work further on this project (as future works and recommendations), photos of other facial expression such as anger, contempt, disgust, fright, and surprise can be included if time permits. 3) The approved facial emotions capture. It was agreed to capture as many angles and posture of just two facial emotions for this project with at least 10 images emotional expression per individual, but due to time and people constraints few persons were captured with as many postures as possible for this project which is stated below: Ø Happy faces: 65 images Ø Sad faces: 62 images There are many other types of facial emotions and again to expand our project in the future, we can include all the other types of the facial emotions if time permits, and people are readily available. 4) Expand Further. This project can be improved furthermore with so many abilities, again due to the limitation of time given to this project, these improvements can be implemented later as future works. In simple words, this project is to detect/predict real-time human emotion which involves creating a model that can detect the percentage confidence of any happy or sad facial image. The higher the percentage confidence the more accurate the facial fed into the model. 5) Other Questions Can the model be reproducible? the supposed response to this question should be YES. If and only if the model will be fed with the proper data (images) such as images of other types of emotional expression.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Our LSLF dataset consists of 1,195,976 labeled face images for 11,459 individuals. These images are stored in JPEG format with a total size of 5.36 GB. Individuals have a minimum of 1 face image and a maximum of 1,157 face images. The average number of face images per individual is 104. Each image is automatically named as (PersonName VideoNumber FrameNumber ImageNuumber) and stored in the related individual folder.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset consists of 91,361 high quality 1024x1024 curated face images, and was created by "bringing to life" various 3D models and correcting bad "text to image" generations from stable diffusion model using a process similar to what is described in this short twitter thread which involve encoding the images into StyleGAN2 latent space and performing a small manipulation that turns each image into a photo-realistic image.
The dataset also contains facial landmarks (extended set) and face parsing semantic segmentation maps. An example script is provided and demonstrates how to access landmarks, segmentation maps, and textually search withing the dataset (with CLIP image/text feature vectors), and also performs some exploratory analysis of the dataset. link to github repo of the dataset.
The process that "brings to life" face-like images and creates several candidate photo-realistic ones is illustrated below: https://i.ibb.co/0sw8TkL/bring-to-life-process-SD-FS-2.png" alt="">
Overall, the SFHQ dataset contains ~425,000 high quality and curated synthetic face images that have no privacy issues or license issues surrounding them.
This dataset contains a high degree of variability on the axes of identity, ethnicity, age, pose, expression, lighting conditions, hair-style, hair-color, facial hair. It lacks variability in accessories axes such as hats or earphones as well as various jewelry. It also doesn't contain any occlusions except the self-occlusion of hair occluding the forehead, the ears and rarely the eyes. This dataset naturally inherits all the biases of it's original datasets (FFHQ, AAHQ, Close-Up Humans, Face Synthetics, LAION-5B) and the StyleGAN2 and Stable Diffusion models.
The purpose of this dataset is to be of sufficiently high quality that new machine learning models can be trained using this data, including even generative face models such as StyleGAN. The dataset may be extended from time to time with additional supervision labels (e.g. text descriptions), but no promises.
Hope this is helpful to some of you, feel free to use as you see fit...