CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image.
The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization.
Note: CelebA dataset may contain potential bias. The fairness indicators example goes into detail about several considerations to keep in mind while using the CelebA dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('celeb_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/celeb_a-2.1.0.png" alt="Visualization" width="500px">
https://choosealicense.com/licenses/other/https://choosealicense.com/licenses/other/
Dataset Card for Dataset Name
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including:
10,177 number of identities,
202,599 number of face images, and
5 landmark locations, 40 binary attributes annotations per image.
The… See the full description on the dataset page: https://huggingface.co/datasets/flwrlabs/celeba.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
The CelebA-HQ dataset is a high-quality version of CelebA that consists of 30,000 images at 1024×1024 resolution.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has segmentation mask of facial attributes corresponding to CelebA. The masks of CelebAMask-HQ were manually-annotated with the size of 512 x 512 and 19 classes including all facial components and accessories such as skin, nose, eyes, eyebrows, ears, mouth, lip, hair, hat, eyeglass, earring, necklace, neck, and cloth. CelebAMask-HQ can be used to train and evaluate algorithms of face parsing, face recognition, and GANs for face generation and editing. ## Sample Images ## Face Manipulation Model with CelebAMask-HQ CelebAMask-HQ can be used on several research fields including: facial image manipulation, face parsing, face recognition, and face hallucination. We showcase an application on interactive facial image manipulation as bellow: * Sa
https://github.com/weihaox/Multi-Modal-CelebA-HQ-Datasethttps://github.com/weihaox/Multi-Modal-CelebA-HQ-Dataset
Multi-Modal-CelebA-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background. Multi-Modal-CelebA-HQ can be used to train and evaluate algorithms of text-to-image generation, text-guided image manipulation, sketch-to-image generation, image caption, and VQA. This dataset is proposed and used in TediGAN.
CelebAHairMask-HQ
CelebAHairMask-HQ is a extended dataset of CelebAMask-HQ for hair segmentation or hair matting. CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has segmentation mask of facial attributes corresponding to CelebA. The masks of CelebAHairMask-HQ were auto-annotated with the size of 1024 x 1024. CelebAHairMask-HQ can be used to train and evaluate… See the full description on the dataset page: https://huggingface.co/datasets/cpuimage/CelebAHairMask-HQ.
CelebFaces Attributes dataset contains 202,599 face images of the size 178×218 from 10,177 celebrities, each annotated with 40 binary labels indicating facial attributes like hair color, gender and age.
CelebA Female Dataset
Dataset Description
This dataset is a filtered subset of the CelebA dataset (Celebrities Faces Attributes), containing only female faces. The original CelebA dataset is a large-scale face attributes dataset with more than 200,000 celebrity images, each with 40 attribute annotations.
Dataset Creation
This dataset was created by:
Loading the original CelebA dataset Filtering to keep only images labeled as female (based on the "Male"… See the full description on the dataset page: https://huggingface.co/datasets/MnLgt/CelebA-females.
CelebA-Spoof is a large-scale face anti-spoofing dataset with the following properties:
Quantity: CelebA-Spoof comprises of 625,537 pictures of 10,177 subjects, significantly larger than the existing datasets. Diversity: The spoof images are captured from 8 scenes (2 environments * 4 illumination conditions) with more than 10 sensors. Annotation Richness: CelebA-Spoof contains 10 spoof type annotations, as well as the 40 attribute annotations inherited from the original CelebA dataset.
Description K-pop Idol Dataset - Female (KID-F) is the first dataset of K-pop idol high quality face images. It consists of about 6,000 high quality face images at 512x512 resolution and identity labels for each image.
We collected about 90,000 K-pop female idol images and crop the face from each image. And we classified high quality face images. As a result, there are about 6,000 high quality face images in this dataset.
There are 300 test datasets for a benchmark. There are no duplicate images between test and train images. Some identities in test images are not duplicated with train images. (It means some test images is new identity to the trained model) Each test images have its degraded pair. You can use these degraded test images for testing face super resolution performance.
We also provide identity labels for each image. You can download the csv file from our github
Download You can download dataset from here. Google Drive
Agreement
The use of this software is RESTRICTED to non-commercial research and educational purposes. All images of the KID-F dataset are obtained from the internet which are not property of EDA(PCEO-AI-CLUB). EDA is not responsible for the content nor the meaning of these images. You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any portion of the images and any portion of derived data. You agree not to further copy, publish or distribute any portion of the KID-F dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset. EDA reserves the right to terminate your access to the CelebA dataset at any time.
Context This dataset is used for Gender Classification with images. The dataset consists of almost 20K images which are almost 132 MB in size.
Acknowledgments This dataset is preprocessed from the CelebFace dataset created by Jessica Li (https://www.kaggle.com/jessicali9530). Thank you so much Jessica for providing a wonderful dataset to the community.
Inspiration The inspiration to create this dataset is the CelebFace dataset created by Jessica Li (https://www.kaggle.com/jessicali9530). I have extracted this dataset from the CelebFace dataset so that you can directly use the dataset for Gender Classification without preprocessing it.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Multi-view Facial Image Dataset Based on CelebA: A dataset of facial images from several viewing angles was created by Aristotle University of Thessaloniki based on the CelebA image dataset, using the software that was created in OpenDR H2020 research project based on this paper and the respective code provided by the authors. CelebA is a large scale facial dataset and consists of 202,599 facial images of 10,177 celebrities captured in the wild. The new dataset namely AUTH-OpenDR Augmented CelebA (AUTH-OpenDR ACelebA) was generated from 140,000 facial images corresponding to 9161 persons, i.e. a subset of CelebA was used. For each CelebA image used, 13 synthetic images generated by yaw axis camera rotation in the interval [0◦ : +60◦ ] with step +5◦ were obtained. Moreover, 10 synthetic images generated by pitch axis camera rotation in the interval [0◦: +45◦] with step +5◦ are also created for each facial image of the aforementioned dataset. Since CelebA license does not allow distribution of derivative work we do not make AcelebA directly available but instead provide instructions and scripts on how to recreate it.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is th
https://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebA_Dialog.htmlhttps://mmlab.ie.cuhk.edu.hk/projects/CelebA/CelebA_Dialog.html
CelebA-Dialog is a large-scale visual-language face dataset with the following features. Facial images are annotated with rich fine-grained labels, which classify one attribute into multiple degrees according to its semantic meaning. Accompanied with each image, there are textual captions describing the attributes and a user editing request sample. CelebA-Dialog has:10,177 number of identities, 202,599 number of face images, and 5 fine-grained attributes annotations per image: Bangs, Eyeglasses, Beard, Smiling, and Age, Textual captions and a user editing request per image.
The Cross-Age Celebrity Dataset (CACD) contains 163,446 images from 2,000 celebrities collected from the Internet. The images are collected from search engines using celebrity name and year (2004-2013) as keywords. Therefore, it is possible to estimate the ages of the celebrities on the images by simply subtract the birth year from the year of which the photo was taken.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Overview: Welcome to the Celebrity Facial Transformations dataset, a meticulously curated collection of high-resolution (500 x 500 pixels) images showcasing famous personalities and celebrities with and without facial hair, specifically beards. Collected from Google Images, this dataset provides a fascinating glimpse into the ever-changing appearances of renowned figures.
Key Features: - Celebrity Variety: Explore a wide range of celebrities from various fields, including actors, musicians, politicians, and athletes. - Facial Transformation: Witness the striking contrast between their bearded and clean-shaven looks, allowing for in-depth analysis of how facial hair can alter one's appearance. - High Resolution: All images are provided in a standardized high-resolution format (500 x 500 pixels), ensuring exceptional image quality for research and analysis.
Data Collection: The images were collected from Google Images using a systematic and careful approach to ensure image quality and relevance. Metadata, such as celebrity names and source URLs, is included for reference.
Data Format: All images are in a uniform high-resolution format (500 x 500 pixels) and jpeg format to maintain consistency and provide greater detail.
Disclaimer: This dataset is intended for research and educational purposes. Users are responsible for adhering to copyright and usage rights when using the images for any purpose beyond research.
Unlock new insights into celebrity transformations with this updated dataset, now featuring high-resolution images that provide even greater value for research and analysis.
DeepFake technology, born with the continuous maturation of deep learning techniques, primarily utilizes neural networks to create non-realistic faces. This method has enriched people’s lives as computer vision advances and deep learning technologies mature. It has revolutionized the film industry by generating astonishing visuals and reducing production costs. Similarly, in the gaming industry, it has facilitated the creation of smooth and realistic animation effects. However, the malicious use of image manipulation to spread false information poses significant risks to society, casting doubt on the authenticity of digital content in visual media. Forgery techniques encompass four main categories: face reenactment, face replacement, face editing, and face synthesis. Face editing, a commonly employed image manipulation method, involves falsifying facial features by modifying the information related to the five facial regions. As one of the commonly employed methods in facial editing, image inpainting technology involves utilizing known content from an image to fill in missing areas, aiming to restore the image in a way that aligns as closely as possible with human perception. In the context of facial forgery, image inpainting is primarily used for identity falsification, wherein facial features are altered to achieve the goal of replacing a face. The use of image inpainting for facial manipulation similarly introduces significant disruption to people’s lives. To support research on detection methods for such manipulations, this paper produced a large-scale dataset for face manipulation detection based on inpainting techniques. This paper specifically focuses on the field of image tampering detection, utilizing two classic datasets: the high-quality CelebA-HQ dataset, comprising 25 000 high-resolution (1 024 × 1 024 pixels) celebrity face images, and the low-quality FF++ dataset, consisting of 15 000 face images extracted from video frames. On the basis of the two datasets, facial feature regions (eyebrows, eyes, nose, mouth, and the entire facial area) are segmented using image segmentation methods. Corresponding mask images are created, and the segmented facial regions are directly obscured on the original image. Two deep neural network-based inpainting methods (image inpainting via conditional texture and structure dual generation (CTSDG) and recurrent feature reasoning for image inpainting (RFR)) along with a traditional inpainting method (struct completion(SC)) were employed. The deep neural network methods require the provision of mask images to indicate the areas for inpainting, while the traditional method could directly perform inpainting on segmented facial feature images. The facial regions were inpainted using these three methods, resulting in a large-scale dataset comprising 600 000 images. This extensive dataset incorporates diverse pre-processing techniques, various inpainting methods, and includes images with different qualities and inpainted facial regions. It serves as a valuable resource for training and testing in related detection tasks, offering a rich dataset for subsequent research in the field, and also establishes a meaningful benchmark dataset for future studies in the domain of face tampering detection.
CelebAGaze consists of 25283 high-resolution celebrity images that are collected from CelebA and the Internet. It consists of 21832 face images with eyes staring at the camera and 3451 face images with eyes staring somewhere else. All images (256 × 256) are cropped and the eye mask region by dlib is computed. Specifically, dlib is used to extract 68 facial landmarks and calculate the mean of 6 points near the eye region, which will be the center point of the mask. The size of the mask is fixed to 30×50. As described above, 300 samples from domain Y are randomly selected, 100 samples from domain X as the test set, the remaining as the training set. Note that this dataset is unpaired and it is not labeled with the specific eye angle or the head pose information.
This is the paper “DF-RAP: A Robust Adversarial Perturbation for Defending against Deepfakes in Real-world Social Network Scenarios" OSN-transmission CelebA sampling dataset collected by manual upload and download. This dataset includes 30,000 facial images of size 256×256 transmitted through online social networks (OSN) and their corresponding original images. Among them, Facebook, Twitter, WeChat and Weibo were selected as the transmission OSN, with 7500 images each.
Please see https://github.com/yakhyo/face-recognition to train Face Recognition model.
MS1M-ArcFace Dataset Description
The MS1M-ArcFace dataset is a cleaned and refined version of the original MS-Celeb-1M dataset, specifically curated for face recognition tasks. This dataset was processed to remove noisy and misaligned images, improving its quality and usability in training robust face recognition models.
Features - Image Size: 112x112 pixels - Classes: 85742 - Aligned: Standardized facial landmarks
Info - Dataset Origin: Based on the MS-Celeb-1M dataset, originally released by Microsoft - Research. - Purpose: Designed to facilitate research and development in face recognition, particularly for high-accuracy models. - Data: Contains millions of images of celebrity faces, preprocessed and aligned for optimal model training. - Preprocessing: Cleaned and refined using advanced methods to reduce noise, mislabels, and inaccuracies. - Applications: Used in training state-of-the-art models like ArcFace for tasks such as identity verification, facial feature extraction, and more. - License: Users should verify compliance with ethical and licensing requirements before using or distributing the dataset. - This dataset has been extensively used in academic and industrial research for benchmarking and developing cutting-edge face recognition systems.
Please refer to the original source of this dataset for additional information. It's released here for academic purposes only.
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image.
The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization.
Note: CelebA dataset may contain potential bias. The fairness indicators example goes into detail about several considerations to keep in mind while using the CelebA dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('celeb_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/celeb_a-2.1.0.png" alt="Visualization" width="500px">