VGGFace2 is a large-scale face recognition dataset. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession. VGGFace2 contains images from identities spanning a wide range of different ethnicities, accents, professions and ages. All face images are captured "in the wild", with pose and emotion variations and different lighting and occlusion conditions. Face distribution for different identities is varied, from 87 to 843, with an average of 362 images for each subject.
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimise the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS-Celeb-1M, and on their union, and show that training on VGGFace2 lead
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
THIS IS A CLONE OF https://github.com/NNNNAI/VGGFace2-HQ
VGGFace2-HQ
Related paper: TPAMI
The first open source high resolution dataset for face swapping!!!
A high resolution version of VGGFace2 for academic face editing purpose.This project uses GFPGAN for image restoration and insightface for data preprocessing (crop and align).
We provide a download link for users to download the data, and also provide guidance on how to generate the VGGFace2 dataset from… See the full description on the dataset page: https://huggingface.co/datasets/RichardErkhov/VGGFace2-HQ.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
@article{DBLP:journals/corr/abs-1710-08092, author = {Qiong Cao and Li Shen and Weidi Xie and Omkar M. Parkhi and Andrew Zisserman}, title = {VGGFace2: {A} dataset for recognising faces across pose and age}, journal = {CoRR}, volume = {abs/1710.08092}, year = {2017}, url = {http://arxiv.org/abs/1710.08092}, eprinttype = {arXiv}, eprint = {1710.08092}… See the full description on the dataset page: https://huggingface.co/datasets/ProgramComputer/VGGFace2.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
VGGFace2 Dataset and Face Mesh Preprocessing
Introduction
The VGGFace2 dataset is a large-scale face recognition dataset containing over 3.31 million images of 9,131 identities, with an average of 362 images per identity. The dataset is designed to include extensive variations in pose, age, illumination, ethnicity, and profession, making it one of the most diverse and challenging face recognition datasets available. For more details, please refer to the original publication:
VGGFace2: A dataset for recognizing faces across pose and age - DOI: 10.48550/arXiv.1710.08092
Preprocessing Using MediaPipe 3D Face Mesh
On this dataset, we applied the MediaPipe-based 3D face mesh algorithm to accurately detect faces while removing all background elements, including hair. Our preprocessing strictly retained facial landmarks, ensuring that only the essential facial features were preserved. This approach significantly enhanced the accuracy and generalization of our model, as the model was trained exclusively on landmark-based facial data.
Training and Performance
The preprocessed data was utilized to train Xception model, which resulted in remarkably accurate outcomes due to the strictly landmark-based facial representation. The model demonstrated robust performance including explainable-AI, proving that eliminating unnecessary background elements contributed positively to its efficiency and reliability.
Citation
If you use this dataset or the preprocessed version in your work, please cite both of the following:
VGGFace2 Dataset:
@article{Cao2018VGGFace2,
title={VGGFace2: A dataset for recognizing faces across pose and age},
author={Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M and Zisserman, Andrew},
journal={arXiv preprint arXiv:1710.08092},
year={2018}
}
DOI: [10.48550/arXiv.1710.08092](https://doi.org/10.48550/arXiv.1710.08092)
Preprocessed Dataset using MediaPipe:@dataset{Shah2025_MediaPipe_FaceMesh,
title={MediaPipe-based 3D Face Mesh Preprocessed VGGFace2 Dataset},
author={Shah, Syed Taimoor Hussain and Shah, Syed Adil Hussain and Zamir, Ammara and Qayyum, Kainat and Shah, Syed Baqir Hussain and Fatima, Syeda Maryam and Deriu, Marco Agostino},
year={2025},
doi={10.5281/zenodo.15078557}
}
DOI: [10.5281/zenodo.15078557](https://doi.org/10.5281/zenodo.15078557)
Contact
For any questions or further details, please feel free to contact us.
Syed Taimoor Hussain Shah
PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico di Torino, Turin, Italy
Email: taimoor.shah@polito.it
ORCID: 0000-0002-6010-6777
RichardErkhov/VGGFace2 dataset hosted on Hugging Face and contributed by the HF Datasets community
This dataset was created by Ansari
This dataset was created by Shiv Sharma
This dataset was created by yu zhang zhang
This dataset was created by GreatGameDota
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset was created by Dalton Omondi
Released under MIT
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FAU detection results of the VGGFace2 model after retraining with the EmotioNet database.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by YeppLiu
Released under Apache 2.0
This dataset was created by OrgMoon
Pretrained weights for face recognition.
For package compatibility reasons, the weights are split into two components: the feature weights and the logit weights. The facenet-pytorch package can load them automatically in that format once they are placed in the pytorch cache directory.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was created by Vladimir Iglovikov
Released under Attribution 4.0 International (CC BY 4.0)
This dataset was created by Arun George
This dataset was created by Bao Hoa
Not seeing a result you expected?
Learn how you can add new datasets to our index.
VGGFace2 is a large-scale face recognition dataset. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession. VGGFace2 contains images from identities spanning a wide range of different ethnicities, accents, professions and ages. All face images are captured "in the wild", with pose and emotion variations and different lighting and occlusion conditions. Face distribution for different identities is varied, from 87 to 843, with an average of 362 images for each subject.