CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image.
The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization.
Note: CelebA dataset may contain potential bias. The fairness indicators example goes into detail about several considerations to keep in mind while using the CelebA dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('celeb_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/celeb_a-2.1.0.png" alt="Visualization" width="500px">
These data were collected to develop a means of identifying those individuals most likely to be dangerous to others because of their pursuit of public figures. Another objective of the study was to gather detailed quantitative information on harassing and threatening communications to public figures and to determine what aspects of written communications are predictive of future behavior. Based on the fact that each attack by a mentally disordered person in which an American public figure was wounded had occurred in connection with a physical approach within 100 yards, the investigators reasoned that accurate predictions of such physical approaches could serve as proxies for the less feasible task of accurate prediction of attacks. The investigators used information from case files of subjects who had pursued two groups of public figures, politicians and celebrities. The data were drawn from the records of the United States Capitol Police and a prominent Los Angeles-based security consulting firm, Gavin de Becker, Inc. Information was gathered from letters and other communications of the subjects, as well as any other sources available, such as police records or descriptions of what occurred during interviews. The data include demographic information such as sex, age, race, marital status, religion, and education, family history information, background information such as school and work records, military history, criminal history, number of communications made, number of threats made, information about subjects' physical appearance, psychological and emotional evaluations, information on travel/mobility patterns, and approaches made.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Results of cross-database experiments on the Celeb-DF dataset (AUC).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the Webis Celebrity Corpus 2019 from the paper Celebrity Profiling
at ACL 2019.
Code: https://github.com/webis-de/ACL-19
Publication: https://aclanthology.org/P19-1249/.
Citation: https://webis.de/publications.html?q=wiegmann_2019a
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Table of the cases from the library of West Coast Trail Lawyers which targets celebrities or famous stars in the USA
Paper: https://webis.de/publications.html?q=wiegmann_2019a Source Dataset: https://files.webis.de/data-in-progress/data-research/social-media-analysis/acl19-celebrity-profiling/
Celebrities are among the most prolific users of social media, promoting their personas and rallying followers. This activity is closely tied to genuine writing samples, rendering them worthy research subjects in many respects, not least author profiling. The Celebrity Profiling task this year is to predict four traits of a celebrity from their social media communication. The traits are the degree of fame, occupation, age, and gender. The social media communication is given as the teaser messages from past tweets. The goal is to develop a piece of software which predicts celebrity traits from the teaser history. The training dataset contains two files: a feeds.ndjson as input and a labels.ndjson as output. Each file lists all celebrities as JSON objects, one per line and identified by the id key. The input file contains the cid and a list of all teaser messages for each celebrity. {"id": 1234, "text": ["a tweet", "another tweet", ...]} The output file contains the cid and a value for each trait for each celebrity from the input file. {"id": 1234, "fame": "star", "occupation": "sports", "gender": "female", "birthyear": 2002} The following values are possible for each of the traits: fame := {rising, star, superstar} occupation := {sports, performer, creator, politics, manager, science, professional, religious} birthyear := {1940, ..., 2012} gender := {male, female, nonbinary}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Literary celebrity and public life in the nineteenth-century United States is a book. It was written by Bonnie Carr O'Neill and published by The University of Georgia Press in 2017.
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
This dataset provides information about the number of properties, residents, and average property values for Celebrity Court cross streets in Bristol, VA.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The psychology of celebrity is a book. It was written by Gayle Stever and published by Routledge in 2018.
This dataset provides information about the number of properties, residents, and average property values for Celebrity Circle cross streets in Hanover Park, IL.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Obsession : celebrities and their stalkers is a book. It was written by David Harvey and published by WW Norton in 2002.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes all the processed data used for experimentation in the article "Perfilado Demográficos de Celebridades en Redes Sociales" - "Demographic Profiling of Celebrities in Social Networks", published in the journal Research in Computer Science. The dataset is a processed version of the training part from the CLEF 2020 celebrity profiling task (https://pan.webis.de/clef20/pan20-web/celebrity-profiling.html). The dataset consists of 5,066,608 tweets corresponding to 1,920 Twitter celebrities. All the tweets are in English. The dataset includes several files:
1. The 5,066,608 tweets in English
2. Four files indicating the gender, age, ocuppation and user associated with each tweet.
3. A list of 1374 common english abreviations used in social networks
4. The five features extracted from the tweets and used for the experiments: words, emoticons/emojis, hashtags, ats, abreviations
This dataset provides information about the number of properties, residents, and average property values for Celebrity Lane cross streets in Fishersville, VA.
No description was included in this Dataset collected from the OSF
Subscribers can find out export and import data of 23 countries by HS code or product’s name. This demo is helpful for market analysis.
According to a database tracking online trends, Keung To stood out as the most popular young generation male celebrity online in Hong Kong. Between November 14, 2024 and February 11, 2025, the 25-year-old singer attracted well over 111 thousand online posts and comments, streets ahead of the second placeholder Hins Cheung.
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This study develops a comprehensive framework grounded in relevant theories and empirical evidence to classify distinct fan types amomg Chinese emerging adults, which is then validated through empirical data. The association between different fan types, mental health outcomes, gratification from celebrity worship, habitual internet use and problematic internet use were also examined. Qualitative interview were also conducted to grasp individual differences and institutional forces that influence individual's celebrity worship experience in China.
Underway ship data provided by OCED/Atlantic Oceanographic & Meteorological Laboratory (AOML). Only location and time are provided. cdm_data_type=Point contact_email=Joaquin.Trinanes@noaa.gov contact_info=Atlantic Oceanographic and Meteorological Laboratory (AOML) 4301 Rickenbacker Causeway Miami, FL 33149 Conventions=COARDS, CF-1.6, ACDD-1.3, NCCSV-1.2 Easternmost_Easting=-89.2253 featureType=Point geospatial_lat_max=0.2139 geospatial_lat_min=-1.3631 geospatial_lat_units=degrees_north geospatial_lon_max=-89.2253 geospatial_lon_min=-91.6378 geospatial_lon_units=degrees_east infoUrl=https://www.aoml.noaa.gov/ocd/ocdweb/index.html institution=AOML Northernmost_Northing=0.2139 sourceUrl=(local files) Southernmost_Northing=-1.3631 standard_name_vocabulary=CF Standard Name Table v29 time_coverage_end=2020-02-06T15:47:04Z time_coverage_start=2019-07-14T00:15:52Z Westernmost_Easting=-91.6378
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Objectives: To estimate the transmissibility of the Ice Bucket Challenge among globally influential celebrities and to identify associated risk factors. Design: Retrospective cohort study. Setting: Social media (YouTube, Facebook, Twitter, Instagram). Participants: David Beckham, Cristiano Ronaldo, Benedict Cumberbatch, Stephen Hawking, Mark Zuckerberg, Oprah Winfrey, Homer Simpson, and Kermit the Frog were defined as index cases. We included contacts up to the fifth generation seeded from each index case and enrolled a total of 99 participants into the cohort. Main outcome measures: Basic reproduction number R0, serial interval of accepting the challenge, and odds ratios of associated risk factors based on fully observed nomination chains; R0 is a measure of transmissibility and is defined as the number of secondary cases generated by a single index in a fully susceptible population. Serial interval is the duration between onset of a primary case and onset of its secondary cases. Results: Based on the empirical data and assuming a branching process we estimated a mean R0 of 1.43 (95% confidence interval 1.23 to 1.65) and a mean serial interval for accepting the challenge of 2.1 days (median 1 day). Higher log (base 10) net worth of the participants was positively associated with transmission (odds ratio 1.63, 95% confidence interval 1.06 to 2.50), adjusting for age and sex. Conclusions: The Ice Bucket Challenge was moderately transmissible among a group of globally influential celebrities, in the range of the pandemic A/H1N1 2009 influenza. The challenge was more likely to be spread by richer celebrities, perhaps in part reflecting greater social influence.
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including - 10,177 number of identities, - 202,599 number of face images, and - 5 landmark locations, 40 binary attributes annotations per image.
The dataset can be employed as the training and test sets for the following computer vision tasks: face attribute recognition, face detection, and landmark (or facial part) localization.
Note: CelebA dataset may contain potential bias. The fairness indicators example goes into detail about several considerations to keep in mind while using the CelebA dataset.
To use this dataset:
import tensorflow_datasets as tfds
ds = tfds.load('celeb_a', split='train')
for ex in ds.take(4):
print(ex)
See the guide for more informations on tensorflow_datasets.
https://storage.googleapis.com/tfds-data/visualization/fig/celeb_a-2.1.0.png" alt="Visualization" width="500px">