100+ datasets found
  1. Popular Baby Names

    • kaggle.com
    zip
    Updated Mar 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ulrik Thyge Pedersen (2023). Popular Baby Names [Dataset]. https://www.kaggle.com/datasets/ulrikthygepedersen/baby-names
    Explore at:
    zip(12903 bytes)Available download formats
    Dataset updated
    Mar 21, 2023
    Authors
    Ulrik Thyge Pedersen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The popularity of baby names is a fascinating reflection of our society's cultural trends and values over time. The dataset on the most popular baby names from 1880 until now provides a comprehensive look at the evolution of naming practices in the United States over the last 140 years. The dataset includes information on the top 1000 baby names for each year, as well as the number of babies given each name, broken down by gender.

    By analyzing this dataset, researchers can identify trends and patterns in baby naming, such as the rise and fall of certain names, the influence of popular culture on naming trends, and the impact of immigration on naming practices. This dataset is a valuable resource for researchers, parents, and anyone interested in exploring the social and cultural history of the United States.

  2. d

    Popular Baby Names

    • catalog.data.gov
    • data.cityofnewyork.us
    • +5more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

  3. Baby Names by Year

    • kaggle.com
    zip
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/code
    Explore at:
    zip(9916059 bytes)Available download formats
    Dataset updated
    Sep 20, 2022
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this dataset

    This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

    How to use the dataset

    How to use the US Baby Names by Year of Birth dataset:

    This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

    This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

    Research Ideas

    This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

    Columns

    • index: the index of the dataframe
    • YearOfBirth: the year in which the baby was born
    • Name: the name of the baby
    • Sex: the sex of the baby
    • Number: the number of babies with that name and sex

    Acknowledgements

    If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

    Data Source

  4. Most Popular Baby Names in NYC

    • kaggle.com
    zip
    Updated Mar 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prashant Banerjee (2020). Most Popular Baby Names in NYC [Dataset]. https://www.kaggle.com/datasets/prashant111/most-popular-baby-names-in-nyc
    Explore at:
    zip(88421 bytes)Available download formats
    Dataset updated
    Mar 15, 2020
    Authors
    Prashant Banerjee
    Area covered
    New York
    Description

    DESCRIPTION

    The most popular baby names by sex and mother's ethnicity in New York City from 2011-2014.

    SUMMARY

    Popular Baby Name Data In NYC from 2011-2014

    Rows: 13962; Columns: 6

    The data include items, such as:

    BRTH_YR: birth year the baby GNDR: gender ETHCTY: mother's ethnicity NM: baby's name CNT: count of the name RNK: ranking of the name

    Source: NYC Open Data

  5. d

    Most Popular Baby Names

    • catalog.data.gov
    • data.chhs.ca.gov
    • +3more
    Updated Nov 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2025). Most Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/most-popular-baby-names-810d5
    Explore at:
    Dataset updated
    Nov 23, 2025
    Dataset provided by
    California Department of Public Health
    Description

    This dataset contains ranks and counts for the top 25 baby names by sex for live births that occurred in California (by occurrence) based on information entered on birth certificates.

  6. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.

  7. U.S. First Names: Popularity and Counts

    • kaggle.com
    zip
    Updated Jun 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daniel Fedorov (2025). U.S. First Names: Popularity and Counts [Dataset]. https://www.kaggle.com/datasets/downshift/u-s-first-names-popularity-and-counts
    Explore at:
    zip(2425 bytes)Available download formats
    Dataset updated
    Jun 9, 2025
    Authors
    Daniel Fedorov
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Description

    This dataset contains counts and rankings of the most common first names in the United States, sourced from comprehensive name census data. It is ideal for analyzing naming trends, demographic patterns, and cultural preferences, as well as for building statistical models to explore name popularity over time.

    Dataset structure

    male_first_names.csv: Male first name frequencies and rankings in the U.S.

    female_first_names.csv: Female first name frequencies and rankings in the U.S.

  8. Most Popular Baby Names - 8ia4-svqc - Archive Repository

    • healthdata.gov
    csv, xlsx, xml
    Updated Nov 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Most Popular Baby Names - 8ia4-svqc - Archive Repository [Dataset]. https://healthdata.gov/dataset/Most-Popular-Baby-Names-8ia4-svqc-Archive-Reposito/hwxa-t8ig
    Explore at:
    xml, csv, xlsxAvailable download formats
    Dataset updated
    Nov 7, 2025
    Description

    This dataset tracks the updates made on the dataset "Most Popular Baby Names" as a repository for previous versions of the data and metadata.

  9. NYC Most Popular Baby Names

    • kaggle.com
    zip
    Updated Jan 1, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of New York (2021). NYC Most Popular Baby Names [Dataset]. https://www.kaggle.com/datasets/new-york-city/nyc-most-popular-baby-names/discussion
    Explore at:
    zip(179712 bytes)Available download formats
    Dataset updated
    Jan 1, 2021
    Dataset authored and provided by
    City of New York
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    New York
    Description

    Content

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

    Context

    This is a dataset hosted by the City of New York. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York City using Kaggle and all of the data sources available through the City of New York organization page!

    • Update Frequency: This dataset is updated annually.

    Acknowledgements

    This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.

    Cover photo by freestocks.org on Unsplash
    Unsplash Images are distributed under a unique Unsplash License.

  10. Top 100 baby names in England and Wales: historical data

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Top 100 baby names in England and Wales: historical data [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalestop100babynameshistoricaldata
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Historic lists of top 100 names for baby boys and girls for 1904 to 2024 at 10-yearly intervals.

  11. Namesakes

    • figshare.com
    json
    Updated Nov 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oleg Vasilyev; Aysu Altun; Nidhi Vyas; Vedant Dharnidharka; Erika Lampert; John Bohannon (2021). Namesakes [Dataset]. http://doi.org/10.6084/m9.figshare.17009105.v1
    Explore at:
    jsonAvailable download formats
    Dataset updated
    Nov 20, 2021
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Oleg Vasilyev; Aysu Altun; Nidhi Vyas; Vedant Dharnidharka; Erika Lampert; John Bohannon
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Abstract

    Motivation: creating challenging dataset for testing Named-Entity
    

    Linking. The Namesakes dataset consists of three closely related datasets: Entities, News and Backlinks. Entities were collected as Wikipedia text chunks corresponding to highly ambiguous entity names. The News were collected as random news text chunks, containing mentions that either belong to the Entities dataset or can be easily confused with them. Backlinks were obtained from Wikipedia dump data with intention to have mentions linked to the entities of the Entity dataset. The Entities and News are human-labeled, resolving the mentions of the entities.Methods

    Entities were collected as Wikipedia 
    

    text chunks corresponding to highly ambiguous entity names: the most popular people names, the most popular locations, and organizations with name ambiguity. In each Entities text chunk, the named entities with the name similar to the chunk Wikipedia page name are labeled. For labeling, these entities were suggested to human annotators (odetta.ai) to tag as "Same" (same as the page entity) or "Other". The labeling was done by 6 experienced annotators that passed through a preliminary trial task. The only accepted tags are the tags assigned in agreement by not less than 5 annotators, and then passed through reconciliation with an experienced reconciliator.

    The News were collected as random news text chunks, containing mentions which either belong to the Entities dataset or can be easily confused with them. In each News text chunk one mention was selected for labeling, and 3-10 Wikipedia pages from Entities were suggested as the labels for an annotator to choose from. The labeling was done by 3 experienced annotators (odetta.ai), after the annotators passed a preliminary trial task. The results were reconciled by an experienced reconciliator. All the labeling was done using Lighttag (lighttag.io).

    Backlinks were obtained from Wikipedia dump data (dumps.wikimedia.org/enwiki/20210701) with intention to have mentions linked to the entities of the Entity dataset. The backlinks were filtered to leave only mentions in a good quality text; each text was cut 1000 characters after the last mention.

    Usage NotesEntities:
    

    File: Namesakes_entities.jsonl The Entities dataset consists of 4148 Wikipedia text chunks containing human-tagged mentions of entities. Each mention is tagged either as "Same" (meaning that the mention is of this Wikipedia page entity), or "Other" (meaning that the mention is of some other entity, just having the same or similar name). The Entities dataset is a jsonl list, each item is a dictionary with the following keys and values: Key: ‘pagename’: page name of the Wikipedia page. Key ‘pageid’: page id of the Wikipedia page. Key ‘title’: title of the Wikipedia page. Key ‘url’: URL of the Wikipedia page. Key ‘text’: The text chunk from the Wikipedia page. Key ‘entities’: list of the mentions in the page text, each entity is represented by a dictionary with the keys: Key 'text': the mention as a string from the page text. Key ‘start’: start character position of the entity in the text. Key ‘end’: end (one-past-last) character position of the entity in the text. Key ‘tag’: annotation tag given as a string - either ‘Same’ or ‘Other’.

    News: File: Namesakes_news.jsonl The News dataset consists of 1000 news text chunks, each one with a single annotated entity mention. The annotation either points to the corresponding entity from the Entities dataset (if the mention is of that entity), or indicates that the mentioned entity does not belong to the Entities dataset. The News dataset is a jsonl list, each item is a dictionary with the following keys and values: Key ‘id_text’: Id of the sample. Key ‘text’: The text chunk. Key ‘urls’: List of URLs of wikipedia entities suggested to labelers for identification of the entity mentioned in the text. Key ‘entity’: a dictionary describing the annotated entity mention in the text: Key 'text': the mention as a string found by an NER model in the text. Key ‘start’: start character position of the mention in the text. Key ‘end’: end (one-past-last) character position of the mention in the text. Key 'tag': This key exists only if the mentioned entity is annotated as belonging to the Entities dataset - if so, the value is a dictionary identifying the Wikipedia page assigned by annotators to the mentioned entity: Key ‘pageid’: Wikipedia page id. Key ‘pagetitle’: page title. Key 'url': page URL.

    Backlinks dataset: The Backlinks dataset consists of two parts: dictionary Entity-to-Backlinks and Backlinks documents. The dictionary points to backlinks for each entity of the Entity dataset (if any backlinks exist for the entity). The Backlinks documents are the backlinks Wikipedia text chunks with identified mentions of the entities from the Entities dataset.

    Each mention is identified by surrounded double square brackets, e.g. "Muir built a small cabin along [[Yosemite Creek]].". However, if the mention differs from the exact entity name, the double square brackets wrap both the exact name and, separated by '|', the mention string to the right, for example: "Muir also spent time with photographer [[Carleton E. Watkins | Carleton Watkins]] and studied his photographs of Yosemite.".

    The Entity-to-Backlinks is a jsonl with 1527 items. File: Namesakes_backlinks_entities.jsonl Each item is a tuple: Entity name. Entity Wikipedia page id. Backlinks ids: a list of pageids of backlink documents.

    The Backlinks documents is a jsonl with 26903 items. File: Namesakes_backlinks_texts.jsonl Each item is a dictionary: Key ‘pageid’: Id of the Wikipedia page. Key ‘title’: Title of the Wikipedia page. Key 'content': Text chunk from the Wikipedia page, with all mentions in the double brackets; the text is cut 1000 characters after the last mention, the cut is denoted as '...[CUT]'. Key 'mentions': List of the mentions from the text, for convenience. Each mention is a tuple: Entity name. Entity Wikipedia page id. Sorted list of all character indexes at which the mention occurrences start in the text.

  12. d

    Popular Baby Names - Dataset - data.sa.gov.au

    • data.sa.gov.au
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Popular Baby Names - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Mar 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Australia
    Description

    List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.

  13. Forest Common Names (Feature Layer)

    • catalog.data.gov
    • datasets.ai
    • +5more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. Forest Service (2025). Forest Common Names (Feature Layer) [Dataset]. https://catalog.data.gov/dataset/forest-common-names-feature-layer-7d8b7
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    U.S. Department of Agriculture Forest Servicehttp://fs.fed.us/
    Description

    This dataset contains the common names of the national forests and grasslands and their respective FS WWW URL information that is used for both display of the national forest and national grassland boundaries on any map product and for dynamic interactivity of the map. This dataset exhibits the following characteristics: 1. Granularity of the polygon features - The spatial extent of the national forests and the grasslands match the way the agency would like to communicate with the public. 2. Preferred /Common Name of the National Forest Units - The common names of the national forest and grassland match the preferred name column that is present in the common names decision table maintained by the FS Office of Communication. 3. Hyperlinks to FS WWW Home page - This column contains the national forest and their respective FS WWW URL information. This URL could be used on any interactive map applications to link users directly to a forest's home page. Data Source - This dataset is derived from the following FS ALP (Automated Lands Program) Land Status Records System authoritative data sources: 1. Administrative Forest Boundaries 2. Proclaimed Forest Boundaries 3. Ranger District Boundaries 4. National Grassland Areas. The common names decision table maintained by the FS Office of Communication contains the common name and its respective Land Status Records System authoritative data source to be used for building the spatial polygon. The spatial polygons for every feature in this dataset comes from one or more authoritative data sources listed above. The process to create the common names dataset is reusing the already existing ALP names from the data sources listed above.

  14. O

    Top 100 Baby Names

    • data.qld.gov.au
    • researchdata.edu.au
    • +1more
    csv
    Updated Feb 13, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Justice (2025). Top 100 Baby Names [Dataset]. https://www.data.qld.gov.au/dataset/top-100-baby-names
    Explore at:
    csv(2 KiB), csv, csv(200 KiB)Available download formats
    Dataset updated
    Feb 13, 2025
    Dataset authored and provided by
    Justice
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    Queensland Top 100 Baby Names

  15. Most Popular Names in the Philippines Dataset

    • kaggle.com
    zip
    Updated Jun 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joriz ivann Villanueva (2023). Most Popular Names in the Philippines Dataset [Dataset]. https://www.kaggle.com/datasets/jorizivannvillanueva/most-popular-names-in-philippines-dataset
    Explore at:
    zip(13981 bytes)Available download formats
    Dataset updated
    Jun 25, 2023
    Authors
    Joriz ivann Villanueva
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Area covered
    Philippines
    Description

    Overview

    The Most Popular Names in the Philippines dataset provides insights into the popularity of different names in the Philippines.

    Content

    The dataset includes the following fields:

    • rank: The position of the name when graded by incidence with all other names in the place.
    • forename: The personal name given to an individual at or shortly after birth, also known as a first name.
    • incidence: Number of people who bear the name.
    • frequency: Ratio and percentage of people who bear the name.
    • gender: The gender of the specific name based on the percentage.
    • gender_percentage: The percentage of bearers who are male or female.

    Potential Use Cases

    This dataset can be used for various purposes, such as:

    • Analyzing naming trends in the Philippines.
    • Exploring the gender distribution of popular names.
    • Conducting research on cultural naming practices.
    • Studying the popularity and prevalence of specific names.
  16. E

    A corpus of names drawn from the local birth registers of England and Wales,...

    • dtechtive.com
    • find.data.gov.scot
    txt, xlsx, zip
    Updated Jan 25, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    University of Edinburgh (2018). A corpus of names drawn from the local birth registers of England and Wales, 1838-2014 [Dataset]. http://doi.org/10.7488/ds/2294
    Explore at:
    xlsx(30.21 MB), zip(5.395 MB), txt(0.0166 MB)Available download formats
    Dataset updated
    Jan 25, 2018
    Dataset provided by
    University of Edinburgh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    UNITED KINGDOM
    Description

    This dataset comprises a corpus of names, in both the first and middle position, for approximately 22 million individuals born in England and Wales between 1838 and 2014. This data is obtained from birth records made available by a set of volunteer-run genealogical resources - collectively, the 'UK local BMD project' (http://www.ukbmd.org.uk/local) - and has been re-purposed here to demonstrate the applicability of network analysis methods to an onomastic dataset. The ownership and licensing of the intellectual property constituting the original birth records is detailed at https://www.ukbmd.org.uk/TermsAndConditions. Under section 29A of the UK Copyright, Designs and Patents Act 1988, a copyright exception permits copies to be made of lawfully accessible material in order to conduct text and data mining for non-commercial research. The data included in this dataset represents the outcome of such a text-mining analysis. No birth records are included in this dataset, and nor is it possible for records to be reconstructed from the data presented herein. The data comprises an archive of tables, presenting this corpus in various forms: as a rank order of names (in both the first and middle position) by number of registered births per year, and by the total number of births across all years sampled. An overview of the data is also provided, with summary statistics such as the number of usable records registered per year, most popular names per year, and measures of forename diversity and the surname-to-forename usage ratio (an indicator of which forenames are more likely to be transferred uses of surnames). These tables are extensive but not exhaustive, and do not exclude the possibility that errors are present in the corpus. Data are also presented both as '.expression' files (an input format readable by the network analysis tool Graphia Professional) and as '.layout' files, a text file format output by Graphia Professional that describes the characteristics of the network so that it may be replicated. Characteristics of the original birth records that allow the identification of individuals - for instance, full name or location of birth - have been removed.

  17. m

    Reddit r/AskScience Flair Dataset

    • data.mendeley.com
    Updated May 23, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Mishra (2022). Reddit r/AskScience Flair Dataset [Dataset]. http://doi.org/10.17632/k9r2d9z999.3
    Explore at:
    Dataset updated
    May 23, 2022
    Authors
    Sumit Mishra
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Reddit is a social news, content rating and discussion website. It's one of the most popular sites on the internet. Reddit has 52 million daily active users and approximately 430 million users who use it once a month. Reddit has different subreddits and here We'll use the r/AskScience Subreddit.

    The dataset is extracted from the subreddit /r/AskScience from Reddit. The data was collected between 01-01-2016 and 20-05-2022. It contains 612,668 Datapoints and 25 Columns. The database contains a number of information about the questions asked on the subreddit, the description of the submission, the flair of the question, NSFW or SFW status, the year of the submission, and more. The data is extracted using python and Pushshift's API. A little bit of cleaning is done using NumPy and pandas as well. (see the descriptions of individual columns below).

    The dataset contains the following columns and descriptions: author - Redditor Name author_fullname - Redditor Full name contest_mode - Contest mode [implement obscured scores and randomized sorting]. created_utc - Time the submission was created, represented in Unix Time. domain - Domain of submission. edited - If the post is edited or not. full_link - Link of the post on the subreddit. id - ID of the submission. is_self - Whether or not the submission is a self post (text-only). link_flair_css_class - CSS Class used to identify the flair. link_flair_text - Flair on the post or The link flair’s text content. locked - Whether or not the submission has been locked. num_comments - The number of comments on the submission. over_18 - Whether or not the submission has been marked as NSFW. permalink - A permalink for the submission. retrieved_on - time ingested. score - The number of upvotes for the submission. description - Description of the Submission. spoiler - Whether or not the submission has been marked as a spoiler. stickied - Whether or not the submission is stickied. thumbnail - Thumbnail of Submission. question - Question Asked in the Submission. url - The URL the submission links to, or the permalink if a self post. year - Year of the Submission. banned - Banned by the moderator or not.

    This dataset can be used for Flair Prediction, NSFW Classification, and different Text Mining/NLP tasks. Exploratory Data Analysis can also be done to get the insights and see the trend and patterns over the years.

  18. Baby names for boys in England and Wales

    • ons.gov.uk
    • cy.ons.gov.uk
    xlsx
    Updated Jul 31, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Office for National Statistics (2025). Baby names for boys in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/livebirths/datasets/babynamesenglandandwalesbabynamesstatisticsboys
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Office for National Statisticshttp://www.ons.gov.uk/
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Rank and count of the top names for baby boys, changes in rank since the previous year and breakdown by country, region, mother's age and month of birth.

  19. NYC Baby Names

    • kaggle.com
    zip
    Updated Sep 8, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of New York (2017). NYC Baby Names [Dataset]. https://www.kaggle.com/datasets/new-york-city/nyc-baby-names/suggestions?status=pending&yourSuggestions=true
    Explore at:
    zip(139141 bytes)Available download formats
    Dataset updated
    Sep 8, 2017
    Dataset authored and provided by
    City of New York
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    New York
    Description

    This dataset is now updated annually here.

    Context

    Baby names for children recently born in New York City. This dataset is notable because it includes a breakdown by the ethnicity of the mother of the baby: a source of ethnic information that is missing from many other similar datasets published on state and national levels.

    Content

    This dataset includes columns for the name, year of birth, sex, and mother's ethnicity of the baby. It also includes a rank column (that name's popularity relative to the rest of the names on the list).

    Acknowledgements

    This data is published as-is by the City of New York.

    Inspiration

    • How do baby names in New York City differ from national trends?
    • What names are most, more, or less popular amongst different ethnicities?
  20. g

    First names of the newborns of the city of Pré Saint-Gervais for the period...

    • gimi9.com
    Updated Dec 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). First names of the newborns of the city of Pré Saint-Gervais for the period 2018-2020 | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_602e644ee2fdd1c8ad070b54
    Explore at:
    Dataset updated
    Dec 16, 2024
    Area covered
    Le Pré-Saint-Gervais
    Description

    The dataset contains the first names of the newborns of the city of Pré Saint-Gervais for the period 2018-2020. In the file, there are 688 first names. The structuring of the data is based on the name of the municipality where the children were born (in the majority of cases, the children were born outside the Pré Saint-Gervais because of the absence of maternity in the commune but at least one of the parents comes from the commune), the INSEE number, the sex, the child’s first name and the number of occurrences and the year of birth. These data are useful in order to analyse trends in the choice of first names and thus to understand the history of the city. The data are collected by the General Affairs Department of the commune of Pré Saint-Gervais from birth declarations. The file can be opened in csv format. To get in touch with the manager for this dataset, you can write to Benjamin Mittet-Brême, Director of General Administration, Civil State and Cemetery. Data-visualisation proposals: — Gender distribution of first names by year https://prenomspsg.trial.opendatasoft.com/chart/embed/repartition_des_sexes_des_prenoms_par_annee1/ — Gender distribution of first names over the period 2018-2020 https://prenomspsg.trial.opendatasoft.com/chart/embed/repartition_des_sexes_des_prenoms_sur_la_periode_2018-20201/ — Most used male given names per year (2018-2020) https://prenomspsg.trial.opendatasoft.com/chart/embed/prenoms_de_sexe_masculin_les_plus_utilises_par_annee_2018-2020/ — Most used female given names per year (2018-2020) https://prenomspsg.trial.opendatasoft.com/chart/embed/prenoms_de_sexe_feminin_les_plus_utilises_par_annee_2018-2020/ — The 10 most given names over the period 2018-2020 https://app.workbenchdata.com/workflows/132629/report Dataset published during the Challenge Data week organised by Sciences Po Saint-Germain-en-Laye from February 15 to 19, 2021.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ulrik Thyge Pedersen (2023). Popular Baby Names [Dataset]. https://www.kaggle.com/datasets/ulrikthygepedersen/baby-names
Organization logo

Popular Baby Names

Can you find patterns in popular Baby Names and predict the next top names?

Explore at:
zip(12903 bytes)Available download formats
Dataset updated
Mar 21, 2023
Authors
Ulrik Thyge Pedersen
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The popularity of baby names is a fascinating reflection of our society's cultural trends and values over time. The dataset on the most popular baby names from 1880 until now provides a comprehensive look at the evolution of naming practices in the United States over the last 140 years. The dataset includes information on the top 1000 baby names for each year, as well as the number of babies given each name, broken down by gender.

By analyzing this dataset, researchers can identify trends and patterns in baby naming, such as the rise and fall of certain names, the influence of popular culture on naming trends, and the impact of immigration on naming practices. This dataset is a valuable resource for researchers, parents, and anyone interested in exploring the social and cultural history of the United States.

Search
Clear search
Close search
Google apps
Main menu