100+ datasets found
  1. USA Name Data

    • kaggle.com
    zip
    Updated Feb 12, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Feb 12, 2019
    Dataset provided by
    Data.govhttps://data.gov/
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    United States
    Description

    Context

    Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

    Content

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

    All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

    Fork this kernel to get started with this dataset.

    Acknowledgements

    https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

    https://cloud.google.com/bigquery/public-data/usa-names

    Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

    Banner Photo by @dcp from Unplash.

    Inspiration

    What are the most common names?

    What are the most common female names?

    Are there more female or male names?

    Female names by a wide margin?

  2. d

    Popular Baby Names

    • catalog.data.gov
    • data.cityofnewyork.us
    • +4more
    Updated Jul 12, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    data.cityofnewyork.us
    Description

    Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.

  3. a

    Facebook Names Dataset

    • academictorrents.com
    bittorrent
    Updated Nov 11, 2015
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ron Bowes (Skull Security) (2015). Facebook Names Dataset [Dataset]. https://academictorrents.com/details/e54c73099d291605e7579b90838c2cd86a8e9575
    Explore at:
    bittorrent(2991052604)Available download formats
    Dataset updated
    Nov 11, 2015
    Dataset authored and provided by
    Ron Bowes (Skull Security)
    License

    https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified

    Description

    171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on

  4. l

    Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore

    • datastore.landcareresearch.co.nz
    Updated May 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-may-2025
    Explore at:
    Dataset updated
    May 15, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name

  5. Baby Names from Social Security Card Applications - National Data

    • catalog.data.gov
    • res1catalogd-o-tdatad-o-tgov.vcapture.xyz
    Updated Jul 4, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
    Explore at:
    Dataset updated
    Jul 4, 2025
    Dataset provided by
    Social Security Administrationhttp://ssa.gov/
    Description

    The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.

  6. h

    us-names-by-state

    • huggingface.co
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SNAD (2025). us-names-by-state [Dataset]. https://huggingface.co/datasets/snad-space/us-names-by-state
    Explore at:
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    SNAD
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    US Baby names

    The SSA dataset with baby names: https://www.ssa.gov/OACT/babynames/

      Coniferest
    

    We use this dataset in the active anomaly discovery Python package coniferest: https://coniferest.snad.space/en/latest/notebooks/us-names.html

      Update the data
    

    Install Python packages: pip install requests aiohttp universal_pathlib pandas Optionally: download https://www.ssa.gov/OACT/babynames/state/namesbystate.zip ./run.py PATH_OR_URL_TO_namesbystate.zip, path may be… See the full description on the dataset page: https://huggingface.co/datasets/snad-space/us-names-by-state.

  7. Baby Names by Year

    • kaggle.com
    Updated Sep 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About this dataset

    This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

    How to use the dataset

    How to use the US Baby Names by Year of Birth dataset:

    This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

    This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

    Research Ideas

    This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

    Columns

    • index: the index of the dataframe
    • YearOfBirth: the year in which the baby was born
    • Name: the name of the baby
    • Sex: the sex of the baby
    • Number: the number of babies with that name and sex

    Acknowledgements

    If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

    Data Source

  8. Gender by Name (Time-series)

    • kaggle.com
    Updated Dec 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Kaggle
    Authors
    The Devastator
    Description

    Automated Gender Identification Using Name Probabilities

    2019 US Social Security Administration Data

    By Derek Howard [source]

    About this dataset

    This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

    More Datasets

    For more datasets, click here.

    Featured Notebooks

    • 🚨 Your notebook can be here! 🚨!

    How to use the dataset

    This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

    To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

    In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
    Good luck!

    Research Ideas

    • Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

    • Generate gender neutral names - use this data to generate random names with no gender bias.

    • Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

    Acknowledgements

    If you use this dataset in your research, please credit the original authors.

    Data Source

    License

    Unknown License - Please check the dataset description for more information.

    Columns

    File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

    Acknowledgements

    If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.

  9. Mountain NER dataset

    • kaggle.com
    Updated Nov 18, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geray Gench (2023). Mountain NER dataset [Dataset]. https://www.kaggle.com/datasets/geraygench/mountain-ner-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 18, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Geray Gench
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset was made for a NER task. In this task, we need to train a named entity recognition (NER) model for the identification of mountain names inside the texts.

    Each entry in the dataset corresponds to a tweet or a sentence that was generated by OpenAI's ChatGPT. It's a mixed dataset that includes a variety of tweets/texts, some of which are focused on mountain-related experiences, while others may discuss different topics.

    The features of the dataset include:

    Text Content: This feature contains the actual text content of each sentence/tweet. It captures the expressions, experiences, or sentiments related to mountainous regions and activities.

    Markers: In the context of the provided code, the "marker" feature represents the start and end indices of the occurrences of specific mountain names within the tweet text.

  10. Nyc popular baby names

    • kaggle.com
    Updated Jun 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rahul Sarkar (2022). Nyc popular baby names [Dataset]. https://www.kaggle.com/datasets/rahulsarkar221/nyc-popular-baby-names
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 20, 2022
    Dataset provided by
    Kaggle
    Authors
    Rahul Sarkar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    New York
    Description

    This data contains popular baby names in New York .

    Dataset :- 1 file (popular-baby-names.csv)

    Columns - Year of Birth : Year of the baby's birth. - Gender : Gender of the baby. - Ethnicity : Types of ethnicity they belong to. - Child's First Name : The first name of the child. - Count : How many babies were named . - Ranking : Ranking of that name.

  11. o

    Geonames - All Cities with a population > 1000

    • public.opendatasoft.com
    • data.smartidf.services
    • +2more
    csv, excel, geojson +1
    Updated Mar 10, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
    Explore at:
    csv, json, geojson, excelAvailable download formats
    Dataset updated
    Mar 10, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name

  12. d

    Popular Baby Names - Dataset - data.sa.gov.au

    • data.sa.gov.au
    Updated Mar 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Popular Baby Names - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/popular-baby-names
    Explore at:
    Dataset updated
    Mar 1, 2025
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    South Australia
    Description

    List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.

  13. Canadian Geographical Names - CGN

    • open.canada.ca
    • catalogue.arctic-sdi.org
    csv, esri rest, kml +3
    Updated Jul 28, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Natural Resources Canada (2025). Canadian Geographical Names - CGN [Dataset]. https://open.canada.ca/data/en/dataset/e27c6eba-3c5d-4051-9db2-082dc6411c2c
    Explore at:
    shp, csv, kml, pdf, esri rest, wmsAvailable download formats
    Dataset updated
    Jul 28, 2025
    Dataset provided by
    Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Area covered
    Canada
    Description

    The Canadian Geographical Names Data Base (CGNDB) is the authoritative national database of Canada's geographical names. The purpose of the CGNDB is to store place names and their attributes that have been approved by the Geographical Names Board of Canada (GNBC), the national coordinating body responsible for standards and policies on place names. The CGNDB is maintained by Natural Resources Canada, through the Canada Centre for Mapping and Earth Observation. The geographic extent of the CGNDB is the Canadian landmass and water bodies; the temporal extent is from 1897 to present. This dataset is extracted from the CGNDB on a weekly basis, and consists of current officially approved names, feature type, coordinates of the feature, decision date, source, and other attributes. The output file formats for this product are: text (CSV), Shape (SHP), and Keyhole Markup Language (KML). Content advisory: The Canadian Geographical Names Database contains historical terminology that is considered racist, offensive and derogatory. Geographical naming authorities are in the process of addressing many offensive place names, but the work is still ongoing. For more information, please contact the GNBC Secretariat.

  14. PII-REAL-names-dataset

    • kaggle.com
    Updated Mar 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kris Smith (2024). PII-REAL-names-dataset [Dataset]. https://www.kaggle.com/datasets/krist0phersmith/pii-real-names-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 17, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Kris Smith
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Datasets of REAL (not generated) given names and surnames.

    This dataset was originally created for the PII detection competition but may be of use for many other purposes.

    To see how I created it you can view this notebook: https://www.kaggle.com/code/krist0phersmith/pii-real-names-data-wrangle

    I scraped names from the Facebook user data dump.

    I then combined with another published data set of names from the wiki names search data.

    Enjoy and feel free to add or share ideas to make this better.

    Happy Kaggling!

  15. World Gender Name Dictionary 2.0 Dataset.

    • tind.wipo.int
    csv, zip
    Updated 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    World Intellectual Property Organization. (2021). World Gender Name Dictionary 2.0 Dataset. [Dataset]. https://tind.wipo.int/record/49408
    Explore at:
    zip(192350735), csv(137854765), zip(19497798), csv(372274310), csv(46488874), csv(391678471), csv(1842), csv(91229769)Available download formats
    Dataset updated
    2021
    Dataset provided by
    World Intellectual Property Organizationhttp://wipo.int/
    Authors
    World Intellectual Property Organization.
    Area covered
    World
    Description

    This dataset revisits the first World Gender Name Dictionary (WGND 1.0), allowing to disambiguate the gender in data naming physical persons (Lax Martínez et al., 2016). We discuss its advantages and limitations and propose an expansion based on updated data and additional sources. By including more than 26 million records linking given names and 195 different countries and territories, the resulting WGND 2.0 substantially increases the international coverage of its processor. As a result, it is particularly designed to be applied to intellectual property unit-record data naming inventors, designers, individual applicants and other creators disclosed in these data.

  16. f

    Network analysis of the social and demographic influences on name choice...

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stephen J. Bush; Anna Powell-Smith; Tom C. Freeman (2023). Network analysis of the social and demographic influences on name choice within the UK (1838-2016) [Dataset]. http://doi.org/10.1371/journal.pone.0205759
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Stephen J. Bush; Anna Powell-Smith; Tom C. Freeman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United Kingdom
    Description

    Chosen names reflect changes in societal values, personal tastes and cultural diversity. Vogues in name usage can be easily shown on a case by case basis, by plotting the rise and fall in their popularity over time. However, individual name choices are not made in isolation and trends in naming are better understood as group-level phenomena. Here we use network analysis to examine onomastic (name) datasets in order to explore the influences on name choices within the UK over the last 170 years. Using a large representative sample of approximately 22 million forenames from England and Wales given between 1838 and 2014, along with a complete population sample of births registered between 1996 and 2016, we demonstrate how trends in name usage can be visualised as network graphs. By exploring the structure of these graphs various patterns of name use become apparent, a consequence of external social forces, such as migration, operating in concert with internal mechanisms of change. In general, we show that the topology of network graphs can reveal naming vogues, and that naming vogues in part reflect social and demographic changes. Many name choices are consistent with a self-correcting feedback loop, whereby rarer names become common because there are virtues perceived in their rarity, yet with these perceived virtues lost upon increasing commonality. Towards the present day, we can speculate that the comparatively greater range of media, freedom of movement, and ability to maintain globally-distributed social networks increases the number of possible names, but also ensures they may more quickly be perceived as commonplace. Consequently, contemporary naming vogues are relatively short-lived with many name choices appearing a balance struck between recognisability and rarity. The data are available in multiple forms including via an easy-to-use web interface at http://demos.flourish.studio/namehistory.

  17. Name and Country of Origin dataset

    • kaggle.com
    Updated Feb 20, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amalesh Vemula (2022). Name and Country of Origin dataset [Dataset]. https://www.kaggle.com/datasets/amaleshvemula7/name-and-country-of-origin-dataset/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 20, 2022
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amalesh Vemula
    Description

    Context

    In my short research, there are no datasets related to Name Country of origin. The next step was to scrape data from individual common names lists in Wikipedia. Faker library is used to create fake data where data is scraped from publicly available data, when it comes to names they're scraped from the Wikipedia common names and other name sources.

    Content

    Dataset consists of 404062 full names from 63 different countries namely -- Bulgaria,Egypt,Canada,Laos,Thailand,Slovakia,Indonesia,Bosnia and Herzegovina,Ukraine,Japan,Israel,United Arab Emirates,Austria,Armenia,Lithuania,Turkey,Croatia,Luxembourg,Sweden,Latvia,Switzerland,Jordan,United Kingdom,Colombia,Portugal,Bangladesh,Palestine,France,Azerbaijan,Estonia,New Zealand,Saudi Arabia,India,Russia,Finland,United States,Slovenia,Mexico,Australia,Malta,Belgium,Taiwan,Philippines,Romania,Nepal,Poland,Greece,Norway,China,Cyprus,Brazil,Spain,Ireland,Czech Republic,Georgia,Italy,Hungary,Ghana,South Korea,Iran,Germany,Netherlands,Denmark.

    Acknowledgements

    This dataset wouldn't be made without the libraries faker (https://pypi.org/project/Faker/) and googletrans (https://pypi.org/project/googletrans/).

    Inspiration

    This dataset can be widely used in solving NLP problems and many text-related problems in determining Ontologies, Knowledge graphs etc.

  18. USA Names

    • console.cloud.google.com
    Updated Jul 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=de (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=de
    Explore at:
    Dataset updated
    Jul 15, 2023
    Dataset provided by
    Googlehttp://google.com/
    Area covered
    United States
    Description

    This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .

  19. Most Popular Baby Names

    • data.chhs.ca.gov
    • data.ca.gov
    • +3more
    csv, zip
    Updated Dec 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Department of Public Health (2024). Most Popular Baby Names [Dataset]. https://data.chhs.ca.gov/dataset/most-popular-baby-names-2005-current
    Explore at:
    csv(1219), zip, csv(121160)Available download formats
    Dataset updated
    Dec 30, 2024
    Dataset authored and provided by
    California Department of Public Healthhttps://www.cdph.ca.gov/
    Description

    This dataset contains ranks and counts for the top 25 baby names by sex for live births that occurred in California (by occurrence) based on information entered on birth certificates.

  20. Data from: Inventory of online public databases and repositories holding...

    • s.cnmilf.com
    • agdatacommons.nal.usda.gov
    • +3more
    Updated Apr 21, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
    Explore at:
    Dataset updated
    Apr 21, 2025
    Dataset provided by
    Agricultural Research Servicehttps://www.ars.usda.gov/
    Description

    United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, _domain-specific databases, and the top journals compare how much data is in institutional vs. _domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find _domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known _domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were _domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of _domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared _domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the _domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Organization logo

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:
2 scholarly articles cite this dataset (View in Google Scholar)
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered
United States
Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Search
Clear search
Close search
Google apps
Main menu