100+ datasets found

USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
d
Popular Baby Names
catalog.data.gov
data.cityofnewyork.us
+4more
Updated Jul 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.cityofnewyork.us (2025). Popular Baby Names [Dataset]. https://catalog.data.gov/dataset/popular-baby-names
Explore at:
Dataset updated
Jul 12, 2025
Dataset provided by
data.cityofnewyork.us
Description
Popular Baby Names by Sex and Ethnic Group Data were collected through civil birth registration. Each record represents the ranking of a baby name in the order of frequency. Data can be used to represent the popularity of a name. Caution should be used when assessing the rank of a baby name if the frequency count is close to 10; the ranking may vary year to year.
a
Facebook Names Dataset
academictorrents.com
bittorrent
Updated Nov 11, 2015
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ron Bowes (Skull Security) (2015). Facebook Names Dataset [Dataset]. https://academictorrents.com/details/e54c73099d291605e7579b90838c2cd86a8e9575
Explore at:
bittorrent(2991052604)Available download formats
Dataset updated
Nov 11, 2015
Dataset authored and provided by
Ron Bowes (Skull Security)
License
https://academictorrents.com/nolicensespecifiedhttps://academictorrents.com/nolicensespecified
Description
171 million names (100 million unique) This torrent contains: The URL of every searchable Facebook user s profile The name of every searchable Facebook user, both unique and by count (perfect for post-processing, datamining, etc) Processed lists, including first names with count, last names with count, potential usernames with count, etc The programs I used to generate everything So, there you have it: lots of awesome data from Facebook. Now, I just have to find one more problem with Facebook so I can write "Revenge of the Facebook Snatchers" and complete the trilogy. Any suggestions? >:-) Limitations So far, I have only indexed the searchable users, not their friends. Getting their friends will be significantly more data to process, and I don t have those capabilities right now. I d like to tackle that in the future, though, so if anybody has any bandwidth they d like to donate, all I need is an ssh account and Nmap installed. An additional limitation is that these are on
l
Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore
datastore.landcareresearch.co.nz
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore [Dataset]. https://datastore.landcareresearch.co.nz/dataset/plant-names-database-quarterly-changes-may-2025
Explore at:
Dataset updated
May 15, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Summary data on changes to data in the Plant Names Database in the following classes: the addition of new names for formal deprecation of duplicate names changes to the status of the name as preferred name or synonym for a taxon updating the origin or occurrence of a taxon within New Zealand applying changes to the classification of a taxon updating the scientific article that is being applied to the taxa to determine whether the name is a synonym or preferred name
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
h
us-names-by-state
huggingface.co
Updated Jul 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SNAD (2025). us-names-by-state [Dataset]. https://huggingface.co/datasets/snad-space/us-names-by-state
Explore at:
Dataset updated
Jul 1, 2025
Dataset authored and provided by
SNAD
License
https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/
Description
US Baby names

The SSA dataset with baby names: https://www.ssa.gov/OACT/babynames/

Coniferest

We use this dataset in the active anomaly discovery Python package coniferest: https://coniferest.snad.space/en/latest/notebooks/us-names.html

Update the data

Install Python packages: pip install requests aiohttp universal_pathlib pandas Optionally: download https://www.ssa.gov/OACT/babynames/state/namesbystate.zip ./run.py PATH_OR_URL_TO_namesbystate.zip, path may be… See the full description on the dataset page: https://huggingface.co/datasets/snad-space/us-names-by-state.
Baby Names by Year
kaggle.com
Updated Sep 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About this dataset

This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

Research Ideas

This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

Columns

index: the index of the dataframe

YearOfBirth: the year in which the baby was born

Name: the name of the baby

Sex: the sex of the baby

Number: the number of babies with that name and sex

Acknowledgements

If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

Data Source
Gender by Name (Time-series)
kaggle.com
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Description
Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

By Derek Howard [source]

About this dataset

This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
Good luck!

Research Ideas

Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

Generate gender neutral names - use this data to generate random names with no gender bias.

Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.
Mountain NER dataset
kaggle.com
Updated Nov 18, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Geray Gench (2023). Mountain NER dataset [Dataset]. https://www.kaggle.com/datasets/geraygench/mountain-ner-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 18, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Geray Gench
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
This dataset was made for a NER task. In this task, we need to train a named entity recognition (NER) model for the identification of mountain names inside the texts.

Each entry in the dataset corresponds to a tweet or a sentence that was generated by OpenAI's ChatGPT. It's a mixed dataset that includes a variety of tweets/texts, some of which are focused on mountain-related experiences, while others may discuss different topics.

The features of the dataset include:

Text Content: This feature contains the actual text content of each sentence/tweet. It captures the expressions, experiences, or sentiments related to mountainous regions and activities.

Markers: In the context of the provided code, the "marker" feature represents the start and end indices of the occurrences of specific mountain names within the tweet text.
Nyc popular baby names
kaggle.com
Updated Jun 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rahul Sarkar (2022). Nyc popular baby names [Dataset]. https://www.kaggle.com/datasets/rahulsarkar221/nyc-popular-baby-names
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 20, 2022
Dataset provided by
Kaggle
Authors
Rahul Sarkar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
New York
Description
This data contains popular baby names in New York .

Dataset :- 1 file (popular-baby-names.csv)

Columns - Year of Birth : Year of the baby's birth. - Gender : Gender of the baby. - Ethnicity : Types of ethnicity they belong to. - Child's First Name : The first name of the child. - Count : How many babies were named . - Ranking : Ranking of that name.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+2more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
d
Popular Baby Names - Dataset - data.sa.gov.au
data.sa.gov.au
Updated Mar 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). Popular Baby Names - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/popular-baby-names
Explore at:
Dataset updated
Mar 1, 2025
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Australia
Description
List of male and female baby names in South Australia from 1944 to 2024. The annual data for baby names is published January/February each year.
Canadian Geographical Names - CGN
open.canada.ca
catalogue.arctic-sdi.org
csv, esri rest, kml +3
Updated Jul 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Natural Resources Canada (2025). Canadian Geographical Names - CGN [Dataset]. https://open.canada.ca/data/en/dataset/e27c6eba-3c5d-4051-9db2-082dc6411c2c
Explore at:
shp, csv, kml, pdf, esri rest, wmsAvailable download formats
Dataset updated
Jul 28, 2025
Dataset provided by
Ministry of Natural Resources of Canadahttps://www.nrcan.gc.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Area covered
Canada
Description
The Canadian Geographical Names Data Base (CGNDB) is the authoritative national database of Canada's geographical names. The purpose of the CGNDB is to store place names and their attributes that have been approved by the Geographical Names Board of Canada (GNBC), the national coordinating body responsible for standards and policies on place names. The CGNDB is maintained by Natural Resources Canada, through the Canada Centre for Mapping and Earth Observation. The geographic extent of the CGNDB is the Canadian landmass and water bodies; the temporal extent is from 1897 to present. This dataset is extracted from the CGNDB on a weekly basis, and consists of current officially approved names, feature type, coordinates of the feature, decision date, source, and other attributes. The output file formats for this product are: text (CSV), Shape (SHP), and Keyhole Markup Language (KML). Content advisory: The Canadian Geographical Names Database contains historical terminology that is considered racist, offensive and derogatory. Geographical naming authorities are in the process of addressing many offensive place names, but the work is still ongoing. For more information, please contact the GNBC Secretariat.
PII-REAL-names-dataset
kaggle.com
Updated Mar 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kris Smith (2024). PII-REAL-names-dataset [Dataset]. https://www.kaggle.com/datasets/krist0phersmith/pii-real-names-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 17, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Kris Smith
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Datasets of REAL (not generated) given names and surnames.

This dataset was originally created for the PII detection competition but may be of use for many other purposes.

To see how I created it you can view this notebook: https://www.kaggle.com/code/krist0phersmith/pii-real-names-data-wrangle

I scraped names from the Facebook user data dump.

I then combined with another published data set of names from the wiki names search data.

Enjoy and feel free to add or share ideas to make this better.

Happy Kaggling!
World Gender Name Dictionary 2.0 Dataset.
tind.wipo.int
csv, zip
Updated 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
World Intellectual Property Organization. (2021). World Gender Name Dictionary 2.0 Dataset. [Dataset]. https://tind.wipo.int/record/49408
Explore at:
zip(192350735), csv(137854765), zip(19497798), csv(372274310), csv(46488874), csv(391678471), csv(1842), csv(91229769)Available download formats
Dataset updated
2021
Dataset provided by
World Intellectual Property Organizationhttp://wipo.int/
Authors
World Intellectual Property Organization.
Area covered
World
Description
This dataset revisits the first World Gender Name Dictionary (WGND 1.0), allowing to disambiguate the gender in data naming physical persons (Lax Martínez et al., 2016). We discuss its advantages and limitations and propose an expansion based on updated data and additional sources. By including more than 26 million records linking given names and 195 different countries and territories, the resulting WGND 2.0 substantially increases the international coverage of its processor. As a result, it is particularly designed to be applied to intellectual property unit-record data naming inventors, designers, individual applicants and other creators disclosed in these data.
f
Network analysis of the social and demographic influences on name choice...
plos.figshare.com
docx
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stephen J. Bush; Anna Powell-Smith; Tom C. Freeman (2023). Network analysis of the social and demographic influences on name choice within the UK (1838-2016) [Dataset]. http://doi.org/10.1371/journal.pone.0205759
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0205759
Dataset updated
May 31, 2023
Dataset provided by
PLOS ONE
Authors
Stephen J. Bush; Anna Powell-Smith; Tom C. Freeman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United Kingdom
Description
Chosen names reflect changes in societal values, personal tastes and cultural diversity. Vogues in name usage can be easily shown on a case by case basis, by plotting the rise and fall in their popularity over time. However, individual name choices are not made in isolation and trends in naming are better understood as group-level phenomena. Here we use network analysis to examine onomastic (name) datasets in order to explore the influences on name choices within the UK over the last 170 years. Using a large representative sample of approximately 22 million forenames from England and Wales given between 1838 and 2014, along with a complete population sample of births registered between 1996 and 2016, we demonstrate how trends in name usage can be visualised as network graphs. By exploring the structure of these graphs various patterns of name use become apparent, a consequence of external social forces, such as migration, operating in concert with internal mechanisms of change. In general, we show that the topology of network graphs can reveal naming vogues, and that naming vogues in part reflect social and demographic changes. Many name choices are consistent with a self-correcting feedback loop, whereby rarer names become common because there are virtues perceived in their rarity, yet with these perceived virtues lost upon increasing commonality. Towards the present day, we can speculate that the comparatively greater range of media, freedom of movement, and ability to maintain globally-distributed social networks increases the number of possible names, but also ensures they may more quickly be perceived as commonplace. Consequently, contemporary naming vogues are relatively short-lived with many name choices appearing a balance struck between recognisability and rarity. The data are available in multiple forms including via an easy-to-use web interface at http://demos.flourish.studio/namehistory.
Name and Country of Origin dataset
kaggle.com
Updated Feb 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amalesh Vemula (2022). Name and Country of Origin dataset [Dataset]. https://www.kaggle.com/datasets/amaleshvemula7/name-and-country-of-origin-dataset/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Amalesh Vemula
Description
Context

In my short research, there are no datasets related to Name Country of origin. The next step was to scrape data from individual common names lists in Wikipedia. Faker library is used to create fake data where data is scraped from publicly available data, when it comes to names they're scraped from the Wikipedia common names and other name sources.

Content

Dataset consists of 404062 full names from 63 different countries namely -- Bulgaria,Egypt,Canada,Laos,Thailand,Slovakia,Indonesia,Bosnia and Herzegovina,Ukraine,Japan,Israel,United Arab Emirates,Austria,Armenia,Lithuania,Turkey,Croatia,Luxembourg,Sweden,Latvia,Switzerland,Jordan,United Kingdom,Colombia,Portugal,Bangladesh,Palestine,France,Azerbaijan,Estonia,New Zealand,Saudi Arabia,India,Russia,Finland,United States,Slovenia,Mexico,Australia,Malta,Belgium,Taiwan,Philippines,Romania,Nepal,Poland,Greece,Norway,China,Cyprus,Brazil,Spain,Ireland,Czech Republic,Georgia,Italy,Hungary,Ghana,South Korea,Iran,Germany,Netherlands,Denmark.

Acknowledgements

This dataset wouldn't be made without the libraries faker (https://pypi.org/project/Faker/) and googletrans (https://pypi.org/project/googletrans/).

Inspiration

This dataset can be widely used in solving NLP problems and many text-related problems in determining Ontologies, Knowledge graphs etc.
USA Names
console.cloud.google.com
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=de (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=de
Explore at:
Dataset updated
Jul 15, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
United States
Description
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Most Popular Baby Names
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Dec 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2024). Most Popular Baby Names [Dataset]. https://data.chhs.ca.gov/dataset/most-popular-baby-names-2005-current
Explore at:
csv(1219), zip, csv(121160)Available download formats
Dataset updated
Dec 30, 2024
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains ranks and counts for the top 25 baby names by sex for live births that occurred in California (by occurrence) based on information entered on birth certificates.
Data from: Inventory of online public databases and repositories holding...
s.cnmilf.com
agdatacommons.nal.usda.gov
+3more
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Inventory of online public databases and repositories holding agricultural data in 2017 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/inventory-of-online-public-databases-and-repositories-holding-agricultural-data-in-2017-d4c81
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Servicehttps://www.ars.usda.gov/
Description
United States agricultural researchers have many options for making their data available online. This dataset aggregates the primary sources of ag-related data and determines where researchers are likely to deposit their agricultural data. These data serve as both a current landscape analysis and also as a baseline for future studies of ag research data. Purpose As sources of agricultural data become more numerous and disparate, and collaboration and open data become more expected if not required, this research provides a landscape inventory of online sources of open agricultural data. An inventory of current agricultural data sharing options will help assess how the Ag Data Commons, a platform for USDA-funded data cataloging and publication, can best support data-intensive and multi-disciplinary research. It will also help agricultural librarians assist their researchers in data management and publication. The goals of this study were to establish where agricultural researchers in the United States-- land grant and USDA researchers, primarily ARS, NRCS, USFS and other agencies -- currently publish their data, including general research data repositories, _domain-specific databases, and the top journals compare how much data is in institutional vs. _domain-specific vs. federal platforms determine which repositories are recommended by top journals that require or recommend the publication of supporting data ascertain where researchers not affiliated with funding or initiatives possessing a designated open data repository can publish data Approach The National Agricultural Library team focused on Agricultural Research Service (ARS), Natural Resources Conservation Service (NRCS), and United States Forest Service (USFS) style research data, rather than ag economics, statistics, and social sciences data. To find _domain-specific, general, institutional, and federal agency repositories and databases that are open to US research submissions and have some amount of ag data, resources including re3data, libguides, and ARS lists were analysed. Primarily environmental or public health databases were not included, but places where ag grantees would publish data were considered. Search methods We first compiled a list of known _domain specific USDA / ARS datasets / databases that are represented in the Ag Data Commons, including ARS Image Gallery, ARS Nutrition Databases (sub-components), SoyBase, PeanutBase, National Fungus Collection, i5K Workspace @ NAL, and GRIN. We then searched using search engines such as Bing and Google for non-USDA / federal ag databases, using Boolean variations of “agricultural data” /“ag data” / “scientific data” + NOT + USDA (to filter out the federal / USDA results). Most of these results were _domain specific, though some contained a mix of data subjects. We then used search engines such as Bing and Google to find top agricultural university repositories using variations of “agriculture”, “ag data” and “university” to find schools with agriculture programs. Using that list of universities, we searched each university web site to see if their institution had a repository for their unique, independent research data if not apparent in the initial web browser search. We found both ag specific university repositories and general university repositories that housed a portion of agricultural data. Ag specific university repositories are included in the list of _domain-specific repositories. Results included Columbia University – International Research Institute for Climate and Society, UC Davis – Cover Crops Database, etc. If a general university repository existed, we determined whether that repository could filter to include only data results after our chosen ag search terms were applied. General university databases that contain ag data included Colorado State University Digital Collections, University of Michigan ICPSR (Inter-university Consortium for Political and Social Research), and University of Minnesota DRUM (Digital Repository of the University of Minnesota). We then split out NCBI (National Center for Biotechnology Information) repositories. Next we searched the internet for open general data repositories using a variety of search engines, and repositories containing a mix of data, journals, books, and other types of records were tested to determine whether that repository could filter for data results after search terms were applied. General subject data repositories include Figshare, Open Science Framework, PANGEA, Protein Data Bank, and Zenodo. Finally, we compared scholarly journal suggestions for data repositories against our list to fill in any missing repositories that might contain agricultural data. Extensive lists of journals were compiled, in which USDA published in 2012 and 2016, combining search results in ARIS, Scopus, and the Forest Service's TreeSearch, plus the USDA web sites Economic Research Service (ERS), National Agricultural Statistics Service (NASS), Natural Resources and Conservation Service (NRCS), Food and Nutrition Service (FNS), Rural Development (RD), and Agricultural Marketing Service (AMS). The top 50 journals' author instructions were consulted to see if they (a) ask or require submitters to provide supplemental data, or (b) require submitters to submit data to open repositories. Data are provided for Journals based on a 2012 and 2016 study of where USDA employees publish their research studies, ranked by number of articles, including 2015/2016 Impact Factor, Author guidelines, Supplemental Data?, Supplemental Data reviewed?, Open Data (Supplemental or in Repository) Required? and Recommended data repositories, as provided in the online author guidelines for each the top 50 journals. Evaluation We ran a series of searches on all resulting general subject databases with the designated search terms. From the results, we noted the total number of datasets in the repository, type of resource searched (datasets, data, images, components, etc.), percentage of the total database that each term comprised, any dataset with a search term that comprised at least 1% and 5% of the total collection, and any search term that returned greater than 100 and greater than 500 results. We compared _domain-specific databases and repositories based on parent organization, type of institution, and whether data submissions were dependent on conditions such as funding or affiliation of some kind. Results A summary of the major findings from our data review: Over half of the top 50 ag-related journals from our profile require or encourage open data for their published authors. There are few general repositories that are both large AND contain a significant portion of ag data in their collection. GBIF (Global Biodiversity Information Facility), ICPSR, and ORNL DAAC were among those that had over 500 datasets returned with at least one ag search term and had that result comprise at least 5% of the total collection. Not even one quarter of the _domain-specific repositories and datasets reviewed allow open submission by any researcher regardless of funding or affiliation. See included README file for descriptions of each individual data file in this dataset. Resources in this dataset:Resource Title: Journals. File Name: Journals.csvResource Title: Journals - Recommended repositories. File Name: Repos_from_journals.csvResource Title: TDWG presentation. File Name: TDWG_Presentation.pptxResource Title: Domain Specific ag data sources. File Name: domain_specific_ag_databases.csvResource Title: Data Dictionary for Ag Data Repository Inventory. File Name: Ag_Data_Repo_DD.csvResource Title: General repositories containing ag data. File Name: general_repos_1.csvResource Title: README and file inventory. File Name: README_InventoryPublicDBandREepAgData.txt

Facebook

Twitter

Click to copy link

Link copied

Cite

Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

zip(0 bytes)Available download formats

Dataset updated

Feb 12, 2019

Dataset provided by

Data.govhttps://data.gov/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Clear search

Close search

Google apps

Main menu

USA Name Data

Context

Content

Acknowledgements

Inspiration

Popular Baby Names

Facebook Names Dataset

Plant Names Database Quarterly Changes May 2025 - Dataset - DataStore

Baby Names from Social Security Card Applications - National Data

us-names-by-state

Baby Names by Year

About this dataset

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

Research Ideas

Columns

Acknowledgements

Gender by Name (Time-series)

Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Mountain NER dataset

Nyc popular baby names

Geonames - All Cities with a population > 1000

Popular Baby Names - Dataset - data.sa.gov.au

Canadian Geographical Names - CGN

PII-REAL-names-dataset

World Gender Name Dictionary 2.0 Dataset.

Network analysis of the social and demographic influences on name choice...

Name and Country of Origin dataset

Context

Content

Acknowledgements

Inspiration

USA Names

Most Popular Baby Names

Data from: Inventory of online public databases and repositories holding...

USA Name Data

USA Name Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration