https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.
All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.
Fork this kernel to get started with this dataset.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names
https://cloud.google.com/bigquery/public-data/usa-names
Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @dcp from Unplash.
What are the most common names?
What are the most common female names?
Are there more female or male names?
Female names by a wide margin?
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
This dataset contains counts and rankings of the most common first names in the United States, sourced from comprehensive name census data. It is ideal for analyzing naming trends, demographic patterns, and cultural preferences, as well as for building statistical models to explore name popularity over time.
male_first_names.csv
: Male first name frequencies and rankings in the U.S.
female_first_names.csv
: Female first name frequencies and rankings in the U.S.
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time
This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.
This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years
This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years
If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov
MuzzammilShah/people-names dataset hosted on Hugging Face and contributed by the HF Datasets community
Contains annual baby name counts for all states (& D.C.) from 1910-2017
*Alaska & Hawaii rows are missing population data before 1950
Fun Club Name Generator Dataset
This is a small, handcrafted dataset of random and fun club name ideas.The goal is to help people who are stuck naming something — whether it's a book club, a gaming group, a project, or just a Discord server between friends.
Why this?
A few friends and I spent hours trying to name a casual group — everything felt cringey, too serious, or already taken. We started writing down names that made us laugh, and eventually collected enough to… See the full description on the dataset page: https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
This dataset is part of the Geographical repository maintained by Opendatasoft. This dataset contains data for places and equivalent entities in United States of America.This layer both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. Processors and tools are using this data. Enhancements Add ISO 3166-3 codes. Simplify geometries to provide better performance across the services. Add administrative hierarchy.
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
This dataset was created on 2020-01-10 22:51:40.810
by merging multiple datasets together. The source datasets for this version were:
IPUMS 1900 households: This dataset includes all households from the 1900 US census.
IPUMS 1900 persons: This dataset includes all individuals from the 1910 US census.
IPUMS 1900 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1900 datasets.
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
By Derek Howard [source]
This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.
To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).
In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
Good luck!
Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.
Generate gender neutral names - use this data to generate random names with no gender bias.
Automate record lookup - quickly and accurately assign genders based on the probability associated with their name
If you use this dataset in your research, please credit the original authors.
License
Unknown License - Please check the dataset description for more information.
File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |
If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.
This dataset represents the popular last names in the United States for White.
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
The Alesco Phone ID Database data ties together a consumer's true identity, and with linkage to the Alesco Power Identity Graph, we are perfectly positioned to help customers solve today's most challenging marketing, analytics, and identity resolution problems.
Our proprietary Phone ID database combines public and private sources and validates phone numbers against current and historical data 24 hours a day, 365 days a year.
With over 650 million unique phone numbers, device and service information, our one-of-a-kind solutions are now available for your marketing and identity resolution challenges in both B2C and B2B applications!
• Alesco Phone ID provides more than 860 million phone numbers monthly linked to a consumer or business name and includes landline, mobile phone number, VoIP, private and business phone numbers — all permissibly obtained and privacy-compliant and linked to other Alesco data sets
• How we do it: Alesco Phone ID is multi-sourced with daily information and delivered monthly or quarterly to clients. Our proprietary machine learning and advanced analytics processes ensure quality levels far above industry standards. Alesco processes over 100 million phone signals per day, compiling, normalizing, and standardizing phone information from 37 input sources.
• Accuracy: Each of Alesco’s phone data sources are vetted to ensure they are authoritative, giving you confidence in the accuracy of the information. Every record is validated, verified and processed to ensure the widest, most reliable coverage combined with stunning precision.
Ease of use: Alesco’s Phone ID Database is available as an on-premise phone database license, giving you full control to host and access this powerful resource on-site. Ongoing updates are provided on a monthly basis ensure your data is up to date.
The Places dataset was published on August 31, 2022 from the United States Census Bureau (USCB) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2022, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census, but some CDPs were added or updated through the 2022 BAS as well. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529072
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.
The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.
Using these data, the COVID-19 community level was classified as low, medium, or high.
COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.
For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.
Archived Data Notes:
This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.
March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.
March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.
March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.
March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.
March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).
March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.
April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.
April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.
May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.
May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.
June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.
June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.
July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.
July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.
July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.
July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.
July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.
August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.
August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.
August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.
August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.
August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.
August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.
September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.
September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset Card for Dataset Name
A data set of images of faces of people affected with Bell's palsy (Facial palsy).
Dataset Details
Dataset Description
A data set of images of faces of people affected with Bell's palsy (Facial palsy). Created using curating and editing publically available youtube videos. Also included are images from people not affected by it, using the same method.
License: CC-BY-4.0
Uses
Can be used to train image models to detect… See the full description on the dataset page: https://huggingface.co/datasets/jasir/palsynet-data.
This dataset represents the popular last names in the United States for people of two or more races.
Full profile of 10,000 people in the US - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!
There are additionally 258+ Million US people profiles available, visit the LinkDB product page here.
Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.
All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.
Fork this kernel to get started with this dataset.
https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names
https://cloud.google.com/bigquery/public-data/usa-names
Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Banner Photo by @dcp from Unplash.
What are the most common names?
What are the most common female names?
Are there more female or male names?
Female names by a wide margin?