90 datasets found

USA Name Data
kaggle.com
zip
Updated Feb 12, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Feb 12, 2019
Dataset provided by
Data.govhttps://data.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?
Baby Names from Social Security Card Applications - National Data
catalog.data.gov
res1catalogd-o-tdatad-o-tgov.vcapture.xyz
Updated Jul 4, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Social Security Administration (2025). Baby Names from Social Security Card Applications - National Data [Dataset]. https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-data
Explore at:
Dataset updated
Jul 4, 2025
Dataset provided by
Social Security Administrationhttp://ssa.gov/
Description
The data (name, year of birth, sex, and number) are from a 100 percent sample of Social Security card applications for 1880 on.
U.S. First Names: Popularity and Counts
kaggle.com
Updated Jun 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Daniel Fedorov (2025). U.S. First Names: Popularity and Counts [Dataset]. https://www.kaggle.com/datasets/downshift/u-s-first-names-popularity-and-counts/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Daniel Fedorov
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
Description

This dataset contains counts and rankings of the most common first names in the United States, sourced from comprehensive name census data. It is ideal for analyzing naming trends, demographic patterns, and cultural preferences, as well as for building statistical models to explore name popularity over time.

Dataset structure

male_first_names.csv: Male first name frequencies and rankings in the U.S.

female_first_names.csv: Female first name frequencies and rankings in the U.S.
USA Names
console.cloud.google.com
Updated Jul 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
https://console.cloud.google.com/marketplace/browse?filter=partner:U.S.%20Social%20Security%20Administration&hl=de (2023). USA Names [Dataset]. https://console.cloud.google.com/marketplace/product/social-security-administration/us-names?hl=de
Explore at:
Dataset updated
Jul 15, 2023
Dataset provided by
Googlehttp://google.com/
Area covered
United States
Description
This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data. All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery .
Baby Names by Year
kaggle.com
Updated Sep 20, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Baby Names by Year [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-baby-names-by-year-of-birth/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 20, 2022
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
About this dataset

This dataset contains US baby names from the Social Security Administration dating back to 1879. With over 150 years of data, this is one of the most comprehensive datasets on baby names in the US. The data includes the name, year of birth, sex, and number of babies with that name for each year. This dataset is a great resource for anyone interested in studying baby naming trends over time

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

This dataset is a compilation of over 140 years of data from the Social Security Administration. It includes data on baby names, year of birth, and sex. There are also columns for the number of babies with that name born in that year.

This dataset can be used to track changes in baby naming trends over time, or to study how popular names have changed in popularity. It can also be used to study how naming trends differ between sexes, or between different years

Research Ideas

This dataset could be used for a number of things, including: 1. Determining baby name trends over time 2. Finding out what the most popular baby names are in the US 3. Analyzing how baby name popularity has changed over the years

Columns

index: the index of the dataframe

YearOfBirth: the year in which the baby was born

Name: the name of the baby

Sex: the sex of the baby

Number: the number of babies with that name and sex

Acknowledgements

If you use this dataset in your research, please credit @nickgott, @rflprr and the Social Security Administration via Data.gov

Data Source
h
Data from: people-names
huggingface.co
Updated Jun 26, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mo Shah (2025). people-names [Dataset]. https://huggingface.co/datasets/MuzzammilShah/people-names
Explore at:
Dataset updated
Jun 26, 2025
Authors
Mo Shah
Description
MuzzammilShah/people-names dataset hosted on Hugging Face and contributed by the HF Datasets community
US State Baby Names
kaggle.com
Updated Sep 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hassen Morad (2018). US State Baby Names [Dataset]. https://www.kaggle.com/datasets/hassenmorad/us-state-baby-names/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 14, 2018
Dataset provided by
Kaggle
Authors
Hassen Morad
Area covered
United States
Description
Contains annual baby name counts for all states (& D.C.) from 1910-2017

*Alaska & Hawaii rows are missing population data before 1950
h
fun-club-name-generator-dataset
huggingface.co
Updated Apr 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mitchell (2025). fun-club-name-generator-dataset [Dataset]. https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset
Explore at:
Dataset updated
Apr 5, 2025
Authors
Mitchell
Description
Fun Club Name Generator Dataset

This is a small, handcrafted dataset of random and fun club name ideas.The goal is to help people who are stuck naming something — whether it's a book club, a gaming group, a project, or just a Discord server between friends.

Why this?

A few friends and I spent hours trying to name a casual group — everything felt cringey, too serious, or already taken. We started writing down names that made us laugh, and eventually collected enough to… See the full description on the dataset page: https://huggingface.co/datasets/Laurenfromhere/fun-club-name-generator-dataset.
o
Geonames - All Cities with a population > 1000
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Mar 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Geonames - All Cities with a population > 1000 [Dataset]. https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/
Explore at:
csv, json, geojson, excelAvailable download formats
Dataset updated
Mar 10, 2024
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
All cities with a population > 1000 or seats of adm div (ca 80.000)Sources and ContributionsSources : GeoNames is aggregating over hundred different data sources. Ambassadors : GeoNames Ambassadors help in many countries. Wiki : A wiki allows to view the data and quickly fix error and add missing places. Donations and Sponsoring : Costs for running GeoNames are covered by donations and sponsoring.Enrichment:add country name
o
Places - United States of America
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Jun 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Places - United States of America [Dataset]. https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-place/
Explore at:
geojson, csv, json, excelAvailable download formats
Dataset updated
Jun 6, 2024
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Area covered
United States
Description
This dataset is part of the Geographical repository maintained by Opendatasoft. This dataset contains data for places and equivalent entities in United States of America.This layer both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. Processors and tools are using this data. Enhancements Add ISO 3166-3 codes. Simplify geometries to provide better performance across the services. Add administrative hierarchy.
Historic US Census - 1900
redivis.com
stanford.redivis.com
application/jsonl +7
Updated Jan 10, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Historic US Census - 1900 [Dataset]. http://doi.org/10.57761/mez6-j880
Explore at:
arrow, spss, avro, sas, application/jsonl, csv, parquet, stataAvailable download formats
Unique identifier
https://doi.org/10.57761/mez6-j880
Dataset updated
Jan 10, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Feb 1, 1900 - Dec 31, 1900
Area covered
United States
Description
Documentation

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.

Section 2

This dataset was created on 2020-01-10 22:51:40.810 by merging multiple datasets together. The source datasets for this version were:

IPUMS 1900 households: This dataset includes all households from the 1900 US census.

IPUMS 1900 persons: This dataset includes all individuals from the 1910 US census.

IPUMS 1900 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1900 datasets.

Section 3

The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.

Historic data are scarce and often only exists in aggregate tables. The key advantage of the IPUMS data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.

In sum: the IPUMS data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.

The IPUMS 1900 census data was collected in June 1900. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Gender by Name (Time-series)
kaggle.com
Updated Dec 5, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2022). Gender by Name (Time-series) [Dataset]. https://www.kaggle.com/datasets/thedevastator/automated-gender-identification-using-name-proba/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 5, 2022
Dataset provided by
Kaggle
Authors
The Devastator
Description
Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

By Derek Howard [source]

About this dataset

This dataset provides an essential tool for generating gender-specific datasets from names alone. It contains information on the probability of a person's name belonging to a certain gender, based off of US Social Security records from the last century. This makes it easy to assign genders to datasets that do not natively include this data. All probability values were culled from records with 5 or more people associated with each name - so those individuals with less common monikers can still have their genders correctly predicted! With this resource, users can generate gender-aware data in no time, making gender identification in data sets more accurate and easier than ever

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides a helpful resource when you need to accurately identify gender from names. With this dataset, you’ll be able to quickly and accurately assign genders to datasets that contain names but no other information about the person.

To get started, you will need a csv file with two columns: name and probability. The name column should contain the first names of the people in your dataset. The probability column should contain numbers between 0 and 1 indicating the likelihood that each name is associated with one specific gender (0 for male, 1 for female).

In addition to simply assigning genders from these probabilities alone, users of this dataset also have more control over their classifications - they can use it as either a baseline or as an absolute measure of accuracy depending on their exact needs/preferences. Experimentation is highly encouraged here!
Good luck!

Research Ideas

Create gender-specific applications - tailor different apps to different genders based on the probability of a particular name belonging to a certain gender.

Generate gender neutral names - use this data to generate random names with no gender bias.

Automate record lookup - quickly and accurately assign genders based on the probability associated with their name

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: name_gender.csv | Column name | Description | |:----------------|:--------------------------------------------------------------------| | name | The name of the person. (String) | | gender | The gender of the person. (String) | | probability | The probability of the gender being assigned to the person. (Float) |

Acknowledgements

If you use this dataset in your research, please credit the original authors. If you use this dataset in your research, please credit Derek Howard.
Popular White Last Names in the US
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Popular White Last Names in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-white-last-names-in-the-us/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
United States
Description
This dataset represents the popular last names in the United States for White.
d
Census Data
catalog.data.gov
data.globalchange.gov
+2more
Updated Mar 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Bureau of the Census (2024). Census Data [Dataset]. https://catalog.data.gov/dataset/census-data
Explore at:
Dataset updated
Mar 1, 2024
Dataset provided by
U.S. Bureau of the Census
Description
The Bureau of the Census has released Census 2000 Summary File 1 (SF1) 100-Percent data. The file includes the following population items: sex, age, race, Hispanic or Latino origin, household relationship, and household and family characteristics. Housing items include occupancy status and tenure (whether the unit is owner or renter occupied). SF1 does not include information on incomes, poverty status, overcrowded housing or age of housing. These topics will be covered in Summary File 3. Data are available for states, counties, county subdivisions, places, census tracts, block groups, and, where applicable, American Indian and Alaskan Native Areas and Hawaiian Home Lands. The SF1 data are available on the Bureau's web site and may be retrieved from American FactFinder as tables, lists, or maps. Users may also download a set of compressed ASCII files for each state via the Bureau's FTP server. There are over 8000 data items available for each geographic area. The full listing of these data items is available here as a downloadable compressed data base file named TABLES.ZIP. The uncompressed is in FoxPro data base file (dbf) format and may be imported to ACCESS, EXCEL, and other software formats. While all of this information is useful, the Office of Community Planning and Development has downloaded selected information for all states and areas and is making this information available on the CPD web pages. The tables and data items selected are those items used in the CDBG and HOME allocation formulas plus topics most pertinent to the Comprehensive Housing Affordability Strategy (CHAS), the Consolidated Plan, and similar overall economic and community development plans. The information is contained in five compressed (zipped) dbf tables for each state. When uncompressed the tables are ready for use with FoxPro and they can be imported into ACCESS, EXCEL, and other spreadsheet, GIS and database software. The data are at the block group summary level. The first two characters of the file name are the state abbreviation. The next two letters are BG for block group. Each record is labeled with the code and name of the city and county in which it is located so that the data can be summarized to higher-level geography. The last part of the file name describes the contents . The GEO file contains standard Census Bureau geographic identifiers for each block group, such as the metropolitan area code and congressional district code. The only data included in this table is total population and total housing units. POP1 and POP2 contain selected population variables and selected housing items are in the HU file. The MA05 table data is only for use by State CDBG grantees for the reporting of the racial composition of beneficiaries of Area Benefit activities. The complete package for a state consists of the dictionary file named TABLES, and the five data files for the state. The logical record number (LOGRECNO) links the records across tables.
d
Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...
datarade.ai
.csv, .xls, .txt
Updated Jul 5, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Alesco Data (2018). Alesco Phone ID Database - Phone Data with over 860 Million Phone Number with Carrier Name, covers 94% of the US population - available for licensing! [Dataset]. https://datarade.ai/data-products/alesco-phone-id-database-the-industry-s-largest-and-most-ac-alesco-data
Explore at:
.csv, .xls, .txtAvailable download formats
Dataset updated
Jul 5, 2018
Dataset authored and provided by
Alesco Data
Area covered
United States
Description
The Alesco Phone ID Database data ties together a consumer's true identity, and with linkage to the Alesco Power Identity Graph, we are perfectly positioned to help customers solve today's most challenging marketing, analytics, and identity resolution problems.

Our proprietary Phone ID database combines public and private sources and validates phone numbers against current and historical data 24 hours a day, 365 days a year.

With over 650 million unique phone numbers, device and service information, our one-of-a-kind solutions are now available for your marketing and identity resolution challenges in both B2C and B2B applications!

• Alesco Phone ID provides more than 860 million phone numbers monthly linked to a consumer or business name and includes landline, mobile phone number, VoIP, private and business phone numbers — all permissibly obtained and privacy-compliant and linked to other Alesco data sets

• How we do it: Alesco Phone ID is multi-sourced with daily information and delivered monthly or quarterly to clients. Our proprietary machine learning and advanced analytics processes ensure quality levels far above industry standards. Alesco processes over 100 million phone signals per day, compiling, normalizing, and standardizing phone information from 37 input sources.

• Accuracy: Each of Alesco’s phone data sources are vetted to ensure they are authoritative, giving you confidence in the accuracy of the information. Every record is validated, verified and processed to ensure the widest, most reliable coverage combined with stunning precision.

Ease of use: Alesco’s Phone ID Database is available as an on-premise phone database license, giving you full control to host and access this powerful resource on-site. Ongoing updates are provided on a monthly basis ensure your data is up to date.
Places
catalog.data.gov
Updated Jul 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau (USCB) (Point of Contact) (2025). Places [Dataset]. https://catalog.data.gov/dataset/places2
Explore at:
Dataset updated
Jul 17, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
The Places dataset was published on August 31, 2022 from the United States Census Bureau (USCB) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The TIGER/Line shapefiles include both incorporated places (legal entities) and census designated places or CDPs (statistical entities). An incorporated place is established to provide governmental functions for a concentration of people as opposed to a minor civil division (MCD), which generally is created to provide services or administer an area without regard, necessarily, to population. Places always nest within a state, but may extend across county and county subdivision boundaries. An incorporated place usually is a city, town, village, or borough, but can have other legal descriptions. CDPs are delineated for the decennial census as the statistical counterparts of incorporated places. CDPs are delineated to provide data for settled concentrations of population that are identifiable by name, but are not legally incorporated under the laws of the state in which they are located. The boundaries for CDPs often are defined in partnership with state, local, and/or tribal officials and usually coincide with visible features or the boundary of an adjacent incorporated place or another legal entity. CDP boundaries often change from one decennial census to the next with changes in the settlement pattern and development; a CDP with the same name as in an earlier census does not necessarily have the same boundary. The only population/housing size requirement for CDPs is that they must contain some housing and population. The boundaries of most incorporated places in this shapefile are as of January 1, 2022, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). The boundaries of all CDPs were delineated as part of the Census Bureau's Participant Statistical Areas Program (PSAP) for the 2020 Census, but some CDPs were added or updated through the 2022 BAS as well. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529072
United States COVID-19 Community Levels by County
data.cdc.gov
healthdata.gov
+1more
csv, xlsx, xml
Updated Nov 2, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC COVID-19 Response (2023). United States COVID-19 Community Levels by County [Dataset]. https://data.cdc.gov/Public-Health-Surveillance/United-States-COVID-19-Community-Levels-by-County/3nnm-4jni
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Nov 2, 2023
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC COVID-19 Response
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Area covered
United States
Description
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.

The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.

Using these data, the COVID-19 community level was classified as low, medium, or high.

COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.

For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

Archived Data Notes:

This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.

March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.

March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.

March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.

March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.

March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).

March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.

April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.

April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.

May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflected in this update.

May 26, 2022: COVID-19 Community Level (CCL) data released for several Florida counties for the week of May 19th, 2022, have been corrected for a data processing error. Of note, Broward, Miami-Dade, Palm Beach Counties should have appeared in the high CCL category, and Osceola County should have appeared in the medium CCL category. These corrections are reflected in this update.

May 26, 2022: COVID-19 Community Level (CCL) data released for Orange County, New York for the week of May 26, 2022 displayed an erroneous case rate of zero and a CCL category of low due to a data source error. This county should have appeared in the medium CCL category.

June 2, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a data processing error. Tolland County, CT should have appeared in the medium community level category during the week of May 26, 2022. This correction is reflected in this update.

June 9, 2022: COVID-19 Community Level (CCL) data released for Tolland County, CT for the week of May 26, 2022 have been updated to correct a misspelling. The medium community level category for Tolland County, CT on the week of May 26, 2022 was misspelled as “meduim” in the data set. This correction is reflected in this update.

June 9, 2022: COVID-19 Community Level (CCL) data released for Mississippi counties for the week of June 9, 2022 should be interpreted with caution due to a reporting cadence change over the Memorial Day holiday that resulted in artificially inflated case rates in the state.

July 7, 2022: COVID-19 Community Level (CCL) data released for Rock County, Minnesota for the week of July 7, 2022 displayed an artificially low case rate and CCL category due to a data source error. This county should have appeared in the high CCL category.

July 14, 2022: COVID-19 Community Level (CCL) data released for Massachusetts counties for the week of July 14, 2022 should be interpreted with caution due to a reporting cadence change that resulted in lower than expected case rates and CCL categories in the state.

July 28, 2022: COVID-19 Community Level (CCL) data released for all Montana counties for the week of July 21, 2022 had case rates of 0 due to a reporting issue. The case rates have been corrected in this update.

July 28, 2022: COVID-19 Community Level (CCL) data released for Alaska for all weeks prior to July 21, 2022 included non-resident cases. The case rates for the time series have been corrected in this update.

July 28, 2022: A laboratory in Nevada reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate will be inflated in Clark County, NV for the week of July 28, 2022.

August 4, 2022: COVID-19 Community Level (CCL) data was updated on August 2, 2022 in error during performance testing. Data for the week of July 28, 2022 was changed during this update due to additional case and hospital data as a result of late reporting between July 28, 2022 and August 2, 2022. Since the purpose of this data set is to provide point-in-time views of COVID-19 Community Levels on Thursdays, any changes made to the data set during the August 2, 2022 update have been reverted in this update.

August 4, 2022: COVID-19 Community Level (CCL) data for the week of July 28, 2022 for 8 counties in Utah (Beaver County, Daggett County, Duchesne County, Garfield County, Iron County, Kane County, Uintah County, and Washington County) case data was missing due to data collection issues. CDC and its partners have resolved the issue and the correction is reflected in this update.

August 4, 2022: Due to a reporting cadence change, case rates for all Alabama counties will be lower than expected. As a result, the CCL levels published on August 4, 2022 should be interpreted with caution.

August 11, 2022: COVID-19 Community Level (CCL) data for the week of August 4, 2022 for South Carolina have been updated to correct a data collection error that resulted in incorrect case data. CDC and its partners have resolved the issue and the correction is reflected in this update.

August 18, 2022: COVID-19 Community Level (CCL) data for the week of August 11, 2022 for Connecticut have been updated to correct a data ingestion error that inflated the CT case rates. CDC, in collaboration with CT, has resolved the issue and the correction is reflected in this update.

August 25, 2022: A laboratory in Tennessee reported a backlog of historic COVID-19 cases. As a result, the 7-day case count and rate may be inflated in many counties and the CCLs published on August 25, 2022 should be interpreted with caution.

August 25, 2022: Due to a data source error, the 7-day case rate for St. Louis County, Missouri, is reported as zero in the COVID-19 Community Level data released on August 25, 2022. Therefore, the COVID-19 Community Level for this county should be interpreted with caution.

September 1, 2022: Due to a reporting issue, case rates for all Nebraska counties will include 6 days of data instead of 7 days in the COVID-19 Community Level (CCL) data released on September 1, 2022. Therefore, the CCLs for all Nebraska counties should be interpreted with caution.

September 8, 2022: Due to a data processing error, the case rate for Philadelphia County, Pennsylvania,
h
palsynet-data
huggingface.co
Updated Jul 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jasir (2024). palsynet-data [Dataset]. https://huggingface.co/datasets/jasir/palsynet-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 28, 2024
Authors
Jasir
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset Card for Dataset Name

A data set of images of faces of people affected with Bell's palsy (Facial palsy).

Dataset Details Dataset Description

A data set of images of faces of people affected with Bell's palsy (Facial palsy). Created using curating and editing publically available youtube videos. Also included are images from people not affected by it, using the same method.

License: CC-BY-4.0

Uses

Can be used to train image models to detect… See the full description on the dataset page: https://huggingface.co/datasets/jasir/palsynet-data.
Popular Last Names for People of Two Or More Races in the US
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). Popular Last Names for People of Two Or More Races in the US [Dataset]. https://www.johnsnowlabs.com/marketplace/popular-last-names-for-people-of-two-or-more-races-in-the-us/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
United States
Description
This dataset represents the popular last names in the United States for people of two or more races.
LinkedIn Dataset - US People Profiles
kaggle.com
Updated May 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joseph from Proxycurl (2023). LinkedIn Dataset - US People Profiles [Dataset]. https://www.kaggle.com/datasets/proxycurl/10000-us-people-profiles/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 16, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Joseph from Proxycurl
Description
Full profile of 10,000 people in the US - download here, data schema here, with more than 40 data points including - Full Name - Education - Location - Work Experience History and many more!

There are additionally 258+ Million US people profiles available, visit the LinkDB product page here.

Our LinkDB database is an exhaustive database of publicly accessible LinkedIn people and companies profiles. It contains close to 500 Million people and companies profiles globally.

Facebook

Twitter

Click to copy link

Link copied

Cite

Data.gov (2019). USA Name Data [Dataset]. https://www.kaggle.com/datasets/datagov/usa-names

USA Name Data

USA Name Data (BigQuery Dataset)

Explore at:

zip(0 bytes)Available download formats

Dataset updated

Feb 12, 2019

Dataset provided by

Data.govhttps://data.gov/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

Context

Cultural diversity in the U.S. has led to great variations in names and naming traditions and names have been used to express creativity, personality, cultural identity, and values. Source: https://en.wikipedia.org/wiki/Naming_in_the_United_States

Content

This public dataset was created by the Social Security Administration and contains all names from Social Security card applications for births that occurred in the United States after 1879. Note that many people born before 1937 never applied for a Social Security card, so their names are not included in this data. For others who did apply, records may not show the place of birth, and again their names are not included in the data.

All data are from a 100% sample of records on Social Security card applications as of the end of February 2015. To safeguard privacy, the Social Security Administration restricts names to those with at least 5 occurrences.

Fork this kernel to get started with this dataset.

Acknowledgements

https://bigquery.cloud.google.com/dataset/bigquery-public-data:usa_names

https://cloud.google.com/bigquery/public-data/usa-names

Dataset Source: Data.gov. This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — http://www.data.gov/privacy-policy#data_policy — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Banner Photo by @dcp from Unplash.

Inspiration

What are the most common names?

What are the most common female names?

Are there more female or male names?

Female names by a wide margin?

Clear search

Close search

Google apps

Main menu

USA Name Data

Context

Content

Acknowledgements

Inspiration

Baby Names from Social Security Card Applications - National Data

U.S. First Names: Popularity and Counts

Description

Dataset structure

USA Names

Baby Names by Year

About this dataset

How to use the dataset

How to use the US Baby Names by Year of Birth dataset:

Research Ideas

Columns

Acknowledgements

Data from: people-names

US State Baby Names

fun-club-name-generator-dataset

Geonames - All Cities with a population > 1000

Places - United States of America

Historic US Census - 1900

Documentation

Section 2

Section 3

Gender by Name (Time-series)

Automated Gender Identification Using Name Probabilities

2019 US Social Security Administration Data

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Popular White Last Names in the US

Census Data

Alesco Phone ID Database - Phone Data with over 860 Million Phone Number...

Places

United States COVID-19 Community Levels by County

palsynet-data

Popular Last Names for People of Two Or More Races in the US

LinkedIn Dataset - US People Profiles

USA Name Data

USA Name Data (BigQuery Dataset)

Context

Content

Acknowledgements

Inspiration