76 datasets found

US Race and Ethnicity Codes
johnsnowlabs.com
csv
Updated Jan 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Snow Labs (2021). US Race and Ethnicity Codes [Dataset]. https://www.johnsnowlabs.com/marketplace/us-race-and-ethnicity-codes/
Explore at:
csvAvailable download formats
Dataset updated
Jan 20, 2021
Dataset authored and provided by
John Snow Labs
Area covered
N/A, United States
Description
This dataset contains Race/Ethinicty codes. It is used to enter in patient demographics information.
Ethnicity coding
zenodo.org
Updated Mar 18, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paola Galdi; Paola Galdi; Luna De Ferrari; Luna De Ferrari (2025). Ethnicity coding [Dataset]. http://doi.org/10.5281/zenodo.15044385
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.15044385
Dataset updated
Mar 18, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Paola Galdi; Paola Galdi; Luna De Ferrari; Luna De Ferrari
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This Zenodo entry details the methodology for extracting and reconciling ethnicity data from the Clinical Practice Research Datalink (CPRD), incorporating both General Practitioner (GP) and Hospital Episode Statistics (HES) sources. The approach aims to resolve discrepancies between these sources and provide a standardized single ethnicity value per patient, categorized into 6 and 12 levels according to NHS coding guidelines.

Materials and Methods:

Ethnicity data from the CPRD are recorded in multiple formats. This study harmonizes these data to achieve consistent ethnicity classification across patient records, following a hierarchal reconciliation protocol prioritizing hospital data over GP records.

Ethnicity Levels: Ethnicity data are processed to conform to two levels of granularity:

Six high-level categories: White, Black, Asian, Mixed, Other, Unknown

Twelve detailed categories: Bangladeshi, Black African, Black Caribbean, Black Other, Chinese, Indian, Mixed, Other Asian, Other, Pakistani, Unknown, White

Source Data Mapping:

CPRD Medcodes: Directly mapped to 490 SNOMED codes

SNOMED to NHS Codes: SNOMED codes are linked to 26 NHS ethnicity codes

NHS to HES Codes: These NHS codes further map into 12 HES hospital ethnicities, which then consolidate into the 6 broad categories mentioned above

Algorithm (AIM-CISC):

Hospital Data Priority: Ethnicity records from hospital sources override those from GP records unless the hospital data is classified as "Unknown", null, or empty.

Conflict Resolution Within GP Data:

The frequency of recorded ethnicities determines the selection. The most frequently recorded ethnicity prevails.

If frequencies are tied, the most recent record is used.

In cases where records are equally recent, the first alphabetically listed ethnicity is selected.

Unique Patient Identifiers: Each patient is represented once in hospital data, ensuring a single source of truth for hospital-based ethnicities. This simplifies reconciliation with GP data when discrepancies arise.

Source Documentation and References:

Reference for Code Lists: Digital ethnicity data in population-wide electronic health records in England: a description of completeness, coverage, and granularity of diversity (Pineda-Moncusí et al., 2022): https://doi.org/10.1101/2022.11.11.22282217

GitHub Repository for Code Lists: https://github.com/BHFDSC/CCU037_01/blob/main/england/phenotypes/snomed_meaning_and_map_to_primary_code.csv

NHS Ethnicity Codes Documentation: https://www.datadictionary.nhs.uk/attributes/ethnic_category_code_2001.html open_in_new

Notes on mapping:

Instances were noted where multiple Medcodes map back to a single SNOMED code, highlighting the importance of careful data cross-referencing. For example, two different Medcodes represent the New Zealand European ethnicity, which both map back to the identical SNOMED code.
d
Race and Ethnicity - ACS 2018-2022 - Tempe Zip Code
catalog.data.gov
data-academy.tempe.gov
+7more
Updated May 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Race and Ethnicity - ACS 2018-2022 - Tempe Zip Code [Dataset]. https://catalog.data.gov/dataset/race-and-ethnicity-acs-2018-2022-tempe-zip-code
Explore at:
Dataset updated
May 10, 2025
Dataset provided by
City of Tempe
Area covered
Tempe
Description
This layer shows the population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2018-2022ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Data Preparation: Data table was downloaded and joined with Zip Code boundaries in the City of Tempe.Date of Census update: December 15, 2023National Figures: data.census.gov
Mapping detailed SNOMED ethnicity codes to harmonised Census 2021 ethnic...
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Nov 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2023). Mapping detailed SNOMED ethnicity codes to harmonised Census 2021 ethnic categories, England [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthinequalities/datasets/mappingdetailedsnomedethnicitycodestoharmonisedcensus2021ethniccategoriesengland
Explore at:
xlsxAvailable download formats
Dataset updated
Nov 6, 2023
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Area covered
England
Description
Comparing NHS England SNOMED code mapping with how individuals self-identified their ethnicity in Census 2021.
f
Code lists for ethnicity and the conditions considered in our study.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Nov 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Barrett, Jessica K.; Yau, Christopher; Griffin, Simon; Marshall, Tom; Nirantharakumar, Krish; Crowe, Francesca; Saunders, Catherine L.; Chen, Sida; Cooper, Jennifer; Kirk, Paul; Edwards, Duncan; Richardson, Sylvia; Jackson, Christopher (2023). Code lists for ethnicity and the conditions considered in our study. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000940558
Explore at:
Dataset updated
Nov 3, 2023
Authors
Barrett, Jessica K.; Yau, Christopher; Griffin, Simon; Marshall, Tom; Nirantharakumar, Krish; Crowe, Francesca; Saunders, Catherine L.; Chen, Sida; Cooper, Jennifer; Kirk, Paul; Edwards, Duncan; Richardson, Sylvia; Jackson, Christopher
Description
Code lists for ethnicity and the conditions considered in our study.
Race/Ethnicity of Newly Medi-Cal Eligible Individuals
data.chhs.ca.gov
healthdata.gov
+3more
csv, zip
Updated Nov 7, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health Care Services (2025). Race/Ethnicity of Newly Medi-Cal Eligible Individuals [Dataset]. https://data.chhs.ca.gov/dataset/race-ethnicity-of-newly-medi-cal-eligible-individuals
Explore at:
csv(27548), zipAvailable download formats
Dataset updated
Nov 7, 2025
Dataset provided by
California Department of Health Care Serviceshttp://www.dhcs.ca.gov/
Authors
Department of Health Care Services
Description
This dataset includes race/ethnicity of newly Medi-Cal eligible individuals who identified their race/ethnicity as Hispanic, White, Other Asian or Pacific Islander, Black, Chinese, Filipino, Vietnamese, Asian Indian, Korean, Alaskan Native or American Indian, Japanese, Cambodian, Samoan, Laotian, Hawaiian, Guamanian, Amerasian, or Other, by reporting period. The race/ethnicity data is from the Medi-Cal Eligibility Data System (MEDS) and includes eligible individuals without prior Medi-Cal Eligibility. This dataset is part of the public reporting requirements set forth in California Welfare and Institutions Code 14102.5.
t
Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes
data.tempe.gov
data-academy.tempe.gov
+8more
Updated May 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2022). Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes [Dataset]. https://data.tempe.gov/datasets/tempegov::race-and-ethnicity-acs-2016-2020-tempe-zip-codes
Explore at:
Dataset updated
May 2, 2022
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2016-2020ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Data Preparation: Data table downloaded and joined with Zip Code boundaries in the City of Tempe.Date of Census update: March 17, 2022National Figures: data.census.gov
d
RACE ETHNICITY Percent Persons by Race COS 2000
catalog.data.gov
datasets.ai
+2more
Updated Dec 2, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Commerce, Bureau of the Census, Geography Division (Point of Contact) (2020). RACE ETHNICITY Percent Persons by Race COS 2000 [Dataset]. https://catalog.data.gov/dataset/race-ethnicity-percent-persons-by-race-cos-2000
Explore at:
Dataset updated
Dec 2, 2020
Dataset provided by
U.S. Department of Commerce, Bureau of the Census, Geography Division (Point of Contact)
Description
TIGER, TIGER/Line, and Census TIGER are registered trademarks of the Bureau of the Census. The Redistricting Census 2000 TIGER/Line files are an extract of selected geographic and cartographic information from the Census TIGER data base. The geographic coverage for a single TIGER/Line file is a county or statistical equivalent entity, with the coverage area based on January 1, 2000 legal boundaries. A complete set of Redistricting Census 2000 TIGER/Line files includes all counties and statistically equivalent entities in the United States and Puerto Rico. The Redistricting Census 2000 TIGER/Line files will not include files for the Island Areas. The Census TIGER data base represents a seamless national file with no overlaps or gaps between parts. However, each county-based TIGER/Line file is designed to stand alone as an independent data set or the files can be combined to cover the whole Nation. The Redistricting Census 2000 TIGER/Line files consist of line segments representing physical features and governmental and statistical boundaries. The Redistricting Census 2000 TIGER/Line files do NOT contain the ZIP Code Tabulation Areas (ZCTAs) and the address ranges are of approximately the same vintage as those appearing in the 1999 TIGER/Line files. That is, the Census Bureau is producing the Redistricting Census 2000 TIGER/Line files in advance of the computer processing that will ensure that the address ranges in the TIGER/Line files agree with the final Master Address File (MAF) used for tabulating Census 2000. The files contain information distributed over a series of record types for the spatial objects of a county. There are 17 record types, including the basic data record, the shape coordinate points, and geographic codes that can be used with appropriate software to prepare maps. Other geographic information contained in the files includes attributes such as feature identifiers/census feature class codes (CFCC) used to differentiate feature types, address ranges and ZIP Codes, codes for legal and statistical entities, latitude/longitude coordinates of linear and point features, landmark point features, area landmarks, key geographic features, and area boundaries. The Redistricting Census 2000 TIGER/Line data dictionary contains a complete list of all the fields in the 17 record types.
Ethnic codes as defined by the Health Management Information System.
plos.figshare.com
xls
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amrit Banstola; Ashik Banstola (2023). Ethnic codes as defined by the Health Management Information System. [Dataset]. http://doi.org/10.1371/journal.pone.0071311.t001
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0071311.t001
Dataset updated
May 31, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Amrit Banstola; Ashik Banstola
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Ethnic codes as defined by the Health Management Information System.
Demographics: Population, Race, Gender Data County
kaggle.com
zip
Updated Jan 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ahmed Mohamed (2025). Demographics: Population, Race, Gender Data County [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/county-level-demographic-population-race-gender
Explore at:
zip(93210 bytes)Available download formats
Dataset updated
Jan 14, 2025
Authors
Ahmed Mohamed
Description
"""

County-Level Demographic: Population, Race, Gender

Overview

This dataset provides a detailed breakdown of demographic information for counties across the United States, derived from the U.S. Census Bureau's 2023 American Community Survey (ACS). The data includes population counts by gender, race, and ethnicity, alongside unique identifiers for each county using State and County FIPS codes.

Dataset Features

The dataset includes the following columns: - County: Name of the county. - State: Name of the state the county belongs to. - State FIPS Code: Federal Information Processing Standard (FIPS) code for the state. - County FIPS Code: FIPS code for the county. - FIPS: Combined State and County FIPS codes, a unique identifier for each county. - Total Population: Total population in the county. - Male Population: Number of males in the county. - Female Population: Number of females in the county. - Total Race Responses: Total race-related responses recorded in the survey. - White Alone: Number of individuals identifying as White alone. - Black or African American Alone: Number of individuals identifying as Black or African American alone. - Hispanic or Latino: Number of individuals identifying as Hispanic or Latino.

Processing Methodology

Source:

Data was retrieved using the U.S. Census Bureau ACS API.

County-Level Aggregation:

Each county is uniquely identified using State FIPS Code and County FIPS Code.

These codes were concatenated to form the unified FIPS column.

Data Cleaning:

All numeric columns were converted to appropriate data types.

County and state names were extracted from the raw NAME field for clarity.

Why Use This Dataset?

This dataset is highly versatile and suitable for: - Demographic Analysis: - Analyze population distribution by gender, race, and ethnicity. - Geographic Studies: - Use FIPS codes to map counties geographically. - Data Visualizations: - Create visual insights into demographic trends across counties.

File Format

The dataset is available as a CSV file with 3,000+ rows (one for each county).

Licensing

Source: Data is sourced from the U.S. Census Bureau's 2023 American Community Survey (ACS).

License: This dataset is in the public domain and provided under the U.S. Census Bureau’s terms of use. Attribution to the Census Bureau is appreciated.

Acknowledgments

Special thanks to the U.S. Census Bureau for making this data publicly available and to the Kaggle community for fostering a collaborative space for data analysis and exploration. """

👨‍👩‍👧 US Country Demographics

kaggle.com

zip

Updated Aug 14, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

mexwell (2023). 👨‍👩‍👧 US Country Demographics [Dataset]. https://www.kaggle.com/datasets/mexwell/us-country-demographics

Explore at:

zip(343499 bytes)Available download formats

Dataset updated

Aug 14, 2023

Authors

mexwell

License

http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

Area covered

United States

Description

The following data set is information obtained about counties in the United States from 2010 through 2019 through the United States Census Bureau. Information described in the data includes the age distributions, the education levels, employment statistics, ethnicity percents, houseold information, income, and other miscellneous statistics. (Values are denoted as -1, if the data is not available)

Data Dictionary

<...

Key	List of...	Comment	Example Value
County	String	County name	`"Abbeville County"`
State	String	State name	`"SC"`
Age.Percent 65 and Older	Float	Estimated percentage of population whose ages are equal or greater than 65 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).	`22.4`
Age.Percent Under 18 Years	Float	Estimated percentage of population whose ages are under 18 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).	`19.8`
Age.Percent Under 5 Years	Float	Estimated percentage of population whose ages are under 5 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).	`4.7`
Education.Bachelor's Degree or Higher	Float	Percentage for the people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019.	`15.6`
Education.High School or Higher	Float	Percentage of people whose highest degree was a high school diploma or its equivalent people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019	`81.7`
Employment.Nonemployer Establishments	Integer	An establishment is a single physical location at which business is conducted or where services or industrial operations are performed. It is not necessarily identical with a company or enterprise which may consist of one establishment or more. The data was collected from 2018.	`1416`
Ethnicities.American Indian and Alaska Native Alone	Float	Estimated percentage of population having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment. This category includes people who indicate their race as "American Indian or Alaska Native" or report entries such as Navajo Blackfeet Inupiat Yup'ik or Central American Indian groups or South American Indian groups.	`0.3`
Ethnicities.Asian Alone	Float	Estimated percentage of population having origins in any of the original peoples of the Far East Southeast Asia or the Indian subcontinent including for example Cambodia China India Japan Korea Malaysia Pakistan the Philippine Islands Thailand and Vietnam. This includes people who reported detailed Asian responses such as: "Asian Indian " "Chinese " "Filipino " "Korean " "Japanese " "Vietnamese " and "Other Asian" or provide other detailed Asian responses.	`0.4`
Ethnicities.Black Alone	Float	Estimated percentage of population having origins in any of the Black racial groups of Africa. It includes people who indicate their race as "Black or African American " or report entries such as African American Kenyan Nigerian or Haitian.	`27.6`
Ethnicities.Hispanic or Latino	Float

FiveThirtyEight Most Common Name Dataset
kaggle.com
zip
Updated Apr 26, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FiveThirtyEight (2019). FiveThirtyEight Most Common Name Dataset [Dataset]. https://www.kaggle.com/datasets/fivethirtyeight/fivethirtyeight-most-common-name-dataset/discussion
Explore at:
zip(2485070 bytes)Available download formats
Dataset updated
Apr 26, 2019
Dataset authored and provided by
FiveThirtyEighthttps://abcnews.go.com/538
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Content

Most Common Name

This directory contains the code and data behind the story Dear Mona, What’s The Most Common Name In America?

The main script file is most-common-name.R

There are four input files:

state-pop.csv - Total population and Hispanic population by state.

surnames.csv - Data on surnames from the U.S. Census Bureau, including a breakdown by race/ethnicity.

aging-curve.csv - Data from the Social Security Administration on the chances that someone born in the decade shown was still alive in 2013: http://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7.html

adjustments.csv - Taken directly from Lee Hartman's article: http://mypage.siu.edu/lhartman/johnsmith.html.

And five output files:

adjusted-name-combinations-list.csv - Adjusted estimates for the most common full names.

adjusted-name-combinations-matrix.csv - The same data from the file adjusted-name-combinations-list.csv but in matrix form. These are the estimates presented in the second (and final) table of the article.

independent-name-combinations-by-pop.csv - Matrix of estimates for the top 100 most common first names by top 100 most common surnames. These were calculated using independent odds, and displayed in the first table presented in the article.

new-top-firstNames.csv - Final estimated ranking of top first names.

new-top-surnames.csv - Final estimated ranking of top surnames.

Context

This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using GitHub's API and Kaggle's API.

This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
d
Race and Hispanic Origin - ACS 2019-2023 - Tempe Zip Codes
catalog.data.gov
performance.tempe.gov
+9more
Updated Aug 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2025). Race and Hispanic Origin - ACS 2019-2023 - Tempe Zip Codes [Dataset]. https://catalog.data.gov/dataset/race-and-hispanic-origin-acs-2019-2023-tempe-zip-codes
Explore at:
Dataset updated
Aug 23, 2025
Dataset provided by
City of Tempe
Area covered
Tempe
Description
This layer shows the population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2019-2023ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Data Preparation: Data table was downloaded and joined with Zip Code boundaries in the City of Tempe.Date of Census update: December 12, 2024National Figures: data.census.gov
Most Common Name in America
kaggle.com
zip
Updated Apr 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bojan Tunguz (2021). Most Common Name in America [Dataset]. https://www.kaggle.com/tunguz/most-common-name-in-america
Explore at:
zip(107213 bytes)Available download formats
Dataset updated
Apr 21, 2021
Authors
Bojan Tunguz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Most Common Name

This directory contains the code and data behind the story Dear Mona, What’s The Most Common Name In America?

The main script file is most-common-name.R

There are four input files:

state-pop.csv - Total population and Hispanic population by state.

surnames.csv - Data on surnames from the U.S. Census Bureau, including a breakdown by race/ethnicity.

aging-curve.csv - Data from the Social Security Administration on the chances that someone born in the decade shown was still alive in 2013: http://www.ssa.gov/oact/NOTES/as120/LifeTables_Tbl_7.html

adjustments.csv - Taken directly from Lee Hartman's article: http://mypage.siu.edu/lhartman/johnsmith.html.

And five output files:

adjusted-name-combinations-list.csv - Adjusted estimates for the most common full names.

adjusted-name-combinations-matrix.csv - The same data from the file adjusted-name-combinations-list.csv but in matrix form. These are the estimates presented in the second (and final) table of the article.

independent-name-combinations-by-pop.csv - Matrix of estimates for the top 100 most common first names by top 100 most common surnames. These were calculated using independent odds, and displayed in the first table presented in the article.

new-top-firstNames.csv - Final estimated ranking of top first names.

new-top-surnames.csv - Final estimated ranking of top surnames.
American Names by Multi-Ethnic/National Origin
kaggle.com
zip
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Teitelbaum (2023). American Names by Multi-Ethnic/National Origin [Dataset]. https://www.kaggle.com/datasets/louisteitelbaum/american-names-by-multi-ethnic-national-origin
Explore at:
zip(778154 bytes)Available download formats
Dataset updated
Aug 22, 2023
Authors
Louis Teitelbaum
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
United States
Description
This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.

Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.

Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.

This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.

DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.
2023 American Community Survey: B08105G | Means of Transportation to Work...
data.census.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2023 American Community Survey: B08105G | Means of Transportation to Work (Two or More Races) (ACS 5-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT5Y2023.B08105G?q=Otsego+County,+New+York+Employment&t=Race+and+Ethnicity
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2023
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units and the group quarters population for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2019-2023 American Community Survey 5-Year Estimates.ACS data generally reflect the geographic boundaries of legal and statistical areas as of January 1 of the estimate year. For more information, see Geography Boundaries by Year..Users must consider potential differences in geographic boundaries, questionnaire content or coding, or other methodological issues when comparing ACS data from different years. Statistically significant differences shown in ACS Comparison Profiles, or in data users' own analysis, may be the result of these differences and thus might not necessarily reflect changes to the social, economic, housing, or demographic characteristics being compared. For more information, see Comparing ACS Data..Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..Workers include members of the Armed Forces and civilians who were at work last week..The Hispanic origin and race codes were updated in 2020. For more information on the Hispanic origin and race code changes, please visit the American Community Survey Technical Documentation website..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
2022 American Community Survey: B01001B | Sex by Age (Black or African...
data.census.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2022 American Community Survey: B01001B | Sex by Age (Black or African American Alone) (ACS 1-Year Estimates Detailed Tables) [Dataset]. https://data.census.gov/table/ACSDT1Y2022.B01001B?q=Race%20and%20Ethnicity
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2022
Area covered
United States
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, the decennial census is the official source of population totals for April 1st of each decennial year. In between censuses, the Census Bureau's Population Estimates Program produces and disseminates the official estimates of the population for the nation, states, counties, cities, and towns and estimates of housing units for states and counties..Information about the American Community Survey (ACS) can be found on the ACS website. Supporting documentation including code lists, subject definitions, data accuracy, and statistical testing, and a full list of ACS tables and table shells (without estimates) can be found on the Technical Documentation section of the ACS website.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2022 American Community Survey 1-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..The Hispanic origin and race codes were updated in 2020. For more information on the Hispanic origin and race code changes, please visit the American Community Survey Technical Documentation website..The 2022 American Community Survey (ACS) data generally reflect the March 2020 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on 2020 Census data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution. For a 5-year median estimate, the margin of error associated with a median was larger than the median itself.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
2018 American Community Survey: EEOALL1RC | EEO 1RC. DETAILED CENSUS...
data.census.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2018 American Community Survey: EEOALL1RC | EEO 1RC. DETAILED CENSUS OCCUPATION BY SEX AND RACE/ETHNICITY FOR RESIDENCE GEOGRAPHY - COLLAPSED, PERCENTAGES ONLY (ACS 5-Year Estimates Equal Employment Opportunity) [Dataset]. https://data.census.gov/table/ACSEEO5Y2018.EEOALL1RC?q=A%20A%20A%20DISTRIBUTORS
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2018
Description
The EEO Tabulation is sponsored by four Federal agencies consisting of the Equal Employment Opportunity Commission (EEOC), the Employment Litigation Section of the Civil Rights Division at the Department of Justice (DOJ), the Office of Federal Contract Compliance Programs (OFCCP), and the Office of Personnel Management (OPM), and developed in conjunction with the U.S. Census Bureau..Supporting documentation on code lists and subject definitions can be found on the Equal Employment Opportunity Tabulation website. https://www.census.gov/topics/employment/equal-employment-opportunity-tabulation.html.Source: U.S. Census Bureau, 2014-2018 American Community Survey.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see https://www.census.gov/programs-surveys/acs/technical-documentation.html The effect of nonsampling error is not represented in these tables)..The U.S. Census Bureau collects race data in accordance with guidelines provided by the U.S. Office of Management and Budget (OMB). Except for the total, all race and ethnicity categories are mutually exclusive. "Black" refers to Black or African American; "AIAN" refers to American Indian and Alaska Native; and "NHPI" refers to Native Hawaiian and Other Pacific Islander. "Balance of Not Hispanic or Latino" includes the balance of non-Hispanic individuals who reported multiple races or reported Some Other Race alone. For more information on race and Hispanic origin, see the Subject Definitions at https://www.census.gov/programs-surveys/acs/technical-documentation.html..Race and Hispanic origin are separate concepts on the American Community Survey. "White alone Hispanic or Latino" includes respondents who reported Hispanic or Latino origin and reported race as "White" and no other race. "All other Hispanic or Latino" includes respondents who reported Hispanic or Latino origin and reported a race other than "White," either alone or in combination..Occupation titles and their 4-digit codes are based on the 2018 Standard Occupational Classification..The 2014-2018 American Community Survey (ACS) data generally reflect the September 2018 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineations due to differences in the effective dates of the geographic entities..Explanation of Symbols:An "-" entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution, or the margin of error associated with a median was larger than the median itself.An "(X)" means that the estimate is not applicable or not available.An "**" entry in the margin of error column indicates that either no sample observations or too few sample observations were available to compute a standard error and thus the margin of error. A statistical test is not appropriate.An "***" entry in the margin of error column indicates that the median falls in the lowest interval or upper interval of an open-ended distribution. A statistical test is not appropriate.An "*****" entry in the margin of error column indicates that the estimate is controlled. A statistical test for sampling variability is not appropriate.An "N" entry in the estimate and margin of error columns indicates that data for this geographic area cannot be displayed because the number of sample cases is too small.An "-" following a median estimate means the median falls in the lowest interval of an open-ended distribution.An "+" following a median estimate means the median falls in the upper interval of an open-ended distribution.
2020 American Community Survey: DP05 | ACS DEMOGRAPHIC AND HOUSING ESTIMATES...
data.census.gov
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ACS, 2020 American Community Survey: DP05 | ACS DEMOGRAPHIC AND HOUSING ESTIMATES (ACS 5-Year Estimates Data Profiles) [Dataset]. https://data.census.gov/table/ACSDP5Y2020.DP05
Explore at:
Dataset provided by
United States Census Bureauhttp://census.gov/
Authors
ACS
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2020
Description
Although the American Community Survey (ACS) produces population, demographic and housing unit estimates, for 2020, the 2020 Census provides the official counts of the population and housing units for the nation, states, counties, cities, and towns. For 2016 to 2019, the Population Estimates Program provides estimates of the population for the nation, states, counties, cities, and towns and intercensal housing unit estimates for the nation, states, and counties..Supporting documentation on code lists, subject definitions, data accuracy, and statistical testing can be found on the American Community Survey website in the Technical Documentation section.Sample size and data quality measures (including coverage rates, allocation rates, and response rates) can be found on the American Community Survey website in the Methodology section..Source: U.S. Census Bureau, 2016-2020 American Community Survey 5-Year Estimates.Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. The value shown here is the 90 percent margin of error. The margin of error can be interpreted roughly as providing a 90 percent probability that the interval defined by the estimate minus the margin of error and the estimate plus the margin of error (the lower and upper confidence bounds) contains the true value. In addition to sampling variability, the ACS estimates are subject to nonsampling error (for a discussion of nonsampling variability, see ACS Technical Documentation). The effect of nonsampling error is not represented in these tables..For more information on understanding race and Hispanic origin data, please see the Census 2010 Brief entitled, Overview of Race and Hispanic Origin: 2010, issued March 2011. (pdf format).The Hispanic origin and race codes were updated in 2020. For more information on the Hispanic origin and race code changes, please visit the American Community Survey Technical Documentation website..The 2016-2020 American Community Survey (ACS) data generally reflect the September 2018 Office of Management and Budget (OMB) delineations of metropolitan and micropolitan statistical areas. In certain instances, the names, codes, and boundaries of the principal cities shown in ACS tables may differ from the OMB delineation lists due to differences in the effective dates of the geographic entities..Estimates of urban and rural populations, housing units, and characteristics reflect boundaries of urban areas defined based on Census 2010 data. As a result, data for urban and rural areas from the ACS do not necessarily reflect the results of ongoing urbanization..Explanation of Symbols:- The estimate could not be computed because there were an insufficient number of sample observations. For a ratio of medians estimate, one or both of the median estimates falls in the lowest interval or highest interval of an open-ended distribution.N The estimate or margin of error cannot be displayed because there were an insufficient number of sample cases in the selected geographic area. (X) The estimate or margin of error is not applicable or not available.median- The median falls in the lowest interval of an open-ended distribution (for example "2,500-")median+ The median falls in the highest interval of an open-ended distribution (for example "250,000+").** The margin of error could not be computed because there were an insufficient number of sample observations.*** The margin of error could not be computed because the median falls in the lowest interval or highest interval of an open-ended distribution.***** A margin of error is not appropriate because the corresponding estimate is controlled to an independent population or housing estimate. Effectively, the corresponding estimate has no sampling error and the margin of error may be treated as zero.
Data from: Quantifying ethnic segregation in cities through random walks
zenodo.org
zip
Updated Feb 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sandro Sousa; Sandro Sousa; Vincenzo Nicosia; Vincenzo Nicosia (2024). Quantifying ethnic segregation in cities through random walks [Dataset]. http://doi.org/10.5281/zenodo.5521053
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5521053
Dataset updated
Feb 29, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Sandro Sousa; Sandro Sousa; Vincenzo Nicosia; Vincenzo Nicosia
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

This repository contains the coverage time distributions used to produce the figures and statistics for the paper:

S. Sousa, V. Nicosia "Quantifying ethnic segregation in cities through random walks". arXiv: https://arxiv.org/abs/2010.10462

Data

The ccp.zip file contains two subfolders with the coverage time distributions for the US and UK systems. Each file contains a line per node of the network with the format:

"Node ID" "[list with the CCT for each fraction c]"

Note that each line will always contain 101 columns where the first column identifies the node and the remaining ones represent the average coverage time to reach a fraction c of classes.

The dfa.zip file contains the following folders:

distances: each line corresponds to one repetition of the walk, it shows the area travelled by the walker, the length of the trajectory and perimeter.

exponents: The output file contains two columns, respectively for \epsilon and F(\epsilon).

results_ids: Time series of the visited nodes

The synthetic.zip file contains the coverage time distributions for the experiment with different lattice sizes (scale-test) and the experiment with distinct spatial patterns for the population distribution (topology-test). The format follows the same as in ccp.zip folder.

Code

The reader interested in replicating the methods used to create the data can obtain the python scrips in the following repository:

https://github.com/segregation-rw/ethnic-segregation-rw

Note that the repository also includes the code to simulate the CCT random walks on the adjacency graphs so that the whole simulation can be replicated.

Facebook

Twitter

Click to copy link

Link copied

Cite

John Snow Labs (2021). US Race and Ethnicity Codes [Dataset]. https://www.johnsnowlabs.com/marketplace/us-race-and-ethnicity-codes/

US Race and Ethnicity Codes

Explore at:

csvAvailable download formats

Dataset updated

Jan 20, 2021

Dataset authored and provided by

John Snow Labs

Area covered

N/A, United States

Description

This dataset contains Race/Ethinicty codes. It is used to enter in patient demographics information.

Clear search

Close search

Google apps

Main menu

US Race and Ethnicity Codes

Ethnicity coding

Materials and Methods:

Source Documentation and References:

Notes on mapping:

Race and Ethnicity - ACS 2018-2022 - Tempe Zip Code

Mapping detailed SNOMED ethnicity codes to harmonised Census 2021 ethnic...

Code lists for ethnicity and the conditions considered in our study.

Race/Ethnicity of Newly Medi-Cal Eligible Individuals

Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes

RACE ETHNICITY Percent Persons by Race COS 2000

Ethnic codes as defined by the Health Management Information System.

Demographics: Population, Race, Gender Data County

County-Level Demographic: Population, Race, Gender

Overview

Dataset Features

Processing Methodology

Why Use This Dataset?

File Format

Licensing

Acknowledgments

👨‍👩‍👧 US Country Demographics

Data Dictionary

FiveThirtyEight Most Common Name Dataset

Content

Most Common Name

Context

Acknowledgements

Race and Hispanic Origin - ACS 2019-2023 - Tempe Zip Codes

Most Common Name in America

Most Common Name

American Names by Multi-Ethnic/National Origin

2023 American Community Survey: B08105G | Means of Transportation to Work...

2022 American Community Survey: B01001B | Sex by Age (Black or African...

2018 American Community Survey: EEOALL1RC | EEO 1RC. DETAILED CENSUS...

2020 American Community Survey: DP05 | ACS DEMOGRAPHIC AND HOUSING ESTIMATES...

Data from: Quantifying ethnic segregation in cities through random walks

US Race and Ethnicity Codes