100+ datasets found

American Names by Multi-Ethnic/National Origin
kaggle.com
zip
Updated Aug 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Louis Teitelbaum (2023). American Names by Multi-Ethnic/National Origin [Dataset]. https://www.kaggle.com/datasets/louisteitelbaum/american-names-by-multi-ethnic-national-origin
Explore at:
zip(778154 bytes)Available download formats
Dataset updated
Aug 22, 2023
Authors
Louis Teitelbaum
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Area covered
United States
Description
This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.

Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.

Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.

This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.

DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.
H
Race and ethnicity data for first, middle, and last names
dataverse.harvard.edu
search.dataone.org
Updated Apr 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Evan Rosenman; Santiago Olivella; Kosuke Imai (2023). Race and ethnicity data for first, middle, and last names [Dataset]. http://doi.org/10.7910/DVN/SGKW0K
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/SGKW0K
Dataset updated
Apr 11, 2023
Dataset provided by
Harvard Dataverse
Authors
Evan Rosenman; Santiago Olivella; Kosuke Imai
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.

👨‍👩‍👧 US Country Demographics

kaggle.com

zip

Updated Aug 14, 2023

Facebook

Twitter

Click to copy link

Link copied

Cite

mexwell (2023). 👨‍👩‍👧 US Country Demographics [Dataset]. https://www.kaggle.com/datasets/mexwell/us-country-demographics

Explore at:

zip(343499 bytes)Available download formats

Dataset updated

Aug 14, 2023

Authors

mexwell

License

http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

Area covered

United States

Description

The following data set is information obtained about counties in the United States from 2010 through 2019 through the United States Census Bureau. Information described in the data includes the age distributions, the education levels, employment statistics, ethnicity percents, houseold information, income, and other miscellneous statistics. (Values are denoted as -1, if the data is not available)

Data Dictionary

<...

Key	List of...	Comment	Example Value
County	String	County name	`"Abbeville County"`
State	String	State name	`"SC"`
Age.Percent 65 and Older	Float	Estimated percentage of population whose ages are equal or greater than 65 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).	`22.4`
Age.Percent Under 18 Years	Float	Estimated percentage of population whose ages are under 18 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).	`19.8`
Age.Percent Under 5 Years	Float	Estimated percentage of population whose ages are under 5 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).	`4.7`
Education.Bachelor's Degree or Higher	Float	Percentage for the people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019.	`15.6`
Education.High School or Higher	Float	Percentage of people whose highest degree was a high school diploma or its equivalent people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019	`81.7`
Employment.Nonemployer Establishments	Integer	An establishment is a single physical location at which business is conducted or where services or industrial operations are performed. It is not necessarily identical with a company or enterprise which may consist of one establishment or more. The data was collected from 2018.	`1416`
Ethnicities.American Indian and Alaska Native Alone	Float	Estimated percentage of population having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment. This category includes people who indicate their race as "American Indian or Alaska Native" or report entries such as Navajo Blackfeet Inupiat Yup'ik or Central American Indian groups or South American Indian groups.	`0.3`
Ethnicities.Asian Alone	Float	Estimated percentage of population having origins in any of the original peoples of the Far East Southeast Asia or the Indian subcontinent including for example Cambodia China India Japan Korea Malaysia Pakistan the Philippine Islands Thailand and Vietnam. This includes people who reported detailed Asian responses such as: "Asian Indian " "Chinese " "Filipino " "Korean " "Japanese " "Vietnamese " and "Other Asian" or provide other detailed Asian responses.	`0.4`
Ethnicities.Black Alone	Float	Estimated percentage of population having origins in any of the Black racial groups of Africa. It includes people who indicate their race as "Black or African American " or report entries such as African American Kenyan Nigerian or Haitian.	`27.6`
Ethnicities.Hispanic or Latino	Float

d
Data from: Racial and Ethnic Differences in Youth's Mental Health and...
catalog.data.gov
icpsr.umich.edu
Updated Nov 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Institute of Justice (2025). Racial and Ethnic Differences in Youth's Mental Health and Substance Needs and Services: Findings from the Survey of Youth in Residential Placement (SYRP), United States, 2003 [Dataset]. https://catalog.data.gov/dataset/racial-and-ethnic-differences-in-youths-mental-health-and-substance-needs-and-services-fin-7386e
Explore at:
Dataset updated
Nov 14, 2025
Dataset provided by
National Institute of Justice
Area covered
United States
Description
These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study examined differences in youth's mental health and substance abuse needs in seven different racial/ethnic groups of justice-involved youth. Using de-identified data from the Survey of Youth in Residential Placement (SYRP), it was assessed whether differences in mental health and substance abuse needs and services existed in a racially/ethnically diverse sample of youth in custody. Data came from a nationally representative sample of 7,073 youth in residential placements across 36 states, representing five program types. An examination of the extent to which there were racial/ethnic disparities in the delivery of services in relation to need was also conducted. This examination included assessing the differences in substance-related problems, availability of substance services, and receipt of substance-specific counseling. One SAS data file (syrp2017.sas7bdat) is included as part of this collection and has 138 variables for 7073 cases, with demographic variables on youth age, sex, race and ethnicity. Also included as part of the data collection are two SAS Program (syntax) files for use in secondary analysis of youth mental health and substance use.
Race/Ethnicity (by State of Georgia) 2019
opendata.atlantaregional.com
gisdata.fultoncountyga.gov
+1more
Updated Feb 25, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgia Association of Regional Commissions (2021). Race/Ethnicity (by State of Georgia) 2019 [Dataset]. https://opendata.atlantaregional.com/datasets/race-ethnicity-by-state-of-georgia-2019
Explore at:
Dataset updated
Feb 25, 2021
Dataset provided by
The Georgia Association of Regional Commissions
Authors
Georgia Association of Regional Commissions
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This dataset was developed by the Research & Analytics Group at the Atlanta Regional Commission using data from the U.S. Census Bureau.For a deep dive into the data model including every specific metric, see the Infrastructure Manifest. The manifest details ARC-defined naming conventions, field names/descriptions and topics, summary levels; source tables; notes and so forth for all metrics.Naming conventions:Prefixes: None Countp Percentr Ratem Mediana Mean (average)t Aggregate (total)ch Change in absolute terms (value in t2 - value in t1)pch Percent change ((value in t2 - value in t1) / value in t1)chp Change in percent (percent in t2 - percent in t1)s Significance flag for change: 1 = statistically significant with a 90% CI, 0 = not statistically significant, blank = cannot be computed Suffixes: _e19 Estimate from 2014-19 ACS_m19 Margin of Error from 2014-19 ACS_00_v19 Decennial 2000, re-estimated to 2019 geography_00_19 Change, 2000-19_e10_v19 2006-10 ACS, re-estimated to 2019 geography_m10_v19 Margin of Error from 2006-10 ACS, re-estimated to 2019 geography_e10_19 Change, 2010-19The user should note that American Community Survey data represent estimates derived from a surveyed sample of the population, which creates some level of uncertainty, as opposed to an exact measure of the entire population (the full census count is only conducted once every 10 years and does not cover as many detailed characteristics of the population). Therefore, any measure reported by ACS should not be taken as an exact number – this is why a corresponding margin of error (MOE) is also given for ACS measures. The size of the MOE relative to its corresponding estimate value provides an indication of confidence in the accuracy of each estimate. Each MOE is expressed in the same units as its corresponding measure; for example, if the estimate value is expressed as a number, then its MOE will also be a number; if the estimate value is expressed as a percent, then its MOE will also be a percent. The user should also note that for relatively small geographic areas, such as census tracts shown here, ACS only releases combined 5-year estimates, meaning these estimates represent rolling averages of survey results that were collected over a 5-year span (in this case 2015-2019). Therefore, these data do not represent any one specific point in time or even one specific year. For geographic areas with larger populations, 3-year and 1-year estimates are also available. For further explanation of ACS estimates and margin of error, visit Census ACS website.Source: U.S. Census Bureau, Atlanta Regional CommissionDate: 2015-2019Data License: Creative Commons Attribution 4.0 International (CC by 4.0)Link to the manifest: https://www.arcgis.com/sharing/rest/content/items/3d489c725bb24f52a987b302147c46ee/data
u
Race, Ethnicity and Citizenship by County 2018
gstore.unm.edu
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2020). Race, Ethnicity and Citizenship by County 2018 [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/474fef30-414f-4269-b37a-5103c84b141f/metadata/ISO-19115:2003.html
Explore at:
Dataset updated
Mar 6, 2020
Time period covered
2018
Area covered
West Bound -109.05017 East Bound -103.00196 North Bound 37.000293 South Bound 31.33217
Description
A broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
f
Example of assigning ethnic class using Ethnicity Estimator.
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Aug 9, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Longley, Paul A.; Kandt, Jens (2018). Example of assigning ethnic class using Ethnicity Estimator. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000672037
Explore at:
Dataset updated
Aug 9, 2018
Authors
Longley, Paul A.; Kandt, Jens
Description
Example of assigning ethnic class using Ethnicity Estimator.
d
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
catalog.data.gov
data.ct.gov
+2more
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/5991c4f8-db89-49d1-a501-1f18e7371e21/metadata/FGDC-STD-001-1998.html
Explore at:
zip(1), csv(5), xls(5), geojson(5), gml(5), json(5), kml(5), shp(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2017
Area covered
New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
Description
A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/e9452dfe-44e1-4435-b2ed-77923feb84a2/metadata/FGDC-STD-001-1998.html
Explore at:
kml(5), gml(5), zip(1), json(5), csv(5), xls(5), geojson(5), shp(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2016
Area covered
West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217, New Mexico
Description
A broad and generalized selection of 2012-2016 US Census Bureau 2016 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
h
autotrain-data-ethnicity-test_v003
huggingface.co
Updated Apr 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris LeDoux (2023). autotrain-data-ethnicity-test_v003 [Dataset]. https://huggingface.co/datasets/cledoux42/autotrain-data-ethnicity-test_v003
Explore at:
Dataset updated
Apr 9, 2023
Authors
Chris LeDoux
Description
AutoTrain Dataset for project: ethnicity-test_v003

Dataset Description

This dataset has been automatically processed by AutoTrain for project ethnicity-test_v003.

Languages

The BCP-47 code for the dataset's language is unk.

Dataset Structure Data Instances

A sample from this dataset looks as follows: [ { "image": "<512x512 RGB PIL image>", "target": 1 }, { "image": "<512x512 RGB PIL image>", "target": 3 }]… See the full description on the dataset page: https://huggingface.co/datasets/cledoux42/autotrain-data-ethnicity-test_v003.
u
American Community Survey
gstore.unm.edu
csv, geojson, gml +5
Updated Mar 6, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/e0430ebf-d4b7-48c2-8fb8-dbdd0858e807/metadata/FGDC-STD-001-1998.html
Explore at:
kml(5), shp(5), gml(5), geojson(5), zip(1), xls(5), json(5), csv(5)Available download formats
Dataset updated
Mar 6, 2020
Dataset provided by
Earth Data Analysis Center
Time period covered
2015
Area covered
New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
Description
A broad and generalized selection of 2011-2015 US Census Bureau 2015 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
d
Loudoun County 2020 Census Population Patterns by Race and Hispanic or...
catalog.data.gov
data.virginia.gov
+2more
Updated Nov 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Loudoun County GIS (2025). Loudoun County 2020 Census Population Patterns by Race and Hispanic or Latino Ethnicity [Dataset]. https://catalog.data.gov/dataset/loudoun-county-2020-census-population-patterns-by-race-and-hispanic-or-latino-ethnicity
Explore at:
Dataset updated
Nov 15, 2025
Dataset provided by
Loudoun County GIS
Area covered
Loudoun County
Description
Use this application to view the pattern of concentrations of people by race and Hispanic or Latino ethnicity. Data are provided at the U.S. Census block group level, one of the smallest Census geographies, to provide a detailed picture of these patterns. The data is sourced from the U.S Census Bureau, 2020 Census Redistricting Data (Public Law 94-171) Summary File. Definitions: Definitions of the Census Bureau’s categories are provided below. This interactive map shows patterns for all categories except American Indian or Alaska Native and Native Hawaiian or Other Pacific Islander. The total population countywide for these two categories is small (1,582 and 263 respectively). The Census Bureau uses the following race categories:Population by RaceWhite – A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.Black or African American – A person having origins in any of the Black racial groups of Africa.American Indian or Alaska Native – A person having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment.Asian – A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.Native Hawaiian or Other Pacific Islander – A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.Some Other Race - this category is chosen by people who do not identify with any of the categories listed above. People can identify with more than one race. These people are included in the Two or More Races Hispanic or Latino PopulationThe Hispanic/Latino population is an ethnic group. Hispanic/Latino people may be of any race.Other layers provided in this tool included the Loudoun County Census block groups, towns and Dulles airport, and the Loudoun County 2021 aerial imagery.
f
Data from: Exploring intergenerational, intra-generational and transnational...
brunel.figshare.com
bin
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christina Victor; Vanessa Burholt (2023). Exploring intergenerational, intra-generational and transnational patterns of family caring in minority ethnic communities: the example of England and Wales dataset [Dataset]. http://doi.org/10.17633/rd.brunel.7560392.v1
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.17633/rd.brunel.7560392.v1
Dataset updated
May 31, 2023
Dataset provided by
Brunel University London
Authors
Christina Victor; Vanessa Burholt
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Wales, England
Description
This data set consists of 17 variables that underpin the analysis of the paper entitled Exploring intergenerational, intra-generational and transnational patterns of family caring in minority ethnic communities: the example of England and Wales published in the International Journal of Care and Caring.

The methodology for the survey is described in the paper.
CalEnviroScreen 4.0 and Race/Ethnicity Analysis
data.ca.gov
catalog.data.gov
Updated Oct 10, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Office of Environmental Health Hazard Assessment (2024). CalEnviroScreen 4.0 and Race/Ethnicity Analysis [Dataset]. https://data.ca.gov/dataset/calenviroscreen-4-0-and-race-ethnicity-analysis
Explore at:
arcgis geoservices rest api, htmlAvailable download formats
Dataset updated
Oct 10, 2024
Dataset authored and provided by
California Office of Environmental Health Hazard Assessmenthttp://www.oehha.ca.gov/
Description
CalEnviroScreen scores represent a combined measure of pollution and the potential vulnerability of a population to the effects of pollution. Like the previous versions, CalEnviroScreen 4.0 does not include indicators of race/ethnicity or age. However, the distribution of the CalEnviroScreen 4.0 cumulative impact scores by race or ethnicity is important. This information can be used to better understand issues related to environmental justice and racial equity in California. CalEPAs racial equity team has released a StoryMap using CalEnviroScreen 3.0 data that examines the connection between racist land use practices of the 1930s and the persistence of environmental injustice. The CalEPA StoryMap, along with this analysis, are examples of information that can be used to better understand issues related to environmental justice and racial equity in California.
t
Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes
data.tempe.gov
data-academy.tempe.gov
+8more
Updated May 2, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2022). Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes [Dataset]. https://data.tempe.gov/datasets/tempegov::race-and-ethnicity-acs-2016-2020-tempe-zip-codes
Explore at:
Dataset updated
May 2, 2022
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2016-2020ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Data Preparation: Data table downloaded and joined with Zip Code boundaries in the City of Tempe.Date of Census update: March 17, 2022National Figures: data.census.gov
V
Mean income per capita by race/ethnicity by locality ( Per capita income in...
data.virginia.gov
csv
Updated Feb 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Other (2024). Mean income per capita by race/ethnicity by locality ( Per capita income in the past 12 months in 2019 inflation-adjusted dollars) [Dataset]. https://data.virginia.gov/dataset/mean-income-per-capita-by-race-ethnicity-by-locality-per-capita-income-in-the-past-12-months-in-2019
Explore at:
csv(11871)Available download formats
Dataset updated
Feb 5, 2024
Dataset authored and provided by
Other
License
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Description
This table shows the mean income per capita in each locality by the following race/ethicity identifiers: White alone; Black or African American alone; White alone, not Hispanic or Latino; and Hispanic or Latino.

There is no data returned from the Census api for the following race/ethnicity identifiers: American Indian and Alaska Native Alone; Asian Alone; Native Hawaiian and Other Pacific Islander Alone; Some Other Race Alone; and Two or More Races.

Information on this dataset from https://censusreporter.org/topics/income/ Table B19301, "Per Capita income", is simply the value for B19313 "Aggregate Income" divided by the total population estimate for the summary geography. This statistic is more or less the 'average' income. Note the potential for misunderstanding: A) the aggregate income is divided among all people, not only those who actually had income, and B) as with any average, outliers (very big earners) can have a disproportionate effect on resulting figure.

Explanation of value = -666666666 : A '-' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.
t
Race and Ethnicity - ACS 2018-2022 - Tempe Tracts
data.tempe.gov
performance.tempe.gov
+6more
Updated Jan 12, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2024). Race and Ethnicity - ACS 2018-2022 - Tempe Tracts [Dataset]. https://data.tempe.gov/datasets/tempegov::race-and-ethnicity-acs-2018-2022-tempe-tracts
Explore at:
Dataset updated
Jan 12, 2024
Dataset authored and provided by
City of Tempe
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered

Description
This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.This layer is symbolized to show the percent of population that is Hispanic or Latino. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). To view only the census tracts that are predominantly in Tempe, add the expression City is Tempe in the map filter settings.A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2018-2022ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community SurveyData Preparation: Data curated from Esri Living Atlas clipped to Census Tract boundaries that are within or adjacent to the City of Tempe boundaryDate of Census update: December 15, 2023National Figures: data.census.gov
d
Race and Ethnicity - ACS 2017-2021 - Tempe Tracts
datasets.ai
performance.tempe.gov
+8more
15, 21, 25, 3, 57, 8
Updated Jan 13, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Tempe (2023). Race and Ethnicity - ACS 2017-2021 - Tempe Tracts [Dataset]. https://datasets.ai/datasets/race-and-ethnicity-acs-2017-2021-tempe-tracts-a63d5
Explore at:
57, 25, 21, 3, 15, 8Available download formats
Dataset updated
Jan 13, 2023
Dataset authored and provided by
City of Tempe
Area covered
Tempe
Description
This layer shows population broken down by race and Hispanic origin.
Data is from US Census American Community Survey (ACS) 5-year estimates.

This layer is symbolized to show the percent of population that is Hispanic or Latino. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). To view only the census tracts that are predominantly in Tempe, add the expression City is Tempe in the map filter settings.

A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).

Vintage: 2017-2021
ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)
Data downloaded from: Census Bureau's API for American Community Survey
Data Preparation: Data curated from Esri Living Atlas clipped to Census Tract boundaries that are within or adjacent to the City of Tempe boundary
Date of Census update: December 8, 2022
National Figures: data.census.gov

Additional Census data notes and data processing notes are available at the Esri Living Atlas Layer:
https://tempegov.maps.arcgis.com/home/item.html?id=23ab8028f1784de4b0810104cd5d1c8f&view=list&sortOrder=desc&sortField=defaultFSOrder#overview
(Esri's Living Atlas always shows latest data)

</div>
Datasheet3_Assessing disparities through missing race and ethnicity data:...
frontiersin.figshare.com
datasetcatalog.nlm.nih.gov
pdf
Updated Jul 24, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katelyn M. Banschbach; Jade Singleton; Xing Wang; Sheetal S. Vora; Julia G. Harris; Ashley Lytch; Nancy Pan; Julia Klauss; Danielle Fair; Erin Hammelev; Mileka Gilbert; Connor Kreese; Ashley Machado; Peter Tarczy-Hornoch; Esi M. Morgan (2024). Datasheet3_Assessing disparities through missing race and ethnicity data: results from a juvenile arthritis registry.pdf [Dataset]. http://doi.org/10.3389/fped.2024.1430981.s003
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fped.2024.1430981.s003
Dataset updated
Jul 24, 2024
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Katelyn M. Banschbach; Jade Singleton; Xing Wang; Sheetal S. Vora; Julia G. Harris; Ashley Lytch; Nancy Pan; Julia Klauss; Danielle Fair; Erin Hammelev; Mileka Gilbert; Connor Kreese; Ashley Machado; Peter Tarczy-Hornoch; Esi M. Morgan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
IntroductionEnsuring high-quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieving the goal of inclusion of racial and ethnic minorities in scientific research and detecting disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network and assess impact of improved data completion on conclusions drawn from the registry.MethodsThis is a mixed-methods quality improvement study that consisted of five parts, as follows: (1) Identifying baseline missing race and ethnicity data, (2) Surveying current collection and entry, (3) Completing data through audit and feedback cycles, (4) Assessing the impact on outcome measures, and (5) Conducting participant interviews and thematic analysis.ResultsAcross six participating centers, 29% of the patients were missing data on race and 31% were missing data on ethnicity. Of patients missing data, most patients were missing both race and ethnicity. Rates of missingness varied by data entry method (electronic vs. manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared with patients with non-missing race and ethnicity data at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow-up compared with White patients. There was no significant change in odds ratio of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values, which may affect the ability to detect changes in odds ratio of cJADAS ≥5 after completion.ConclusionsAbout one-third of the patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data were not uniformly distributed compared with those with non-missing race and ethnicity data at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.

Facebook

Twitter

Click to copy link

Link copied

Cite

Louis Teitelbaum (2023). American Names by Multi-Ethnic/National Origin [Dataset]. https://www.kaggle.com/datasets/louisteitelbaum/american-names-by-multi-ethnic-national-origin

American Names by Multi-Ethnic/National Origin

25,540 Americans, 491 Overlapping Ethnic/National Categories

Explore at:

zip(778154 bytes)Available download formats

Dataset updated

Aug 22, 2023

Authors

Louis Teitelbaum

License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Area covered

United States

Description

This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.

Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.

Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.

This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.

DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.

Clear search

Close search

Google apps

Main menu

American Names by Multi-Ethnic/National Origin

Race and ethnicity data for first, middle, and last names

👨‍👩‍👧 US Country Demographics

Data Dictionary

Data from: Racial and Ethnic Differences in Youth's Mental Health and...

Race/Ethnicity (by State of Georgia) 2019

Race, Ethnicity and Citizenship by County 2018

Example of assigning ethnic class using Ethnicity Estimator.

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

American Community Survey

American Community Survey

autotrain-data-ethnicity-test_v003

American Community Survey

Loudoun County 2020 Census Population Patterns by Race and Hispanic or...

Data from: Exploring intergenerational, intra-generational and transnational...

CalEnviroScreen 4.0 and Race/Ethnicity Analysis

Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes

Mean income per capita by race/ethnicity by locality ( Per capita income in...

Race and Ethnicity - ACS 2018-2022 - Tempe Tracts

Race and Ethnicity - ACS 2017-2021 - Tempe Tracts

This layer shows population broken down by race and Hispanic origin.

Datasheet3_Assessing disparities through missing race and ethnicity data:...

American Names by Multi-Ethnic/National Origin

25,540 Americans, 491 Overlapping Ethnic/National Categories