100+ datasets found
  1. American Names by Multi-Ethnic/National Origin

    • kaggle.com
    zip
    Updated Aug 22, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Louis Teitelbaum (2023). American Names by Multi-Ethnic/National Origin [Dataset]. https://www.kaggle.com/datasets/louisteitelbaum/american-names-by-multi-ethnic-national-origin
    Explore at:
    zip(778154 bytes)Available download formats
    Dataset updated
    Aug 22, 2023
    Authors
    Louis Teitelbaum
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.

    Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.

    Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.

    This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.

    DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.

  2. H

    Race and ethnicity data for first, middle, and last names

    • dataverse.harvard.edu
    • search.dataone.org
    Updated Apr 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Evan Rosenman; Santiago Olivella; Kosuke Imai (2023). Race and ethnicity data for first, middle, and last names [Dataset]. http://doi.org/10.7910/DVN/SGKW0K
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 11, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Evan Rosenman; Santiago Olivella; Kosuke Imai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.

  3. 👨‍👩‍👧 US Country Demographics

    • kaggle.com
    zip
    Updated Aug 14, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    mexwell (2023). 👨‍👩‍👧 US Country Demographics [Dataset]. https://www.kaggle.com/datasets/mexwell/us-country-demographics
    Explore at:
    zip(343499 bytes)Available download formats
    Dataset updated
    Aug 14, 2023
    Authors
    mexwell
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    United States
    Description

    The following data set is information obtained about counties in the United States from 2010 through 2019 through the United States Census Bureau. Information described in the data includes the age distributions, the education levels, employment statistics, ethnicity percents, houseold information, income, and other miscellneous statistics. (Values are denoted as -1, if the data is not available)

    Data Dictionary

    <...

    KeyList of...CommentExample Value
    CountyStringCounty name"Abbeville County"
    StateStringState name"SC"
    Age.Percent 65 and OlderFloatEstimated percentage of population whose ages are equal or greater than 65 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).22.4
    Age.Percent Under 18 YearsFloatEstimated percentage of population whose ages are under 18 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).19.8
    Age.Percent Under 5 YearsFloatEstimated percentage of population whose ages are under 5 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico).4.7
    Education.Bachelor's Degree or HigherFloatPercentage for the people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019.15.6
    Education.High School or HigherFloatPercentage of people whose highest degree was a high school diploma or its equivalent people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 201981.7
    Employment.Nonemployer EstablishmentsIntegerAn establishment is a single physical location at which business is conducted or where services or industrial operations are performed. It is not necessarily identical with a company or enterprise which may consist of one establishment or more. The data was collected from 2018.1416
    Ethnicities.American Indian and Alaska Native AloneFloatEstimated percentage of population having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment. This category includes people who indicate their race as "American Indian or Alaska Native" or report entries such as Navajo Blackfeet Inupiat Yup'ik or Central American Indian groups or South American Indian groups.0.3
    Ethnicities.Asian AloneFloatEstimated percentage of population having origins in any of the original peoples of the Far East Southeast Asia or the Indian subcontinent including for example Cambodia China India Japan Korea Malaysia Pakistan the Philippine Islands Thailand and Vietnam. This includes people who reported detailed Asian responses such as: "Asian Indian " "Chinese " "Filipino " "Korean " "Japanese " "Vietnamese " and "Other Asian" or provide other detailed Asian responses.0.4
    Ethnicities.Black AloneFloatEstimated percentage of population having origins in any of the Black racial groups of Africa. It includes people who indicate their race as "Black or African American " or report entries such as African American Kenyan Nigerian or Haitian.27.6
    Ethnicities.Hispanic or LatinoFloat
  4. d

    Data from: Racial and Ethnic Differences in Youth's Mental Health and...

    • catalog.data.gov
    • icpsr.umich.edu
    Updated Nov 14, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Institute of Justice (2025). Racial and Ethnic Differences in Youth's Mental Health and Substance Needs and Services: Findings from the Survey of Youth in Residential Placement (SYRP), United States, 2003 [Dataset]. https://catalog.data.gov/dataset/racial-and-ethnic-differences-in-youths-mental-health-and-substance-needs-and-services-fin-7386e
    Explore at:
    Dataset updated
    Nov 14, 2025
    Dataset provided by
    National Institute of Justice
    Area covered
    United States
    Description

    These data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study examined differences in youth's mental health and substance abuse needs in seven different racial/ethnic groups of justice-involved youth. Using de-identified data from the Survey of Youth in Residential Placement (SYRP), it was assessed whether differences in mental health and substance abuse needs and services existed in a racially/ethnically diverse sample of youth in custody. Data came from a nationally representative sample of 7,073 youth in residential placements across 36 states, representing five program types. An examination of the extent to which there were racial/ethnic disparities in the delivery of services in relation to need was also conducted. This examination included assessing the differences in substance-related problems, availability of substance services, and receipt of substance-specific counseling. One SAS data file (syrp2017.sas7bdat) is included as part of this collection and has 138 variables for 7073 cases, with demographic variables on youth age, sex, race and ethnicity. Also included as part of the data collection are two SAS Program (syntax) files for use in secondary analysis of youth mental health and substance use.

  5. Race/Ethnicity (by State of Georgia) 2019

    • opendata.atlantaregional.com
    • gisdata.fultoncountyga.gov
    • +1more
    Updated Feb 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Georgia Association of Regional Commissions (2021). Race/Ethnicity (by State of Georgia) 2019 [Dataset]. https://opendata.atlantaregional.com/datasets/race-ethnicity-by-state-of-georgia-2019
    Explore at:
    Dataset updated
    Feb 25, 2021
    Dataset provided by
    The Georgia Association of Regional Commissions
    Authors
    Georgia Association of Regional Commissions
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This dataset was developed by the Research & Analytics Group at the Atlanta Regional Commission using data from the U.S. Census Bureau.For a deep dive into the data model including every specific metric, see the Infrastructure Manifest. The manifest details ARC-defined naming conventions, field names/descriptions and topics, summary levels; source tables; notes and so forth for all metrics.Naming conventions:Prefixes: None Countp Percentr Ratem Mediana Mean (average)t Aggregate (total)ch Change in absolute terms (value in t2 - value in t1)pch Percent change ((value in t2 - value in t1) / value in t1)chp Change in percent (percent in t2 - percent in t1)s Significance flag for change: 1 = statistically significant with a 90% CI, 0 = not statistically significant, blank = cannot be computed Suffixes: _e19 Estimate from 2014-19 ACS_m19 Margin of Error from 2014-19 ACS_00_v19 Decennial 2000, re-estimated to 2019 geography_00_19 Change, 2000-19_e10_v19 2006-10 ACS, re-estimated to 2019 geography_m10_v19 Margin of Error from 2006-10 ACS, re-estimated to 2019 geography_e10_19 Change, 2010-19The user should note that American Community Survey data represent estimates derived from a surveyed sample of the population, which creates some level of uncertainty, as opposed to an exact measure of the entire population (the full census count is only conducted once every 10 years and does not cover as many detailed characteristics of the population). Therefore, any measure reported by ACS should not be taken as an exact number – this is why a corresponding margin of error (MOE) is also given for ACS measures. The size of the MOE relative to its corresponding estimate value provides an indication of confidence in the accuracy of each estimate. Each MOE is expressed in the same units as its corresponding measure; for example, if the estimate value is expressed as a number, then its MOE will also be a number; if the estimate value is expressed as a percent, then its MOE will also be a percent. The user should also note that for relatively small geographic areas, such as census tracts shown here, ACS only releases combined 5-year estimates, meaning these estimates represent rolling averages of survey results that were collected over a 5-year span (in this case 2015-2019). Therefore, these data do not represent any one specific point in time or even one specific year. For geographic areas with larger populations, 3-year and 1-year estimates are also available. For further explanation of ACS estimates and margin of error, visit Census ACS website.Source: U.S. Census Bureau, Atlanta Regional CommissionDate: 2015-2019Data License: Creative Commons Attribution 4.0 International (CC by 4.0)Link to the manifest: https://www.arcgis.com/sharing/rest/content/items/3d489c725bb24f52a987b302147c46ee/data

  6. u

    Race, Ethnicity and Citizenship by County 2018

    • gstore.unm.edu
    Updated Mar 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2020). Race, Ethnicity and Citizenship by County 2018 [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/474fef30-414f-4269-b37a-5103c84b141f/metadata/ISO-19115:2003.html
    Explore at:
    Dataset updated
    Mar 6, 2020
    Time period covered
    2018
    Area covered
    West Bound -109.05017 East Bound -103.00196 North Bound 37.000293 South Bound 31.33217
    Description

    A broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.

  7. f

    Example of assigning ethnic class using Ethnicity Estimator.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Aug 9, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Longley, Paul A.; Kandt, Jens (2018). Example of assigning ethnic class using Ethnicity Estimator. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000672037
    Explore at:
    Dataset updated
    Aug 9, 2018
    Authors
    Longley, Paul A.; Kandt, Jens
    Description

    Example of assigning ethnic class using Ethnicity Estimator.

  8. d

    COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

    • catalog.data.gov
    • data.ct.gov
    • +2more
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    data.ct.gov
    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical

  9. u

    American Community Survey

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Mar 6, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/5991c4f8-db89-49d1-a501-1f18e7371e21/metadata/FGDC-STD-001-1998.html
    Explore at:
    zip(1), csv(5), xls(5), geojson(5), gml(5), json(5), kml(5), shp(5)Available download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    2017
    Area covered
    New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
    Description

    A broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.

  10. u

    American Community Survey

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Mar 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/e9452dfe-44e1-4435-b2ed-77923feb84a2/metadata/FGDC-STD-001-1998.html
    Explore at:
    kml(5), gml(5), zip(1), json(5), csv(5), xls(5), geojson(5), shp(5)Available download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    2016
    Area covered
    West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217, New Mexico
    Description

    A broad and generalized selection of 2012-2016 US Census Bureau 2016 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.

  11. h

    autotrain-data-ethnicity-test_v003

    • huggingface.co
    Updated Apr 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris LeDoux (2023). autotrain-data-ethnicity-test_v003 [Dataset]. https://huggingface.co/datasets/cledoux42/autotrain-data-ethnicity-test_v003
    Explore at:
    Dataset updated
    Apr 9, 2023
    Authors
    Chris LeDoux
    Description

    AutoTrain Dataset for project: ethnicity-test_v003

      Dataset Description
    

    This dataset has been automatically processed by AutoTrain for project ethnicity-test_v003.

      Languages
    

    The BCP-47 code for the dataset's language is unk.

      Dataset Structure
    
    
    
    
    
      Data Instances
    

    A sample from this dataset looks as follows: [ { "image": "<512x512 RGB PIL image>", "target": 1 }, { "image": "<512x512 RGB PIL image>", "target": 3 }]… See the full description on the dataset page: https://huggingface.co/datasets/cledoux42/autotrain-data-ethnicity-test_v003.

  12. u

    American Community Survey

    • gstore.unm.edu
    csv, geojson, gml +5
    Updated Mar 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Earth Data Analysis Center (2020). American Community Survey [Dataset]. https://gstore.unm.edu/apps/rgis/datasets/e0430ebf-d4b7-48c2-8fb8-dbdd0858e807/metadata/FGDC-STD-001-1998.html
    Explore at:
    kml(5), shp(5), gml(5), geojson(5), zip(1), xls(5), json(5), csv(5)Available download formats
    Dataset updated
    Mar 6, 2020
    Dataset provided by
    Earth Data Analysis Center
    Time period covered
    2015
    Area covered
    New Mexico, West Bounding Coordinate -109.05017 East Bounding Coordinate -103.00196 North Bounding Coordinate 37.000293 South Bounding Coordinate 31.33217
    Description

    A broad and generalized selection of 2011-2015 US Census Bureau 2015 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.

  13. d

    Loudoun County 2020 Census Population Patterns by Race and Hispanic or...

    • catalog.data.gov
    • data.virginia.gov
    • +2more
    Updated Nov 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Loudoun County GIS (2025). Loudoun County 2020 Census Population Patterns by Race and Hispanic or Latino Ethnicity [Dataset]. https://catalog.data.gov/dataset/loudoun-county-2020-census-population-patterns-by-race-and-hispanic-or-latino-ethnicity
    Explore at:
    Dataset updated
    Nov 15, 2025
    Dataset provided by
    Loudoun County GIS
    Area covered
    Loudoun County
    Description

    Use this application to view the pattern of concentrations of people by race and Hispanic or Latino ethnicity. Data are provided at the U.S. Census block group level, one of the smallest Census geographies, to provide a detailed picture of these patterns. The data is sourced from the U.S Census Bureau, 2020 Census Redistricting Data (Public Law 94-171) Summary File. Definitions: Definitions of the Census Bureau’s categories are provided below. This interactive map shows patterns for all categories except American Indian or Alaska Native and Native Hawaiian or Other Pacific Islander. The total population countywide for these two categories is small (1,582 and 263 respectively). The Census Bureau uses the following race categories:Population by RaceWhite – A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.Black or African American – A person having origins in any of the Black racial groups of Africa.American Indian or Alaska Native – A person having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment.Asian – A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.Native Hawaiian or Other Pacific Islander – A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.Some Other Race - this category is chosen by people who do not identify with any of the categories listed above. People can identify with more than one race. These people are included in the Two or More Races Hispanic or Latino PopulationThe Hispanic/Latino population is an ethnic group. Hispanic/Latino people may be of any race.Other layers provided in this tool included the Loudoun County Census block groups, towns and Dulles airport, and the Loudoun County 2021 aerial imagery.

  14. f

    Data from: Exploring intergenerational, intra-generational and transnational...

    • brunel.figshare.com
    bin
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Christina Victor; Vanessa Burholt (2023). Exploring intergenerational, intra-generational and transnational patterns of family caring in minority ethnic communities: the example of England and Wales dataset [Dataset]. http://doi.org/10.17633/rd.brunel.7560392.v1
    Explore at:
    binAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    Brunel University London
    Authors
    Christina Victor; Vanessa Burholt
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Wales, England
    Description

    This data set consists of 17 variables that underpin the analysis of the paper entitled Exploring intergenerational, intra-generational and transnational patterns of family caring in minority ethnic communities: the example of England and Wales published in the International Journal of Care and Caring.

    The methodology for the survey is described in the paper.

  15. CalEnviroScreen 4.0 and Race/Ethnicity Analysis

    • data.ca.gov
    • catalog.data.gov
    Updated Oct 10, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    California Office of Environmental Health Hazard Assessment (2024). CalEnviroScreen 4.0 and Race/Ethnicity Analysis [Dataset]. https://data.ca.gov/dataset/calenviroscreen-4-0-and-race-ethnicity-analysis
    Explore at:
    arcgis geoservices rest api, htmlAvailable download formats
    Dataset updated
    Oct 10, 2024
    Dataset authored and provided by
    California Office of Environmental Health Hazard Assessmenthttp://www.oehha.ca.gov/
    Description

    CalEnviroScreen scores represent a combined measure of pollution and the potential vulnerability of a population to the effects of pollution. Like the previous versions, CalEnviroScreen 4.0 does not include indicators of race/ethnicity or age. However, the distribution of the CalEnviroScreen 4.0 cumulative impact scores by race or ethnicity is important. This information can be used to better understand issues related to environmental justice and racial equity in California. CalEPAs racial equity team has released a StoryMap using CalEnviroScreen 3.0 data that examines the connection between racist land use practices of the 1930s and the persistence of environmental injustice. The CalEPA StoryMap, along with this analysis, are examples of information that can be used to better understand issues related to environmental justice and racial equity in California.

  16. t

    Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes

    • data.tempe.gov
    • data-academy.tempe.gov
    • +8more
    Updated May 2, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2022). Race and Ethnicity - ACS 2016-2020 - Tempe Zip Codes [Dataset]. https://data.tempe.gov/datasets/tempegov::race-and-ethnicity-acs-2016-2020-tempe-zip-codes
    Explore at:
    Dataset updated
    May 2, 2022
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2016-2020ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Data Preparation: Data table downloaded and joined with Zip Code boundaries in the City of Tempe.Date of Census update: March 17, 2022National Figures: data.census.gov

  17. V

    Mean income per capita by race/ethnicity by locality ( Per capita income in...

    • data.virginia.gov
    csv
    Updated Feb 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Other (2024). Mean income per capita by race/ethnicity by locality ( Per capita income in the past 12 months in 2019 inflation-adjusted dollars) [Dataset]. https://data.virginia.gov/dataset/mean-income-per-capita-by-race-ethnicity-by-locality-per-capita-income-in-the-past-12-months-in-2019
    Explore at:
    csv(11871)Available download formats
    Dataset updated
    Feb 5, 2024
    Dataset authored and provided by
    Other
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Description

    This table shows the mean income per capita in each locality by the following race/ethicity identifiers: White alone; Black or African American alone; White alone, not Hispanic or Latino; and Hispanic or Latino.

    There is no data returned from the Census api for the following race/ethnicity identifiers: American Indian and Alaska Native Alone; Asian Alone; Native Hawaiian and Other Pacific Islander Alone; Some Other Race Alone; and Two or More Races.

    Information on this dataset from https://censusreporter.org/topics/income/ Table B19301, "Per Capita income", is simply the value for B19313 "Aggregate Income" divided by the total population estimate for the summary geography. This statistic is more or less the 'average' income. Note the potential for misunderstanding: A) the aggregate income is divided among all people, not only those who actually had income, and B) as with any average, outliers (very big earners) can have a disproportionate effect on resulting figure.

    Explanation of value = -666666666 : A '-' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.

  18. t

    Race and Ethnicity - ACS 2018-2022 - Tempe Tracts

    • data.tempe.gov
    • performance.tempe.gov
    • +6more
    Updated Jan 12, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2024). Race and Ethnicity - ACS 2018-2022 - Tempe Tracts [Dataset]. https://data.tempe.gov/datasets/tempegov::race-and-ethnicity-acs-2018-2022-tempe-tracts
    Explore at:
    Dataset updated
    Jan 12, 2024
    Dataset authored and provided by
    City of Tempe
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Description

    This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.This layer is symbolized to show the percent of population that is Hispanic or Latino. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). To view only the census tracts that are predominantly in Tempe, add the expression City is Tempe in the map filter settings.A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2018-2022ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community SurveyData Preparation: Data curated from Esri Living Atlas clipped to Census Tract boundaries that are within or adjacent to the City of Tempe boundaryDate of Census update: December 15, 2023National Figures: data.census.gov

  19. d

    Race and Ethnicity - ACS 2017-2021 - Tempe Tracts

    • datasets.ai
    • performance.tempe.gov
    • +8more
    15, 21, 25, 3, 57, 8
    Updated Jan 13, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    City of Tempe (2023). Race and Ethnicity - ACS 2017-2021 - Tempe Tracts [Dataset]. https://datasets.ai/datasets/race-and-ethnicity-acs-2017-2021-tempe-tracts-a63d5
    Explore at:
    57, 25, 21, 3, 15, 8Available download formats
    Dataset updated
    Jan 13, 2023
    Dataset authored and provided by
    City of Tempe
    Area covered
    Tempe
    Description

    This layer shows population broken down by race and Hispanic origin.

    Data is from US Census American Community Survey (ACS) 5-year estimates.

    This layer is symbolized to show the percent of population that is Hispanic or Latino. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). To view only the census tracts that are predominantly in Tempe, add the expression City is Tempe in the map filter settings.

    A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).

    Vintage: 2017-2021
    ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)
    Data Preparation: Data curated from Esri Living Atlas clipped to Census Tract boundaries that are within or adjacent to the City of Tempe boundary
    Date of Census update: December 8, 2022
    National Figures: data.census.gov

    Additional Census data notes and data processing notes are available at the Esri Living Atlas Layer:
    (Esri's Living Atlas always shows latest data)
    </div>
    

  20. Datasheet3_Assessing disparities through missing race and ethnicity data:...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    pdf
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katelyn M. Banschbach; Jade Singleton; Xing Wang; Sheetal S. Vora; Julia G. Harris; Ashley Lytch; Nancy Pan; Julia Klauss; Danielle Fair; Erin Hammelev; Mileka Gilbert; Connor Kreese; Ashley Machado; Peter Tarczy-Hornoch; Esi M. Morgan (2024). Datasheet3_Assessing disparities through missing race and ethnicity data: results from a juvenile arthritis registry.pdf [Dataset]. http://doi.org/10.3389/fped.2024.1430981.s003
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Katelyn M. Banschbach; Jade Singleton; Xing Wang; Sheetal S. Vora; Julia G. Harris; Ashley Lytch; Nancy Pan; Julia Klauss; Danielle Fair; Erin Hammelev; Mileka Gilbert; Connor Kreese; Ashley Machado; Peter Tarczy-Hornoch; Esi M. Morgan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    IntroductionEnsuring high-quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieving the goal of inclusion of racial and ethnic minorities in scientific research and detecting disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network and assess impact of improved data completion on conclusions drawn from the registry.MethodsThis is a mixed-methods quality improvement study that consisted of five parts, as follows: (1) Identifying baseline missing race and ethnicity data, (2) Surveying current collection and entry, (3) Completing data through audit and feedback cycles, (4) Assessing the impact on outcome measures, and (5) Conducting participant interviews and thematic analysis.ResultsAcross six participating centers, 29% of the patients were missing data on race and 31% were missing data on ethnicity. Of patients missing data, most patients were missing both race and ethnicity. Rates of missingness varied by data entry method (electronic vs. manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared with patients with non-missing race and ethnicity data at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow-up compared with White patients. There was no significant change in odds ratio of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values, which may affect the ability to detect changes in odds ratio of cJADAS ≥5 after completion.ConclusionsAbout one-third of the patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data were not uniformly distributed compared with those with non-missing race and ethnicity data at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Louis Teitelbaum (2023). American Names by Multi-Ethnic/National Origin [Dataset]. https://www.kaggle.com/datasets/louisteitelbaum/american-names-by-multi-ethnic-national-origin
Organization logo

American Names by Multi-Ethnic/National Origin

25,540 Americans, 491 Overlapping Ethnic/National Categories

Explore at:
zip(778154 bytes)Available download formats
Dataset updated
Aug 22, 2023
Authors
Louis Teitelbaum
License

Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically

Area covered
United States
Description

This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.

Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.

Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.

This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.

DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.

Search
Clear search
Close search
Google apps
Main menu