Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.
Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.
Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.
This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.
DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
We provide datasets that that estimate the racial distributions associated with first, middle, and last names in the United States. The datasets cover five racial categories: White, Black, Hispanic, Asian, and Other. The provided data are computed from the voter files of six Southern states -- Alabama, Florida, Georgia, Louisiana, North Carolina, and South Carolina -- that collect race and ethnicity data upon registration. We include seven voter files per state, sourced between 2018 and 2021 from L2, Inc. Together, these states have approximately 36MM individuals who provide self-reported race and ethnicity. The last name datasets includes 338K surnames, while the middle name dictionaries contains 126K middle names and the first name datasets includes 136K first names. For each type of name, we provide a dataset of P(race | name) probabilities and P(name | race) probabilities. We include only names that appear at least 25 times across the 42 (= 7 voter files * 6 states) voter files in our dataset. These data are closely related to the the dataset: "Name Dictionaries for "wru" R Package", https://doi.org/10.7910/DVN/7TRYAC. These are the probabilities used in the latest iteration of the "WRU" package (Khanna et al., 2022) to make probabilistic predictions about the race of individuals, given their names and geolocations.
Facebook
Twitterhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html
The following data set is information obtained about counties in the United States from 2010 through 2019 through the United States Census Bureau. Information described in the data includes the age distributions, the education levels, employment statistics, ethnicity percents, houseold information, income, and other miscellneous statistics. (Values are denoted as -1, if the data is not available)
| Key | List of... | Comment | Example Value |
|---|---|---|---|
| County | String | County name | "Abbeville County" |
| State | String | State name | "SC" |
| Age.Percent 65 and Older | Float | Estimated percentage of population whose ages are equal or greater than 65 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico). | 22.4 |
| Age.Percent Under 18 Years | Float | Estimated percentage of population whose ages are under 18 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico). | 19.8 |
| Age.Percent Under 5 Years | Float | Estimated percentage of population whose ages are under 5 years old are produced for the United States states and counties as well as for the Commonwealth of Puerto Rico and its municipios (county-equivalents for Puerto Rico). | 4.7 |
| Education.Bachelor's Degree or Higher | Float | Percentage for the people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019. | 15.6 |
| Education.High School or Higher | Float | Percentage of people whose highest degree was a high school diploma or its equivalent people who attended college but did not receive a degree and people who received an associate's bachelor's master's or professional or doctorate degree. These data include only persons 25 years old and over. The percentages are obtained by dividing the counts of graduates by the total number of persons 25 years old and over. Tha data is collected from 2015 to 2019 | 81.7 |
| Employment.Nonemployer Establishments | Integer | An establishment is a single physical location at which business is conducted or where services or industrial operations are performed. It is not necessarily identical with a company or enterprise which may consist of one establishment or more. The data was collected from 2018. | 1416 |
| Ethnicities.American Indian and Alaska Native Alone | Float | Estimated percentage of population having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment. This category includes people who indicate their race as "American Indian or Alaska Native" or report entries such as Navajo Blackfeet Inupiat Yup'ik or Central American Indian groups or South American Indian groups. | 0.3 |
| Ethnicities.Asian Alone | Float | Estimated percentage of population having origins in any of the original peoples of the Far East Southeast Asia or the Indian subcontinent including for example Cambodia China India Japan Korea Malaysia Pakistan the Philippine Islands Thailand and Vietnam. This includes people who reported detailed Asian responses such as: "Asian Indian " "Chinese " "Filipino " "Korean " "Japanese " "Vietnamese " and "Other Asian" or provide other detailed Asian responses. | 0.4 |
| Ethnicities.Black Alone | Float | Estimated percentage of population having origins in any of the Black racial groups of Africa. It includes people who indicate their race as "Black or African American " or report entries such as African American Kenyan Nigerian or Haitian. | 27.6 |
| Ethnicities.Hispanic or Latino | Float |
Facebook
TwitterThese data are part of NACJD's Fast Track Release and are distributed as they were received from the data depositor. The files have been zipped by NACJD for release, but not checked or processed except for the removal of direct identifiers. Users should refer to the accompanying readme file for a brief description of the files available with this collection and consult the investigator(s) if further information is needed. This study examined differences in youth's mental health and substance abuse needs in seven different racial/ethnic groups of justice-involved youth. Using de-identified data from the Survey of Youth in Residential Placement (SYRP), it was assessed whether differences in mental health and substance abuse needs and services existed in a racially/ethnically diverse sample of youth in custody. Data came from a nationally representative sample of 7,073 youth in residential placements across 36 states, representing five program types. An examination of the extent to which there were racial/ethnic disparities in the delivery of services in relation to need was also conducted. This examination included assessing the differences in substance-related problems, availability of substance services, and receipt of substance-specific counseling. One SAS data file (syrp2017.sas7bdat) is included as part of this collection and has 138 variables for 7073 cases, with demographic variables on youth age, sex, race and ethnicity. Also included as part of the data collection are two SAS Program (syntax) files for use in secondary analysis of youth mental health and substance use.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset was developed by the Research & Analytics Group at the Atlanta Regional Commission using data from the U.S. Census Bureau.For a deep dive into the data model including every specific metric, see the Infrastructure Manifest. The manifest details ARC-defined naming conventions, field names/descriptions and topics, summary levels; source tables; notes and so forth for all metrics.Naming conventions:Prefixes: None Countp Percentr Ratem Mediana Mean (average)t Aggregate (total)ch Change in absolute terms (value in t2 - value in t1)pch Percent change ((value in t2 - value in t1) / value in t1)chp Change in percent (percent in t2 - percent in t1)s Significance flag for change: 1 = statistically significant with a 90% CI, 0 = not statistically significant, blank = cannot be computed Suffixes: _e19 Estimate from 2014-19 ACS_m19 Margin of Error from 2014-19 ACS_00_v19 Decennial 2000, re-estimated to 2019 geography_00_19 Change, 2000-19_e10_v19 2006-10 ACS, re-estimated to 2019 geography_m10_v19 Margin of Error from 2006-10 ACS, re-estimated to 2019 geography_e10_19 Change, 2010-19The user should note that American Community Survey data represent estimates derived from a surveyed sample of the population, which creates some level of uncertainty, as opposed to an exact measure of the entire population (the full census count is only conducted once every 10 years and does not cover as many detailed characteristics of the population). Therefore, any measure reported by ACS should not be taken as an exact number – this is why a corresponding margin of error (MOE) is also given for ACS measures. The size of the MOE relative to its corresponding estimate value provides an indication of confidence in the accuracy of each estimate. Each MOE is expressed in the same units as its corresponding measure; for example, if the estimate value is expressed as a number, then its MOE will also be a number; if the estimate value is expressed as a percent, then its MOE will also be a percent. The user should also note that for relatively small geographic areas, such as census tracts shown here, ACS only releases combined 5-year estimates, meaning these estimates represent rolling averages of survey results that were collected over a 5-year span (in this case 2015-2019). Therefore, these data do not represent any one specific point in time or even one specific year. For geographic areas with larger populations, 3-year and 1-year estimates are also available. For further explanation of ACS estimates and margin of error, visit Census ACS website.Source: U.S. Census Bureau, Atlanta Regional CommissionDate: 2015-2019Data License: Creative Commons Attribution 4.0 International (CC by 4.0)Link to the manifest: https://www.arcgis.com/sharing/rest/content/items/3d489c725bb24f52a987b302147c46ee/data
Facebook
TwitterA broad and generalized selection of 2014-2018 US Census Bureau 2018 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
Facebook
TwitterExample of assigning ethnic class using Ethnicity Estimator.
Facebook
TwitterNote: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
Facebook
TwitterA broad and generalized selection of 2013-2017 US Census Bureau 2017 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
Facebook
TwitterA broad and generalized selection of 2012-2016 US Census Bureau 2016 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
Facebook
TwitterAutoTrain Dataset for project: ethnicity-test_v003
Dataset Description
This dataset has been automatically processed by AutoTrain for project ethnicity-test_v003.
Languages
The BCP-47 code for the dataset's language is unk.
Dataset Structure
Data Instances
A sample from this dataset looks as follows: [ { "image": "<512x512 RGB PIL image>", "target": 1 }, { "image": "<512x512 RGB PIL image>", "target": 3 }]… See the full description on the dataset page: https://huggingface.co/datasets/cledoux42/autotrain-data-ethnicity-test_v003.
Facebook
TwitterA broad and generalized selection of 2011-2015 US Census Bureau 2015 5-year American Community Survey race, ethnicity and citizenship data estimates, obtained via Census API and joined to the appropriate geometry (in this case, New Mexico counties). The selection, while not comprehensive, provides a first-level characterization of the race and/or ethnicity of populations in New Mexico, along with citizenship status and nativity. The determination of which estimates to include was based upon level of interest and providing a manageable dataset for users. The U.S. Census Bureau's American Community Survey (ACS) is a nationwide, continuous survey designed to provide communities with reliable and timely demographic, housing, social, and economic data every year. The ACS collects long-form-type information throughout the decade rather than only once every 10 years. As in the decennial census, strict confidentiality laws protect all information that could be used to identify individuals or households.The ACS combines population or other data from multiple years to produce reliable numbers for small counties, neighborhoods, and other local areas. To provide information for communities each year, the ACS provides 1-, 3-, and 5-year estimates. ACS 5-year estimates (multiyear estimates) are “period” estimates that represent data collected over a 60-month period of time (as opposed to “point-in-time” estimates, such as the decennial census, that approximate the characteristics of an area on a specific date). ACS data are released in the year immediately following the year in which they are collected. ACS estimates based on data collected from 2009–2014 should not be called “2009” or “2014” estimates. Multiyear estimates should be labeled to indicate clearly the full period of time. The primary advantage of using multiyear estimates is the increased statistical reliability of the data for less populated areas and small population subgroups. Data are based on a sample and are subject to sampling variability. The degree of uncertainty for an estimate arising from sampling variability is represented through the use of a margin of error. While each full Data Profile contains margin of error (MOE) information, this dataset does not. Those individuals requiring more complete data are directed to download the more detailed datasets from the ACS American FactFinder website. This dataset is organized by New Mexico county boundaries, based on TIGER/Line Files: shapefiles and related database files (.dbf) that are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database.
Facebook
TwitterUse this application to view the pattern of concentrations of people by race and Hispanic or Latino ethnicity. Data are provided at the U.S. Census block group level, one of the smallest Census geographies, to provide a detailed picture of these patterns. The data is sourced from the U.S Census Bureau, 2020 Census Redistricting Data (Public Law 94-171) Summary File. Definitions: Definitions of the Census Bureau’s categories are provided below. This interactive map shows patterns for all categories except American Indian or Alaska Native and Native Hawaiian or Other Pacific Islander. The total population countywide for these two categories is small (1,582 and 263 respectively). The Census Bureau uses the following race categories:Population by RaceWhite – A person having origins in any of the original peoples of Europe, the Middle East, or North Africa.Black or African American – A person having origins in any of the Black racial groups of Africa.American Indian or Alaska Native – A person having origins in any of the original peoples of North and South America (including Central America) and who maintains tribal affiliation or community attachment.Asian – A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.Native Hawaiian or Other Pacific Islander – A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.Some Other Race - this category is chosen by people who do not identify with any of the categories listed above. People can identify with more than one race. These people are included in the Two or More Races Hispanic or Latino PopulationThe Hispanic/Latino population is an ethnic group. Hispanic/Latino people may be of any race.Other layers provided in this tool included the Loudoun County Census block groups, towns and Dulles airport, and the Loudoun County 2021 aerial imagery.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set consists of 17 variables that underpin the analysis of the paper entitled Exploring intergenerational, intra-generational and transnational patterns of family caring in minority ethnic communities: the example of England and Wales published in the International Journal of Care and Caring.
The methodology for the survey is described in the paper.
Facebook
TwitterCalEnviroScreen scores represent a combined measure of pollution and the potential vulnerability of a population to the effects of pollution. Like the previous versions, CalEnviroScreen 4.0 does not include indicators of race/ethnicity or age. However, the distribution of the CalEnviroScreen 4.0 cumulative impact scores by race or ethnicity is important. This information can be used to better understand issues related to environmental justice and racial equity in California. CalEPAs racial equity team has released a StoryMap using CalEnviroScreen 3.0 data that examines the connection between racist land use practices of the 1930s and the persistence of environmental injustice. The CalEPA StoryMap, along with this analysis, are examples of information that can be used to better understand issues related to environmental justice and racial equity in California.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2016-2020ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community Survey Data Preparation: Data table downloaded and joined with Zip Code boundaries in the City of Tempe.Date of Census update: March 17, 2022National Figures: data.census.gov
Facebook
TwitterOpen Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
This table shows the mean income per capita in each locality by the following race/ethicity identifiers: White alone; Black or African American alone; White alone, not Hispanic or Latino; and Hispanic or Latino.
There is no data returned from the Census api for the following race/ethnicity identifiers: American Indian and Alaska Native Alone; Asian Alone; Native Hawaiian and Other Pacific Islander Alone; Some Other Race Alone; and Two or More Races.
Information on this dataset from https://censusreporter.org/topics/income/ Table B19301, "Per Capita income", is simply the value for B19313 "Aggregate Income" divided by the total population estimate for the summary geography. This statistic is more or less the 'average' income. Note the potential for misunderstanding: A) the aggregate income is divided among all people, not only those who actually had income, and B) as with any average, outliers (very big earners) can have a disproportionate effect on resulting figure.
Explanation of value = -666666666 : A '-' entry in the estimate column indicates that either no sample observations or too few sample observations were available to compute an estimate, or a ratio of medians cannot be calculated because one or both of the median estimates falls in the lowest interval or upper interval of an open-ended distribution.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This layer shows population broken down by race and Hispanic origin. Data is from US Census American Community Survey (ACS) 5-year estimates.This layer is symbolized to show the percent of population that is Hispanic or Latino. To see the full list of attributes available in this service, go to the "Data" tab, and choose "Fields" at the top right (in ArcGIS Online). To view only the census tracts that are predominantly in Tempe, add the expression City is Tempe in the map filter settings.A ‘Null’ entry in the estimate indicates that data for this geographic area cannot be displayed because the number of sample cases is too small (per the U.S. Census).Vintage: 2018-2022ACS Table(s): B03002 (Not all lines of this ACS table are available in this feature layer.)Data downloaded from: Census Bureau's API for American Community SurveyData Preparation: Data curated from Esri Living Atlas clipped to Census Tract boundaries that are within or adjacent to the City of Tempe boundaryDate of Census update: December 15, 2023National Figures: data.census.gov
Facebook
Twitter</div>
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
IntroductionEnsuring high-quality race and ethnicity data within the electronic health record (EHR) and across linked systems, such as patient registries, is necessary to achieving the goal of inclusion of racial and ethnic minorities in scientific research and detecting disparities associated with race and ethnicity. The project goal was to improve race and ethnicity data completion within the Pediatric Rheumatology Care Outcomes Improvement Network and assess impact of improved data completion on conclusions drawn from the registry.MethodsThis is a mixed-methods quality improvement study that consisted of five parts, as follows: (1) Identifying baseline missing race and ethnicity data, (2) Surveying current collection and entry, (3) Completing data through audit and feedback cycles, (4) Assessing the impact on outcome measures, and (5) Conducting participant interviews and thematic analysis.ResultsAcross six participating centers, 29% of the patients were missing data on race and 31% were missing data on ethnicity. Of patients missing data, most patients were missing both race and ethnicity. Rates of missingness varied by data entry method (electronic vs. manual). Recovered data had a higher percentage of patients with Other race or Hispanic/Latino ethnicity compared with patients with non-missing race and ethnicity data at baseline. Black patients had a significantly higher odds ratio of having a clinical juvenile arthritis disease activity score (cJADAS10) of ≥5 at first follow-up compared with White patients. There was no significant change in odds ratio of cJADAS10 ≥5 for race and ethnicity after data completion. Patients missing race and ethnicity were more likely to be missing cJADAS values, which may affect the ability to detect changes in odds ratio of cJADAS ≥5 after completion.ConclusionsAbout one-third of the patients in a pediatric rheumatology registry were missing race and ethnicity data. After three audit and feedback cycles, centers decreased missing data by 94%, primarily via data recovery from the EHR. In this sample, completion of missing data did not change the findings related to differential outcomes by race. Recovered data were not uniformly distributed compared with those with non-missing race and ethnicity data at baseline, suggesting that differences in outcomes after completing race and ethnicity data may be seen with larger sample sizes.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This dataset includes all personal names listed in the Wikipedia category “American people by ethnic or national origin” and all subcategories fitting the pattern “American People of [ ] descent”, in total more than 25,000 individuals. Each individual is represented by a row, with columns indicating binary membership (0/1) in each ethnic/national category.
Ethnicity inference is an essential tool for identifying disparities in public health and social sciences. Existing datasets linking personal names to ethnic or national origin often neglect to recognize multi-ethnic or multi-national identities. Furthermore, existing datasets use coarse classification schemes (e.g. classifying both Indian and Japanese people as “Asian”) that may not be suitable for many research questions. This dataset remedies these problems by including both very fine-grain ethnic/national categories (e.g. Afghan-Jewish) and more broad ones (e.g. European). Users can chose the categories that are relevant to their research. Since many Americans on Wikipedia are associated with multiple overlapping or distinct ethnicities/nationalities, these multi-ethnic associations are also reflected in the data.
Data were obtained from the Wikipedia API and reviewed manually to remove stage names, pen names, mononyms, first initials (when full names are available on Wikipedia), nicknames, honorific titles, and pages that correspond to a group or event rather than an individual.
This dataset was designed for use in training classification algorithms, but may also be independently interesting inasmuch as it is a representative sample of Americans who are famous enough to have their own Wikipedia page, along with detailed information on their ethnic/national origins.
DISCLAIMER: Due to the incomplete nature of Wikipedia, data may not properly reflect all ethnic national associations for any given individual. For example, there is no guarantee that a given Cuban Jewish person will be listed in both the “American People of Cuban descent” and the “American People of Jewish descent” categories.