This ethnicity dataset (GREG) is a digital version of the paper Soviet Narodov Mira atlas created in 1964. In 2010 the GREG (Geo-referencing of ethnic groups) project, used maps and data drawn from the Narodov Mira atlas to create a GIS (Geographic Information Systems) version of the atlas (2010). ETH ZurichFirst developed by G.P. Murdock in the 1940s, is an ethnographic classification system on human behavior, social life and customs, material culture, and human-ecological environments (2003). University of California
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Globe by race. It includes the population of Globe across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Globe across relevant racial categories.
Key observations
The percent distribution of Globe population by race (across all racial categories recognized by the U.S. Census Bureau): 58.09% are white, 2.70% are Black or African American, 5.26% are American Indian and Alaska Native, 2.92% are Asian, 0.12% are Native Hawaiian and other Pacific Islander, 11.37% are some other race and 19.54% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Globe Population by Race & Ethnicity. You can refer the same here
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset includes San Francisco COVID-19 tests by race/ethnicity and by date. This dataset represents the daily count of tests collected, and the breakdown of test results (positive, negative, or indeterminate). Tests in this dataset include all those collected from persons who listed San Francisco as their home address at the time of testing. It also includes tests that were collected by San Francisco providers for persons who were missing a locating address. This dataset does not include tests for residents listing a locating address outside of San Francisco, even if they were tested in San Francisco.
The data were de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected). If a person tested multiple times on the same date, only one test is included from that date. When there are multiple tests on the same date, a positive result, if one exists, will always be selected as the record for the person. If a PCR and antigen test are taken on the same day, the PCR test will supersede. If a person tests multiple times on the same day and the results are all the same (e.g. all negative or all positive) then the first test done is selected as the record for the person.
The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco.
When a person gets tested for COVID-19, they may be asked to report information about themselves. One piece of information that might be requested is a person's race and ethnicity. These data are often incomplete in the laboratory and provider reports of the test results sent to the health department. The data can be missing or incomplete for several possible reasons:
• The person was not asked about their race and ethnicity.
• The person was asked, but refused to answer.
• The person answered, but the testing provider did not include the person's answers in the reports.
• The testing provider reported the person's answers in a format that could not be used by the health department.
For any of these reasons, a person's race/ethnicity will be recorded in the dataset as “Unknown.”
B. NOTE ON RACE/ETHNICITY The different values for Race/Ethnicity in this dataset are "Asian;" "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" "White;" "Multi-racial;" "Other;" and “Unknown."
The Race/Ethnicity categorization increases data clarity by emulating the methodology used by the U.S. Census in the American Community Survey. Specifically, persons who identify as "Asian," "Black or African American," "American Indian or Alaska Native," "Native Hawaiian or Other Pacific Islander," "White," "Multi-racial," or "Other" do NOT include any person who identified as Hispanic/Latino at any time in their testing reports that either (1) identified them as SF residents or (2) as someone who tested without a locating address by an SF provider. All persons across all races who identify as Hispanic/Latino are recorded as “"Hispanic or Latino/a, all races." This categorization increases data accuracy by correcting the way “Other” persons were counted. Previously, when a person reported “Other” for Race/Ethnicity, they would be recorded “Unknown.” Under the new categorization, they are counted as “Other” and are distinct from “Unknown.”
If a person records their race/ethnicity as “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other” for their first COVID-19 test, then this data will not change—even if a different race/ethnicity is reported for this person for any future COVID-19 test. There are two exceptions to this rule. The first exception is if a person’s race/ethnicity value is reported as “Unknown” on their first test and then on a subsequent test they report “Asian;” "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" or "White”, then this subsequent reported race/ethnicity will overwrite the previous recording of “Unknown”. If a person has only ever selected “Unknown” as their race/ethnicity, then it will be recorded as “Unknown.” This change provides more specific and actionable data on who is tested in San Francisco.
The second exception is if a person ever marks “Hispanic or Latino/a, all races” for race/ethnicity then this choice will always overwrite any previous or future response. This is because it is an overarching category that can include any and all other races and is mutually exclusive with the other responses.
A person's race/ethnicity will be recorded as “Multi-racial” if they select two or more values among the following choices: “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other.” If a person selects a combination of two or more race/ethnicity answers that includes “Hispanic or Latino/a, all races” then they will still be recorded as “Hispanic or Latino/a, all races”—not as “Multi-racial.”
C. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information.
D. UPDATE PROCESS Updates automatically at 5:00AM Pacific Time each day. Redundant runs are scheduled at 7:00AM and 9:00AM in case of pipeline failure.
E. HOW TO USE THIS DATASET San Francisco population estimates for race/ethnicity can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24, 2020 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.
In order to track trends over time, a user can analyze this data by sorting or filtering by the "specimen_collection_date" field.
Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of percent positive. When there are fewer than 20 positives tests for a given race/ethnicity and time period, the positivity rate is not calculated for the public tracker because rates of small test counts are less reliable.
Calculating Testing Rates: To calculate the testing rate per 10,000 residents, divide the total number of tests collected (positive, negative, and indeterminate results) for the specified race/ethnicity by the total number of residents who identify as that race/ethnicity (according to the 2016-2020 American Community Survey (ACS) population estimate), then multiply by 10,000. When there are fewer than 20 total tests for a given race/ethnicity and time period, the testing rate is not calculated for the public tracker because rates of small test counts are less reliable.
Read more about how this data is updated and validated daily: https://sf.gov/information/covid-19-data-questions
F. CHANGE LOG
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Earth by race. It includes the population of Earth across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Earth across relevant racial categories.
Key observations
The percent distribution of Earth population by race (across all racial categories recognized by the U.S. Census Bureau): 60.83% are white, 3.52% are Black or African American, 4.59% are American Indian and Alaska Native, 2.77% are some other race and 28.28% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Earth Population by Race & Ethnicity. You can refer the same here
A computerized data set of demographic, economic and social data for 227 countries of the world. Information presented includes population, health, nutrition, mortality, fertility, family planning and contraceptive use, literacy, housing, and economic activity data. Tabular data are broken down by such variables as age, sex, and urban/rural residence. Data are organized as a series of statistical tables identified by country and table number. Each record consists of the data values associated with a single row of a given table. There are 105 tables with data for 208 countries. The second file is a note file, containing text of notes associated with various tables. These notes provide information such as definitions of categories (i.e. urban/rural) and how various values were calculated. The IDB was created in the U.S. Census Bureau''s International Programs Center (IPC) to help IPC staff meet the needs of organizations that sponsor IPC research. The IDB provides quick access to specialized information, with emphasis on demographic measures, for individual countries or groups of countries. The IDB combines data from country sources (typically censuses and surveys) with IPC estimates and projections to provide information dating back as far as 1950 and as far ahead as 2050. Because the IDB is maintained as a research tool for IPC sponsor requirements, the amount of information available may vary by country. As funding and research activity permit, the IPC updates and expands the data base content. Types of data include: * Population by age and sex * Vital rates, infant mortality, and life tables * Fertility and child survivorship * Migration * Marital status * Family planning Data characteristics: * Temporal: Selected years, 1950present, projected demographic data to 2050. * Spatial: 227 countries and areas. * Resolution: National population, selected data by urban/rural * residence, selected data by age and sex. Sources of data include: * U.S. Census Bureau * International projects (e.g., the Demographic and Health Survey) * United Nations agencies Links: * ICPSR: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/08490
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Black Earth population by race and ethnicity. The dataset can be utilized to understand the racial distribution of Black Earth.
The dataset will have the following datasets when applicable
Please note that in case when either of Hispanic or Non-Hispanic population doesnt exist, the respective dataset will not be available (as there will not be a population subset applicable for the same)
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
By Jonathan Ortiz [source]
This College Completion dataset provides an invaluable insight into the success and progress of college students in the United States. It contains graduation rates, race and other data to offer a comprehensive view of college completion in America. The data is sourced from two primary sources – the National Center for Education Statistics (NCES)’ Integrated Postsecondary Education System (IPEDS) and Voluntary System of Accountability’s Student Success and Progress rate.
At four-year institutions, the graduation figures come from IPEDS for first-time, full-time degree seeking students at the undergraduate level, who entered college six years earlier at four-year institutions or three years earlier at two-year institutions. Furthermore, colleges report how many students completed their program within 100 percent and 150 percent of normal time which corresponds with graduation within four years or six year respectively. Students reported as being of two or more races are included in totals but not shown separately
When analyzing race and ethnicity data NCES have classified student demographics since 2009 into seven categories; White non-Hispanic; Black non Hispanic; American Indian/ Alaskan native ; Asian/ Pacific Islander ; Unknown race or ethnicity ; Non resident with two new categorize Native Hawaiian or Other Pacific Islander combined with Asian plus students belonging to several races. Also worth noting is that different classifications for graduate data stemming from 2008 could be due to variations in time frame examined & groupings used by particular colleges – those who can’t be identified from National Student Clearinghouse records won’t be subjected to penalty by these locations .
When it comes down to efficiency measures parameters like “Awards per 100 Full Time Undergraduate Students which includes all undergraduate completions reported by a particular institution including associate degrees & certificates less than 4 year programme will assist us here while we also take into consideration measures like expenditure categories , Pell grant percentage , endowment values , average student aid amounts & full time faculty members contributing outstandingly towards instructional research / public service initiatives .
When trying to quantify outcomes back up Median Estimated SAT score metric helps us when it is derived either on 25th percentile basis / 75th percentile basis with all these factors further qualified by identifying required criteria meeting 90% threshold when incoming students are considered for relevance . Last but not least , Average Student Aid equalizes amount granted by institution dividing same over total sum received against what was allotted that particular year .
All this analysis gives an opportunity get a holistic overview about performance , potential deficits &
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains data on student success, graduation rates, race and gender demographics, an efficiency measure to compare colleges across states and more. It is a great source of information to help you better understand college completion and student success in the United States.
In this guide we’ll explain how to use the data so that you can find out the best colleges for students with certain characteristics or focus on your target completion rate. We’ll also provide some useful tips for getting the most out of this dataset when seeking guidance on which institutions offer the highest graduation rates or have a good reputation for success in terms of completing programs within normal timeframes.
Before getting into specifics about interpreting this dataset, it is important that you understand that each row represents information about a particular institution – such as its state affiliation, level (two-year vs four-year), control (public vs private), name and website. Each column contains various demographic information such as rate of awarding degrees compared to other institutions in its sector; race/ethnicity Makeup; full-time faculty percentage; median SAT score among first-time students; awards/grants comparison versus national average/state average - all applicable depending on institution location — and more!
When using this dataset, our suggestion is that you begin by forming a hypothesis or research question concerning student completion at a given school based upon observable characteristics like financ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Black Earth town by race. It includes the population of Black Earth town across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Black Earth town across relevant racial categories.
Key observations
The percent distribution of Black Earth town population by race (across all racial categories recognized by the U.S. Census Bureau): 95.17% are white, 2.42% are Asian and 2.42% are multiracial.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Black Earth town Population by Race & Ethnicity. You can refer the same here
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/X88LYHhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.2/customlicense?persistentId=doi:10.7910/DVN/X88LYH
Although many countries have ethnic kin on the “wrong side” of their borders, few seek to annex foreign territories on the basis of ethnicity. This article examines why some states pursue irredentism, whereas others exhibit restraint. It focuses on the triadic structure of the kin group in the irredentist state, its co-ethnic enclave and the host state, and provides new data on all actual and potential irredentist cases from 1946 - 2014. The results indicate that irredentism is more likely when the kin group is near economic parity with other groups in its own state, which results in status inconsistency and engenders grievances. It is also more likely in more ethnically homogeneous countries with winner-take-all majoritarian systems where the kin group does not need to moderate its policy to win elections by attracting other groups. These conditions generate both the grievance and opportunity for kin groups to pursue irredentism.
Why do countries welcome some refugees and treat others poorly? Existing explanations suggest that the assistance refugees receive is a reflection of countries’ wealth or compassion. However, statistical analysis of a global dataset on asylum admissions shows that states’ approaches to refugees are shaped by foreign policy and ethnic politics. States admit refugees from adversaries in order to weaken those regimes, but they are reluctant to accept refugees from friendly states. At the same time, policymakers favour refugee groups who share their ethnic identity. Aside from addressing a puzzling real-world phenomenon, this article adds insights to the literature on the politics of migration and asylum.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about artists. It has 1 row and is filtered where the artworks is "All Races are Here, All the Lands of the Earth Make Contributions Here" - Walt Whitman. It features 9 columns including birth date, death date, country, and gender.
The 10,000 Worlds Employee Dataset is a comprehensive dataset designed for analyzing workforce trends, employee performance, and organizational dynamics within a large-scale company setting. This dataset contains information on 10,000 employees, spanning various departments, roles, and experience levels. It is ideal for research in human resource analytics, machine learning applications in employee retention, performance prediction, and diversity analysis.
Key Features of the Dataset: Employee Demographics:
Age, gender, ethnicity Education level, degree specialization Years of experience Employment Details:
Department (e.g., HR, Engineering, Marketing) Job title and seniority level Employment type (full-time, part-time, contract) Performance & Productivity Metrics:
Annual performance ratings Work hours, overtime details Training programs attended Compensation & Benefits:
Salary, bonuses, stock options Benefits (healthcare, pension plans, remote work options) Employee Engagement & Retention:
Job satisfaction scores Attrition and turnover rates Promotion history and career growth Workplace Environment Factors:
Team collaboration metrics Employee feedback and survey results Work-life balance indicators Use Cases: HR Analytics: Identifying patterns in employee satisfaction, retention, and performance. Predictive Modeling: Forecasting attrition risks and promotion likelihoods. Diversity & Inclusion Analysis: Understanding representation across departments. Compensation Benchmarking: Comparing salaries and benefits within and across industries. This dataset is highly valuable for data scientists, HR professionals, and business analysts looking to gain insights into workforce dynamics and improve organizational strategies.
Would you like any additional details or a sample schema for the dataset?
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
According to the Wikipedia, an ultramarathon, also called ultra distance or ultra running, is any footrace longer than the traditional marathon length of 42.195 kilometres (26 mi 385 yd). Various distances are raced competitively, from the shortest common ultramarathon of 31 miles (50 km) to over 200 miles (320 km). 50k and 100k are both World Athletics record distances, but some 100 miles (160 km) races are among the oldest and most prestigious events, especially in North America.}
The data in this file is a large collection of ultra-marathon race records registered between 1798 and 2022 (a period of well over two centuries) being therefore a formidable long term sample. All data was obtained from public websites.
Despite the original data being of public domain, the race records, which originally contained the athlete´s names, have been anonymized to comply with data protection laws and to preserve the athlete´s privacy. However, a column Athlete ID has been created with a numerical ID representing each unique runner (so if Antonio Fernández participated in 5 races over different years, then the corresponding race records now hold his unique Athlete ID instead of his name). This way I have preserved valuable information.
The dataset contains 7,461,226 ultra-marathon race records from 1,641,168 unique athletes.
The following columns (with data types) are included:
The Event name column include country location information that can be derived to a new column, and similarly seasonal information can be found in the Event dates column beyond the Year of event (these can be extracted with a bit of processing).
The Event distance/length column describes the type of race, covering the most popular UM race distances and lengths, and some other specific modalities (multi-day, etc.):
Additionally, there is information of age, gender and speed (in km/h) in other columns.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘🗳 VEP Turnout’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/vep-turnoute on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Files:
National level
- U.S. VEP Turnout 1789-Present-Statistics - The complete time series of national presidential and midterm general election turnout rates from 1787-present.
National and state level
- 1980-2014 November General Election - Turnout Rates
- 2016 November General Election - Turnout Rates
- 2018 November General Election - Turnout Rates
- 2020 November General Election - Turnout Rates
Turnout rates by demographic breakdown, 1986-2018, from the Census Bureau's Current Population Survey, November Voting and Registration Supplement (or CPS for short). These tables are corrected for vote overreporting bias. For uncorrected weights see the source link.
- Turnout Rate 1986-2018 by Age
- Turnout Rate 1986-2018 by Education
- Turnout Rate 1986-2018 by Race and Ethnicity
For more information on these files see the source link below.
Source: Data prepared and maintained by Dr. Michael P. McDonald at the University of Florida, at electproject.org
Updated: synced from source weekly
License: CC-BY
This dataset was created by Government and contains around 100 samples along with Unnamed: 7, Denominators, technical information and other features such as: - Unnamed: 4 - Unnamed: 5 - and more.
- Analyze Unnamed: 16 in relation to Unnamed: 14
- Study the influence of Unnamed: 12 on Unnamed: 9
- More datasets
If you use this dataset in your research, please credit Government
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘😷 NYC Leading Causes of Death’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/nyc-leading-causes-of-deathe on 13 February 2022.
--- Dataset description provided by original source is as follows ---
NYC Leading Causes of Death Data
Rows: 3840; Columns: 6
The data includes items, such as:
- Year
- Ethnicity
- Sex
- Cause of Death
- Count
- Percent
Source: NYC Open Data
https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dtam
This dataset was created by Data Society and contains around 4000 samples along with Ethnicity, Sex, technical information and other features such as: - Percent - Count - and more.
- Analyze Cause Of Death in relation to Year
- Study the influence of Ethnicity on Sex
- More datasets
If you use this dataset in your research, please credit Data Society
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of White Earth by race. It includes the population of White Earth across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of White Earth across relevant racial categories.
Key observations
The percent distribution of White Earth population by race (across all racial categories recognized by the U.S. Census Bureau): 100% are white.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2018-2022 5-Year Estimates.
Racial categories include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for White Earth Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The MELD dataset (Multilingual Ethnic Language Dataset) is designed to address the severe under-representation of ethnic languages in computational linguistics and natural language processing (NLP). It includes transliterated text samples from Chakma, Garo, and Marma, alongside Standard Bengali and English, collected to reflect real-world use. This dataset provides valuable linguistic insights into low-resource and endangered languages written in the Bengali script.
The data was gathered through a rigorous process of interviews with native speakers, written contributions, and manual transliteration into Bengali alphabets. With 3046 annotated sentences, it highlights the unique linguistic patterns of ethnic communities who use Bengali script to write their native languages, especially on social media. The dataset is suitable for tasks like language identification, machine translation, and sentiment analysis.
By enabling NLP researchers and linguists to develop tools for language processing, the dataset aims to foster inclusive technology development while promoting cultural preservation. Its applications include building language identification models, creating translation systems, and supporting the study of linguistic diversity. Researchers are encouraged to use MELD for advancing computational research in low-resource and ethnic languages.
Words: 4529 Sentence: 808
Words: 1680 Sentence: 314
Words: 1244 Sentence: 292
Words: 4380 Sentence: 816
Words: 4883 Sentence: 816
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘2001- 2013 Graduation Outcomes District- ALL STUDENTS,SWD,GENDER,ELL,ETHNICITY,EVER ELL’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/221cfcfe-831f-419a-8a60-2f8514460a0a on 26 January 2022.
--- Dataset description provided by original source is as follows ---
The New York State calculation method was first adopted for the Cohort of 2001 (Class of 2005). The cohort consists of all students who first entered 9th grade in a given school year (e.g., the Cohort of 2006 entered 9th grade in the school year). Graduates are defined as those students earning either a Local or Regents diploma.
--- Original source retains full ownership of the source dataset ---
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
By Health [source]
This dataset contains data on the modes of transportation used by California residents aged 16 and older to commute to work. It includes data from the U.S. Census Bureau, Decennial Census and American Community Survey, covering all regions, counties, cities/towns, and census tracts in California. With each region showing detailed information regarding how its population travels to work (modes of transportation used), this dataset provides vital insight into the development of transport infrastructure in California over the past decade.
Unlike other states where private cars constitute an overwhelming majority of daily commuters (over 79% nationwide according to a 2015 survey), Californians have built up varied commuting habits – bicycles are commonly reported 5%, public transit stands at 15%, walking alone 4%, and carpooling is at 11%. Commuting plays a significant role on overall health—active modes such as biking or walking lead to healthier lifestyles that lower heart disease risks, obesity rates, diabetes prevalence; passengers on public transport also have a lower chance of injury in collisions compared with pedestrians or cyclists.
The consequences of inadequate planning for human mobility extend beyond physical health – it can also cause huge disparities between different racial groups such as Native Americans who experience four times higher death rate from pedestrian-car collisions than Whites or Asians; African-Americans and Latinos suffer twice as much as White people do when driving privately in their own cars due to air pollution hazards or lack thereof access to reliable public transportation system that could provide them with healthier alternatives. It is our hope that policymakers will use this dataset prominently stated by the Healthy Communities Data & Indicators Project - part of the Office Of Health Equity - while ensuring every resident’s right for safe mobility no matter their background!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains information on the percent of Californians aged 16 and older who use different modes of transportation to get to work. The data is collected from the U.S. Census Bureau and American Community Survey, and covers all counties, cities/towns and census tracts in California.
In this dataset, there are several columns of data such as mode (mode of transport), race_eth_name (name of the race/ethnicity), region_code (code for the region) and pop_total (total population). This makes it possible to look at relations between transportation choice and demographic factors like gender or ethnicity, or comparison between regions within California regarding commuting habits.
The purpose of this dataset is to provide information on how Californians travel to their jobs with respect to both geographical area as well as demographic characteristics. It allows studies into why certain areas might have higher usage rates for specific types of transport compared with others, how gender affects travel decisions, or which regions have access issues with public transit compared with driving for example.
To use this dataset you should start by familiarizing yourself with descriptive statistics such as percentages, hazard ratios etc., in order to understand each variable's contribution towards commuting trends more effectively. It might also help if you filter data by geographic area or personal characteristics first before performing more detailed analysis for more insightful results that can be used in policy-making when planning effective infrastructure investments related to transportation options over time or among differing populations within California state population levels noted here year-by-year across a decade period provided here
- Creating interactive maps to visualize and compare the transportation methods of different race/ethnicities in California.
- Analyzing the transportation trends across regions, counties, cities/towns, and census tracts to forecast and plan for infrastructure investments.
- Comparing the risk ratio of pedestrian-car fatalities across different ethnic groups in order to address safety issues within underserved populations
If you use this dataset in your research, please credit the original authors. Data Source
**License: [Open Database License (ODbL) v1.0](https://opendatacommons.org/lice...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset tracks annual two or more races student percentage from 2009 to 2023 for Rim Of The World Senior High School vs. California and Rim Of The World Unified School District
This ethnicity dataset (GREG) is a digital version of the paper Soviet Narodov Mira atlas created in 1964. In 2010 the GREG (Geo-referencing of ethnic groups) project, used maps and data drawn from the Narodov Mira atlas to create a GIS (Geographic Information Systems) version of the atlas (2010). ETH ZurichFirst developed by G.P. Murdock in the 1940s, is an ethnographic classification system on human behavior, social life and customs, material culture, and human-ecological environments (2003). University of California