Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a source dataset for a Let's Get Healthy California indicator at https://letsgethealthy.ca.gov/. Infant Mortality is defined as the number of deaths in infants under one year of age per 1,000 live births. Infant mortality is often used as an indicator to measure the health and well-being of a community, because factors affecting the health of entire populations can also impact the mortality rate of infants. Although California’s infant mortality rate is better than the national average, there are significant disparities, with African American babies dying at more than twice the rate of other groups. Data are from the Birth Cohort Files. The infant mortality indicator computed from the birth cohort file comprises birth certificate information on all births that occur in a calendar year (denominator) plus death certificate information linked to the birth certificate for those infants who were born in that year but subsequently died within 12 months of birth (numerator). Studies of infant mortality that are based on information from death certificates alone have been found to underestimate infant death rates for infants of all race/ethnic groups and especially for certain race/ethnic groups, due to problems such as confusion about event registration requirements, incomplete data, and transfers of newborns from one facility to another for medical care. Note there is a separate data table "Infant Mortality by Race/Ethnicity" which is based on death records only, which is more timely but less accurate than the Birth Cohort File. Single year shown to provide state-level data and county totals for the most recent year. Numerator: Infants deaths (under age 1 year). Denominator: Live births occurring to California state residents. Multiple years aggregated to allow for stratification at the county level. For this indicator, race/ethnicity is based on the birth certificate information, which records the race/ethnicity of the mother. The mother can “decline to state”; this is considered to be a valid response. These responses are not displayed on the indicator visualization.
Facebook
TwitterTHIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Mass Shootings in the United States of America (1966-2017) The US has witnessed 398 mass shootings in last 50 years that resulted in 1,996 deaths and 2,488 injured. The latest and the worst mass shooting of October 2, 2017 killed 58 and injured 515 so far. The number of people injured in this attack is more than the number of people injured in all mass shootings of 2015 and 2016 combined. The average number of mass shootings per year is 7 for the last 50 years that would claim 39 lives and 48 injured per year.
Geography: United States of America
Time period: 1966-2017
Unit of analysis: Mass Shooting Attack
Dataset: The dataset contains detailed information of 398 mass shootings in the United States of America that killed 1996 and injured 2488 people.
Variables: The dataset contains Serial No, Title, Location, Date, Summary, Fatalities, Injured, Total Victims, Mental Health Issue, Race, Gender, and Lat-Long information.
I’ve consulted several public datasets and web pages to compile this data. Some of the major data sources include Wikipedia, Mother Jones, Stanford, USA Today and other web sources.
With a broken heart, I like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:
• How many people got killed and injured per year?
• Visualize mass shootings on the U.S map
• Is there any correlation between shooter and his/her race, gender
• Any correlation with calendar dates? Do we have more deadly days, weeks or months on average
• What cities and states are more prone to such attacks
• Can you find and combine any other external datasets to enrich the analysis, for example, gun ownership by state
• Any other pattern you see that can help in prediction, crowd safety or in-depth analysis of the event
• How many shooters have some kind of mental health problem? Can we compare that shooter with general population with same condition
This is the new Version of Mass Shootings Dataset. I've added eight new variables:
Age, Employed and Employed at (3 variables) contain shooter details
Quite a few missing values have been added
Three more recent mass shootings have been added including the Texas Church shooting of November 5, 2017
I hope it will help create more visualization and extract patterns.
Keep Coding!
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset contains data and analysis from the article Do State Department Travel Warnings Reflect Real Danger?
BTSOriginUS_10_09_to_06_16.csv Air Carrier Statistics Database export, Bureau of Transportation StatisticsSDamerican_deaths_abroad_10_09_to_06_16.csv U.S. State DepartmentSDwarnings_10_09to06_16.csv U.S. State Department via Internet Archivehttps://cdn-images-1.medium.com/max/800/1*moPQYbzXW0Jx6AFhY8VKWQ.png" alt="alt text">
https://cdn-images-1.medium.com/max/800/1*s1OX6ke8wlHhK4VubpVWcg.png" alt="alt text">
https://cdn-images-1.medium.com/max/800/1*JwvpqE4YIuYfx2UEqCp9nA.png" alt="alt text">
https://cdn-images-1.medium.com/max/800/1*LHLsJ0IzLsSlNl0UN8XrAw.png" alt="alt text">
https://cdn-images-1.medium.com/max/800/1*l0sqn7voWyMCbwoQ2OKGfg.png" alt="alt text">
Facebook
TwitterBy Substance Abuse and Mental Health Services Organization [source]
This dataset contains estimates of serious mental illness in the US by state and substate region from 2012-2014. This data helps to understand better the mental health disparities that exist between states and different regions within states. By looking at this data, researchers can identify the parts of the country with particularly high or low rates of serious mental illness, which can help prioritize resources for affected areas.
The dataset includes estimates along with 95% confidence intervals based on a survey-weighted hierarchical Bayes estimation approach and are generated by Markov Chain Monte Carlo techniques. Columns labeled Map Group can be used to distinguish substate regions included in corresponding maps as well as numerical order for sorting original sort order. For definitions in Substate Region, refer to the National Survey on Drug Use and Health's Substate Region Definitions found here: https://www.samhsa.gov/data/sites/default/files/NSDUHsubstateRegionDefs2014/NSDUHsubstateRegionDefs2014.pdf
This reliable information is provided by SAMHSA, Center for Behavioral Health Statistics and Quality through their National Survey on Drug Use and Health from 2012-2014; helping us gain insights into America’s overall mental health picture – revealing more about where help is needed most urgently so that we can take steps towards a healthier future for all Americans!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
Welcome to this dataset! This dataset contains estimates of Serious Mental Illnesses in the United States by state and substate region from 2012 to 2014. It is designed for researchers, analysts, and data scientists looking for information about the prevalence of Serious Mental Illnesses across the US.
- Performing a trend analysis to identify changes in the estimates of serious mental illnesses over time and across different geographic regions.
- Exploring disparities in serious mental illnesses among certain minority groups or deprived socio-economic subgroups by comparing estimates at the substate level.
- Developing targeted public health strategies and interventions for states with higher than average rates of serious mental illness prevalence
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: 2012-2014_Substate_SAE_Table_24.csv | Column name | Description | |:--------------------|:----------------------------------------------------------------------------------------------------------------------------------------------| | Order | A numerical order that can be used to sort the data back to its original order. (Numeric) | | State | The US state associated with the data. (String) | | Substate Region | The substate region associated with the data. (String) | | 95% CI (Lower) | The lower bound of the 95 percent confidence interval for the estimated number of people with serious mental illness in the region. (Numeric) | | 95% CI (Upper) | The upper bound of the 95 percent confidence interval for the estimated number of people with serious mental illness in the region. (Numeric) | | Map Group | A numerical value which can distinguish between different substate regions included in the maps. (Numeric) |
Facebook
TwitterA dataset to advance the study of life-cycle interactions of biomedical and socioeconomic factors in the aging process. The EI project has assembled a variety of large datasets covering the life histories of approximately 39,616 white male volunteers (drawn from a random sample of 331 companies) who served in the Union Army (UA), and of about 6,000 African-American veterans from 51 randomly selected United States Colored Troops companies (USCT). Their military records were linked to pension and medical records that detailed the soldiers������?? health status and socioeconomic and family characteristics. Each soldier was searched for in the US decennial census for the years in which they were most likely to be found alive (1850, 1860, 1880, 1900, 1910). In addition, a sample consisting of 70,000 men examined for service in the Union Army between September 1864 and April 1865 has been assembled and linked only to census records. These records will be useful for life-cycle comparisons of those accepted and rejected for service. Military Data: The military service and wartime medical histories of the UA and USCT men were collected from the Union Army and United States Colored Troops military service records, carded medical records, and other wartime documents. Pension Data: Wherever possible, the UA and USCT samples have been linked to pension records, including surgeon''''s certificates. About 70% of men in the Union Army sample have a pension. These records provide the bulk of the socioeconomic and demographic information on these men from the late 1800s through the early 1900s, including family structure and employment information. In addition, the surgeon''''s certificates provide rich medical histories, with an average of 5 examinations per linked recruit for the UA, and about 2.5 exams per USCT recruit. Census Data: Both early and late-age familial and socioeconomic information is collected from the manuscript schedules of the federal censuses of 1850, 1860, 1870 (incomplete), 1880, 1900, and 1910. Data Availability: All of the datasets (Military Union Army; linked Census; Surgeon''''s Certificates; Examination Records, and supporting ecological and environmental variables) are publicly available from ICPSR. In addition, copies on CD-ROM may be obtained from the CPE, which also maintains an interactive Internet Data Archive and Documentation Library, which can be accessed on the Project Website. * Dates of Study: 1850-1910 * Study Features: Longitudinal, Minority Oversamples * Sample Size: ** Union Army: 35,747 ** Colored Troops: 6,187 ** Examination Sample: 70,800 ICPSR Link: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/06836
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This directory contains the data behind the story Where Police Have Killed Americans In 2015.
We linked entries from the Guardian's database on police killings to census data from the American Community Survey. The Guardian data was downloaded on June 2, 2015. More information about its database is available here.
Census data was calculated at the tract level from the 2015 5-year American Community Survey using the tables S0601 (demographics), S1901 (tract-level income and poverty), S1701 (employment and education) and DP03 (county-level income). Census tracts were determined by geocoding addresses to latitude/longitude using the Bing Maps and Google Maps APIs and then overlaying points onto 2014 census tracts. GEOIDs are census-standard and should be easily joinable to other ACS tables -- let us know if you find anything interesting.
Field descriptions:
| Header | Description | Source |
|---|---|---|
name | Name of deceased | Guardian |
age | Age of deceased | Guardian |
gender | Gender of deceased | Guardian |
raceethnicity | Race/ethnicity of deceased | Guardian |
month | Month of killing | Guardian |
day | Day of incident | Guardian |
year | Year of incident | Guardian |
streetaddress | Address/intersection where incident occurred | Guardian |
city | City where incident occurred | Guardian |
state | State where incident occurred | Guardian |
latitude | Latitude, geocoded from address | |
longitude | Longitude, geocoded from address | |
state_fp | State FIPS code | Census |
county_fp | County FIPS code | Census |
tract_ce | Tract ID code | Census |
geo_id | Combined tract ID code | |
county_id | Combined county ID code | |
namelsad | Tract description | Census |
lawenforcementagency | Agency involved in incident | Guardian |
cause | Cause of death | Guardian |
armed | How/whether deceased was armed | Guardian |
pop | Tract population | Census |
share_white | Share of pop that is non-Hispanic white | Census |
share_bloack | Share of pop that is black (alone, not in combination) | Census |
share_hispanic | Share of pop that is Hispanic/Latino (any race) | Census |
p_income | Tract-level median personal income | Census |
h_income | Tract-level median household income | Census |
county_income | County-level median household income | Census |
comp_income | h_income / county_income | Calculated from Census |
county_bucket | Household income, quintile within county | Calculated from Census |
nat_bucket | Household income, quintile nationally | Calculated from Census |
pov | Tract-level poverty rate (official) | Census |
urate | Tract-level unemployment rate | Calculated from Census |
college | Share of 25+ pop with BA or higher | Calculated from Census |
Note regarding income calculations:
All income fields are in inflation-adjusted 2013 dollars.
comp_income is simply tract-level median household income as a share of county-level median household income.
county_bucket provides where the tract's median household income falls in the distribution (by quintile) of all tracts in the county. (1 indicates a tract falls in the poorest 20% of tracts within the county.) Distribution is not weighted by population.
nat_bucket is the same but for all U.S. counties.
This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!
This dataset is maintained using GitHub's API and Kaggle's API.
This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
Facebook
TwitterThe American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Much of the ACS data provided on the Census Bureau's Web site are available separately by age group, race, Hispanic origin, and sex. Summary files, Subject tables, Data profiles, and Comparison profiles are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 65,000 or more. Detailed Tables contain the most detailed cross-tabulations published for areas 65k and more. The data are population counts. There are over 31,000 variables in this dataset.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Dead Lake township population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Dead Lake township. The dataset can be utilized to understand the population distribution of Dead Lake township by age. For example, using this dataset, we can identify the largest age group in Dead Lake township.
Key observations
The largest age group in Dead Lake Township, Minnesota was for the group of age 65-69 years with a population of 96 (15.02%), according to the 2021 American Community Survey. At the same time, the smallest age group in Dead Lake Township, Minnesota was the 25-29 years with a population of 7 (1.10%). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.
Age groups:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Dead Lake township Population by Age. You can refer the same here
Facebook
TwitterNOTE: This dataset has been retired and marked as historical-only.
This dataset is a companion to the COVID-19 Daily Cases and Deaths dataset (https://data.cityofchicago.org/d/naz8-j4nc). The major difference in this dataset is that the case, death, and hospitalization corresponding rates per 100,000 population are not those for the single date indicated. They are rolling averages for the seven-day period ending on that date. This rolling average is used to account for fluctuations that may occur in the data, such as fewer cases being reported on weekends, and small numbers. The intent is to give a more representative view of the ongoing COVID-19 experience, less affected by what is essentially noise in the data.
All rates are per 100,000 population in the indicated group, or Chicago, as a whole, for “Total” columns.
Only Chicago residents are included based on the home address as provided by the medical provider.
Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted based on the date the test specimen was collected. Deaths among cases are aggregated by day of death. Hospitalizations are reported by date of first hospital admission. Demographic data are based on what is reported by medical providers or collected by CDPH during follow-up investigation.
Denominators are from the U.S. Census Bureau American Community Survey 1-year estimate for 2018 and can be seen in the Citywide, 2018 row of the Chicago Population Counts dataset (https://data.cityofchicago.org/d/85cm-7uqa).
All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. At any given time, this dataset reflects cases and deaths currently known to CDPH.
Numbers in this dataset may differ from other public sources due to definitions of COVID-19-related cases and deaths, sources used, how cases and deaths are associated to a specific date, and similar factors.
Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office, U.S. Census Bureau American Community Survey
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Gratis by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Gratis across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 50.0% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Gratis Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterProject Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.
Gender * The City collects information on gender identity using these guidelines.
Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 cases reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to the California Department of Public Health, Virtual Assistant information gathering beginning December 2021. The Virtual Assistant is only sent to adults who are 18+ years old. https://www.sfdph.org/dph/files/PoliciesProcedures/COM9_SexualOrientationGuidelines.pdf">Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission Type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset has been archived and will no longer update as of 9/11/2023.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cases on each date.
New cases are the count of cases within that characteristic group where the positive tests were collected on that specific specimen collection date. Cumulative cases are the running total of all San Francisco cases in that characteristic group up to the specimen collection date listed.
This data may not be immediately available for recently reported cases. Data updates as more information becomes available.
To explore data on the total number of cases, use the ARCHIVED: COVID-19 Cases Over Time dataset.
E. CHANGE LOG
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the American Fork population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of American Fork across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2022, the population of American Fork was 37,268, a 8.25% increase year-by-year from 2021. Previously, in 2021, American Fork population was 34,427, an increase of 2.63% compared to a population of 33,544 in 2020. Over the last 20 plus years, between 2000 and 2022, population of American Fork increased by 14,726. In this period, the peak population was 37,268 in the year 2022. The numbers suggest that the population has not reached its peak yet and is showing a trend of further growth. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for American Fork Population by Year. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Traverse City by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Traverse City. The dataset can be utilized to understand the population distribution of Traverse City by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Traverse City. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Traverse City.
Key observations
Largest age group (population): Male # 30-34 years (757) | Female # 70-74 years (831). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Traverse City Population by Gender. You can refer the same here
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Context Diabetes is one of the most prevalent chronic diseases in the United States, affecting millions of Americans each year and placing a substantial financial burden on the economy. It is a serious chronic condition in which the body loses the ability to effectively regulate blood glucose levels, leading to a reduced quality of life and decreased life expectancy. During digestion, food is broken down into sugars, which enter the bloodstream. This triggers the pancreas to release insulin, a hormone that helps cells in the body use these sugars for energy. Diabetes is typically characterized by either insufficient insulin production or the body's inability to use insulin effectively.
Chronic high blood sugar levels in individuals with diabetes can lead to severe complications, including heart disease, vision loss, kidney disease, and lower-limb amputation. Although there is no cure for diabetes, strategies such as maintaining a healthy weight, eating a balanced diet, staying physically active, and receiving medical treatments can help mitigate its effects. Early diagnosis is crucial, as it allows for lifestyle modifications and more effective treatment, making predictive models for assessing diabetes risk valuable tools for public health officials.
The scale of the diabetes epidemic is significant. According to the Centers for Disease Control and Prevention (CDC), as of 2018, approximately 34.2 million Americans have diabetes, while 88 million have prediabetes. Alarmingly, the CDC estimates that 1 in 5 individuals with diabetes and about 8 in 10 individuals with prediabetes are unaware of their condition. Type II diabetes is the most common form, and its prevalence varies based on factors such as age, education, income, geographic location, race, and other social determinants of health. The burden of diabetes disproportionately affects those with lower socioeconomic status. The economic impact is also substantial, with the cost of diagnosed diabetes reaching approximately $327 billion annually, and total costs, including undiagnosed diabetes and prediabetes, nearing $400 billion each year.
Content The Behavioral Risk Factor Surveillance System (BRFSS) is a health-related telephone survey that is collected annually by the CDC. Each year, the survey collects responses from over 400,000 Americans on health-related risk behaviors, chronic health conditions, and the use of preventative services. It has been conducted every year since 1984. For this project, a XPT of the dataset available on CDC website for the year 2023 was used. This original dataset contains responses from 433,323 individuals and has 345 features. These features are either questions directly asked of participants, or calculated variables based on individual participant responses.
I have selected 20 features from this dataset that are suitable for working on the topic of diabetes, and I have saved them in a CSV file without making any changes to the data. The goal of this is to make it easier to work with the data. For more information or to access updated data, you can refer to the CDC website. I initially examined the original dataset from the CDC and found no duplicate entries. That dataset contains 330 columns and features. Therefore, the duplicate cases in this dataset are not due to errors but rather represent individuals with similar conditions. In my opinion, removing these entries would both introduce errors and reduce accuracy.
Explore some of the following research questions: - Can survey questions from the BRFSS provide accurate predictions of whether an individual has diabetes? - What risk factors are most predictive of diabetes risk? - Can we use a subset of the risk factors to accurately predict whether an individual has diabetes? - Can we create a short form of questions from the BRFSS using feature selection to accurately predict if someone might have diabetes or is at high risk of diabetes?
Acknowledgements It is important to reiterate that I did not create this dataset, it is simply a summarized and reformatted dataset derived from the BRFSS 2023 dataset available on the CDC website. It is also worth noting that none of the data in this dataset discloses individuals' identities.
Inspiration Zidian Xie et al for Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques using the 2014 BRFSS, and Alex Teboul for building Diabetes Health Indicators dataset based on BRFSS 2015 were the inspiration for creating this dataset and exploring the BRFSS in general.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset includes an aggregated and event-correlated analysis of power outages in the United States, synthesized by integrating three data sources: the Environment for the Analysis of Geo-Located Energy Information (EAGLE-I), the Electric Emergency Incident Disturbance Report (DOE-417), and Annual Estimates of the Resident Population for Counties 2024 (CO-EST2024-POP). The EAGLE-I dataset, spanning from 2014 to 2023, encompasses over 146 million customers and offers county-level outage information at 15-minute intervals. The data has been processed, filtered, and aggregated to deliver an enhanced perspective on power outages, which are then correlated with DOE-417 data based on geographic location as well as the start and end times of events. For each major disturbance documented in DOE-417, essential metrics are defined to quantify the outages associated with the event. This dataset supports researchers in examining outages triggered by major disturbances like extreme weather and physical disruptions, thereby aiding studies on power system resilience.
Links to the raw data for generating the correlated dataset are included below as "DOE-417", "EAGLE-I", and "CO-EST2024-POP" resources.
Acknowledgement: This work is funded by the Laboratory Directed Research and Development (LDRD) at the Pacific Northwest National Laboratory (PNNL) as part of the Resilience Through Data-Driven, Intelligently Designed Control (RD2C) Initiative.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset presents the distribution of median household income among distinct age brackets of householders in Dead Lake township. Based on the latest 2019-2023 5-Year Estimates from the American Community Survey, it displays how income varies among householders of different ages in Dead Lake township. It showcases how household incomes typically rise as the head of the household gets older. The dataset can be utilized to gain insights into age-based household income trends and explore the variations in incomes across households.
Key observations: Insights from 2023
In terms of income distribution across age cohorts, in Dead Lake township, the median household income stands at $149,375 for householders within the 25 to 44 years age group, followed by $86,389 for the 65 years and over age group. Notably, householders within the 45 to 64 years age group, had the lowest median household income at $72,250.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.
Age groups classifications include:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Dead Lake township median household income by age. You can refer the same here
Facebook
TwitterBy Elias Dabbas [source]
This dataset contains the details about Hollywood's all-time domestic box office records. It includes data scraped from Box Office Mojo, which breakdowns every movie's lifetime gross, ranking and production year. Domestic gross (adjusted to inflation) has been used as the benchmark to determine what movies were the most successful at the box office in America. This dataset allows you to explore an extensive, comprehensive list of Hollywood all-time biggest hits. Analyze examples of previously unprecedented blockbusters and observe current market trends with this comprehensive overview of domestic box office history - only here at this treasury of motion picture insights!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset contains comprehensive information about Hollywood movies and their domestic performance at the box office. It includes data on films' production year, lifetime gross, ranking and the studio that produced them. By using this dataset, you can analyze the financial successes and failures of films produced by different studios to gain insights into the Hollywood movie market over time.
The 'rank' column shows each film's ranking compared to other Hollywood movies released in its year of release based on its box office revenue from theaters (not including other sources such as DVD sales or streaming services). The higher the number for a film’s rank means it was more successful financially than other films released in its date window when ticket prices were taken into account; lower numbers equate to less success at that time frame's box office.
The ‘title’ column features all movies analyzed here with links provided which direct users to articles giving background information about those projects - directorial credentials or management history -- as well as full reviews with ratings given by critics while they were screened theatricallly across North America (U.S., Canada).
The ‘studio’ outlines which media conglomerate is credited with distribution/marketing rights for each featured motion picture during their original domestic theatrical runs; these name-brands represent umbrella-corporations comprising multiple divisions specializing in creative development/financing of cinematic works along with doorways engineered around technical know-how -- ie: visual effects shops used by filmmakers during post-production responsibilities their respective productions entailed) -- maintained throughout various industrial regions across entertainment media outlets extending well beyond motion pictures proper... including music/television sector domains defined under respective company flags like Warner Bros., Disney(ABC), NBCUniversal(Comcast) ++ et al mirroring segmentations off any parent brand cited within this database under said label; pertaining solely toward big screen celluloid matters examined herein because charter established assumptions indicate only valid commercially viable feature length fare delivering both titles & collections contained below adheres relevant criterion set forth specifications that warrant inclusion alongside applicable vertical peers made front % center terms established formulating current entries visible within page iteration whilst conforming platform protocols designed enable public
- Creating a recommendation engine to suggest similar movies based on lifetime gross and year of release.
- Data analysis and visualization of box office trends over time for major Hollywood studios.
- Utilizing the data to recommend alternative ways for movie marketers to invest their advertising budgets in order to maximize their return on investment
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - **Keep i...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Greenville by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Greenville across both sexes and to determine which sex constitutes the majority.
Key observations
There is a majority of female population, with 54.49% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Greenville Population by Race & Ethnicity. You can refer the same here
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is a source dataset for a Let's Get Healthy California indicator at https://letsgethealthy.ca.gov/. Infant Mortality is defined as the number of deaths in infants under one year of age per 1,000 live births. Infant mortality is often used as an indicator to measure the health and well-being of a community, because factors affecting the health of entire populations can also impact the mortality rate of infants. Although California’s infant mortality rate is better than the national average, there are significant disparities, with African American babies dying at more than twice the rate of other groups. Data are from the Birth Cohort Files. The infant mortality indicator computed from the birth cohort file comprises birth certificate information on all births that occur in a calendar year (denominator) plus death certificate information linked to the birth certificate for those infants who were born in that year but subsequently died within 12 months of birth (numerator). Studies of infant mortality that are based on information from death certificates alone have been found to underestimate infant death rates for infants of all race/ethnic groups and especially for certain race/ethnic groups, due to problems such as confusion about event registration requirements, incomplete data, and transfers of newborns from one facility to another for medical care. Note there is a separate data table "Infant Mortality by Race/Ethnicity" which is based on death records only, which is more timely but less accurate than the Birth Cohort File. Single year shown to provide state-level data and county totals for the most recent year. Numerator: Infants deaths (under age 1 year). Denominator: Live births occurring to California state residents. Multiple years aggregated to allow for stratification at the county level. For this indicator, race/ethnicity is based on the birth certificate information, which records the race/ethnicity of the mother. The mother can “decline to state”; this is considered to be a valid response. These responses are not displayed on the indicator visualization.