Facebook
TwitterTHIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Mass Shootings in the United States of America (1966-2017) The US has witnessed 398 mass shootings in last 50 years that resulted in 1,996 deaths and 2,488 injured. The latest and the worst mass shooting of October 2, 2017 killed 58 and injured 515 so far. The number of people injured in this attack is more than the number of people injured in all mass shootings of 2015 and 2016 combined. The average number of mass shootings per year is 7 for the last 50 years that would claim 39 lives and 48 injured per year.
Geography: United States of America
Time period: 1966-2017
Unit of analysis: Mass Shooting Attack
Dataset: The dataset contains detailed information of 398 mass shootings in the United States of America that killed 1996 and injured 2488 people.
Variables: The dataset contains Serial No, Title, Location, Date, Summary, Fatalities, Injured, Total Victims, Mental Health Issue, Race, Gender, and Lat-Long information.
I’ve consulted several public datasets and web pages to compile this data. Some of the major data sources include Wikipedia, Mother Jones, Stanford, USA Today and other web sources.
With a broken heart, I like to call the attention of my fellow Kagglers to use Machine Learning and Data Sciences to help me explore these ideas:
• How many people got killed and injured per year?
• Visualize mass shootings on the U.S map
• Is there any correlation between shooter and his/her race, gender
• Any correlation with calendar dates? Do we have more deadly days, weeks or months on average
• What cities and states are more prone to such attacks
• Can you find and combine any other external datasets to enrich the analysis, for example, gun ownership by state
• Any other pattern you see that can help in prediction, crowd safety or in-depth analysis of the event
• How many shooters have some kind of mental health problem? Can we compare that shooter with general population with same condition
This is the new Version of Mass Shootings Dataset. I've added eight new variables:
Age, Employed and Employed at (3 variables) contain shooter details
Quite a few missing values have been added
Three more recent mass shootings have been added including the Texas Church shooting of November 5, 2017
I hope it will help create more visualization and extract patterns.
Keep Coding!
Facebook
TwitterNotice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Facebook
TwitterAttribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Washington Post compiled a dataset of every fatal shooting in the United States by a police officer in the line of duty since Jan. 1, 2015.
In 2015, The Post began tracking more than a dozen details about each killing by culling local news reports, law enforcement websites and social media and by monitoring independent databases such as Killed by Police and Fatal Encounters. The available features are: - Race of the deceased; - Circumstances of the shooting; - Whether the person was armed; - Whether the victim was experiencing a mental-health crisis; - Among others.
In 2016, The Post is gathering additional information about each fatal shooting that occurs this year and is filing open-records requests with departments. More than a dozen additional details are being collected about officers in each shooting.
The Post is documenting only those shootings in which a police officer, in the line of duty, shot and killed a civilian — the circumstances that most closely parallel the 2014 killing of Michael Brown in Ferguson, Mo., which began the protest movement culminating in Black Lives Matter and an increased focus on police accountability nationwide. The Post is not tracking deaths of people in police custody, fatal shootings by off-duty officers or non-shooting deaths.
The FBI and the Centers for Disease Control and Prevention log fatal shootings by police, but officials acknowledge that their data is incomplete. In 2015, The Post documented more than two times more fatal shootings by police than had been recorded by the FBI. Last year, the FBI announced plans to overhaul how it tracks fatal police encounters.
If you use this dataset in your research, please credit the authors.
BibTeX
@misc{wapo-police-shootings-bot , author = {The Washington Post}, title = {data-police-shootings}, month = jan, year = 2015, publisher = {Github}, url = {https://github.com/washingtonpost/data-police-shootings} }
License
CC BY NC SA 4.0
Splash banner
Image by pixabay avaiable on pexels.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
Facebook
TwitterThe American Civil War is the conflict with the largest number of American military fatalities in history. In fact, the Civil War's death toll is comparable to all other major wars combined, the deadliest of which were the World Wars, which have a combined death toll of more than 520,000 American fatalities. The ongoing series of conflicts and interventions in the Middle East and North Africa, collectively referred to as the War on Terror in the west, has a combined death toll of more than 7,000 for the U.S. military since 2001. Other records In terms of the number of deaths per day, the American Civil War is still at the top, with an average of 425 deaths per day, while the First and Second World Wars have averages of roughly 100 and 200 fatalities per day respectively. Technically, the costliest battle in U.S. military history was the Battle of Elsenborn Ridge, which was a part of the Battle of the Bulge in the Second World War, and saw upwards of 5,000 deaths over 10 days. However, the Battle of Gettysburg had more military fatalities of American soldiers, with almost 3,200 Union deaths and over 3,900 Confederate deaths, giving a combined total of more than 7,000. The Battle of Antietam is viewed as the bloodiest day in American military history, with over 3,600 combined fatalities and almost 23,000 total casualties on September 17, 1862. Revised Civil War figures For more than a century, the total death toll of the American Civil War was generally accepted to be around 620,000, a number which was first proposed by Union historians William F. Fox and Thomas L. Livermore in 1888. This number was calculated by using enlistment figures, battle reports, and census data, however many prominent historians since then have thought the number should be higher. In 2011, historian J. David Hacker conducted further investigations and claimed that the number was closer to 750,000 (and possibly as high as 850,000). While many Civil War historians agree that this is possible, and even likely, obtaining consistently accurate figures has proven to be impossible until now; both sides were poor at keeping detailed records throughout the war, and much of the Confederacy's records were lost by the war's end. Many Confederate widows also did not register their husbands death with the authorities, as they would have then been ineligible for benefits.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the Kill Devil Hills population over the last 20 plus years. It lists the population for each year, along with the year on year change in population, as well as the change in percentage terms for each year. The dataset can be utilized to understand the population change of Kill Devil Hills across the last two decades. For example, using this dataset, we can identify if the population is declining or increasing. If there is a change, when the population peaked, or if it is still growing and has not reached its peak. We can also compare the trend with the overall trend of United States population over the same period of time.
Key observations
In 2023, the population of Kill Devil Hills was 7,778, a 0.32% decrease year-by-year from 2022. Previously, in 2022, Kill Devil Hills population was 7,803, a decline of 0.12% compared to a population of 7,812 in 2021. Over the last 20 plus years, between 2000 and 2023, population of Kill Devil Hills increased by 1,865. In this period, the peak population was 7,812 in the year 2021. The numbers suggest that the population has already reached its peak and is showing a trend of decline. Source: U.S. Census Bureau Population Estimates Program (PEP).
When available, the data consists of estimates from the U.S. Census Bureau Population Estimates Program (PEP).
Data Coverage:
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Kill Devil Hills Population by Year. You can refer the same here
Facebook
TwitterThe American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population. Much of the ACS data provided on the Census Bureau's Web site are available separately by age group, race, Hispanic origin, and sex. Summary files, Subject tables, Data profiles, and Comparison profiles are available for the nation, all 50 states, the District of Columbia, Puerto Rico, every congressional district, every metropolitan area, and all counties and places with populations of 65,000 or more. Detailed Tables contain the most detailed cross-tabulations published for areas 65k and more. The data are population counts. There are over 31,000 variables in this dataset.
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
The graph depicts the number of AIDS-related deaths in the United States annually from 1981 to 2021. The x-axis represents the years, labeled with two-digit abbreviations from '81 to '21, while the y-axis shows the number of deaths in thousands. Over this 41-year span, AIDS deaths increased dramatically from 1,675.77 in 1981, reaching a peak of 43,276.94 in 1994, and then declined significantly to 6,306.24 by 2021. The data highlights a sharp upward trend in the early years of the epidemic, followed by a substantial downward trend starting in the mid-1990s, reflecting improvements in treatment and prevention. The information is presented in a line graph format, effectively illustrating the rise and subsequent decline in AIDS-related fatalities over the four decades.
Facebook
TwitterSelected variables from the most recent ACS Community Survey (Released 2023) aggregated by Community Area. Additional years will be added as they become available. The underlying algorithm to create the dataset calculates the % of a census tract that falls within the boundaries of a given community area. Given that census tracts and community area boundaries are not aligned, these figures should be considered an estimate. Total population in this dataset: 2,647,621 Total Chicago Population Per ACS 2023: 2,664,452 % Difference: -0.632% There are different approaches in common use for displaying Hispanic or Latino population counts. In this dataset, following the approach taken by the Census Bureau, a person who identifies as Hispanic or Latino will also be counted in the race category with which they identify. However, again following the Census Bureau data, there is also a column for White Not Hispanic or Latino. Code can be found here: https://github.com/Chicago/5-Year-ACS-Survey-Data Community Area Shapefile: https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas-current-/cauq-8yn6 Census Area Python Package Documentation: https://census-area.readthedocs.io/en/latest/index.html
Facebook
TwitterBy Health [source]
The Behavioral Risk Factor Surveillance System (BRFSS) offers an expansive collection of data on the health-related quality of life (HRQOL) from 1993 to 2010. Over this time period, the Health-Related Quality of Life dataset consists of a comprehensive survey reflecting the health and well-being of non-institutionalized US adults aged 18 years or older. The data collected can help track and identify unmet population health needs, recognize trends, identify disparities in healthcare, determine determinants of public health, inform decision making and policy development, as well as evaluate programs within public healthcare services.
The HRQOL surveillance system has developed a compact set of HRQOL measures such as a summary measure indicating unhealthy days which have been validated for population health surveillance purposes and have been widely implemented in practice since 1993. Within this study's dataset you will be able to access information such as year recorded, location abbreviations & descriptions, category & topic overviews, questions asked in surveys and much more detailed information including types & units regarding data values retrieved from respondents along with their sample sizes & geographical locations involved!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset tracks the Health-Related Quality of Life (HRQOL) from 1993 to 2010 using data from the Behavioral Risk Factor Surveillance System (BRFSS). This dataset includes information on the year, location abbreviation, location description, type and unit of data value, sample size, category and topic of survey questions.
Using this dataset on BRFSS: HRQOL data between 1993-2010 will allow for a variety of analyses related to population health needs. The compact set of HRQOL measures can be used to identify trends in population health needs as well as determine disparities among various locations. Additionally, responses to survey questions can be used to inform decision making and program and policy development in public health initiatives.
- Analyzing trends in HRQOL over the years by location to identify disparities in health outcomes between different populations and develop targeted policy interventions.
- Developing new models for predicting HRQOL indicators at a regional level, and using this information to inform medical practice and public health implementation efforts.
- Using the data to understand differences between states in terms of their HRQOL scores and establish best practices for healthcare provision based on that understanding, including areas such as access to care, preventative care services availability, etc
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset description for more information.
File: rows.csv | Column name | Description | |:-------------------------------|:----------------------------------------------------------| | Year | Year of survey. (Integer) | | LocationAbbr | Abbreviation of location. (String) | | LocationDesc | Description of location. (String) | | Category | Category of survey. (String) | | Topic | Topic of survey. (String) | | Question | Question asked in survey. (String) | | DataSource | Source of data. (String) | | Data_Value_Unit | Unit of data value. (String) | | Data_Value_Type | Type of data value. (String) | | Data_Value_Footnote_Symbol | Footnote symbol for data value. (String) | | Data_Value_Std_Err | Standard error of the data value. (Float) | | Sample_Size | Sample size used in sample. (Integer) | | Break_Out | Break out categories used. (String) | | Break_Out_Category | Type break out assessed. (String) | | **GeoLocation*...
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Age-standardised mortality rates for deaths involving coronavirus (COVID-19), non-COVID-19 deaths and all deaths by vaccination status, broken down by age group.
Facebook
TwitterBy Danny [source]
This dataset contains US county-level demographic data from 2016, giving insight into the health and economic conditions of counties in the United States. Aggregated and filtered from various sources such as the US Census Small Area Income and Poverty Estimates (SAIPE) Program, American Community Survey, CDC National Center for Health Statistics, and more, this comprehensive dataset provides information on population as well as desert population for each county. Additionally, data is split between metropolitan and nonmetropolitan areas according to the Office of Management and Budget's 2013 classification scheme. Valuable information pertaining to infant mortality rates and total population are also included in this detailed set of data. Use this dataset to gain a better understanding of one of our nation's most essential regions
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
- Look at the information within the 'About this Dataset' section to have an understanding of what data sources were used to create this dataset as well as any transformations that may have been done while creating it.
- Familiarize yourself with the columns provided in the data set to understand what information is available for each county such as total population (totpop), parental education level (educationLvl), median household income (medianIncome), etc.,
- Use a combination of filtering and sorting techniques to narrow down results and focus in on more specific county demographics that you are looking for such as total households living below poverty line by state or median household income per capita between two counties etc.,
- Keep in mind any additional transformations/simplifications/aggregations done during step 2 when using your data for analysis. For example, if certain variables were pivoted during step two from being rows into columns because it was easier to work with multiple years of income levels by having them all consolidated into one column then be aware that some states may not appear in all records due to those transformations being applied differently between regions which could result in missing values or other inconsistencies when doing downstream analysis on your selected variables.
- Utilize resources such as Wikipedia and government census estimates if you need more detailed information surrounding these demographic characteristics beyond what's available within our current dataset – these can be helpful when conducting further research outside of solely relying on our provided spreadsheet values alone!
- Creating a US county-level heat map of infant mortality rates, offering insight into which areas are most at risk for poor health outcomes.
- Generating predictive models from the population data to anticipate and prepare for future population trends in different states or regions.
- Developing an interactive web-based tool for school districts to explore potential impacts of student mobility on their area's population stability and diversity
If you use this dataset in your research, please credit the original authors. Data Source
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Food Desert.csv | Column name | Description | |:--------------------|:----------------------------------------------------------------------------------| | year | The year the data was collected. (Integer) | | fips | The Federal Information Processing Standard (FIPS) code for the county. (Integer) | | state_fips | The FIPS code for the state. (Integer) | | county_fips | The FIPS code for the county. (Integer)...
Facebook
TwitterMost estimates place the total number of deaths during the Second World War at around 70-85 million people. Approximately 17 million of these deaths (20-25 percent of the total) were due to crimes against humanity carried out by the Nazi regime in Europe. In comparison to the millions of deaths that took place through conflict, famine, or disease, these 17 million stand out due to the reasoning behind them, along with the systematic nature and scale in which they were carried out. Nazi ideology claimed that the Aryan race (a non-existent ethnic group referring to northern Europeans) was superior to all other ethnicities; this became the justification for German expansion and the extermination of others. During the war, millions of people deemed to be of lesser races were captured and used as slave laborers, with a large share dying of exhaustion, starvation, or individual execution. Murder campaigns were also used for systematic extermination; the most famous of these were the extermination camps, such as at Auschwitz, where roughly 80 percent of the 1.1 million victims were murdered in gas chambers upon arrival at the camp. German death squads in Eastern Europe carried out widespread mass shootings, and up to two million people were killed in this way. In Germany itself, many disabled, homosexual, and "undesirables" were also killed or euthanized as part of a wider eugenics program, which aimed to "purify" German society.
The Holocaust Of all races, the Nazi's viewed Jews as being the most inferior. Conspiracy theories involving Jews go back for centuries in Europe, and they have been repeatedly marginalized throughout history. German fascists used the Jews as scapegoats for the economic struggles during the interwar period. Following Hitler's ascendency to the Chancellorship in 1933, the German authorities began constructing concentration camps for political opponents and so-called undesirables, but the share of Jews being transported to these camps gradually increased in the following years, particularly after Kristallnacht (the Night of Broken Glass) in 1938. In 1939, Germany then invaded Poland, home to Europe's largest Jewish population. German authorities segregated the Jewish population into ghettos, and constructed thousands more concentration and detention camps across Eastern Europe, to which millions of Jews were transported from other territories. By the end of the war, over two thirds of Europe's Jewish population had been killed, and this share is higher still when one excludes the neutral or non-annexed territories.
Lebensraum Another key aspect of Nazi ideology was that of the Lebensraum (living space). Both the populations of the Soviet Union and United States were heavily concentrated in one side of the country, with vast territories extending to the east and west, respectively. Germany was much smaller and more densely populated, therefore Hitler aspired to extend Germany's territory to the east and create new "living space" for Germany's population and industry to grow. While Hitler may have envied the U.S. in this regard, the USSR was seen as undeserving; Slavs were the largest major group in the east and the Nazis viewed them as inferior, which was again used to justify the annexation of their land and subjugation of their people. As the Germans took Slavic lands in Poland, the USSR, and Yugoslavia, ethnic cleansings (often with the help of local conspirators) became commonplace in the annexed territories. It is also believed that the majority of Soviet prisoners of war (PoWs) died through starvation and disease, and they were not given the same treatment as PoWs on the western front. The Soviet Union lost as many as 27 million people during the war, and 10 million of these were due to Nazi genocide. It is estimated that Poland lost up to six million people, and almost all of these were through genocide.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This directory contains the data behind the story Where Police Have Killed Americans In 2015.
We linked entries from the Guardian's database on police killings to census data from the American Community Survey. The Guardian data was downloaded on June 2, 2015. More information about its database is available here.
Census data was calculated at the tract level from the 2015 5-year American Community Survey using the tables S0601 (demographics), S1901 (tract-level income and poverty), S1701 (employment and education) and DP03 (county-level income). Census tracts were determined by geocoding addresses to latitude/longitude using the Bing Maps and Google Maps APIs and then overlaying points onto 2014 census tracts. GEOIDs are census-standard and should be easily joinable to other ACS tables -- let us know if you find anything interesting.
Field descriptions:
| Header | Description | Source |
|---|---|---|
name | Name of deceased | Guardian |
age | Age of deceased | Guardian |
gender | Gender of deceased | Guardian |
raceethnicity | Race/ethnicity of deceased | Guardian |
month | Month of killing | Guardian |
day | Day of incident | Guardian |
year | Year of incident | Guardian |
streetaddress | Address/intersection where incident occurred | Guardian |
city | City where incident occurred | Guardian |
state | State where incident occurred | Guardian |
latitude | Latitude, geocoded from address | |
longitude | Longitude, geocoded from address | |
state_fp | State FIPS code | Census |
county_fp | County FIPS code | Census |
tract_ce | Tract ID code | Census |
geo_id | Combined tract ID code | |
county_id | Combined county ID code | |
namelsad | Tract description | Census |
lawenforcementagency | Agency involved in incident | Guardian |
cause | Cause of death | Guardian |
armed | How/whether deceased was armed | Guardian |
pop | Tract population | Census |
share_white | Share of pop that is non-Hispanic white | Census |
share_bloack | Share of pop that is black (alone, not in combination) | Census |
share_hispanic | Share of pop that is Hispanic/Latino (any race) | Census |
p_income | Tract-level median personal income | Census |
h_income | Tract-level median household income | Census |
county_income | County-level median household income | Census |
comp_income | h_income / county_income | Calculated from Census |
county_bucket | Household income, quintile within county | Calculated from Census |
nat_bucket | Household income, quintile nationally | Calculated from Census |
pov | Tract-level poverty rate (official) | Census |
urate | Tract-level unemployment rate | Calculated from Census |
college | Share of 25+ pop with BA or higher | Calculated from Census |
Note regarding income calculations:
All income fields are in inflation-adjusted 2013 dollars.
comp_income is simply tract-level median household income as a share of county-level median household income.
county_bucket provides where the tract's median household income falls in the distribution (by quintile) of all tracts in the county. (1 indicates a tract falls in the poorest 20% of tracts within the county.) Distribution is not weighted by population.
nat_bucket is the same but for all U.S. counties.
This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!
This dataset is maintained using GitHub's API and Kaggle's API.
This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset of the most highly populated city (if applicable) in a form easy to join with the COVID19 Global Forecasting (Week 1) dataset. You can see how to use it in this kernel
There are four columns. The first two correspond to the columns from the original COVID19 Global Forecasting (Week 1) dataset. The other two is the highest population density, at city level, for the given country/state. Note that some countries are very small and in those cases the population density reflects the entire country. Since the original dataset has a few cruise ships as well, I've added them there.
Thanks a lot to Kaggle for this competition that gave me the opportunity to look closely at some data and understand this problem better.
Summary: I believe that the square root of the population density should relate to the logistic growth factor of the SIR model. I think the SEIR model isn't applicable due to any intervention being too late for a fast-spreading virus like this, especially in places with dense populations.
After playing with the data provided in COVID19 Global Forecasting (Week 1) (and everything else online or media) a bit, one thing becomes clear. They have nothing to do with epidemiology. They reflect sociopolitical characteristics of a country/state and, more specifically, the reactivity and attitude towards testing.
The testing method used (PCR tests) means that what we measure could potentially be a proxy for the number of people infected during the last 3 weeks, i.e the growth (with lag). It's not how many people have been infected and recovered. Antibody or serology tests would measure that, and by using them, we could go back to normality faster... but those will arrive too late. Way earlier, China will have experimentally shown that it's safe to go back to normal as soon as your number of newly infected per day is close to zero.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F197482%2F429e0fdd7f1ce86eba882857ac7a735e%2Fcovid-summary.png?generation=1585072438685236&alt=media" alt="">
My view, as a person living in NYC, about this virus, is that by the time governments react to media pressure, to lockdown or even test, it's too late. In dense areas, everyone susceptible has already amble opportunities to be infected. Especially for a virus with 5-14 days lag between infections and symptoms, a period during which hosts spread it all over on subway, the conditions are hopeless. Active populations have already been exposed, mostly asymptomatic and recovered. Sensitive/older populations are more self-isolated/careful in affluent societies (maybe this isn't the case in North Italy). As the virus finishes exploring the active population, it starts penetrating the more isolated ones. At this point in time, the first fatalities happen. Then testing starts. Then the media and the lockdown. Lockdown seems overly effective because it coincides with the tail of the disease spread. It helps slow down the virus exploring the long-tail of sensitive population, and we should all contribute by doing it, but it doesn't cause the end of the disease. If it did, then as soon as people were back in the streets (see China), there would be repeated outbreaks.
Smart politicians will test a lot because it will make their condition look worse. It helps them demand more resources. At the same time, they will have a low rate of fatalities due to large denominator. They can take credit for managing well a disproportionally major crisis - in contrast to people who didn't test.
We were lucky this time. We, Westerners, have woken up to the potential of a pandemic. I'm sure we will give further resources for prevention. Additionally, we will be more open-minded, helping politicians to have more direct responses. We will also require them to be more responsible in their messages and reactions.
Facebook
TwitterBy CrowdFlower [source]
This dataset contains comprehensive information on Indian terrorism deaths, including death tolls due to violence, civilian deaths and militant/terrorist/insurgent fatalities. Accurate estimates from 27,233 sentences sourced and verified from the South Asia Terrorism Portal are provided for every incident. Each row of the dataset includes variables corresponding to the state, district, date reported on as well as features indicating accuracy of judgments for each row. Golden rows indicate maximum accuracy levels for these details and include totals for civilians killed or injured according to the gold standard. Additionally features such as trusted judgements count along with extracted subjects and objects of sentences can be derived from this data set making it a powerful interface that allows researchers to gain access into key aspects of India's current situation related to lethal force events
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides information on deaths that have occurred in India due to terrorism, as well as the incident details. This can be a valuable source of information for researchers looking to better understand the impacts of terrorism on Indian society and the associated prevention measures.
Here’s how you can use this dataset:
- Analyze death tolls by type (civilian, militant/terrorist/insurgent, security forces):Use descriptive statistics functions to compare and contrast the number of deaths caused by civilian, militant/terrorist/insurgent, and security forces over time. You could also look for correlations between these types of incidents and other factors such as region or date.
- Explore different regions impacted by terrorism: Explore which states or districts in India are affected most adversely by terrorist activities using location data from this dataset. You could also examine trends related to where incidents take place over time as well as total cumulative death counts per region; these findings may help inform where intense anti-terrorism efforts are required most.
- Generate insight on key dates of events: Utilize date fields such as report date or last judgment at in order to pinpoint when certain major events have taken place related to terrorism in India; you could then dive deeper into any relevant context surrounding those dates that may spark further curiosity into the topic itself (e.g., who was involved? what was going on politically?)
- Identifying trends in the number of deaths for different types of people over time in each district, state and country. This can be used to identify areas where violence is increasing or decreasing, and help develop interventions to reduce casualties from terrorism.
- Investigating correlations between the type of people killed (civilians, militants/terrorists/insurgents etc.) and other factors such as political instability or development levels in the region.
- Performing sentiment analysis on the sentences found in this dataset to measure how public opinion about terrorism is changing over time. This could be combined with other datasets such as media coverage to provide an even more comprehensive understanding of public attitudes towards terrorism
If you use this dataset in your research, please credit the original authors. Data Source
Unknown License - Please check the dataset description for more information.
File: deaths-in-india-satp-dfe.csv | Column name | Description | |:-----------------------------------------|:--------------------------------------------------------------------------------------------------------------------| | _golden | A boolean value indicating whether the annotation is a golden annotation or not. (Boolean) | | _unit_state | A value indicating the state of the annotation unit. (String) | | _trusted_judgments | The number of trusted judgments for the annotation unit. (Integer) | | _last_judgment_at | The date and time of the last judgment for the annotation unit. (DateTime) ...
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
The world is becoming more modernized by the year, and with this becoming all the more polluted.
This data was pulled from the US Energy Administration and joined together for an easier analysis. Its a collection of some big factors that play into C02 Emissions, with everything from the Production and Consumption of each type of major energy source for each country and its pollution rating each year. It also includes each countries GDP, Population, Energy intensity per capita (person), and Energy intensity per GDP (per person GDP). All the data spans all the way from the 1980's to 2020.
Facebook
TwitterBy Andy Kriebel [source]
This dataset contains information on the world's harvested crops. The data includes the value of the crop, the country of origin, the year of harvest, and more. This data can be used to understand which crops are the most valuable, and how this value has changed over time
This dataset provides information on the world's most profitable cash crops. The data includes the value of the crop, the country of origin, and the year of harvest. This dataset can be used to understand which crops are most valuable and how this value has changed over time
- To find out which crops are grown in which countries
- To find out the value of harvested crops by country and year
- To find out the world's biggest cash crop
License
License: Dataset copyright by authors - You are free to: - Share - copy and redistribute the material in any medium or format for any purpose, even commercially. - Adapt - remix, transform, and build upon the material for any purpose, even commercially. - You must: - Give appropriate credit - Provide a link to the license, and indicate if changes were made. - ShareAlike - You must distribute your contributions under the same license as the original. - Keep intact - all notices that refer to this license, including copyright notices.
File: Harvested Crops.csv | Column name | Description | |:---------------------|:------------------------------------------------------------| | Area | The country where the crop was grown. (String) | | Element | The type of crop. (String) | | Item | The name of the crop. (String) | | Year | The year the crop was harvested. (Integer) | | Unit | The unit of measurement for the value of the crop. (String) | | Value | The value of the crop. (Float) | | Flag | A code that indicates the quality of the data. (String) | | Flag Description | A description of the flag code. (String) |
File: Harvested Crops Summary.csv
If you use this dataset in your research, please credit Andy Kriebel.
Facebook
TwitterUSASpending.gov is the government's official tool for tracking spending, it shows where money goes and who benefits from federal funds.
The Federal Funding Accountability and Transparency Act of 2006 required that federal contract, grant, loan awards over $25k be searchable online to give the American public access to government spending. The data that is collected in USAspending.gov is derived from data gathered at more than a hundred agencies, as well as other government systems. Federal agencies submit contracts, grants, loans and other awards information to be uploaded on USAspending.gov at least twice a month.
The United States spends a lot of money on contracts every year but where does it all go? This data set has information about how much different agencies have spent on awards for the fiscal year 2021. More data can be downloaded, for other years, on USAspending.gov.
Contracts are published to the GSA's Federal Procurement Data System within five days of being awarded, with contract reporting automatically getting posted on USAspending.gov by 9 AM the next day and going live at 8:00 am EST two mornings later
Learn more about the contents here: https://www.usaspending.gov/data-dictionary
The Bureau of the Fiscal Service, United States Department of the Treasury, is dedicated to making government spending data available to everyone.
This data starts off separated into smaller files that need to be joined.
The federal government buys a lot of things, like office furniture and aircraft. It also buys services, like telephone and Internet access. The Federal Government and its sub-agencies use contracts to buy these things. They use Product and Service Codes (PSC) to classify the items and services they purchase.
An obligation is a promise to spend money. An outlay is when the government spends money. When the government enters into a contract or grant, it promises to spend all of the money. This is so it can pay people who do what they agreed to do. When the government actually pays someone, then it counts as an outlay.
There are many different variables in this database, which are spread across multiple files. The most important ones to start learning are:
To learn more about the data, you can reference the data dictionary. The data dictionary includes information on outlays, which are not included in the data provided here. https://www.usaspending.gov/data-dictionary
Please see the analysts guide for more information: https://datalab.usaspending.gov/analyst-guide/
The U.S. Department of the Treasury, Bureau of the Fiscal Service is committed to providing open data to enable effective tracking of federal spending. The data is available to copy, adapt, redistribute, or otherwise use for non-commercial or for commercial purposes, subject to the Limitation on Permissible Use of Dun & Bradstreet, Inc. Data noted on the homepage. https://www.usaspending.gov/db_info
USAspending.gov collects data from all over the government to provide information to the public. Special thanks for the Data Transparency Team within the Office of the Chief Data Officer at the Bureau of Fiscal Services.
Can we find any patterns to help the public? How about predicting future spending needs or opportunities? Test out your ideas here!
Facebook
TwitterTHIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.