100+ datasets found

Z
Effect of suicide rates on life expectancy dataset
data.niaid.nih.gov
zenodo.org
Updated Apr 16, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Filip Zoubek (2021). Effect of suicide rates on life expectancy dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4694269
Explore at:
Dataset updated
Apr 16, 2021
Dataset authored and provided by
Filip Zoubek
License
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
Description
Effect of suicide rates on life expectancy dataset

Abstract In 2015, approximately 55 million people died worldwide, of which 8 million committed suicide. In the USA, one of the main causes of death is the aforementioned suicide, therefore, this experiment is dealing with the question of how much suicide rates affects the statistics of average life expectancy. The experiment takes two datasets, one with the number of suicides and life expectancy in the second one and combine data into one dataset. Subsequently, I try to find any patterns and correlations among the variables and perform statistical test using simple regression to confirm my assumptions.

Data

The experiment uses two datasets - WHO Suicide Statistics[1] and WHO Life Expectancy[2], which were firstly appropriately preprocessed. The final merged dataset to the experiment has 13 variables, where country and year are used as index: Country, Year, Suicides number, Life expectancy, Adult Mortality, which is probability of dying between 15 and 60 years per 1000 population, Infant deaths, which is number of Infant Deaths per 1000 population, Alcohol, which is alcohol, recorded per capita (15+) consumption, Under-five deaths, which is number of under-five deaths per 1000 population, HIV/AIDS, which is deaths per 1 000 live births HIV/AIDS, GDP, which is Gross Domestic Product per capita, Population, Income composition of resources, which is Human Development Index in terms of income composition of resources, and Schooling, which is number of years of schooling.

LICENSE

THE EXPERIMENT USES TWO DATASET - WHO SUICIDE STATISTICS AND WHO LIFE EXPECTANCY, WHICH WERE COLLEECTED FROM WHO AND UNITED NATIONS WEBSITE. THEREFORE, ALL DATASETS ARE UNDER THE LICENSE ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 3.0 IGO (https://creativecommons.org/licenses/by-nc-sa/3.0/igo/).

[1] https://www.kaggle.com/szamil/who-suicide-statistics

[2] https://www.kaggle.com/kumarajarshi/life-expectancy-who
d
Mass Killings in America, 2006 - present
data.world
csv, zip
Updated Aug 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
Explore at:
zip, csvAvailable download formats
Dataset updated
Aug 11, 2025
Authors
The Associated Press
Time period covered
Jan 1, 2006 - Aug 1, 2025
Area covered

Description
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON AUG. 11

OVERVIEW

2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

A total of 229 people died in mass killings in 2019.

The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

One-third of the offenders died at the scene of the killing or soon after, half from suicides.

About this Dataset

The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

This data will be updated periodically and can be used as an ongoing resource to help cover these events.

Using this Dataset

To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

Mass killings by year

Mass shootings by year

To get these counts just for your state:

Filter killings by state

Definition of "mass murder"

Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

Methodology

Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

This project started at USA TODAY in 2012.

Contacts

Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Statewide Death Profiles
data.chhs.ca.gov
data.ca.gov
+3more
csv, zip
Updated Jul 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
California Department of Public Health (2025). Statewide Death Profiles [Dataset]. https://data.chhs.ca.gov/dataset/statewide-death-profiles
Explore at:
csv(5401561), csv(200270), csv(16301), csv(164006), csv(5034), csv(463460), csv(2026589), csv(419332), csv(4689434), zip, csv(385695)Available download formats
Dataset updated
Jul 28, 2025
Dataset authored and provided by
California Department of Public Healthhttps://www.cdph.ca.gov/
Description
This dataset contains counts of deaths for California as a whole based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.

The final data tables include both deaths that occurred in California regardless of the place of residence (by occurrence) and deaths to California residents (by residence), whereas the provisional data table only includes deaths that occurred in California regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.

The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
T
PDI (Police Data Initiative) Crime Incidents
data.cincinnati-oh.gov
csv, xlsx, xml
Updated Aug 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Cincinnati (2025). PDI (Police Data Initiative) Crime Incidents [Dataset]. https://data.cincinnati-oh.gov/Safety/PDI-Police-Data-Initiative-Crime-Incidents/k59e-2pvf
Explore at:
xlsx, xml, csvAvailable download formats
Dataset updated
Aug 10, 2025
Dataset authored and provided by
City of Cincinnati
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: Due to the RMS change for CPS, this data set stops on 6/2/2024. For records beginning on 6/3/2024, please see the dataset at this link: https://data.cincinnati-oh.gov/safety/Reported-Crime-STARS-Category-Offenses-/7aqy-xrv9/about_data

Data Description: This data represents reported Crime Incidents in the City of Cincinnati. Incidents are the records, of reported crimes, collated by an agency for management. Incidents are typically housed in a Records Management System (RMS) that stores agency-wide data about law enforcement operations. This does not include police calls for service, arrest information, final case determination, or any other incident outcome data.

Data Creation: The Cincinnati Police Department's (CPD) records crime incidents in the City through Records Management System (RMS) that stores agency-wide data about law enforcement operations.

Data Created By: The source of this data is the Cincinnati Police Department.

Refresh Frequency: This data is updated daily.

CincyInsights: The City of Cincinnati maintains an interactive dashboard portal, CincyInsights in addition to our Open Data in an effort to increase access and usage of city data. This data set has an associated dashboard available here: https://insights.cincinnati-oh.gov/stories/s/8eaa-xrvz

Data Dictionary: A data dictionary providing definitions of columns and attributes is available as an attachment to this dataset.

Processing: The City of Cincinnati is committed to providing the most granular and accurate data possible. In that pursuit the Office of Performance and Data Analytics facilitates standard processing to most raw data prior to publication. Processing includes but is not limited: address verification, geocoding, decoding attributes, and addition of administrative areas (i.e. Census, neighborhoods, police districts, etc.).

Data Usage: For directions on downloading and using open data please visit our How-to Guide: https://data.cincinnati-oh.gov/dataset/Open-Data-How-To-Guide/gdr9-g3ad

Disclaimer: In compliance with privacy laws, all Public Safety datasets are anonymized and appropriately redacted prior to publication on the City of Cincinnati’s Open Data Portal. This means that for all public safety datasets: (1) the last two digits of all addresses have been replaced with “XX,” and in cases where there is a single digit street address, the entire address number is replaced with "X"; and (2) Latitude and Longitude have been randomly skewed to represent values within the same block area (but not the exact location) of the incident.
Deaths; suicide (residents), various themes
cbs.nl
dexes.eu
+3more
xml
Updated Jan 23, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centraal Bureau voor de Statistiek (2025). Deaths; suicide (residents), various themes [Dataset]. https://www.cbs.nl/en-gb/figures/detail/7022eng
Explore at:
xmlAvailable download formats
Dataset updated
Jan 23, 2025
Dataset provided by
Statistics Netherlands
Authors
Centraal Bureau voor de Statistiek
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1950 - 2023
Area covered
The Netherlands
Description
This table contains the number of victims of suicide arranged by marital status, method, motives, age and sex. They represent the number deaths by suicide in the resident population of the Netherlands.

The figures in this table are equal to the suicide figures in the causes of death statistics, because they are based on the same files. The causes of death statistics do not contain information on the motive of suicide. For the years 1950-1995, this information is obtained from a historical data file on suicides. For the years 1996-now the motive is taken from the external causes of death (Niet-Natuurlijke dood) file. Before the 9th revision of the International Statistical Classification of Diseases and Related Health Problems (ICD), i.e. for the years 1950-1978, it was not possible to code "jumping in front of train/metro". For these years 1950-1978 "jumping in front of train/metro" has been left empty, and it has been counted in the group "other method".

Relative figures have been calculated per 100 000 of the corresponding population group. The figures are calculated based on the average population of the corresponding year.

Data available from: 1950

Status of the figures: The figures up to and including 2023 are final.

Changes as of January 23rd 2025: The figures for 2023 are made final.

When will new figures be published: In the third quarter of 2025 the provisional figures for 2024 will be published.
f
Prevalence of Suicidal Ideation in Chinese College Students: A Meta-Analysis...
datasetcatalog.nlm.nih.gov
plos.figshare.com
Updated Oct 6, 2014
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Li, Ya-Ming; Tang, Si-Yuan; Li, Zhan-Zhan; Lei, Xian-Yang; Liu, Li; Chen, Lizhang; Zhang, Dan (2014). Prevalence of Suicidal Ideation in Chinese College Students: A Meta-Analysis [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001185115
Explore at:
Dataset updated
Oct 6, 2014
Authors
Li, Ya-Ming; Tang, Si-Yuan; Li, Zhan-Zhan; Lei, Xian-Yang; Liu, Li; Chen, Lizhang; Zhang, Dan
Description
BackgroundAbout 1 million people worldwide commit suicide each year, and college students with suicidal ideation are at high risk of suicide. The prevalence of suicidal ideation in college students has been estimated extensively, but quantitative syntheses of overall prevalence are scarce, especially in China. Accurate estimates of prevalence are important for making public policy. In this paper, we aimed to determine the prevalence of suicidal ideation in Chinese college students.Objective and MethodsDatabases including PubMed, Web of Knowledge, Chinese Web of Knowledge, Wangfang (Chinese database) and Weipu (Chinese database) were systematically reviewed to identify articles published between 2004 to July 2013, in either English or Chinese, reporting prevalence estimates of suicidal ideation among Chinese college students. The strategy also included a secondary search of reference lists of records retrieved from databases. Then the prevalence estimates were summarized using a random effects model. The effects of moderator variables on the prevalence estimates were assessed using a meta-regression model.ResultsA total of 41 studies involving 160339 college students were identified, and the prevalence ranged from 1.24% to 26.00%. The overall pooled prevalence of suicidal ideation among Chinese college students was 10.72% (95%CI: 8.41% to 13.28%). We noted substantial heterogeneity in prevalence estimates. Subgroup analyses showed that prevalence of suicidal ideation in females is higher than in males.ConclusionsThe prevalence of suicidal ideation in Chinese college students is relatively high, although the suicide rate is lower compared with the entire society, suggesting the need for local surveys to inform the development of health services for college students.
Suicides in England and Wales
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Aug 29, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). Suicides in England and Wales [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/suicidesintheunitedkingdomreferencetables
Explore at:
xlsxAvailable download formats
Dataset updated
Aug 29, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Number of suicides and suicide rates, by sex and age, in England and Wales. Information on conclusion type is provided, along with the proportion of suicides by method and the median registration delay.
f
SuicideBD: A Suicidal Dataset for Bangladesh Public
figshare.com
Updated May 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Abir Hassan; Subangkar Karmaker Shanto; Md. Saddam Hossain Mukta; salekul Islam; Md.Arafat Hossain (2023). SuicideBD: A Suicidal Dataset for Bangladesh Public [Dataset]. http://doi.org/10.6084/m9.figshare.19550761.v4
Explore at:
Unique identifier
https://doi.org/10.6084/m9.figshare.19550761.v4
Dataset updated
May 31, 2023
Dataset provided by
figshare
Authors
Md. Abir Hassan; Subangkar Karmaker Shanto; Md. Saddam Hossain Mukta; salekul Islam; Md.Arafat Hossain
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh
Description
This dataset contains individual details who committed suicide in Bangladesh during the Pandemic between February, 2020 to November, 2020. This dataset includes details of every individuals who committed suicide like personal details, family & social life, profession, financial condition, methods of committing suicide, location and weather info. The dataset is freely available. The major fields included in this dataset are: age group, age, gender, profession group, reason, method, suicide date & time, addiction status, mental status, economic condition, marital status, family details, academic qualification, weather. Apart from the above data this dataset also contains a CSV file of a Bengali wordcloud built on social media posts of the suicide victims.

The access to the dataset files is kept restricted. Fill the form (link in the References section) to request the data.
Weapons Used in Crimes in LA
kaggle.com
Updated Apr 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benjamin Mann (2024). Weapons Used in Crimes in LA [Dataset]. https://www.kaggle.com/datasets/benmann2448/weapons-used-in-crimes-in-la/suggestions
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 11, 2024
Dataset provided by
Kaggle
Authors
Benjamin Mann
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Los Angeles
Description
This dataset contains all the different kinds of weapons and how many times they were used to commit crimes in Los Angeles between the years 2020 to early 2024. This dataset was created from the data published by the LAPD and you can find the original dataset here.
Number and percentage of homicide victims, by type of firearm used to commit...
www150.statcan.gc.ca
open.canada.ca
+1more
Updated Jul 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Number and percentage of homicide victims, by type of firearm used to commit the homicide [Dataset]. http://doi.org/10.25318/3510017001-eng
Explore at:
Unique identifier
https://doi.org/10.25318/3510017001-eng
Dataset updated
Jul 22, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number and percentage of homicide victims, by type of firearm used to commit the homicide (total firearms; handgun; rifle or shotgun; other firearm-like weapons; firearm, type of firearm is unknown), Canada, 1974 to 2024.
predict-criminal
kaggle.com
Updated Jan 12, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RANOELISON Dimbisoa Patrick (2021). predict-criminal [Dataset]. https://www.kaggle.com/dimbisoa/predictcriminal
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 12, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RANOELISON Dimbisoa Patrick
Description
There has been a surge in crimes committed in recent years, making crime a top cause of concern for law enforcement. If we are able to estimate whether someone is going to commit a crime in the future, we can take precautions and be prepared. You are given a dataset containing answers to various questions concerning the professional and private lives of several people. A few of them have been arrested for various small and large crimes in the past.The train data consists of 39999 rows, while the test data consists of 5710 rows.

The train data consists of 39999 rows, while the test data consists of 5710 rows.

Use the given data to predict if the people in the test data will commit a crime. You are given three files to download: train, test and sample submission. The evaluation metric is precision score.
Suicides in India during 2015
kaggle.com
Updated Aug 22, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Vidya Pb (2020). Suicides in India during 2015 [Dataset]. https://www.kaggle.com/vidyapb/suicides-in-india-during-2015/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 22, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Vidya Pb
Area covered
India
Description
Context

This dataset contains information on suicides which happened in India during 2015.

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F4208638%2Ffab2e99b439f9780daf358511060f514%2FWorld-Suicide-Prevention-Day.jpg?generation=1598114750200382&alt=media" alt="">

The singular age-old social precept of 'Lok Kya Kahenge?' (loosely translated: "What will people say?") suppresses the much-needed psychological care in India. It's high time that we understand why suicides happen and what are the reasons behind it. This dataset aims to spread awareness about suicides in India.

Content

I acquired this dataset from here. Have a look at the website.

This dataset contains 9 files in .csv format. You can find a description for each column. Let me summarize it here as well.

Cause-wise distribution of suicides in Central Armed Police Force (CAPF) during 2015.

Economic Status-wise distribution of suicides during 2015.

Educational Status-wise distribution of suicides during 2015.

Farmer or Cultivators distribution of suicides during 2015.

Profession-wise distribution of suicides during 2015.

Social status-wise distribution of suicides during 2015.

Cause-wise distribution of suicides during 2015.

Suicides by Agricultural labourers during 2015.

Suicides by means adopted during 2015.

Inspiration

We now have plenty of data to explore to draw some conclusions about suicides which happened in India during 2015. Let's start by answering these questions: - What are the top 5 states where Farmers' suicides occurred the most? - What's the top reason that agricultural labourers committed suicide? - Which Profession has the most suicides? What could be the reason? - How many Transgender suicides have occurred in different categories?

I hope these questions interest you in starting to explore this dataset.

Acknowledgements

I thank the Indian Government for making it public under their Open Government Data (OGD) Platform India. Please use this dataset strictly for educational purposes. Thank you.
C
Violence Reduction - Victim Demographics - Aggregated
data.cityofchicago.org
s.cnmilf.com
+1more
csv, xlsx, xml
Updated Aug 10, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
City of Chicago (2025). Violence Reduction - Victim Demographics - Aggregated [Dataset]. https://data.cityofchicago.org/Public-Safety/Violence-Reduction-Victim-Demographics-Aggregated/gj7a-742p
Explore at:
xml, xlsx, csvAvailable download formats
Dataset updated
Aug 10, 2025
Dataset authored and provided by
City of Chicago
Description
This dataset contains aggregate data on violent index victimizations at the quarter level of each year (i.e., January – March, April – June, July – September, October – December), from 2001 to the present (1991 to present for Homicides), with a focus on those related to gun violence. Index crimes are 10 crime types selected by the FBI (codes 1-4) for special focus due to their seriousness and frequency. This dataset includes only those index crimes that involve bodily harm or the threat of bodily harm and are reported to the Chicago Police Department (CPD). Each row is aggregated up to victimization type, age group, sex, race, and whether the victimization was domestic-related. Aggregating at the quarter level provides large enough blocks of incidents to protect anonymity while allowing the end user to observe inter-year and intra-year variation. Any row where there were fewer than three incidents during a given quarter has been deleted to help prevent re-identification of victims. For example, if there were three domestic criminal sexual assaults during January to March 2020, all victims associated with those incidents have been removed from this dataset. Human trafficking victimizations have been aggregated separately due to the extremely small number of victimizations.

This dataset includes a " GUNSHOT_INJURY_I " column to indicate whether the victimization involved a shooting, showing either Yes ("Y"), No ("N"), or Unknown ("UKNOWN.") For homicides, injury descriptions are available dating back to 1991, so the "shooting" column will read either "Y" or "N" to indicate whether the homicide was a fatal shooting or not. For non-fatal shootings, data is only available as of 2010. As a result, for any non-fatal shootings that occurred from 2010 to the present, the shooting column will read as “Y.” Non-fatal shooting victims will not be included in this dataset prior to 2010; they will be included in the authorized dataset, but with "UNKNOWN" in the shooting column.

The dataset is refreshed daily, but excludes the most recent complete day to allow CPD time to gather the best available information. Each time the dataset is refreshed, records can change as CPD learns more about each victimization, especially those victimizations that are most recent. The data on the Mayor's Office Violence Reduction Dashboard is updated daily with an approximately 48-hour lag. As cases are passed from the initial reporting officer to the investigating detectives, some recorded data about incidents and victimizations may change once additional information arises. Regularly updated datasets on the City's public portal may change to reflect new or corrected information.

How does this dataset classify victims?

The methodology by which this dataset classifies victims of violent crime differs by victimization type:

Homicide and non-fatal shooting victims: A victimization is considered a homicide victimization or non-fatal shooting victimization depending on its presence in CPD's homicide victims data table or its shooting victims data table. A victimization is considered a homicide only if it is present in CPD's homicide data table, while a victimization is considered a non-fatal shooting only if it is present in CPD's shooting data tables and absent from CPD's homicide data table.

To determine the IUCR code of homicide and non-fatal shooting victimizations, we defer to the incident IUCR code available in CPD's Crimes, 2001-present dataset (available on the City's open data portal). If the IUCR code in CPD's Crimes dataset is inconsistent with the homicide/non-fatal shooting categorization, we defer to CPD's Victims dataset.

For a criminal homicide, the only sensible IUCR codes are 0110 (first-degree murder) or 0130 (second-degree murder). For a non-fatal shooting, a sensible IUCR code must signify a criminal sexual assault, a robbery, or, most commonly, an aggravated battery. In rare instances, the IUCR code in CPD's Crimes and Victims dataset do not align with the homicide/non-fatal shooting categorization:

In instances where a homicide victimization does not correspond to an IUCR code 0110 or 0130, we set the IUCR code to "01XX" to indicate that the victimization was a homicide but we do not know whether it was a first-degree murder (IUCR code = 0110) or a second-degree murder (IUCR code = 0130).

When a non-fatal shooting victimization does not correspond to an IUCR code that signifies a criminal sexual assault, robbery, or aggravated battery, we enter “UNK” in the IUCR column, “YES” in the GUNSHOT_I column, and “NON-FATAL” in the PRIMARY column to indicate that the victim was non-fatally shot, but the precise IUCR code is unknown.

Other violent crime victims: For other violent crime types, we refer to the IUCR classification that exists in CPD's victim table, with only one exception:

When there is an incident that is associated with no victim with a matching IUCR code, we assume that this is an error. Every crime should have at least 1 victim with a matching IUCR code. In these cases, we change the IUCR code to reflect the incident IUCR code because CPD's incident table is considered to be more reliable than the victim table.

Note: All businesses identified as victims in CPD data have been removed from this dataset.

Note: The definition of “homicide” (shooting or otherwise) does not include justifiable homicide or involuntary manslaughter. This dataset also excludes any cases that CPD considers to be “unfounded” or “noncriminal.”

Note: In some instances, the police department's raw incident-level data and victim-level data that were inputs into this dataset do not align on the type of crime that occurred. In those instances, this dataset attempts to correct mismatches between incident and victim specific crime types. When it is not possible to determine which victims are associated with the most recent crime determination, the dataset will show empty cells in the respective demographic fields (age, sex, race, etc.).

Note: The initial reporting officer usually asks victims to report demographic data. If victims are unable to recall, the reporting officer will use their best judgment. “Unknown” can be reported if it is truly unknown.
Number of suicides India 1971-2022
statista.com
Updated May 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of suicides India 1971-2022 [Dataset]. https://www.statista.com/statistics/665354/number-of-suicides-india/
Explore at:
Dataset updated
May 27, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
India
Description
Over *** thousand deaths due to suicides were recorded in India in 2022. Furthermore, majority of suicides were reported in the state of Tamil Nadu, followed by Rajasthan. The number of suicides that year had increased from the previous year. Some of the causes for suicides in the country were due to professional problems, abuse, violence, family problems, financial loss, sense of isolation and mental disorders. Depressive disorders and suicide As of 2015, over ****** million people worldwide suffered from some kind of depressive disorder. Furthermore, over ** percent of the total population in India suffer from different forms of mental disorders as of 2017. There exists a positive correlation between the number of suicide mortality rates and people with select mental disorders as opposed to those without. Risk factors for mental disorders Every ******* person in India suffers from some form of mental disorder. Today, depressive disorders are regarded as the leading contributor not only to disease burden and morbidity worldwide, but even suicide if not addressed. In 2022, the leading cause for suicide deaths in India was due to family problems. The second leading cause was due to illness. Some of the risk factors, relative to developing mental disorders including depressive and anxiety disorders, include bullying victimization, poverty, unemployment, childhood sexual abuse and intimate partner violence.
Number of homicide victims, by method used to commit the homicide
www150.statcan.gc.ca
open.canada.ca
Updated Jul 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Number of homicide victims, by method used to commit the homicide [Dataset]. http://doi.org/10.25318/3510006901-eng
Explore at:
Unique identifier
https://doi.org/10.25318/3510006901-eng
Dataset updated
Jul 22, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number of homicide victims, by method used to commit the homicide (total methods used; shooting; stabbing; beating; strangulation; fire (burns or suffocation); other methods used; methods used unknown), Canada, 1974 to 2024.
CommitBench
zenodo.org
csv, json
Updated Feb 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maximilian Schall; Maximilian Schall; Tamara Czinczoll; Tamara Czinczoll; Gerard de Melo; Gerard de Melo (2024). CommitBench [Dataset]. http://doi.org/10.5281/zenodo.10497442
Explore at:
json, csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10497442
Dataset updated
Feb 14, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maximilian Schall; Maximilian Schall; Tamara Czinczoll; Tamara Czinczoll; Gerard de Melo; Gerard de Melo
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Time period covered
Dec 15, 2023
Description
Data Statement for CommitBench

- Dataset Title: CommitBench

- Dataset Curator: Maximilian Schall, Tamara Czinczoll, Gerard de Melo

- Dataset Version: 1.0, 15.12.2023

- Data Statement Author: Maximilian Schall, Tamara Czinczoll

- Data Statement Version: 1.0, 16.01.2023

- Code URL: https://github.com/maxscha/commitbench

EXECUTIVE SUMMARY

We provide CommitBench as an open-source, reproducible and privacy- and license-aware benchmark for commit message generation. The dataset is gathered from github repositories with licenses that permit redistribution. We provide six programming languages, Java, Python, Go, JavaScript, PHP and Ruby. The commit messages in natural language are restricted to English, as it is the working language in many software development projects. The dataset has 1,664,590 examples that were generated by using extensive quality-focused filtering techniques (e.g. excluding bot commits). Additionally, we provide a version with longer sequences for benchmarking models with more extended sequence input, as well a version with

CURATION RATIONALE

We created this dataset due to quality and legal issues with previous commit message generation datasets. Given a git diff displaying code changes between two file versions, the task is to predict the accompanying commit message describing these changes in natural language. We base our GitHub repository selection on that of a previous dataset, CodeSearchNet, but apply a large number of filtering techniques to improve the data quality and eliminate noise. Due to the original repository selection, we are also restricted to the aforementioned programming languages. It was important to us, however, to provide some number of programming languages to accommodate any changes in the task due to the degree of hardware-relatedness of a language. The dataset is provides as a large CSV file containing all samples. We provide the following fields: Diff, Commit Message, Hash, Project, Split.

DOCUMENTATION FOR SOURCE DATASETS

Repository selection based on CodeSearchNet, which can be found under https://github.com/github/CodeSearchNet

LANGUAGE VARIETIES

Since GitHub hosts software projects from all over the world, there is no single uniform variety of English used across all commit messages. This means that phrasing can be regional or subject to influences from the programmer's native language. It also means that different spelling conventions may co-exist and that different terms may used for the same concept. Any model trained on this data should take these factors into account. For the number of samples for different programming languages, see Table below:

Language Number of Samples
Java 153,119
Ruby 233,710
Go 137,998
JavaScript 373,598
Python 472,469
PHP 294,394

SPEAKER DEMOGRAPHIC

Due to the extremely diverse (geographically, but also socio-economically) backgrounds of the software development community, there is no single demographic the data comes from. Of course, this does not entail that there are no biases when it comes to the data origin. Globally, the average software developer tends to be male and has obtained higher education. Due to the anonymous nature of GitHub profiles, gender distribution information cannot be extracted.

ANNOTATOR DEMOGRAPHIC

Due to the automated generation of the dataset, no annotators were used.

SPEECH SITUATION AND CHARACTERISTICS

The public nature and often business-related creation of the data by the original GitHub users fosters a more neutral, information-focused and formal language. As it is not uncommon for developers to find the writing of commit messages tedious, there can also be commit messages representing the frustration or boredom of the commit author. While our filtering is supposed to catch these types of messages, there can be some instances still in the dataset.

PREPROCESSING AND DATA FORMATTING

See paper for all preprocessing steps. We do not provide the un-processed raw data due to privacy concerns, but it can be obtained via CodeSearchNet or requested from the authors.

CAPTURE QUALITY

While our dataset is completely reproducible at the time of writing, there are external dependencies that could restrict this. If GitHub shuts down and someone with a software project in the dataset deletes their repository, there can be instances that are non-reproducible.

LIMITATIONS

While our filters are meant to ensure a high quality for each data sample in the dataset, we cannot ensure that only low-quality examples were removed. Similarly, we cannot guarantee that our extensive filtering methods catch all low-quality examples. Some might remain in the dataset. Another limitation of our dataset is the low number of programming languages (there are many more) as well as our focus on English commit messages. There might be some people that only write commit messages in their respective languages, e.g., because the organization they work at has established this or because they do not speak English (confidently enough). Perhaps some languages' syntax better aligns with that of programming languages. These effects cannot be investigated with CommitBench.

Although we anonymize the data as far as possible, the required information for reproducibility, including the organization, project name, and project hash, makes it possible to refer back to the original authoring user account, since this information is freely available in the original repository on GitHub.

METADATA

License: Dataset under the CC BY-NC 4.0 license

DISCLOSURES AND ETHICAL REVIEW

While we put substantial effort into removing privacy-sensitive information, our solutions cannot find 100% of such cases. This means that researchers and anyone using the data need to incorporate their own safeguards to effectively reduce the amount of personal information that can be exposed.

ABOUT THIS DOCUMENT

A data statement is a characterization of a dataset that provides context to allow developers and users to better understand how experimental results might generalize, how software might be appropriately deployed, and what biases might be reflected in systems built on the software.

This data statement was written based on the template for the Data Statements Version 2 schema. The template was prepared by Angelina McMillan-Major, Emily M. Bender, and Batya Friedman and can be found at https://techpolicylab.uw.edu/data-statements/ and was updated from the community Version 1 Markdown template by Leon Dercyznski.
Number, percentage and rate of homicide victims, by racialized identity...
www150.statcan.gc.ca
data.urbandatacentre.ca
+3more
Updated Jul 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Number, percentage and rate of homicide victims, by racialized identity group, gender and region [Dataset]. http://doi.org/10.25318/3510020601-eng
Explore at:
Unique identifier
https://doi.org/10.25318/3510020601-eng
Dataset updated
Jul 22, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number, percentage and rate (per 100,000 population) of homicide victims, by racialized identity group (total, by racialized identity group; racialized identity group; South Asian; Chinese; Black; Filipino; Arab; Latin American; Southeast Asian; West Asian; Korean; Japanese; other racialized identity group; multiple racialized identity; racialized identity, but racialized identity group is unknown; rest of the population; unknown racialized identity group), gender (all genders; male; female; gender unknown) and region (Canada; Atlantic region; Quebec; Ontario; Prairies region; British Columbia; territories), 2019 to 2024.
w
Immigration system statistics data tables
gov.uk
Updated May 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Home Office (2025). Immigration system statistics data tables [Dataset]. https://www.gov.uk/government/statistical-data-sets/immigration-system-statistics-data-tables
Explore at:
Dataset updated
May 22, 2025
Dataset provided by
GOV.UK
Authors
Home Office
Description
List of the data tables as part of the Immigration System Statistics Home Office release. Summary and detailed data tables covering the immigration system, including out-of-country and in-country visas, asylum, detention, and returns.

If you have any feedback, please email MigrationStatsEnquiries@homeoffice.gov.uk.

Accessible file formats

The Microsoft Excel .xlsx files may not be suitable for users of assistive technology.
If you use assistive technology (such as a screen reader) and need a version of these documents in a more accessible format, please email MigrationStatsEnquiries@homeoffice.gov.uk
Please tell us what format you need. It will help us if you say what assistive technology you use.

Related content

Immigration system statistics, year ending March 2025
Immigration system statistics quarterly release
Immigration system statistics user guide
Publishing detailed data tables in migration statistics
Policy and legislative changes affecting migration to the UK: timeline
Immigration statistics data archives

Passenger arrivals

https://assets.publishing.service.gov.uk/media/68258d71aa3556876875ec80/passenger-arrivals-summary-mar-2025-tables.xlsx">Passenger arrivals summary tables, year ending March 2025 (MS Excel Spreadsheet, 66.5 KB)

‘Passengers refused entry at the border summary tables’ and ‘Passengers refused entry at the border detailed datasets’ have been discontinued. The latest published versions of these tables are from February 2025 and are available in the ‘Passenger refusals – release discontinued’ section. A similar data series, ‘Refused entry at port and subsequently departed’, is available within the Returns detailed and summary tables.

Electronic travel authorisation

https://assets.publishing.service.gov.uk/media/681e406753add7d476d8187f/electronic-travel-authorisation-datasets-mar-2025.xlsx">Electronic travel authorisation detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 56.7 KB)
ETA_D01: Applications for electronic travel authorisations, by nationality ETA_D02: Outcomes of applications for electronic travel authorisations, by nationality

Entry clearance visas granted outside the UK

https://assets.publishing.service.gov.uk/media/68247953b296b83ad5262ed7/visas-summary-mar-2025-tables.xlsx">Entry clearance visas summary tables, year ending March 2025 (MS Excel Spreadsheet, 113 KB)

https://assets.publishing.service.gov.uk/media/682c4241010c5c28d1c7e820/entry-clearance-visa-outcomes-datasets-mar-2025.xlsx">Entry clearance visa applications and outcomes detailed datasets, year ending March 2025 (MS Excel Spreadsheet, 29.1 MB)
Vis_D01: Entry clearance visa applications, by nationality and visa type
Vis_D02: Outcomes of entry clearance visa applications, by nationality, visa type, and outcome

Additional d
Number, rate and percentage changes in rates of homicide victims
www150.statcan.gc.ca
datasets.ai
+1more
Updated Jul 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Number, rate and percentage changes in rates of homicide victims [Dataset]. http://doi.org/10.25318/3510006801-eng
Explore at:
Unique identifier
https://doi.org/10.25318/3510006801-eng
Dataset updated
Jul 22, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number, rate and percentage changes in rates of homicide victims, Canada, provinces and territories, 1961 to 2024.

FiveThirtyEight Hate Crimes Dataset

kaggle.com

Updated Apr 26, 2019

Facebook

Twitter

Click to copy link

Link copied

Cite

FiveThirtyEight (2019). FiveThirtyEight Hate Crimes Dataset [Dataset]. https://www.kaggle.com/datasets/fivethirtyeight/fivethirtyeight-hate-crimes-dataset/discussion

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Apr 26, 2019

Dataset provided by

Kagglehttp://kaggle.com/

Authors

FiveThirtyEight

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Content

Hate Crimes

This folder contains data behind the story Higher Rates Of Hate Crimes Are Tied To Income Inequality.

Header	Definition
`state`	State name
`median_household_income`	Median household income, 2016
`share_unemployed_seasonal`	Share of the population that is unemployed (seasonally adjusted), Sept. 2016
`share_population_in_metro_areas`	Share of the population that lives in metropolitan areas, 2015
`share_population_with_high_school_degree`	Share of adults 25 and older with a high-school degree, 2009
`share_non_citizen`	Share of the population that are not U.S. citizens, 2015
`share_white_poverty`	Share of white residents who are living in poverty, 2015
`gini_index`	Gini Index, 2015
`share_non_white`	Share of the population that is not white, 2015
`share_voters_voted_trump`	Share of 2016 U.S. presidential voters who voted for Donald Trump
`hate_crimes_per_100k_splc`	Hate crimes per 100,000 population, Southern Poverty Law Center, Nov. 9-18, 2016
`avg_hatecrimes_per_100k_fbi`	Average annual hate crimes per 100,000 population, FBI, 2010-2015

Sources: Kaiser Family Foundation Kaiser Family Foundation Kaiser Family Foundation Census Bureau Kaiser Family Foundation Kaiser Family Foundation Census Bureau Kaiser Family Foundation United States Elections Project Southern Poverty Law Center FBI

Correction

Please see the following commit: https://github.com/fivethirtyeight/data/commit/fbc884a5c8d45a0636e1d6b000021632a0861986

Context

This is a dataset from FiveThirtyEight hosted on their GitHub. Explore FiveThirtyEight data using Kaggle and all of the data sources available through the FiveThirtyEight organization page!

Update Frequency: This dataset is updated daily.

Acknowledgements

This dataset is maintained using GitHub's API and Kaggle's API.

This dataset is distributed under the Attribution 4.0 International (CC BY 4.0) license.

Facebook

Twitter

Click to copy link

Link copied

Cite

Filip Zoubek (2021). Effect of suicide rates on life expectancy dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4694269

Effect of suicide rates on life expectancy dataset

Explore at:

Dataset updated

Apr 16, 2021

Dataset authored and provided by

Filip Zoubek

License

Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically

Description

Effect of suicide rates on life expectancy dataset

Abstract In 2015, approximately 55 million people died worldwide, of which 8 million committed suicide. In the USA, one of the main causes of death is the aforementioned suicide, therefore, this experiment is dealing with the question of how much suicide rates affects the statistics of average life expectancy. The experiment takes two datasets, one with the number of suicides and life expectancy in the second one and combine data into one dataset. Subsequently, I try to find any patterns and correlations among the variables and perform statistical test using simple regression to confirm my assumptions.

Data

The experiment uses two datasets - WHO Suicide Statistics[1] and WHO Life Expectancy[2], which were firstly appropriately preprocessed. The final merged dataset to the experiment has 13 variables, where country and year are used as index: Country, Year, Suicides number, Life expectancy, Adult Mortality, which is probability of dying between 15 and 60 years per 1000 population, Infant deaths, which is number of Infant Deaths per 1000 population, Alcohol, which is alcohol, recorded per capita (15+) consumption, Under-five deaths, which is number of under-five deaths per 1000 population, HIV/AIDS, which is deaths per 1 000 live births HIV/AIDS, GDP, which is Gross Domestic Product per capita, Population, Income composition of resources, which is Human Development Index in terms of income composition of resources, and Schooling, which is number of years of schooling.

LICENSE

THE EXPERIMENT USES TWO DATASET - WHO SUICIDE STATISTICS AND WHO LIFE EXPECTANCY, WHICH WERE COLLEECTED FROM WHO AND UNITED NATIONS WEBSITE. THEREFORE, ALL DATASETS ARE UNDER THE LICENSE ATTRIBUTION-NONCOMMERCIAL-SHAREALIKE 3.0 IGO (https://creativecommons.org/licenses/by-nc-sa/3.0/igo/).

[1] https://www.kaggle.com/szamil/who-suicide-statistics

[2] https://www.kaggle.com/kumarajarshi/life-expectancy-who

Clear search

Close search

Google apps

Main menu

Language	Number of Samples
Java	153,119
Ruby	233,710
Go	137,998
JavaScript	373,598
Python	472,469
PHP	294,394

Effect of suicide rates on life expectancy dataset

Mass Killings in America, 2006 - present

OVERVIEW

About this Dataset

Using this Dataset

Definition of "mass murder"

Methodology

Contacts

Statewide Death Profiles

PDI (Police Data Initiative) Crime Incidents

Deaths; suicide (residents), various themes

Prevalence of Suicidal Ideation in Chinese College Students: A Meta-Analysis...

Suicides in England and Wales

SuicideBD: A Suicidal Dataset for Bangladesh Public

Weapons Used in Crimes in LA

Number and percentage of homicide victims, by type of firearm used to commit...

predict-criminal

Suicides in India during 2015

Context

Content

Inspiration

Acknowledgements

Violence Reduction - Victim Demographics - Aggregated

Number of suicides India 1971-2022

Number of homicide victims, by method used to commit the homicide

CommitBench

Data Statement for CommitBench

EXECUTIVE SUMMARY

CURATION RATIONALE

DOCUMENTATION FOR SOURCE DATASETS

LANGUAGE VARIETIES

SPEAKER DEMOGRAPHIC

ANNOTATOR DEMOGRAPHIC

SPEECH SITUATION AND CHARACTERISTICS

PREPROCESSING AND DATA FORMATTING

CAPTURE QUALITY

LIMITATIONS

METADATA

DISCLOSURES AND ETHICAL REVIEW

ABOUT THIS DOCUMENT

Number, percentage and rate of homicide victims, by racialized identity...

Immigration system statistics data tables

Accessible file formats

Related content

Passenger arrivals

Electronic travel authorisation

Entry clearance visas granted outside the UK

Number, rate and percentage changes in rates of homicide victims

FiveThirtyEight Hate Crimes Dataset

Content

Hate Crimes

Correction

Context

Acknowledgements

Effect of suicide rates on life expectancy dataset