Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical chart and dataset showing World death rate by year from 1950 to 2025.
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
This dataset contains global COVID-19 case and death data by country, collected directly from the official World Health Organization (WHO) COVID-19 Dashboard. It provides a comprehensive view of the pandemic’s impact worldwide, covering the period up to 2025. The dataset is intended for researchers, analysts, and anyone interested in understanding the progression and global effects of COVID-19 through reliable, up-to-date information.
The World Health Organization is the United Nations agency responsible for international public health. The WHO COVID-19 Dashboard is a trusted source that aggregates official reports from countries and territories around the world, providing daily updates on cases, deaths, and other key metrics related to COVID-19.
This dataset can be used for: - Tracking the spread and trends of COVID-19 globally and by country - Modeling and forecasting pandemic progression - Comparative analysis of the pandemic’s impact across countries and regions - Visualization and reporting
The data is sourced from the WHO, widely regarded as the most authoritative source for global health statistics. However, reporting practices and data completeness may vary by country and may be subject to revision as new information becomes available.
Special thanks to the WHO for making this data publicly available and to all those working to collect, verify, and report COVID-19 statistics.
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JULY 12
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
What are people dying from?
This question is essential to guide decisions in public health, and find ways to save lives.
Many leading causes of death receive little mainstream attention. If news reports reflected what children died from, they would say that around 1,400 young children die from diarrheal diseases, 1,000 die from malaria, and 1,900 from respiratory infections – every day.
This can change. Over time, death rates from these causes have declined across the world.
A better understanding of the causes of death has led to the development of technologies, preventative measures, and better healthcare, reducing the chances of dying from a wide range of different causes, across all age groups.
In the past, infectious diseases dominated. But death rates from infectious diseases have fallen quickly – faster than other causes. This has led to a shift in the leading causes of death. Now, non-communicable diseases – such as heart diseases and cancers – are the most common causes of death globally.
More progress is possible, and the impact of causes of death can fall further.
On this page, you will find global data and research on leading causes of death and how they can be prevented.
This data can also help understand the burden of disease more broadly, and offer a lens to see the impacts of healthcare and medicine, habits and behaviours, environmental factors, health infrastructure, and more.
By Saloni Dattani, Fiona Spooner, Hannah Ritchie and Max Roser
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset reports the daily reported number of the 7-day moving average rates of Deaths involving COVID-19 by vaccination status and by age group. Learn how the Government of Ontario is helping to keep Ontarians safe during the 2019 Novel Coronavirus outbreak. Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool Data includes: * Date on which the death occurred * Age group * 7-day moving average of the last seven days of the death rate per 100,000 for those not fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those vaccinated with at least one booster ##Additional notes As of June 16, all COVID-19 datasets will be updated weekly on Thursdays by 2pm. As of January 12, 2024, data from the date of January 1, 2024 onwards reflect updated population estimates. This update specifically impacts data for the 'not fully vaccinated' category. On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023. CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags. The data does not include vaccination data for people who did not provide consent for vaccination records to be entered into the provincial COVaxON system. This includes individual records as well as records from some Indigenous communities where those communities have not consented to including vaccination information in COVaxON. “Not fully vaccinated” category includes people with no vaccine and one dose of double-dose vaccine. “People with one dose of double-dose vaccine” category has a small and constantly changing number. The combination will stabilize the results. Spikes, negative numbers and other data anomalies: Due to ongoing data entry and data quality assurance activities in Case and Contact Management system (CCM) file, Public Health Units continually clean up COVID-19, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes, negative numbers and current totals being different from previously reported case and death counts. Public Health Units report cause of death in the CCM based on information available to them at the time of reporting and in accordance with definitions provided by Public Health Ontario. The medical certificate of death is the official record and the cause of death could be different. Deaths are defined per the outcome field in CCM marked as “Fatal”. Deaths in COVID-19 cases identified as unrelated to COVID-19 are not included in the Deaths involving COVID-19 reported. Rates for the most recent days are subject to reporting lags All data reflects totals from 8 p.m. the previous day. This dataset is subject to change.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID vaccination vs. mortality ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/sinakaraji/covid-vaccination-vs-death on 12 November 2021.
--- Dataset description provided by original source is as follows ---
The COVID-19 outbreak has brought the whole planet to its knees.More over 4.5 million people have died since the writing of this notebook, and the only acceptable way out of the disaster is to vaccinate all parts of society. Despite the fact that the benefits of vaccination have been proved to the world many times, anti-vaccine groups are springing up all over the world. This data set was generated to investigate the impact of coronavirus vaccinations on coronavirus mortality.
country | iso_code | date | total_vaccinations | people_vaccinated | people_fully_vaccinated | New_deaths | population | ratio |
---|---|---|---|---|---|---|---|---|
country name | iso code for each country | date that this data belong | number of all doses of COVID vaccine usage in that country | number of people who got at least one shot of COVID vaccine | number of people who got full vaccine shots | number of daily new deaths | 2021 country population | % of vaccinations in that country at that date = people_vaccinated/population * 100 |
This dataset is a combination of the following three datasets:
1.https://www.kaggle.com/gpreda/covid-world-vaccination-progress
2.https://covid19.who.int/WHO-COVID-19-global-data.csv
3.https://www.kaggle.com/rsrishav/world-population
you can find more detail about this dataset by reading this notebook:
https://www.kaggle.com/sinakaraji/simple-linear-regression-covid-vaccination
Afghanistan | Albania | Algeria | Andorra | Angola |
Anguilla | Antigua and Barbuda | Argentina | Armenia | Aruba |
Australia | Austria | Azerbaijan | Bahamas | Bahrain |
Bangladesh | Barbados | Belarus | Belgium | Belize |
Benin | Bermuda | Bhutan | Bolivia (Plurinational State of) | Brazil |
Bosnia and Herzegovina | Botswana | Brunei Darussalam | Bulgaria | Burkina Faso |
Cambodia | Cameroon | Canada | Cabo Verde | Cayman Islands |
Central African Republic | Chad | Chile | China | Colombia |
Comoros | Cook Islands | Costa Rica | Croatia | Cuba |
Curaçao | Cyprus | Denmark | Djibouti | Dominica |
Dominican Republic | Ecuador | Egypt | El Salvador | Equatorial Guinea |
Estonia | Ethiopia | Falkland Islands (Malvinas) | Fiji | Finland |
France | French Polynesia | Gabon | Gambia | Georgia |
Germany | Ghana | Gibraltar | Greece | Greenland |
Grenada | Guatemala | Guinea | Guinea-Bissau | Guyana |
Haiti | Honduras | Hungary | Iceland | India |
Indonesia | Iran (Islamic Republic of) | Iraq | Ireland | Isle of Man |
Israel | Italy | Jamaica | Japan | Jordan |
Kazakhstan | Kenya | Kiribati | Kuwait | Kyrgyzstan |
Lao People's Democratic Republic | Latvia | Lebanon | Lesotho | Liberia |
Libya | Liechtenstein | Lithuania | Luxembourg | Madagascar |
Malawi | Malaysia | Maldives | Mali | Malta |
Mauritania | Mauritius | Mexico | Republic of Moldova | Monaco |
Mongolia | Montenegro | Montserrat | Morocco | Mozambique |
Myanmar | Namibia | Nauru | Nepal | Netherlands |
New Caledonia | New Zealand | Nicaragua | Niger | Nigeria |
Niue | North Macedonia | Norway | Oman | Pakistan |
occupied Palestinian territory, including east Jerusalem | ||||
Panama | Papua New Guinea | Paraguay | Peru | Philippines |
Poland | Portugal | Qatar | Romania | Russian Federation |
Rwanda | Saint Kitts and Nevis | Saint Lucia | ||
Saint Vincent and the Grenadines | Samoa | San Marino | Sao Tome and Principe | Saudi Arabia |
Senegal | Serbia | Seychelles | Sierra Leone | Singapore |
Slovakia | Slovenia | Solomon Islands | Somalia | South Africa |
Republic of Korea | South Sudan | Spain | Sri Lanka | Sudan |
Suriname | Sweden | Switzerland | Syrian Arab Republic | Tajikistan |
United Republic of Tanzania | Thailand | Togo | Tonga | Trinidad and Tobago |
Tunisia | Turkey | Turkmenistan | Turks and Caicos Islands | Tuvalu |
Uganda | Ukraine | United Arab Emirates | The United Kingdom | United States of America |
Uruguay | Uzbekistan | Vanuatu | Venezuela (Bolivarian Republic of) | Viet Nam |
Wallis and Futuna | Yemen | Zambia | Zimbabwe |
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
this graph was created in R:
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F99ddcc7060665597ad9b1c263aa8174d%2Fgraph1.gif?generation=1717872782993200&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ff7af5fc372d601a18645c41c37411157%2Fgraph2.gif?generation=1717872788516258&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Fc85d9de1d5b88949298afa0bab1d9406%2Fgraph3.gif?generation=1717872793749722&alt=media" alt="">
Having enough to eat is one of the fundamental basic human needs. Hunger – or, more formally, undernourishment – is defined as eating less than the energy required to maintain an active and healthy life.
The share of undernourished people is the leading indicator for food security and nutrition used by the Food and Agriculture Organization of the United Nations.
The fight against hunger focuses on a sufficient energy intake – enough calories per person per day. But it is not the only factor that matters for a healthy diet. Sufficient protein, fats, and micronutrients are also essential, and we cover this in our topic page on micronutrient deficiencies.
Undernourishment in mothers and children is a leading risk factor for death and other poor health outcomes.
The UN has set a global target as part of the Sustainable Development Goals to “end hunger by 2030“. While the world has progressed in past decades, we are far from reaching this target.
On this page, you can find our data, visualizations, and writing on hunger and undernourishment. It looks at how many people are undernourished, where they are, and other metrics used to track food security.
Hunger – also known as undernourishment – is defined as not consuming enough calories to maintain a normal, active, healthy life.
The world has made much progress in reducing global hunger in recent decades — we will see this in the following key insight. But we are still far away from an end to hunger. Tragically, nearly one-in-ten people still do not get enough food to eat.
The share of the undernourished population is shown globally and by region in the chart.
You can see that rates of hunger are highest in Sub-Saharan Africa. South Asia has much higher rates than the Americas and East Asia. Rates in North America and Europe are below 2.5%. However, the FAO shows this as “2.5%” rather than the specific point estimate.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘The Lost Journalists: Dataset of journalist deaths’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/journalist-deathse on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Credit for the original dataset goes to CPJ
In-the-News
:
- NRT: AT LEAST 122 MEDIA PROFESSIONALS KILLED GLOBALLY IN 2016
- All Africa: Africa: Journalist Killings Ease From Record Highs As Murders Down, Combat Deaths Up
- BBC: The lost journalists of 2016
https://data.world/api/journalism/dataset/journalist-deaths/file/raw/journalist_deaths_by_year.png" alt="journalist_deaths_by_year.png">
Methodology
CPJ began compiling detailed records on journalist deaths in 1992. We apply strict journalistic standards when investigating a death. One important aspect of our research is determining whether a death was work-related. As a result, we classify deaths as "motive confirmed" or "motive unconfirmed."
We consider a case "confirmed" only if we are reasonably certain that a journalist was murdered in direct reprisal for his or her work; was killed in crossfire during combat situations; or was killed while carrying out a dangerous assignment such as coverage of a street protest. We do not include journalists who are killed in accidents such as car or plane crashes.
We include only confirmed cases in the statistical analyses in this database.
When the motive is unclear, but it is possible that a journalist was killed because of his or her work, CPJ classifies the case as "unconfirmed" and continues to investigate. We regularly reclassify cases based on our ongoing research.
Our archives include narrative capsules of all journalists killed, including the cases in which the motive is unconfirmed. In cases where the place of death is incidental to the journalist's killing, we have listed the country where the fatal attack occurred to be the place of the journalist's death (for example, in a case where a journalist is hit by shrapnel in one country and evacuated to another, where he or she dies, CPJ lists the country in which he or she was hit as the place of death).
CPJ defines journalists as people who cover news or comment on public affairs through any media -- including in print, in photographs, on radio, on television, and online. We take up cases involving staff journalists, freelancers, stringers, bloggers, and citizen journalists. The combination of daily reporting and statistical data forms the basis of our case-driven and long-term advocacy.
In 2003, CPJ began documenting the deaths of media support workers. We did so in recognition of the vital role these individuals play in newsgathering. These workers include translators, drivers, fixers, and administrative workers.
Our archives include narrative capsules for media workers killed on duty. These cases are not included our statistical analyses.
About CPJ
The Committee to Protect Journalists is an independent, nonprofit organization that promotes press freedom worldwide. We defend the right of journalists to report the news without fear of reprisal.
Additional Reading
Investigative journalism in Africa – “Walking through a minefield at midnight”
Iraq: The deadliest war for journalists
Being a journalist in Mexico is getting even more dangerousSource: Committee to Protect Journalists
This dataset was created by Journalism, News, and Media and contains around 2000 samples along with Date, Unnamed: 18, technical information and other features such as: - Local/ Foreign - Unnamed: 20 - and more.
- Analyze Coverage in relation to Taken Captive
- Study the influence of Organization on Unnamed: 21
- More datasets
If you use this dataset in your research, please credit Journalism, News, and Media
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
United States recorded 1127152 Coronavirus Deaths since the epidemic began, according to the World Health Organization (WHO). In addition, United States reported 103436829 Coronavirus Cases. This dataset includes a chart with historical data for the United States Coronavirus Deaths.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Vehicle Miles Traveled During Covid-19 Lock-Downs ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/vehicle-miles-travelede on 13 February 2022.
--- Dataset description provided by original source is as follows ---
**This data set was last updated 3:30 PM ET Monday, January 4, 2021. The last date of data in this dataset is December 31, 2020. **
Overview
Data shows that mobility declined nationally since states and localities began shelter-in-place strategies to stem the spread of COVID-19. The numbers began climbing as more people ventured out and traveled further from their homes, but in parallel with the rise of COVID-19 cases in July, travel declined again.
This distribution contains county level data for vehicle miles traveled (VMT) from StreetLight Data, Inc, updated three times a week. This data offers a detailed look at estimates of how much people are moving around in each county.
Data available has a two day lag - the most recent data is from two days prior to the update date. Going forward, this dataset will be updated by AP at 3:30pm ET on Monday, Wednesday and Friday each week.
This data has been made available to members of AP’s Data Distribution Program. To inquire about access for your organization - publishers, researchers, corporations, etc. - please click Request Access in the upper right corner of the page or email kromano@ap.org. Be sure to include your contact information and use case.
Findings
- Nationally, data shows that vehicle travel in the US has doubled compared to the seven-day period ending April 13, which was the lowest VMT since the COVID-19 crisis began. In early December, travel reached a low not seen since May, with a small rise leading up to the Christmas holiday.
- Average vehicle miles traveled continues to be below what would be expected without a pandemic - down 38% compared to January 2020. September 4 reported the largest single day estimate of vehicle miles traveled since March 14.
- New Jersey, Michigan and New York are among the states with the largest relative uptick in travel at this point of the pandemic - they report almost two times the miles traveled compared to their lowest seven-day period. However, travel in New Jersey and New York is still much lower than expected without a pandemic. Other states such as New Mexico, Vermont and West Virginia have rebounded the least.
About This Data
The county level data is provided by StreetLight Data, Inc, a transportation analysis firm that measures travel patterns across the U.S.. The data is from their Vehicle Miles Traveled (VMT) Monitor which uses anonymized and aggregated data from smartphones and other GPS-enabled devices to provide county-by-county VMT metrics for more than 3,100 counties. The VMT Monitor provides an estimate of total vehicle miles travelled by residents of each county, each day since the COVID-19 crisis began (March 1, 2020), as well as a change from the baseline average daily VMT calculated for January 2020. Additional columns are calculations by AP.
Included Data
01_vmt_nation.csv - Data summarized to provide a nationwide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
02_vmt_state.csv - Data summarized to provide a statewide look at vehicle miles traveled. Includes single day VMT across counties, daily percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
03_vmt_county.csv - Data providing a county level look at vehicle miles traveled. Includes VMT estimate, percent change compared to January and seven day rolling averages to smooth out the trend lines over time.
Additional Data Queries
* Filter for specific state - filters
02_vmt_state.csv
daily data for specific state.* Filter counties by state - filters
03_vmt_county.csv
daily data for counties in specific state.* Filter for specific county - filters
03_vmt_county.csv
daily data for specific county.Interactive
The AP has designed an interactive map to show percent change in vehicle miles traveled by county since each counties lowest point during the pandemic:
This dataset was created by Angeliki Kastanis and contains around 0 samples along with Date At Low, Mean7 County Vmt At Low, technical information and other features such as: - County Name - County Fips - and more.
- Analyze State Name in relation to Baseline Jan Vmt
- Study the influence of Date At Low on Mean7 County Vmt At Low
- More datasets
If you use this dataset in your research, please credit Angeliki Kastanis
--- Original source retains full ownership of the source dataset ---
Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
License information was derived automatically
PerCapita_CO2_Footprint_InDioceses_FULLBurhans, Molly A., Cheney, David M., Gerlt, R.. . “PerCapita_CO2_Footprint_InDioceses_FULL”. Scale not given. Version 1.0. MO and CT, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2019.MethodologyThis is the first global Carbon footprint of the Catholic population. We will continue to improve and develop these data with our research partners over the coming years. While it is helpful, it should also be viewed and used as a "beta" prototype that we and our research partners will build from and improve. The years of carbon data are (2010) and (2015 - SHOWN). The year of Catholic data is 2018. The year of population data is 2016. Care should be taken during future developments to harmonize the years used for catholic, population, and CO2 data.1. Zonal Statistics: Esri Population Data and Dioceses --> Population per dioceses, non Vatican based numbers2. Zonal Statistics: FFDAS and Dioceses and Population dataset --> Mean CO2 per Diocese3. Field Calculation: Population per Diocese and Mean CO2 per diocese --> CO2 per Capita4. Field Calculation: CO2 per Capita * Catholic Population --> Catholic Carbon FootprintAssumption: PerCapita CO2Deriving per-capita CO2 from mean CO2 in a geography assumes that people's footprint accounts for their personal lifestyle and involvement in local business and industries that are contribute CO2. Catholic CO2Assumes that Catholics and non-Catholic have similar CO2 footprints from their lifestyles.Derived from:A multiyear, global gridded fossil fuel CO2 emission data product: Evaluation and analysis of resultshttp://ffdas.rc.nau.edu/About.htmlRayner et al., JGR, 2010 - The is the first FFDAS paper describing the version 1.0 methods and results published in the Journal of Geophysical Research.Asefi et al., 2014 - This is the paper describing the methods and results of the FFDAS version 2.0 published in the Journal of Geophysical Research.Readme version 2.2 - A simple readme file to assist in using the 10 km x 10 km, hourly gridded Vulcan version 2.2 results.Liu et al., 2017 - A paper exploring the carbon cycle response to the 2015-2016 El Nino through the use of carbon cycle data assimilation with FFDAS as the boundary condition for FFCO2."S. Asefi‐Najafabady P. J. Rayner K. R. Gurney A. McRobert Y. Song K. Coltin J. Huang C. Elvidge K. BaughFirst published: 10 September 2014 https://doi.org/10.1002/2013JD021296 Cited by: 30Link to FFDAS data retrieval and visualization: http://hpcg.purdue.edu/FFDAS/index.phpAbstractHigh‐resolution, global quantification of fossil fuel CO2 emissions is emerging as a critical need in carbon cycle science and climate policy. We build upon a previously developed fossil fuel data assimilation system (FFDAS) for estimating global high‐resolution fossil fuel CO2 emissions. We have improved the underlying observationally based data sources, expanded the approach through treatment of separate emitting sectors including a new pointwise database of global power plants, and extended the results to cover a 1997 to 2010 time series at a spatial resolution of 0.1°. Long‐term trend analysis of the resulting global emissions shows subnational spatial structure in large active economies such as the United States, China, and India. These three countries, in particular, show different long‐term trends and exploration of the trends in nighttime lights, and population reveal a decoupling of population and emissions at the subnational level. Analysis of shorter‐term variations reveals the impact of the 2008–2009 global financial crisis with widespread negative emission anomalies across the U.S. and Europe. We have used a center of mass (CM) calculation as a compact metric to express the time evolution of spatial patterns in fossil fuel CO2 emissions. The global emission CM has moved toward the east and somewhat south between 1997 and 2010, driven by the increase in emissions in China and South Asia over this time period. Analysis at the level of individual countries reveals per capita CO2 emission migration in both Russia and India. The per capita emission CM holds potential as a way to succinctly analyze subnational shifts in carbon intensity over time. Uncertainties are generally lower than the previous version of FFDAS due mainly to an improved nightlight data set."Global Diocesan Boundaries:Burhans, M., Bell, J., Burhans, D., Carmichael, R., Cheney, D., Deaton, M., Emge, T. Gerlt, B., Grayson, J., Herries, J., Keegan, H., Skinner, A., Smith, M., Sousa, C., Trubetskoy, S. “Diocesean Boundaries of the Catholic Church” [Feature Layer]. Scale not given. Version 1.2. Redlands, CA, USA: GoodLands Inc., Environmental Systems Research Institute, Inc., 2016.Using: ArcGIS. 10.4. Version 10.0. Redlands, CA: Environmental Systems Research Institute, Inc., 2016.Boundary ProvenanceStatistics and Leadership DataCheney, D.M. “Catholic Hierarchy of the World” [Database]. Date Updated: August 2019. Catholic Hierarchy. Using: Paradox. Retrieved from Original Source.Catholic HierarchyAnnuario Pontificio per l’Anno .. Città del Vaticano :Tipografia Poliglotta Vaticana, Multiple Years.The data for these maps was extracted from the gold standard of Church data, the Annuario Pontificio, published yearly by the Vatican. The collection and data development of the Vatican Statistics Office are unknown. GoodLands is not responsible for errors within this data. We encourage people to document and report errant information to us at data@good-lands.org or directly to the Vatican.Additional information about regular changes in bishops and sees comes from a variety of public diocesan and news announcements.GoodLands’ polygon data layers, version 2.0 for global ecclesiastical boundaries of the Roman Catholic Church:Although care has been taken to ensure the accuracy, completeness and reliability of the information provided, due to this being the first developed dataset of global ecclesiastical boundaries curated from many sources it may have a higher margin of error than established geopolitical administrative boundary maps. Boundaries need to be verified with appropriate Ecclesiastical Leadership. The current information is subject to change without notice. No parties involved with the creation of this data are liable for indirect, special or incidental damage resulting from, arising out of or in connection with the use of the information. We referenced 1960 sources to build our global datasets of ecclesiastical jurisdictions. Often, they were isolated images of dioceses, historical documents and information about parishes that were cross checked. These sources can be viewed here:https://docs.google.com/spreadsheets/d/11ANlH1S_aYJOyz4TtG0HHgz0OLxnOvXLHMt4FVOS85Q/edit#gid=0To learn more or contact us please visit: https://good-lands.org/Esri Gridded Population Data 2016DescriptionThis layer is a global estimate of human population for 2016. Esri created this estimate by modeling a footprint of where people live as a dasymetric settlement likelihood surface, and then assigned 2016 population estimates stored on polygons of the finest level of geography available onto the settlement surface. Where people live means where their homes are, as in where people sleep most of the time, and this is opposed to where they work. Another way to think of this estimate is a night-time estimate, as opposed to a day-time estimate.Knowledge of population distribution helps us understand how humans affect the natural world and how natural events such as storms and earthquakes, and other phenomena affect humans. This layer represents the footprint of where people live, and how many people live there.Dataset SummaryEach cell in this layer has an integer value with the estimated number of people likely to live in the geographic region represented by that cell. Esri additionally produced several additional layers World Population Estimate Confidence 2016: the confidence level (1-5) per cell for the probability of people being located and estimated correctly. World Population Density Estimate 2016: this layer is represented as population density in units of persons per square kilometer.World Settlement Score 2016: the dasymetric likelihood surface used to create this layer by apportioning population from census polygons to the settlement score raster.To use this layer in analysis, there are several properties or geoprocessing environment settings that should be used:Coordinate system: WGS_1984. This service and its underlying data are WGS_1984. We do this because projecting population count data actually will change the populations due to resampling and either collapsing or splitting cells to fit into another coordinate system. Cell Size: 0.0013474728 degrees (approximately 150-meters) at the equator. No Data: -1Bit Depth: 32-bit signedThis layer has query, identify, pixel, and export image functions enabled, and is restricted to a maximum analysis size of 30,000 x 30,000 pixels - an area about the size of Africa.Frye, C. et al., (2018). Using Classified and Unclassified Land Cover Data to Estimate the Footprint of Human Settlement. Data Science Journal. 17, p.20. DOI: http://doi.org/10.5334/dsj-2018-020.What can you do with this layer?This layer is unsuitable for mapping or cartographic use, and thus it does not include a convenient legend. Instead, this layer is useful for analysis, particularly for estimating counts of people living within watersheds, coastal areas, and other areas that do not have standard boundaries. Esri recommends using the Zonal Statistics tool or the Zonal Statistics to Table tool where you provide input zones as either polygons, or raster data, and the tool will summarize the count of population within those zones. https://www.esri.com/arcgis-blog/products/arcgis-living-atlas/data-management/2016-world-population-estimate-services-are-now-available/
[Edit 12/09/2020] You will now find in the files below the last 30 days, too many people do not respect the request not to recover too often the dataset (no interest in recovering every minute while the file changes 4 or 5 times a day) If you want access to the entire history, contact me [Edit 31/03/2020] Since yesterday, I made sure to have the data of the day since the ESSC, so the data of the same day are now available and updated several times a day (about every hour) as the new figures fall all over the world. The data of the previous day is always consolidated around 2am (it is no longer 1h since the time change). If you only want to have the complete data, just don't take into account the last day (today’s date) Here I share the data that I compile with the famous coronavirus infection world map created and maintained by The Johns Hopkins University and which serve me to display ** CoronaVirus statistics worldwide and by country** They share the day’s data each night on a GitHub deposit. My tools compile this new data as soon as they are available and I share the result here. This data is used to display tables and graphs on the CoronaVirus website (Covid19) of Politologue.com https://coronavirus.politologue.com/ This data will allow you to make your own graphs and analyses if you look at the subject. I do not oblige you to do it, but if my compilation allows you to do something about it and saved you time, a link to https://coronavirus.politologue.com/ will be appreciable. Information in files (csv and json) — Number of cases — Number of deaths — Number of healing — Death rate (percentage) — Healing rate (percentage) — Infection rate (persons still infected, not deceased or cured) (percentage) — And for data by country, you will find a field “country” If you integrate the client-side json or csv on a site or application, please keep a cache on your servers without risking an unexpected load on my servers.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.
So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.
The European CDC publishes daily statistics on the COVID-19 pandemic. Not just for Europe, but for the entire world. We rely on the ECDC as they collect and harmonize data from around the world which allows us to compare what is happening in different countries.
This dataset has daily level information on the number of affected cases, deaths and recovery etc. from coronavirus. It also contains various other parameters like average life expectancy, population density, smocking population etc. which users can find useful in further prediction that they need to make.
The data is available from 31 Dec,2019.
Give people weekly data so that they can use it to make accurate predictions.
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching *** zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than *** zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just * percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of **** percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached *** zettabytes.
https://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdfhttps://object-store.os-api.cci2.ecmwf.int:443/cci2-prod-catalogue/licences/cc-by/cc-by_f24dc630aa52ab8c52a0ac85c03bc35e0abc850b4d7453bdc083535b41d5a5c3.pdf
ERA5 is the fifth generation ECMWF reanalysis for the global climate and weather for the past 8 decades. Data is available from 1940 onwards. ERA5 replaces the ERA-Interim reanalysis. Reanalysis combines model data with observations from across the world into a globally complete and consistent dataset using the laws of physics. This principle, called data assimilation, is based on the method used by numerical weather prediction centres, where every so many hours (12 hours at ECMWF) a previous forecast is combined with newly available observations in an optimal way to produce a new best estimate of the state of the atmosphere, called analysis, from which an updated, improved forecast is issued. Reanalysis works in the same way, but at reduced resolution to allow for the provision of a dataset spanning back several decades. Reanalysis does not have the constraint of issuing timely forecasts, so there is more time to collect observations, and when going further back in time, to allow for the ingestion of improved versions of the original observations, which all benefit the quality of the reanalysis product. ERA5 provides hourly estimates for a large number of atmospheric, ocean-wave and land-surface quantities. An uncertainty estimate is sampled by an underlying 10-member ensemble at three-hourly intervals. Ensemble mean and spread have been pre-computed for convenience. Such uncertainty estimates are closely related to the information content of the available observing system which has evolved considerably over time. They also indicate flow-dependent sensitive areas. To facilitate many climate applications, monthly-mean averages have been pre-calculated too, though monthly means are not available for the ensemble mean and spread. ERA5 is updated daily with a latency of about 5 days (monthly means are available around the 6th of each month). In case that serious flaws are detected in this early release (called ERA5T), this data could be different from the final release 2 to 3 months later. In case that this occurs users are notified. The data set presented here is a regridded subset of the full ERA5 data set on native resolution. It is online on spinning disk, which should ensure fast and easy access. It should satisfy the requirements for most common applications. An overview of all ERA5 datasets can be found in this article. Information on access to ERA5 data on native resolution is provided in these guidelines. Data has been regridded to a regular lat-lon grid of 0.25 degrees for the reanalysis and 0.5 degrees for the uncertainty estimate (0.5 and 1 degree respectively for ocean waves). There are four main sub sets: hourly and monthly products, both on pressure levels (upper air fields) and single levels (atmospheric, ocean-wave and land surface quantities). The present entry is "ERA5 monthly mean data on single levels from 1940 to present".
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical chart and dataset showing World population growth rate by year from 1961 to 2023.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This is the dataset for the article "A Predictive Method to Improve the Effectiveness of Twitter Communication in a Cultural Heritage Scenario".
Abstract:
Museums are embracing social technologies in the attempt to broaden their audience and to engage people. Although social communication seems an easy task, media managers know how hard it is to reach millions of people with a simple message. Indeed, millions of posts are competing every day to get visibility in terms of likes and shares and very little research focused on museums communication to identify best practices. In this paper, we focus on Twitter and we propose a novel method that exploits interpretable machine learning techniques to: (a) predict whether a tweet will likely be appreciated by Twitter users or not; (b) present simple suggestions that will help enhancing the message and increasing the probability of its success. Using a real-world dataset of around 40,000 tweets written by 23 world famous museums, we show that our proposed method allows identifying tweet features that are more likely to influence the tweet success.
Code to run a selection of experiments is available at https://github.com/rmartoglia/predict-twitter-ch
Dataset structure
The dataset contains the dataset used in the experiments of the above research paper. Only the extracted features for the museum tweet threads (and not the message full text) are provided and needed for the analyses.
We selected 23 well known world spread art museums and grouped them into five groups: G1 (museums with at least three million of followers); G2 (museums with more than one million of followers); G3 (museums with more than 400,000 followers); G4 (museums with more that 200,000 followers); G5 (Italian museums). From these museums, we analyzed ca. 40,000 tweets, with a number varying from 5k ca. to 11k ca. for each museum group, depending on the number of museums in each group.
Content features: these are the features that can be drawn form the content of the tweet itself. We further divide such features in the following two categories:
– Countable: these features have a value ranging into different intervals. We take into consideration: the number of hashtags (i.e., words preceded by #) in the tweet, the number of URLs (i.e., links to external resources), the number of images (e.g., photos and graphical emoticons), the number of mentions (i.e., twitter accounts preceded by @), the length of the tweet;
– On-Off : these features have binary values in {0, 1}. We observe whether the tweet has exclamation marks, question marks, person names, place names, organization names, other names. Moreover, we also take into consideration the tweet topic density: assuming that the involved topics correspond to the hashtags mentioned in the text, we define a tweet as dense of topics if the number of hashtags it contains is greater than a given threshold, set to 5. Finally, we observe the tweet sentiment that might be present (positive or negative) or not (neutral).
Context features: these features are not drawn form the content of the tweet itself and might give a larger picture of the context in which the tweet was sent. Namely, we take into consideration the part of the day in which the tweet was sent (morning, afternoon, evening and night respectively from 5:00am to 11:59am, from 12:00pm to 5:59pm, from 6:00pm to 10:59pm and from 11pm to 4:59am), and a boolean feature indicating whether the tweet is a retweet or not.
User features: these features are proper of the user that sent the tweet, and are the same for all the tweets of this user. Namely we consider the name of the museum and the number of followers of the user.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Historical chart and dataset showing World death rate by year from 1950 to 2025.