This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.
It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.
Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.
This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.
A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.
All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
THIS DATASET WAS LAST UPDATED AT 8:11 PM EASTERN ON JUNE 29
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">
People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.
For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.
India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.
https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">
Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source
The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.
The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters
The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.
And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
Rank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
[Edit 12/09/2020] You will now find in the files below the last 30 days, too many people do not respect the request not to recover too often the dataset (no interest in recovering every minute while the file changes 4 or 5 times a day) If you want access to the entire history, contact me [Edit 31/03/2020] Since yesterday, I made sure to have the data of the day since the ESSC, so the data of the same day are now available and updated several times a day (about every hour) as the new figures fall all over the world. The data of the previous day is always consolidated around 2am (it is no longer 1h since the time change). If you only want to have the complete data, just don't take into account the last day (today’s date) Here I share the data that I compile with the famous coronavirus infection world map created and maintained by The Johns Hopkins University and which serve me to display ** CoronaVirus statistics worldwide and by country** They share the day’s data each night on a GitHub deposit. My tools compile this new data as soon as they are available and I share the result here. This data is used to display tables and graphs on the CoronaVirus website (Covid19) of Politologue.com https://coronavirus.politologue.com/ This data will allow you to make your own graphs and analyses if you look at the subject. I do not oblige you to do it, but if my compilation allows you to do something about it and saved you time, a link to https://coronavirus.politologue.com/ will be appreciable. Information in files (csv and json) — Number of cases — Number of deaths — Number of healing — Death rate (percentage) — Healing rate (percentage) — Infection rate (persons still infected, not deceased or cured) (percentage) — And for data by country, you will find a field “country” If you integrate the client-side json or csv on a site or application, please keep a cache on your servers without risking an unexpected load on my servers.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
This dataset reports the daily reported number of the 7-day moving average rates of Deaths involving COVID-19 by vaccination status and by age group. Learn how the Government of Ontario is helping to keep Ontarians safe during the 2019 Novel Coronavirus outbreak. Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool Data includes: * Date on which the death occurred * Age group * 7-day moving average of the last seven days of the death rate per 100,000 for those not fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those vaccinated with at least one booster ##Additional notes As of June 16, all COVID-19 datasets will be updated weekly on Thursdays by 2pm. As of January 12, 2024, data from the date of January 1, 2024 onwards reflect updated population estimates. This update specifically impacts data for the 'not fully vaccinated' category. On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023. CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags. The data does not include vaccination data for people who did not provide consent for vaccination records to be entered into the provincial COVaxON system. This includes individual records as well as records from some Indigenous communities where those communities have not consented to including vaccination information in COVaxON. “Not fully vaccinated” category includes people with no vaccine and one dose of double-dose vaccine. “People with one dose of double-dose vaccine” category has a small and constantly changing number. The combination will stabilize the results. Spikes, negative numbers and other data anomalies: Due to ongoing data entry and data quality assurance activities in Case and Contact Management system (CCM) file, Public Health Units continually clean up COVID-19, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes, negative numbers and current totals being different from previously reported case and death counts. Public Health Units report cause of death in the CCM based on information available to them at the time of reporting and in accordance with definitions provided by Public Health Ontario. The medical certificate of death is the official record and the cause of death could be different. Deaths are defined per the outcome field in CCM marked as “Fatal”. Deaths in COVID-19 cases identified as unrelated to COVID-19 are not included in the Deaths involving COVID-19 reported. Rates for the most recent days are subject to reporting lags All data reflects totals from 8 p.m. the previous day. This dataset is subject to change.
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
How much time do people spend on social media? As of 2025, the average daily social media usage of internet users worldwide amounted to 141 minutes per day, down from 143 minutes in the previous year. Currently, the country with the most time spent on social media per day is Brazil, with online users spending an average of 3 hours and 49 minutes on social media each day. In comparison, the daily time spent with social media in the U.S. was just 2 hours and 16 minutes. Global social media usageCurrently, the global social network penetration rate is 62.3 percent. Northern Europe had an 81.7 percent social media penetration rate, topping the ranking of global social media usage by region. Eastern and Middle Africa closed the ranking with 10.1 and 9.6 percent usage reach, respectively. People access social media for a variety of reasons. Users like to find funny or entertaining content and enjoy sharing photos and videos with friends, but mainly use social media to stay in touch with current events friends. Global impact of social mediaSocial media has a wide-reaching and significant impact on not only online activities but also offline behavior and life in general. During a global online user survey in February 2019, a significant share of respondents stated that social media had increased their access to information, ease of communication, and freedom of expression. On the flip side, respondents also felt that social media had worsened their personal privacy, increased a polarization in politics and heightened everyday distractions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Describe real-world epidemiology, treatment patterns, health care resource utilization, and costs of locally advanced or metastatic urothelial carcinoma (la/mUC) in France. Retrospective study including all adults with la/mUC diagnosis during January 2017 to December 2020 in the PMSI database. Annual prevalence and incidence ranged from 36.4 to 38.9 and 16.4 to 18.5 cases per 100,000 people, respectively. Of the 25,314 patients with incident la/mUC, 37.6% did not receive first-line systemic treatment. Of the 14,656 patients who started first-line systemic treatment, 66.6%, 22.5%, and 10.9% received 1, 2, and 3 lines of therapy, respectively. Annual per-patient costs in second-/third-line setting ranged from €8803 to €16,012. The substantial disease burden of la/mUC in France highlights the unmet need for new therapies. What is this article about? Urothelial carcinoma (UC) is a type of cancer affecting the urinary system. It can spread to other parts of the body, described as locally advanced or metastatic (la/m). We used information from a French database recording hospitalizations in France to find out how many people have la/mUC, how many new cases develop each year, what treatments they receive, how many die in the hospital, and how much their care costs. What were the results? Based on database information, 37 to 39 of every 100,000 people have la/mUC and 17 to 19 of every 100,000 people are identified with a new case yearly. Slightly more than one-third of patients with la/mUC did not receive recommended treatment (chemotherapy) when first diagnosed. Chemotherapy was the most common treatment type for the first, second, or third treatment; checkpoint inhibitors (a unique treatment) became more commonly used as a second treatment over time. Yearly in-hospital death rates were high, ranging from 47.8% of patients who died within 1 year from diagnosis to 62.9% dying within 3 years. Yearly cost of care was high (costing €8803 to €16,012) in patients starting a second or third treatment. What do the results of the study mean? The study shows many patients may not be fit enough or choose not to receive treatment. Even those receiving treatment are at high risk for poor outcomes. The burden of la/mUC in France is high, underscoring the need for more therapies and better supportive care early in disease management.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Vital Statistics: Death Rate: per 1000 Population: Andhra Pradesh: Urban data was reported at 4.900 NA in 2020. This records an increase from the previous number of 4.800 NA for 2019. Vital Statistics: Death Rate: per 1000 Population: Andhra Pradesh: Urban data is updated yearly, averaging 5.400 NA from Dec 1997 (Median) to 2020, with 23 observations. The data reached an all-time high of 6.100 NA in 1998 and a record low of 4.800 NA in 2019. Vital Statistics: Death Rate: per 1000 Population: Andhra Pradesh: Urban data remains active status in CEIC and is reported by Office of the Registrar General & Census Commissioner, India. The data is categorized under India Premium Database’s Demographic – Table IN.GAH003: Vital Statistics: Death Rate: by States.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This table contains figures on the underlying causes of death of deceased residents of the Netherlands. The figures are broken down by quarter and year of death according to a limited number of causes of death. The mortality from COVID-19 or the disease caused by the new coronavirus is coded with the ICD-10 codes U07.1 or U07.2. These codes have been put down by the World Health Organization (WHO) as 'emergency codes' or temporary emergency codes and will later be included in a chapter of the ICD10. These codes are therefore still counted in the quarterly table under the category 'Other causes of death'. In the 2013 statistical year, Statistics Netherlands switched to using international software for the automatic coding of causes of death. This makes the figures more reproducible and internationally comparable. There are, however, some significant shifts in the causes of death (see section 4). Data available from: 1st quarter 1996 Status of the figures: The figures for 2022 and 2023 are provisional, the other figures are final. Changes as of July 19, 2023: Figures for 2023 quarter 1 have been added, the figures for 2022 have been adjusted. When will new numbers come out? Provisional figures for the second quarter of 2023 will be published in the fourth quarter of 2023.
The World Religion Project (WRP) aims to provide detailed information about religious adherence worldwide since 1945. It contains data about the number of adherents by religion in each of the states in the international system. These numbers are given for every half-decade period (1945, 1950, etc., through 2010). Percentages of the states' populations that practice a given religion are also provided. (Note: These percentages are expressed as decimals, ranging from 0 to 1, where 0 indicates that 0 percent of the population practices a given religion and 1 indicates that 100 percent of the population practices that religion.) Some of the religions (as detailed below) are divided into religious families. To the extent data are available, the breakdown of adherents within a given religion into religious families is also provided.
The project was developed in three stages. The first stage consisted of the formation of a religion tree. A religion tree is a systematic classification of major religions and of religious families within those major religions. To develop the religion tree we prepared a comprehensive literature review, the aim of which was (i) to define a religion, (ii) to find tangible indicators of a given religion of religious families within a major religion, and (iii) to identify existing efforts at classifying world religions. (Please see the original survey instrument to view the structure of the religion tree.) The second stage consisted of the identification of major data sources of religious adherence and the collection of data from these sources according to the religion tree classification. This created a dataset that included multiple records for some states for a given point in time. It also contained multiple missing data for specific states, specific time periods and specific religions. The third stage consisted of cleaning the data, reconciling discrepancies of information from different sources and imputing data for the missing cases.
The Global Religion Dataset: This dataset uses a religion-by-five-year unit. It aggregates the number of adherents of a given religion and religious group globally by five-year periods.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
JP: People Using Basic Drinking Water Services: % of Population data was reported at 98.946 % in 2015. This records an increase from the previous number of 98.903 % for 2014. JP: People Using Basic Drinking Water Services: % of Population data is updated yearly, averaging 98.625 % from Dec 2000 (Median) to 2015, with 16 observations. The data reached an all-time high of 98.946 % in 2015 and a record low of 98.476 % in 2004. JP: People Using Basic Drinking Water Services: % of Population data remains active status in CEIC and is reported by World Bank. The data is categorized under Global Database’s Japan – Table JP.World Bank: Health Statistics. The percentage of people using at least basic water services. This indicator encompasses both people using basic water services as well as those using safely managed water services. Basic drinking water services is defined as drinking water from an improved source, provided collection time is not more than 30 minutes for a round trip. Improved water sources include piped water, boreholes or tubewells, protected dug wells, protected springs, and packaged or delivered water.; ; WHO/UNICEF Joint Monitoring Programme (JMP) for Water Supply, Sanitation and Hygiene (washdata.org).; Weighted Average;
By Throwback Thursday [source]
This dataset provides comprehensive information about Super Bowl games that took place in 2019, including game details such as the winning team, losing team, venue, city, attendance, network that broadcasted the game, average number of viewers in the United States who watched the game, rating (representing the percentage of households with televisions that were tuned into the game), share (representing the percentage of households with televisions in use that were tuned into the game), and cost per 30-second advertisement. Additionally, this dataset includes specific details about each Super Bowl game such as the final score (in terms of winning team points minus losing team points), conference affiliations of both winning and losing teams, and any additional notes or information about each respective Super Bowl. All of these data points collectively provide a comprehensive overview of each recorded Super Bowl game from 2019
Game details: The 'Game' column represents the number or identifier of the Super Bowl game. For example, '1' indicates it is the first Super Bowl game.
Winning team: The 'Winning team' column lists the name of the team that won the Super Bowl game. For example, 'New England Patriots'.
Winning Team Points: The 'Winning Team Points' column shows the number of points scored by the winning team in that particular game.
Winning Team Conference: The 'Winning Team Conference' column indicates which conference (e.g., AFC or NFC) the winning team belongs to.
Score: The 'Score' column displays a summary of the final score in each game, showcasing how many points were scored by both teams in this format - Winning Team Points - Losing Team Points.
Losing team: Similar to winning teams, losing teams are listed under the 'Losing team' column.
Losing Team Conference: This column represents which conference (e.g., AFC or NFC)the losing team belongs to.
Venue and city: The columns 'Venue' and 'City' show where each Super Bowl game was played, respectively.
Attendance : This column shows numbers associated with how many people attended a particular super bowl event
Network : Indicates Television network for broadcasted super bowl
11.Average U.S viewers : It denotes average number of viewers in United States who watched a specific super bowl
12.Rating & Share : These represent data associated with watching percentage (Rating)and households televisions percanton tuned into a particular event(Share).
13.Cost Per 30s Ad: The 'Cost Per 30s Ad' column specifies the cost of a 30-second advertisement during the Super Bowl game in dollars.
14.Notes: The 'Notes' column includes additional notes or information about each Super Bowl game.
This dataset provides a comprehensive record of every Super Bowl game that took place in 2019. By analyzing these attributes, you can gain insights into team performance, viewer interest, and commercial aspects of the games. Use this guide to explore and analyze the dataset effectively for your analysis or research purposes
- Analyzing the popularity and reach of the Super Bowl: With data on average U.S. viewers, rating, share, and cost per 30-second ad, this dataset can be used to analyze the Super Bowl's popularity and reach. By comparing these metrics across different games, one can assess how the viewership and interest in the Super Bowl has changed over time.
- Evaluating advertising effectiveness during the Super Bowl: The dataset includes information on the cost per 30-second ad during each Super Bowl game. This data can be used to analyze whether there is a correlation between ad costs and viewer ratings or share. It can also help marketers and advertisers understand how effective their advertisements were in reaching a wide audience during past Super Bowls.
- Studying game attendance trends: The dataset provides information on attendance at each Super Bowl game. By analyzing this data, one can identify trends in game attendance over the years and evaluate factors that may impact ticket sales such as venue location or teams competing in the game. This analysis could be useful for event organizers and stadium operators looking to optimize future hosting decisions for large-scale events like sports championships or music festivals
If you use this dataset in your research, please credit the original authors. Data Source
See the dataset descrip...
Attribution-NonCommercial-ShareAlike 3.0 (CC BY-NC-SA 3.0)https://creativecommons.org/licenses/by-nc-sa/3.0/
License information was derived automatically
The “richness index” represents the level of economical wellbeing a country certain area in 2010. Regions with higher income per capita and low poverty rate and more access to market are wealthier and are therefore better able to prepare for and respond to adversity. The index results from the second cluster of the Principal Component Analysis preformed among 9 potential variables. The analysis identifies four dominant variables, namely “GDPppp per capita”, “agriculture share GDP per agriculture sector worker”, “poverty rate” and “market accessibility”, assigning weights of 0.33, 0.26, 0.25 and 0.16, respectively. Before to perform the analysis all variables were log transformed (except the “agriculture share GDP per agriculture sector worker”) to shorten the extreme variation and then were score-standardized (converted to distribution with average of 0 and standard deviation of 1; inverse method was applied for the “poverty rate” and “market accessibility”) in order to be comparable. The 0.5 arc-minute grid total GDPppp is based on the night time light satellite imagery of NOAA (see Ghosh, T., Powell, R., Elvidge, C. D., Baugh, K. E., Sutton, P. C., & Anderson, S. (2010).Shedding light on the global distribution of economic activity. The Open Geography Journal (3), 148-161) and adjusted to national total as recorded by International Monetary Fund for 2010. The “GDPppp per capita” was calculated dividing the total GDPppp by the population in each pixel. Further, a focal statistic ran to determine mean values within 10 km. This had a smoothing effect and represents some of the extended influence of intense economic activity for the local people. Country based data for “agriculture share GDP per agriculture sector worker” were calculated from GDPppp (data from International Monetary Fund) fraction from agriculture activity (measured by World Bank) divided by the number of worker in the agriculture sector (data from World Bank). The tabular data represents the average of the period 2008-2012 and were linked by country unit to the national boundaries shapefile (FAO/GAUL) and then converted into raster format (resolution 0.5 arc-minute). The first administrative level data for the “poverty rate” were estimated by NOAA for 2003 using nighttime lights satellite imagery. Tabular data were linked by first administrative unit to the first administrative boundaries shapefile (FAO/GAUL) and then converted into raster format (resolution 0.5 arc-minute). The 0.5 arc-minute grid “market accessibility” measures the travel distance in minutes to large cities (with population greater than 50,000 people). This dataset was developed by the European Commission and the World Bank to represent access to markets, schools, hospitals, etc.. The dataset capture the connectivity and the concentration of economic activity (in 2000). Markets may be important for a variety of reasons, including their abilities to spread risk and increase incomes. Markets are a means of linking people both spatially and over time. That is, they allow shocks (and risks) to be spread over wider areas. In particular, markets should make households less vulnerable to (localized) covariate shocks. This dataset has been produced in the framework of the “Climate change predictions in Sub-Saharan Africa: impacts and adaptations (ClimAfrica)” project, Work Package 4 (WP4). More information on ClimAfrica project is provided in the Supplemental Information section of this metadata.
Data publication: 2014-05-15
Supplemental Information:
ClimAfrica was an international project funded by European Commission under the 7th Framework Programme (FP7) for the period 2010-2014. The ClimAfrica consortium was formed by 18 institutions, 9 from Europe, 8 from Africa, and the Food and Agriculture Organization of United Nations (FAO).
ClimAfrica was conceived to respond to the urgent international need for the most appropriate and up-to-date tools and methodologies to better understand and predict climate change, assess its impact on African ecosystems and population, and develop the correct adaptation strategies. Africa is probably the most vulnerable continent to climate change and climate variability and shows diverse range of agro-ecological and geographical features. Thus the impacts of climate change can be very high and can greatly differ across the continent, and even within countries.
The project focused on the following specific objectives:
Develop improved climate predictions on seasonal to decadal climatic scales, especially relevant to SSA;
Assess climate impacts in key sectors of SSA livelihood and economy, especially water resources and agriculture;
Evaluate the vulnerability of ecosystems and civil population to inter-annual variations and longer trends (10 years) in climate;
Suggest and analyse new suited adaptation strategies, focused on local needs;
Develop a new concept of 10 years monitoring and forecasting warning system, useful for food security, risk management and civil protection in SSA;
Analyse the economic impacts of climate change on agriculture and water resources in SSA and the cost-effectiveness of potential adaptation measures.
The work of ClimAfrica project was broken down into the following work packages (WPs) closely connected. All the activities described in WP1, WP2, WP3, WP4, WP5 consider the domain of the entire South Sahara Africa region. Only WP6 has a country specific (watershed) spatial scale where models validation and detailed processes analysis are carried out.
Contact points:
Metadata Contact: FAO-Data
Resource Contact: Selvaraju Ramasamy
Resource constraints:
copyright
Online resources:
Project deliverable D4.1 - Scenarios of major production systems in Africa
Climafrica Website - Climate Change Predictions In Sub-Saharan Africa: Impacts And Adaptations
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This file contains the list of codes and general categories associated with the ICD10h (Historic cause of death coding and classification scheme for individual-level causes of death). ICD10h has been designed by the authors to aid the coding and classification of causes of death recorded on historic individual death records and associated files include a manual, a list of exemplar strings in the English language, and a categorisation for infant mortality. The ICD10h system is based on the 10th revision of the International Classification of Diseases - 2016 version (ICD10 - 2016), and combines ICD10 codes (without modification) with new codes for archaic/historic terms. The data was derived from the following projects/deposited data: Determining the Demography of Victorian Scotland Through Record Linkage, ESRC RES-000-23-0128 held at the Cambridge Group for the History of Population and Social Structure, University of Cambridge; P. Gunn and R. Kippen, ‘Household and Family Formation in Nineteenth-Century Tasmania, Dataset of 195 Thousand Births, 93 Thousand Deaths and 51 Thousand Marriages Registered in Tasmania, 1838-1899’, 2008.
The resource creation was supported by the following projects: Digitising Scotland/Scottish Health Informatics Project (funded by the ESRC); Studying Health in Port Cities (funded by The Netherlands Organisation for Scientific Research); The Great Leap (funded by COST-Action CA22116).
SHARING/ACCESS INFORMATION
This resource is available under a CC BY licence.
Recommended citation for this dataset: Historic cause of death coding and classification scheme for individual-level causes of death – Codes [https://doi.org/10.17863/CAM.109961]
Please see the associated resources: Historic cause of death coding and classification scheme for individual-level causes of death – manual [https://doi.org/10.17863/CAM.109960] Historic cause of death coding and classification scheme for individual-level causes of death – English language historic strings [https://doi.org/10.17863/CAM.109962] Historic cause of death coding and classification scheme for individual-level causes of death – Infant Categorisations [https://doi.org/10.17863/CAM.109963]
ICD10h is a research tool created to facilitate the study of historical cause of death records and should not be used for any official purpose. It is based on the International Classification of Diseases, 10th Revision (ICD-10) version 2016 (Geneva: World Health Organization 2016) but is not a recognised version or extension of ICD-10 and is not authorised by WHO. However we have consulted with WHO: they recognise that ICD10h is a useful academic methodology and have not raised any objections to its creation. Data coded using ICD10h are not directly comparable with data coded in ICD-10, and the underlying or primary cause of death derived using the ICD10h methodology may be different from the underlying cause derived in ICD-10 according to the WHO rules. Please note that ICD-10 version 2016 is not the most recent version of ICD-10; and that WHO now recommend the use of ICD-11; a more advanced and detailed classification.
DATA & FILE OVERVIEW
ICD10h_Masterlist.xlsx Excel file consisting of 3 worksheets:
1) ReadMe sheet
2) Masterlist
3) 2020to2024transfer
Separate csv files for 2) and 3) containing the same information.
This file builds on a previous, unpublished version of ICD10h (dating from 2020). The 2020to2024transfer file enables data coded to the earlier version to be updated to the current version.
METHODOLOGICAL INFORMATION
The data were hand-coded and subject to stringent algorithm-assisted tests.
DATA-SPECIFIC INFORMATION FOR: Masterlist
Number of variables: 10
Number of cases/rows: 14088
Variable List: IDMasterlist (a unique ID number for Masterlist table) ICD10h (ICD10h code) ICD10 (ICD10 code) ICD10_2levelCATEGORY (ICD10 first part of 2 level categorisation) ICD10_2levelCAUSE (ICD10 second part of 2 level categorisation) ICD10h_DESCRIPTION (ICD10h description - this differs from ICD10_2levelCAUSE only where there is a specific historical code) Histcat (category of general historical categorisation) DoNotUse (1=do not use for mortality coding – ICD10 asterisk codes) NotForUnderlying (1=do not use for underlying mortality codes) GenderSpecific (0=can be used for men or women; 1=use for men only; 2=use for women only)
DATA-SPECIFIC INFORMATION FOR: 2020to2024transfer
Number of variables: 4
Number of cases/rows: 13763
Variable List: ID2024_transfer (unique ID for 2020to2024transfer table) IDoct2020Masterlist (ID variable from the 2020 Masterlist) ICD10h_oct2020 (ICD10h from the October 2020 Masterlist) ICD10h2024 (ICD10h value from the current version of the Masterlist)
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
'In-the-Wild' Dataset We present a dataset of audio deepfakes (and corresponding benign audio) for a set of politicians and other public figures, collected from publicly available sources such as social networks and video streaming platforms. For n = 58 celebrities and politicians, we collect both bona-fide and spoofed audio. In total, we collect 20.8 hours of bona-fide and 17.2 hours of spoofed audio. On average, there are 23 minutes of bona-fide and 18 minutes of spoofed audio per speaker.
The dataset is intended to be used for evaluating deepfake detection and voice anti-spoofing machine-learning models. It is especially useful to judge a model's capability to generalize to realistic, in-the-wild audio samples. Find more information in our paper, and download the dataset here.
The most interesting deepfake detection models we used in our experiments are open-source on GitHub:
RawNet 2 RawGAT-ST PC-Darts This dataset and the associated documentation are licensed under the Apache License, Version 2.0.
This dataset contains counts of deaths for California counties based on information entered on death certificates. Final counts are derived from static data and include out-of-state deaths to California residents, whereas provisional counts are derived from incomplete and dynamic data. Provisional counts are based on the records available when the data was retrieved and may not represent all deaths that occurred during the time period. Deaths involving injuries from external or environmental forces, such as accidents, homicide and suicide, often require additional investigation that tends to delay certification of the cause and manner of death. This can result in significant under-reporting of these deaths in provisional data.
The final data tables include both deaths that occurred in each California county regardless of the place of residence (by occurrence) and deaths to residents of each California county (by residence), whereas the provisional data table only includes deaths that occurred in each county regardless of the place of residence (by occurrence). The data are reported as totals, as well as stratified by age, gender, race-ethnicity, and death place type. Deaths due to all causes (ALL) and selected underlying cause of death categories are provided. See temporal coverage for more information on which combinations are available for which years.
The cause of death categories are based solely on the underlying cause of death as coded by the International Classification of Diseases. The underlying cause of death is defined by the World Health Organization (WHO) as "the disease or injury which initiated the train of events leading directly to death, or the circumstances of the accident or violence which produced the fatal injury." It is a single value assigned to each death based on the details as entered on the death certificate. When more than one cause is listed, the order in which they are listed can affect which cause is coded as the underlying cause. This means that similar events could be coded with different underlying causes of death depending on variations in how they were entered. Consequently, while underlying cause of death provides a convenient comparison between cause of death categories, it may not capture the full impact of each cause of death as it does not always take into account all conditions contributing to the death.