Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
TwitterDESCRIPTION
Johns Hopkins' county-level COVID-19 case and death data, paired with population and rates per 100,000
SUMMARY Updates April 9, 2020 The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County. April 20, 2020 Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well. April 29, 2020 The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
Overview The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Queries Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
Interactive Embed Code
Caveats This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website. In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules. In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county" This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members. Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates. Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey. The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories --...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The COVID-19 pandemic has left an indelible mark on societies worldwide, not only through its direct impact on health but also through its ripple effects on various aspects of life. As we strive to comprehend the full extent of its toll, one crucial metric that emerges is excess deaths – a measure encompassing not only confirmed COVID-19 fatalities but also those indirectly caused by the pandemic. In this discourse, we delve into the comprehensive dataset provided by The Economist and processed by Our World in Data, shedding light on the central estimates and uncertainty intervals of global excess deaths.
The dataset, meticulously compiled and analyzed by The Economist, serves as a cornerstone for understanding the broader implications of the pandemic beyond official death counts. This invaluable resource, available for public scrutiny and further research, offers insights into the nuanced dynamics of excess mortality across different regions and timeframes.
Central to our exploration are the central estimates provided by The Economist, representing the best approximation of excess deaths attributable to the pandemic. These figures, derived through rigorous statistical methodologies, provide a foundational understanding of the pandemic's impact on mortality rates globally. By accounting for excess deaths beyond what would typically be expected, these estimates paint a clearer picture of the true toll of COVID-19.
Accompanying these central estimates are uncertainty intervals, reflecting the range within which the true value of excess deaths is likely to fall. As with any statistical analysis, uncertainties abound, stemming from various factors such as data collection methods, reporting inconsistencies, and the inherent complexity of modeling excess mortality. Acknowledging these uncertainties is paramount in interpreting the data accurately and avoiding overgeneralizations or misinterpretations.
Delving deeper into the dataset, it becomes evident that the magnitude of excess deaths varies significantly across different regions and time periods. Factors such as healthcare infrastructure, socio-economic disparities, and the stringency of public health measures exert profound influences on mortality outcomes. By dissecting these variations, policymakers and public health experts can glean invaluable insights to inform targeted interventions and mitigate future crises.
Moreover, the dataset underscores the interconnectedness of global health, highlighting how the impact of the pandemic transcends geographical boundaries. As nations grapple with containing the spread of the virus within their borders, the ripple effects of excess mortality reverberate across the international community. This interconnectedness underscores the importance of collective action and solidarity in addressing not only the immediate challenges posed by the pandemic but also the long-term ramifications on global health security.
It is essential to note that behind every data point lies a human story – a life lost, a family shattered, a community grieving. Amidst the statistical analyses and epidemiological models, it is imperative not to lose sight of the human dimension of the pandemic. Each excess death represents more than just a number; it embodies a profound loss and underscores the urgency of concerted efforts to prevent further tragedies.
In conclusion, the dataset provided by The Economist and processed by Our World in Data offers a comprehensive lens through which to understand the complexities of excess mortality during the COVID-19 pandemic. By interrogating the central estimates and uncertainty intervals, we gain critical insights into the multifaceted dimensions of the pandemic's impact on global mortality rates. Moving forward, leveraging these insights to inform evidence-based policies and interventions is paramount in mitigating the ongoing crisis and building resilient health systems for the future.
Facebook
TwitterNOTE: This dataset has been retired and marked as historical-only. Weekly rates of COVID-19 cases, hospitalizations, and deaths among people living in Chicago by vaccination status and age. Rates for fully vaccinated and unvaccinated begin the week ending April 3, 2021 when COVID-19 vaccines became widely available in Chicago. Rates for boosted begin the week ending October 23, 2021 after booster shots were recommended by the Centers for Disease Control and Prevention (CDC) for adults 65+ years old and adults in certain populations and high risk occupational and institutional settings who received Pfizer or Moderna for their primary series or anyone who received the Johnson & Johnson vaccine. Chicago residency is based on home address, as reported in the Illinois Comprehensive Automated Immunization Registry Exchange (I-CARE) and Illinois National Electronic Disease Surveillance System (I-NEDSS). Outcomes: • Cases: People with a positive molecular (PCR) or antigen COVID-19 test result from an FDA-authorized COVID-19 test that was reported into I-NEDSS. A person can become re-infected with SARS-CoV-2 over time and so may be counted more than once in this dataset. Cases are counted by week the test specimen was collected. • Hospitalizations: COVID-19 cases who are hospitalized due to a documented COVID-19 related illness or who are admitted for any reason within 14 days of a positive SARS-CoV-2 test. Hospitalizations are counted by week of hospital admission. • Deaths: COVID-19 cases who died from COVID-19-related health complications as determined by vital records or a public health investigation. Deaths are counted by week of death. Vaccination status: • Fully vaccinated: Completion of primary series of a U.S. Food and Drug Administration (FDA)-authorized or approved COVID-19 vaccine at least 14 days prior to a positive test (with no other positive tests in the previous 45 days). • Boosted: Fully vaccinated with an additional or booster dose of any FDA-authorized or approved COVID-19 vaccine received at least 14 days prior to a positive test (with no other positive tests in the previous 45 days). • Unvaccinated: No evidence of having received a dose of an FDA-authorized or approved vaccine prior to a positive test. CLARIFYING NOTE: Those who started but did not complete all recommended doses of an FDA-authorized or approved vaccine prior to a positive test (i.e., partially vaccinated) are excluded from this dataset. Incidence rates for fully vaccinated but not boosted people (Vaccinated columns) are calculated as total fully vaccinated but not boosted with outcome divided by cumulative fully vaccinated but not boosted at the end of each week. Incidence rates for boosted (Boosted columns) are calculated as total boosted with outcome divided by cumulative boosted at the end of each week. Incidence rates for unvaccinated (Unvaccinated columns) are calculated as total unvaccinated with outcome divided by total population minus cumulative boosted, fully, and partially vaccinated at the end of each week. All rates are multiplied by 100,000. Incidence rate ratios (IRRs) are calculated by dividing the weekly incidence rates among unvaccinated people by those among fully vaccinated but not boosted and boosted people. Overall age-adjusted incidence rates and IRRs are standardized using the 2000 U.S. Census standard population. Population totals are from U.S. Census Bureau American Community Survey 1-year estimates for 2019. All data are provisional and subject to change. Information is updated as additional details are received and it is, in fact, very common for recent dates to be incomplete and to be updated as time goes on. This dataset reflects data known to CDPH at the time when the dataset is updated each week. Numbers in this dataset may differ from other public sources due to when data are reported and how City of Chicago boundaries are defined. For all datasets related to COVID-19, see https://data.cityofchic
Facebook
TwitterIn collaboration with the Public Health Agency of Canada (PHAC), this table provides Canadians and researchers with data to monitor only the confirmed cases of coronavirus (COVID-19) in Canada. This table will provide an aggregate summary of the data available in the publication 13-26-0003.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Age-standardised mortality rates for deaths involving coronavirus (COVID-19), non-COVID-19 deaths and all deaths by vaccination status, broken down by age group.
Facebook
TwitterRank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
https://ichef.bbci.co.uk/news/976/cpsprodpb/11C98/production/_118165827_gettyimages-1232465340.jpg" alt="">
People across India scrambled for life-saving oxygen supplies on Friday and patients lay dying outside hospitals as the capital recorded the equivalent of one death from COVID-19 every five minutes.
For the second day running, the country’s overnight infection total was higher than ever recorded anywhere in the world since the pandemic began last year, at 332,730.
India’s second wave has hit with such ferocity that hospitals are running out of oxygen, beds, and anti-viral drugs. Many patients have been turned away because there was no space for them, doctors in Delhi said.
https://s.yimg.com/ny/api/res/1.2/XhVWo4SOloJoXaQLrxxUIQ--/YXBwaWQ9aGlnaGxhbmRlcjt3PTk2MA--/https://s.yimg.com/os/creatr-uploaded-images/2021-04/8aa568f0-a3e0-11eb-8ff6-6b9a188e374a" alt="">
Mass cremations have been taking place as the crematoriums have run out of space. Ambulance sirens sounded throughout the day in the deserted streets of the capital, one of India’s worst-hit cities, where a lockdown is in place to try and stem the transmission of the virus. source
The dataset consists of the tweets made with the #IndiaWantsOxygen hashtag covering the tweets from the past week. The dataset totally consists of 25,440 tweets and will be updated on a daily basis.
The description of the features is given below | No |Columns | Descriptions | | -- | -- | -- | | 1 | user_name | The name of the user, as they’ve defined it. | | 2 | user_location | The user-defined location for this account’s profile. | | 3 | user_description | The user-defined UTF-8 string describing their account. | | 4 | user_created | Time and date, when the account was created. | | 5 | user_followers | The number of followers an account currently has. | | 6 | user_friends | The number of friends an account currently has. | | 7 | user_favourites | The number of favorites an account currently has | | 8 | user_verified | When true, indicates that the user has a verified account | | 9 | date | UTC time and date when the Tweet was created | | 10 | text | The actual UTF-8 text of the Tweet | | 11 | hashtags | All the other hashtags posted in the tweet along with #IndiaWantsOxygen | | 12 | source | Utility used to post the Tweet, Tweets from the Twitter website have a source value - web | | 13 | is_retweet | Indicates whether this Tweet has been Retweeted by the authenticating user. |
https://globalnews.ca/news/7785122/india-covid-19-hospitals-record/ Image courtesy: BBC and Reuters
The past few days have been really depressing after seeing these incidents. These tweets are the voice of the indians requesting help and people all over the globe asking their own countries to support India by providing oxygen tanks.
And I strongly believe that this is not just some data, but the pure emotions of people and their call for help. And I hope we as data scientists could contribute on this front by providing valuable information and insights.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex Infected abroad - Identifies if person was infected by Covid-19 in Czech republic or abroad Infected in country - code of country from where person came (origin country of Covid-19)
Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.
df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()
df_region = df[df['region'] != ''].groupby(['region']).agg(
region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'),
infected=pd.NamedAgg(column='infected', aggfunc='sum'),
cured=pd.NamedAgg(column='cured', aggfunc='sum'),
death=pd.NamedAgg(column='death', aggfunc='sum')
).reset_index()
df_detail = df[['date','region','sub_region','age','sex','infected','cured','death','infected_abroad','infected_in_country']].reset_index(drop=True)
Thanks to websites of MVCR for sharing such great information.
Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?
Facebook
Twitterhttp://www.gnu.org/licenses/lgpl-3.0.htmlhttp://www.gnu.org/licenses/lgpl-3.0.html
There are lots of datasets and Kaggle notebooks on Corona virus that cover its geographical/demographic spread for different regions and age groups, but how is that helpful in stopping its spread or finding cure for it. This dataset aims to bring new dimensions into data science driven analysis of Corona virus, and aims to include dimensions such as :-
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
This dataset would not have been possible without the following organizations sharing their data on daily basis : -
There are many important questions that need to be answered to find best ways to fight and win against Corona virus and save humanity. Currently, most datasets are only counting infections and death which is not as much helpful because we need to find best strategies for reducing Corona virus and its impact. If there is no such dataset, then what can we do to collect, compile or develop such a dataset ?
Q). Do we even know, what are the right questions to ask ? Do we have the right features to help answer right questions ? Q). Do we have the right data to do some useful research on this topic ? Q). What are the most important steps to stop the spread ? Q). Why did it spread so fast across the world ? For example, did it spread because of air, water, coughing, touch or deliberate human action(germ warfare) ?
Q). Over 10,000 people across Italy are infected, but what is the most likely reason (Touch, Cough, Air, Water, or some other element) ? Q). How many iItalians are primary victims of Corona, and how many are secondary(getting from Primary victims) ?
Q). What are the most effective tests for early diagnosis, and hence stopping the spread for population at risk ? Q). What are the most likely causes of such widespread/global epidemic ? Q). What are the best treatment strategies(including most effective medicines) ? Q). What is the best course of action for an individual, family or society to save themselves from Corona infection(Use Sanitizers, Wash Hands frequently, Isolate themselves from rest of world, or do something else) ? Q). What role can WHO play to collect meaningful data on Corona Virus, and share it in responsible way so that next time we can do better to counter such an epidemic ?
What are the other most important questions that we should be asking ?
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite as "2020 COVID19 Global Daily Impact Dataset by criticalperegrine.tumblr" Please read the .ods file for sources
A dataset containing statistics pertaining to : * Policy - what special mesures were applied during the year * Epidemic - how fast is COVID spreading, how deadly it is, how much has it spread & killed people * Population - How many people per country, how old they are, how urban and concentrated they are * Medical System - How many Physicians & Beds exist * Weather - Temperature, Humidity and Wind * Electrical Grid - How has the consumption of electricity changed * Aviation - How have the number of flights varied The reader can view the detailed sources for each statistic in "fullCOVIDsources.ods" with precise links / citations. The .csv dataset itself can be opened with Excel or any spreadsheet program. Wunderground.com was used for (almost) all Weather data. The Oxford Government response tracker was used for all Policy data.
The "Epidemic" Statistics contain the Reff, a measure of the propagation of the epidemic. This was computed through the "EpiEstim" package by Cori et al (https://pbil.univ-lyon1.fr/CRAN/web/packages/EpiEstim/index.html), through the used of the serial interval by Challen et al ( https://www.medrxiv.org/content/10.1101/2020.11.17.20231548v2 ). The choice behind this serial interval is due to the fact that it reportedly accounts for pre-symptomatic transmission, an important feature according to the literature, whilst showing similar Reff for most regions as a more cited distribution by Qun Li et al ( https://www.nejm.org/doi/full/10.1056/NEJMOa2001316 ). The reader can inspect the code that generates the Reff values by reading the file "Reff Computation.r". The choice behind the Reff itself is because it is a simple to interpret indicator : >1, we have an epidemic; <1, we do not.
Electricity is used in most of the world, save for very very rural countries, for personal & industrial use. From cooking food, to transforming goods through the use of heavy machinery, to services (digital, or simply powering the light in venues providing services). It is essential for production, and a major decrease in consumption in electricity would imply a decrease in "daily" GDP (Gross Domestic Production) since : * Electricity is difficult to stock, so most electrical demand is related to needs for that day * There is no reported "major innovation" that decreases electricity consumption by more than 10% whilst maintaining a country's production * Electricity is used to transform most goods and produce most services in a country, as mentioned previously. So electricity is used to compare the shock done to the GDP due to different policies or infection rates.
Arrivals were used to make the effect of "Closed Borders" pop out. Aviation is used here as a non essential good, and also as a measure of international mobility throughout the year
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.