This dataset is maintained by the European Centre for Disease Prevention and Control (ECDC) and reports on the geographic distribution of COVID-19 cases worldwide. This data includes COVID-19 reported cases and deaths broken out by country. This data can be visualized via ECDC’s Situation Dashboard . More information on ECDC’s response to COVID-19 is available here . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is hosted in both the EU and US regions of BigQuery. See the links below for the appropriate dataset copy: US region EU region This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. Users of ECDC public-use data files must comply with data use restrictions to ensure that the information will be used solely for statistical analysis or reporting purposes.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS DEATH reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Our complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, and testing, as well as other variables of potential interest.
We will continue to publish up-to-date data on confirmed cases, deaths, and testing, throughout the duration of the COVID-19 pandemic.
Our complete COVID-19 dataset is available in CSV, XLSX, and JSON formats, and includes all of our historical data on the pandemic up to the date of publication.
The CSV and XLSX files follow a format of 1 row per location and date. The JSON version is split by country ISO code, with static variables and an array of daily records.
The variables represent all of our main data related to confirmed cases, deaths, and testing, as well as other variables of potential interest.
As of 10 September 2020, the columns are: iso_code
, continent
, location
, date
, total_cases
, new_cases
, new_cases_smoothed
, total_deaths
, new_deaths
, new_deaths_smoothed
, total_cases_per_million
, new_cases_per_million
, new_cases_smoothed_per_million
, total_deaths_per_million
, new_deaths_per_million
, new_deaths_smoothed_per_million
, total_tests
, new_tests
, new_tests_smoothed
, total_tests_per_thousand
, new_tests_per_thousand
, new_tests_smoothed_per_thousand
, tests_per_case
, positive_rate
, tests_units
, stringency_index
, population
, population_density
, median_age
, aged_65_older
, aged_70_older
, gdp_per_capita
, extreme_poverty
, cardiovasc_death_rate
, diabetes_prevalence
, female_smokers
, male_smokers
, handwashing_facilities
, hospital_beds_per_thousand
, life_expectancy
, human_development_index
A full codebook is made available, with a description and source for each variable in the dataset.
If you are interested in the individual files that make up the complete dataset, or more detailed information, other files can be found in the subfolders:
ecdc
: data fro...2019 Novel Coronavirus COVID-19 (2019-nCoV) Visual Dashboard and Map:
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6
Downloadable data:
https://github.com/CSSEGISandData/COVID-19
Additional Information about the Visual Dashboard:
https://systems.jhu.edu/research/public-health/ncov
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides values for CORONAVIRUS CASES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post, Mapping 2019-nCoV (https://systems.jhu.edu/research/public-health/ncov/), and included data sources are listed here: https://github.com/CSSEGISandData/COVID-19
How many confirmed COVID-19 cases were there in the US, by state?
This query determines the total number of cases by province in February. A "province_state" can refer to any subset of the US in this particular dataset, including a county or state.
SELECT
province_state,
confirmed AS feb_confirmed_cases,
FROM
bigquery-public-data.covid19_jhu_csse.summary
WHERE
country_region = "US"
AND date = '2020-02-29'
ORDER BY
feb_confirmed_cases desc
Which countries with the highest number of confirmed cases have the most per capita? This query joins the Johns Hopkins dataset with the World Bank's global population data to determine which countries among those with the highest total number of confirmed cases have the most confirmed cases per capita.
with country_pop AS(
SELECT
IF(country = "United States","US",IF(country="Iran, Islamic Rep.","Iran",country)) AS country,
year_2018
FROM
bigquery-public-data.world_bank_global_population.population_by_country
)
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "US"
AND country_pop.country = "US"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY
country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "France"
AND country_pop.country = "France"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY
country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "China"
AND country_pop.country = "China"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
cases.confirmed AS total_confirmed_cases,
cases.confirmed/country_pop.year_2018 * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary
cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region IN ("Italy", "Spain", "Germany", "Iran")
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
ORDER BY
confirmed_cases_per_100000 desc
JHU CSSE
Daily
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - WHO
People can catch COVID-19 from others who have the virus. This has been spreading rapidly around the world and Italy is one of the most affected country.
On March 8, 2020 - Italy’s prime minister announced a sweeping coronavirus quarantine early Sunday, restricting the movements of about a quarter of the country’s population in a bid to limit contagions at the epicenter of Europe’s outbreak. - TIME
This dataset is from https://github.com/pcm-dpc/COVID-19
collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale
This dataset has two files
covid19_italy_province.csv
- Province level data of COVID-19 casescovid_italy_region.csv
- Region level data of COVID-19 casesData is collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale and is uploaded into this github repo.
Dashboard on the data can be seen here. Picture courtesy is from the dashboard.
Insights on * Spread to various regions over time * Try to predict the spread of COVID-19 ahead of time to take preventive measures
The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.
The dataset consists of two main csv files: covid_us_county.csv
and us_county.csv
. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.
covid_us_county.csv
: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date
column. The earliest recorded cases are from 2020-01-22.
us_counties.csv
: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.
COVID-19 dataset covid_us_county.csv
:
fips
: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.
county
: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.
state
: Name of US state or territory.
state_code
: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.
lat
and long
: coordinates of the county or territory.
date
: Reporting date.
cases
& deaths
: Cumulative numbers for cases & deaths.
Demographic dataset us_counties.csv
:
fips
, county
, state
, state_code
: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips
.
male
& female
: Population numbers for male and female.
population
: Total population for the county. Provided as convenience feature; is always the sum of male + female
.
female_percentage
: Another convenience feature: female / population
in percent.
median_age
: Overall median age for the county.
Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
The github repo states that:
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coronavirus disease 2019 (COVID19) time series that lists confirmed cases, reported deaths, and reported recoveries. Data is broken down by country (and sometimes by sub-region).
Coronavirus disease (COVID19) is caused by severe acute respiratory syndrome Coronavirus 2 (SARSCoV2) and has had an effect worldwide. On March 11, 2020, the World Health Organization (WHO) declared it a pandemic, currently indicating more than 118,000 cases of coronavirus disease in more than 110 countries and territories around the world.
This dataset contains the latest news related to Covid-19 and it was fetched with the help of Newsdata.io news API.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The World Health Organization reported 6932591 Coronavirus Deaths since the epidemic began. In addition, countries reported 766440796 Coronavirus Cases. This dataset provides - World Coronavirus Deaths- actual values, historical data, forecast, chart, statistics, economic calendar and news.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset of the most highly populated city (if applicable) in a form easy to join with the COVID19 Global Forecasting (Week 1) dataset. You can see how to use it in this kernel
There are four columns. The first two correspond to the columns from the original COVID19 Global Forecasting (Week 1) dataset. The other two is the highest population density, at city level, for the given country/state. Note that some countries are very small and in those cases the population density reflects the entire country. Since the original dataset has a few cruise ships as well, I've added them there.
Thanks a lot to Kaggle for this competition that gave me the opportunity to look closely at some data and understand this problem better.
Summary: I believe that the square root of the population density should relate to the logistic growth factor of the SIR model. I think the SEIR model isn't applicable due to any intervention being too late for a fast-spreading virus like this, especially in places with dense populations.
After playing with the data provided in COVID19 Global Forecasting (Week 1) (and everything else online or media) a bit, one thing becomes clear. They have nothing to do with epidemiology. They reflect sociopolitical characteristics of a country/state and, more specifically, the reactivity and attitude towards testing.
The testing method used (PCR tests) means that what we measure could potentially be a proxy for the number of people infected during the last 3 weeks, i.e the growth (with lag). It's not how many people have been infected and recovered. Antibody or serology tests would measure that, and by using them, we could go back to normality faster... but those will arrive too late. Way earlier, China will have experimentally shown that it's safe to go back to normal as soon as your number of newly infected per day is close to zero.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F197482%2F429e0fdd7f1ce86eba882857ac7a735e%2Fcovid-summary.png?generation=1585072438685236&alt=media" alt="">
My view, as a person living in NYC, about this virus, is that by the time governments react to media pressure, to lockdown or even test, it's too late. In dense areas, everyone susceptible has already amble opportunities to be infected. Especially for a virus with 5-14 days lag between infections and symptoms, a period during which hosts spread it all over on subway, the conditions are hopeless. Active populations have already been exposed, mostly asymptomatic and recovered. Sensitive/older populations are more self-isolated/careful in affluent societies (maybe this isn't the case in North Italy). As the virus finishes exploring the active population, it starts penetrating the more isolated ones. At this point in time, the first fatalities happen. Then testing starts. Then the media and the lockdown. Lockdown seems overly effective because it coincides with the tail of the disease spread. It helps slow down the virus exploring the long-tail of sensitive population, and we should all contribute by doing it, but it doesn't cause the end of the disease. If it did, then as soon as people were back in the streets (see China), there would be repeated outbreaks.
Smart politicians will test a lot because it will make their condition look worse. It helps them demand more resources. At the same time, they will have a low rate of fatalities due to large denominator. They can take credit for managing well a disproportionally major crisis - in contrast to people who didn't test.
We were lucky this time. We, Westerners, have woken up to the potential of a pandemic. I'm sure we will give further resources for prevention. Additionally, we will be more open-minded, helping politicians to have more direct responses. We will also require them to be more responsible in their messages and reactions.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is part of the Monash, UEA & UCR time series regression repository. http://tseregression.org/
The goal of this dataset is to predict COVID-19's death rate on 1st April 2020 for each country using daily confirmed cases for the last three months. This dataset contains 201 time series, where each time series is the daily confirmed cases for a country. The data was obtained from WHO's COVID-19 database.
Please refer to https://covid19.who.int/ for more details
This feature layer contains the most up-to-date COVID-19 cases and latest trend plot. It covers China, the US, Canada, Australia (at province/state level), and the rest of the world (at country level, represented by either the country centroids or their capitals). Data sources are WHO, US CDC, China NHC, ECDC, and DXY. The China data is automatically updating at least once per hour, and non China data is updating manually. This layer is created and maintained by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University. This feature layer is supported by Esri Living Atlas team and JHU Data Services. This layer is opened to the public and free to share. Contact us.The data is processed from JHU Services and filtered for the Middle East and Africa Region.
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Objective Daily COVID-19 data reported by the World Health Organization (WHO) may provide the basis for political ad hoc decisions including travel restrictions. Data reported by countries, however, is heterogeneous and metrics to evaluate its quality are scarce. In this work, we analyzed COVID-19 case counts provided by WHO and developed tools to evaluate country-specific reporting behaviors. Methods In this retrospective cross-sectional study, COVID-19 data reported daily to WHO from 3rd January 2020 until 14th June 2021 were analyzed. We proposed the concepts of binary reporting rate and relative reporting behavior and performed descriptive analyses for all countries with these metrics. We developed a score to evaluate the consistency of incidence and binary reporting rates. Further, we performed spectral clustering of the binary reporting rate and relative reporting behavior to identify salient patterns in these metrics. Results Our final analysis included 222 countries and regions. Reporting scores varied between -0.17, indicating discrepancies between incidence and binary reporting rate, and 1.0 suggesting high consistency of these two metrics. Median reporting score for all countries was 0.71 (IQR 0.55 to 0.87). Descriptive analyses of the binary reporting rate and relative reporting behavior showed constant reporting with a slight “weekend effect” for most countries, while spectral clustering demonstrated that some countries had even more complex reporting patterns. Conclusion The majority of countries reported COVID-19 cases when they did have cases to report. The identification of a slight “weekend effect” suggests that COVID-19 case counts reported in the middle of the week may represent the best data basis for political ad hoc decisions. A few countries, however, showed unusual or highly irregular reporting that might require more careful interpretation. Our score system and cluster analyses might be applied by epidemiologists advising policymakers to consider country-specific reporting behaviors in political ad hoc decisions. Methods Data collection COVID-19 data was downloaded from WHO. Using a public repository, we have added the countries' full names to the WHO data set using the two-letter abbreviations for each country to merge both data sets. The provided COVID-19 data covers January 2020 until June 2021. We uploaded the final data set used for the analyses of this paper. Data processing We processed data using a Jupyter Notebook with a Python kernel and publically available external libraries. This upload contains the required Jupyter Notebook (reporting_behavior.ipynb) with all analyses and some additional work, a README, and the conda environment yml (env.yml).
Disclaimer: As of January 2025, SPC will no longer provide updated information on COVID-19 cases and deaths. The information presented on this page is for reference only. For current epidemic and emerging disease alerts in the Pacific region, please visit: https://www.spc.int/epidemics/
Statistics from SPC's Public Health Division (PHD) on the number of cases of COVID-19 and the number of deaths attributed to COVID-19 in Pacific Island Countries and Territories.
Find more Pacific data on PDH.stat.
This repository contains spatiotemporal data from many official sources for 2019-Novel Coronavirus beginning 2019 in Hubei, China ("nCoV_2019") You may not use this data for commercial purposes. If there is a need for commercial use of the data, please contact Ginkgo Biosecurity, the biosecurity and public health unit of Ginkgo Bioworks at help-epi-modeling@ginkgobioworks.com to obtain a commercial use license.
The incidence data are in a CSV file format. One row in an incidence file contains a piece of epidemiological data extracted from the specified source.
The file contains data from multiple sources at multiple spatial resolutions in cumulative and non-cumulative formats by confirmation status. To select a single time series of case or death data, filter the incidence dataset by source, spatial resolution, location, confirmation status, and cumulative flag.
Data are collected, structured, and validated by Ginkgo's digital surveillance experts. The data structuring process is designed to produce the most reliable estimates of reported cases and deaths over space and time. The data are cleaned and provided in a uniform format such that information can be compared across multiple sources. Data are collected at the time of publication in the highest geographic and temporal resolutions available in the original report.
This repository is intended to provide a single access point for data from a wide range of data sources. Data will be updated periodically with the latest epidemiological data. Ginkgo Biosecurity maintains a database of epidemiological information for over three thousand high-priority infectious disease events (please note: this database was previously maintained by Metabiota; the team responsible joined Ginkgo Biosecurity in August 2022. When using the database, please cite Ginkgo Biosecurity and refer to this repository). Please contact us (help-epi-modeling@ginkgobioworks.com) if you are interested in licensing the complete dataset.
Reporting sources provide either cumulative incidence, non-cumulative incidence, or both. If the source only provides a non-cumulative incidence value, the cumulative values are inferred using prior reports from the same source. Use the CUMULATIVE FLAG variable to subset the data to cumulative (TRUE) or non-cumulative (FALSE) values.
The incidence datasets include the confirmation status of cases and deaths when this information is provided by the reporting source. Subset the data by the CONFIRMATION_STATUS variable to either TOTAL, CONFIRMED, SUSPECTED, or PROBABLE to obtain the data of your choice.
Total incidence values include confirmed, suspected, and probable incidence values. If a source only provides suspected, probable, or confirmed incidence, the total incidence is inferred to be the sum of the provided values. If the report does not specify confirmation status, the value is included in the "total" confirmation status value.
The data provided under the "Multisource Fusion" often does not include suspected incidence due to inconsistencies in reporting cases and deaths with this confirmation status.
The incidence datasets include cases and deaths. Subset the data to either CASE or DEATH using the OUTCOME variable. It should be noted that deaths are included in case counts.
Data are provided at multiple spatial resolutions. Data should be subset to a single spatial resolution of interest using the SPATIAL_RESOLUTION variable.
Information is included at the finest spatial resolution provided to the original epidemic report. We also aggregate incidence to coarser geographic resolutions. For example, if a source only provides data at the province-level, then province-level data are included in the dataset as well as country-level totals. Users should avoid summing all cases or deaths in a given country for a given date without specifying the SPATIAL_RESOLUTION value. For example, subset the data to SPATIAL_RESOLUTION equal to "AL0” in order to view only the aggregated country level data.
There are differences in administrative division naming practices by country. Administrative levels in this dataset are defined using the Google Geolocation API (https://developers.google.com/maps/documentation/geolocation/). For example, the data for the 2019-nCoV from one source provides information for the city of Beijing, which Google Geolocations indicates is a "locality.” Beijing is also the name of the municipality where the city Beijing is located. Thus, the 2019-nCoV dataset includes rows of data for both the city Beijing, as well as the municipality of the same name. If additional cities in the Beijing municipality reported data, those data would be aggregated with the city Beijing data to form the municipality Beijing data.
Data sources in this repository were selected to provide comprehensive spatiotemporal data for each outbreak. Data from a specific source can be selected using the SOURCE variable.
In addition to the original reporting sources, Ginkgo Biosecurity compiles multiple sources to generate the most comprehensive view of an outbreak. This compilation is stored in the database under the source name "Multisource Fusion". The purpose of generating this new view of the outbreak is to provide the most accurate and precise spatiotemporal data for the outbreak. At this time, Ginkgo Biosecurity does not incorporate unofficial - including media - sources into the "Multisource Fusion" dataset.
Data are collected by a team of digital surveillance experts and undergo many quality assurance tests. After data are collected, they are independently verified by at least one additional analyst. The data also pass an automated validation program to ensure data consistency and integrity.
Creative Commons License Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)
This is a human-readable summary of the Legal Code.
You are free:
to Share — to copy, distribute and transmit the work to Remix — to adapt the work
Under the following conditions:
Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).
Noncommercial — You may not use this work for commercial purposes.
Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.
With the understanding that:
Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.
Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.
Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.
For details and the full license text, see http://creativecommons.org/licenses/by-nc-sa/3.0/
The information is provided “AS IS” and Concentric makes no representations or warranties, express or implied, of any type whatsoever including, without limitation, title, noninfringement, accuracy, completeness, merchantability, or fitness for any particular purpose. Use of proprietary information shall be at the user’s own risk, and Concentric assumes no liability or obligation to the user as a result of use.
Ginkgo Biosecurity shall in no event be liable for any decision taken by the user based on the data made available. Under no circumstances, shall Ginkgo Biosecurity be liable for any damages (whatsoever) arising out of the use or inability to use the database. The entire risk arising out of the use of the database remains with the user.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A coronavirus dataset with 104 countries constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the COVID-19. The assumptions for the different factors are as follows:
The last column represents the number of daily tests performed and the total number of cases and deaths reported each day.
https://raw.githubusercontent.com/SamBelkacem/COVID19-Algeria-and-World-Dataset/master/Images/Data%20description.png">
https://raw.githubusercontent.com/SamBelkacem/COVID19-Algeria-and-World-Dataset/master/Images/Countries%20by%20geographic%20coordinates.png">
https://raw.githubusercontent.com/SamBelkacem/COVID19-Algeria-and-World-Dataset/master/Images/Statistical%20description%20of%20the%20data.png">
https://raw.githubusercontent.com/SamBelkacem/COVID19-Algeria-and-World-Dataset/master/Images/Data%20distribution.png">
The dataset is available in an encoded CSV form on GitHub.
The Python Jupyter Notebook to read and visualize the data is available on nbviewer.
The dataset is updated every month with the latest numbers of COVID-19 cases, deaths, and tests. The last update was on March 01, 2021.
The dataset is constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the coronavirus. Note that we selected only the main factors for which we found data and that other factors can be used. All data were retrieved from the reliable Our World in Data website, except for data on:
If you want to use the dataset please cite the following arXiv paper, more details about the data construction are provided in it.
@article{belkacem_covid-19_2020,
title = {COVID-19 data analysis and forecasting: Algeria and the world},
shorttitle = {COVID-19 data analysis and forecasting},
journal = {arXiv preprint arXiv:2007.09755},
author = {Belkacem, Sami},
year = {2020}
}
If you have any question or suggestion, please contact me at this email address: s.belkacem@usthb.dz
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Countries around the world are working to “flatten the curve” of the coronavirus pandemic. Flattening the curve involves reducing the number of new COVID-19 cases from one day to the next. This helps prevent healthcare systems from becoming overwhelmed. When a country has fewer new COVID-19 cases emerging today than it did on a previous day, that’s a sign that the country is flattening the curve.
On a trend line of total cases, a flattened curve looks how it sounds: flat. On the charts on this page, which show new cases per day, a flattened curve will show a downward trend in the number of daily new cases.
This analysis uses a 5-day moving average to visualize the number of new COVID-19 cases and calculate the rate of change. This is calculated for each day by averaging the values of that day, the two days before, and the two next days. This approach helps prevent major events (such as a change in reporting methods) from skewing the data. The interactive charts below show the daily number of new cases for the 10 most affected countries, based on the reported number of deaths by COVID-19.
This datas were last updated on Saturday, April 25, 2020 at 11:51 PM EDT.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
It has never been easier to solve any database related problem using any sequel language and the following gives an opportunity for you guys to understand how I was able to figure out some of the interline relationships between databases using Panoply.io tool.
I was able to insert coronavirus dataset and create a submittable, reusable result. I hope it helps you work in Data Warehouse environment.
The following is list of SQL commands performed on dataset attached below with the final output as stored in Exports Folder QUERY 1 SELECT "Province/State" As "Region", Deaths, Recovered, Confirmed FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Deaths>0 Description: How will we estimate where Coronavirus has infiltrated, but there is effective recovery amongst patients? We can view those places by having Recovery twice more than the Death Toll.
Query 2 SELECT country, sum(confirmed) as "Confirmed Count", sum(Recovered) as "Recovered Count", sum(Deaths) as "Death Toll" FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Confirmed>0 GROUP BY country
Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries
Query 3 SELECT country as "Countries where Coronavirus has reached" FROM "public"."coronavirus_updated" WHERE confirmed>0 GROUP BY country Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries.
Query 4 SELECT country, sum(suspected) as "Suspected Cases under potential CoronaVirus outbreak" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 AND confirmed=0 GROUP BY country ORDER BY sum(suspected) DESC
Description: Coronavirus is spreading at alarming rate. In order to know which countries are newly getting the virus is important because in these countries if timely measures are taken, it could prevent any causalities. Here is a list of suspected cases with no virus resulted deaths.
Query 5 SELECT country, sum(suspected) as "Coronavirus uncontrolled spread count and human life loss", 100*sum(suspected)/(SELECT sum((suspected)) FROM "public"."coronavirus_updated") as "Global suspected Exposure of Coronavirus in percentage" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 GROUP BY country ORDER BY sum(suspected) DESC Description: Coronavirus is getting stronger in particular countries, but how will we measure that? We can measure it by knowing the percentage of suspected patients amongst countries which still doesn’t have any Coronavirus related deaths. The following is a list.
Data Provided by: SRK, Data Scientist at H2O.ai, Chennai, India
This dataset is maintained by the European Centre for Disease Prevention and Control (ECDC) and reports on the geographic distribution of COVID-19 cases worldwide. This data includes COVID-19 reported cases and deaths broken out by country. This data can be visualized via ECDC’s Situation Dashboard . More information on ECDC’s response to COVID-19 is available here . This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets. What is BigQuery . This dataset is hosted in both the EU and US regions of BigQuery. See the links below for the appropriate dataset copy: US region EU region This dataset has significant public interest in light of the COVID-19 crisis. All bytes processed in queries against this dataset will be zeroed out, making this part of the query free. Data joined with the dataset will be billed at the normal rate to prevent abuse. After September 15, queries over these datasets will revert to the normal billing rate. Users of ECDC public-use data files must comply with data use restrictions to ensure that the information will be used solely for statistical analysis or reporting purposes.