74 datasets found
  1. COVID-19 Trends in Each Country

    • data.amerigeoss.org
    esri rest, html
    Updated Jul 29, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ESRI (2020). COVID-19 Trends in Each Country [Dataset]. https://data.amerigeoss.org/dataset/covid-19-trends-in-each-country
    Explore at:
    esri rest, htmlAvailable download formats
    Dataset updated
    Jul 29, 2020
    Dataset provided by
    Esrihttp://esri.com/
    Description

    COVID-19 Trends Methodology
    Our goal is to analyze and present daily updates in the form of recent trends within countries, states, or counties during the COVID-19 global pandemic. The data we are analyzing is taken directly from the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard, though we expect to be one day behind the dashboard’s live feeds to allow for quality assurance of the data.


    6/24/2020 - Expanded Case Rates discussion to include fix on 6/23 for calculating active cases.
    6/22/2020 - Added Executive Summary and Subsequent Outbreaks sections
    Revisions on 6/10/2020 based on updated CDC reporting. This affects the estimate of active cases by revising the average duration of cases with hospital stays downward from 30 days to 25 days. The result shifted 76 U.S. counties out of Epidemic to Spreading trend and no change for national level trends.
    Methodology update on 6/2/2020: This sets the length of the tail of new cases to 6 to a maximum of 14 days, rather than 21 days as determined by the last 1/3 of cases. This was done to align trends and criteria for them with U.S. CDC guidance. The impact is areas transition into Controlled trend sooner for not bearing the burden of new case 15-21 days earlier.
    Correction on 6/1/2020
    Discussion of our assertion of an abundance of caution in assigning trends in rural counties added 5/7/2020.
    Revisions added on 4/30/2020 are highlighted.
    Revisions added on 4/23/2020 are highlighted.

    Executive Summary
    COVID-19 Trends is a methodology for characterizing the current trend for places during the COVID-19 global pandemic. Each day we assign one of five trends: Emergent, Spreading, Epidemic, Controlled, or End Stage to geographic areas to geographic areas based on the number of new cases, the number of active cases, the total population, and an algorithm (described below) that contextualize the most recent fourteen days with the overall COVID-19 case history. Currently we analyze the countries of the world and the U.S. Counties.
    The purpose is to give policymakers, citizens, and analysts a fact-based data driven sense for the direction each place is currently going. When a place has the initial cases, they are assigned Emergent, and if that place controls the rate of new cases, they can move directly to Controlled, and even to End Stage in a short time. However, if the reporting or measures to curtail spread are not adequate and significant numbers of new cases continue, they are assigned to Spreading, and in cases where the spread is clearly uncontrolled, Epidemic trend.

    We analyze the data reported by Johns Hopkins University to produce the trends, and we report the rates of cases, spikes of new cases, the number of days since the last reported case, and number of deaths. We also make adjustments to the assignments based on population so rural areas are not assigned trends based solely on case rates, which can be quite high relative to local populations.

    Two key factors are not consistently known or available and should be taken into consideration with the assigned trend. First is the amount of resources, e.g., hospital beds, physicians, etc.that are currently available in each area. Second is the number of recoveries, which are often not tested or reported. On the latter, we provide a probable number of active cases based on CDC guidance for the typical duration of mild to severe cases.

    Reasons for undertaking this work in March of 2020:
    1. The popular online maps and dashboards show counts of confirmed cases, deaths, and recoveries by country or administrative sub-region. Comparing the counts of one country to another can only provide a basis for comparison during the initial stages of the outbreak when counts were low and the number of local outbreaks in each country was low. By late March 2020, countries with small populations were being left out of the mainstream news because it was not easy to recognize they had high per capita rates of cases (Switzerland, Luxembourg, Iceland, etc.). Additionally, comparing countries that have had confirmed COVID-19 cases for high numbers of days to countries where the outbreak occurred recently is also a poor basis for comparison.
    2. The graphs of confirmed cases and daily increases in cases were fit into a standard size rectangle, though the Y-axis for one country had a maximum value of 50, and for another country 100,000, which potentially misled people interpreting the slope of the curve. Such misleading circumstances affected comparing large population countries to small population counties or countries with low numbers of cases to China which had a large count of cases in the early part of the outbreak. These challenges for interpreting and comparing these graphs represent work each reader must do based on their experience and ability. Thus, we felt it would be a service to attempt to automate the thought process experts would use when visually analyzing these graphs, particularly the most recent tail of the graph, and provide readers with an a resulting synthesis to characterize the state of the pandemic in that country, state, or county.
    3. The lack of reliable data for confirmed recoveries and therefore active cases. Merely subtracting deaths from total cases to arrive at this figure progressively loses accuracy after two weeks. The reason is 81% of cases recover after experiencing mild symptoms in 10 to 14 days. Severe cases are 14% and last 15-30 days (based on average days with symptoms of 11 when admitted to hospital plus 12 days median stay, and plus of one week to include a full range of severely affected people who recover). Critical cases are 5% and last 31-56 days. Sources:
    • U.S. CDC. April 3, 2020 Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19). Accessed online.
    • Initial older guidance was also obtained online.
    Additionally, many people who recover may not be tested, and many who are, may not be tracked due to privacy laws.
    Thus, the formula used to compute an estimate of active cases is:

    Active Cases = 100% of new cases in past 14 days + 19% from past 15-25 days + 5% from past 26-49 days - total deaths.
    <br

  2. g

    Coronavirus (Covid-19) Data in the United States

    • github.com
    • openicpsr.org
    • +3more
    csv
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://github.com/nytimes/covid-19-data
    Explore at:
    csvAvailable download formats
    Dataset provided by
    New York Times
    License

    https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE

    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  3. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Nishat Anjum
    Nafiz Sadman
    Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  4. m

    Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First...

    • data.mendeley.com
    Updated Jul 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasmot Ali (2020). Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First Case And First Death [Dataset]. http://doi.org/10.17632/vw427wzzkk.5
    Explore at:
    Dataset updated
    Jul 20, 2020
    Authors
    Hasmot Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. The datasets mainly focus on two major fields first one is First Case which consists of information of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the other one First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.

    *This is not a country. This is a ship. The name of the Cruise Ship was not given from the government.
    "N+": the age is not specified but greater than N
    “No Trace”: some data was not found
    “Unspecified”: not available from the authority
    “N/A”: for “Last Visited Country(s) of Confirmed Case(s)” column, “N/A” indicates that the confirmed case(s) of those countries do not have any travel history in recent past; in “Age of First Death(s)” column “N/A” indicates that those countries do not have may death case till May 16, 2020.

  5. COVID-19 by country

    • kaggle.com
    Updated Sep 13, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan Carlos Santiago Culebras (2021). COVID-19 by country [Dataset]. https://www.kaggle.com/jcsantiago/covid19-by-country-with-government-response/activity
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 13, 2021
    Dataset provided by
    Kaggle
    Authors
    Juan Carlos Santiago Culebras
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    Within the current response of a pandemic caused by the SARS-CoV-2 coronavirus, which in turn causes the disease, called COVID-19. It is necessary to join forces to minimize the effects of this disease.

    Therefore, the intention of this dataset is to save data scientists time:

    • Gather the data at the country level, encoding the country with its ISO code to allow easy access to other data
    • Perform pre-processing of data, calculations of increments and other indicators that can facilitate modeling.
    • Add the response of the governments over time so that it can be taken into account in the modeling.
    • Daily update.

    This dataset is not intended to be static, so suggestions for expanding it are welcome. If someone considers it important to add information, please let me know.

    Content

    The data contained in this dataset comes mainly from the following sources:

    Source: Center for Systems Science and Engineering (CSSE) at Johns Hopkins University https://github.com/CSSEGISandData/COVID-19 Provided by Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE): https://systems.jhu.edu/

    Source: OXFORD COVID-19 GOVERNMENT RESPONSE TRACKER https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker Hale, Thomas and Samuel Webster (2020). Oxford COVID-19 Government Response Tracker. Data use policy: Creative Commons Attribution CC BY standard.

    The original data is updated daily.

    The features it includes are:

    • Country Name

    • Country Code ISO 3166 Alpha 3

    • Date

    • Incidence data:

      • confirmed
      • deaths
      • recoveries
    • Daily increments:

      • confirmed_inc
      • deaths_inc
      • recoveries_inc
    • Empirical Contagion Rate - ECR

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3508582%2F3e90ecbcdf76dfbbee54a21800f5e0d6%2FECR.jpg?generation=1586861653126435&alt=media" alt="">

    • GOVERNMENT RESPONSE TRACKER - GRTStringencyIndex

      OXFORD COVID-19 GOVERNMENT RESPONSE TRACKER - Stringency Index

    • Indices from Start Contagion

      • Days since the first case of contagion is overcome
      • Days since 100 cases are exceeded
    • Percentages over the country's population:

      • confirmed_PopPct
      • deaths_PopPct
      • recoveries_PopPct

    The method of obtaining the data and its transformations can be seen in the notebook:

    Notebook COVID-19 Data by country with Government Response

    Photo by Markus Spiske on Unsplash

  6. Z

    COVID-19 mortality correlation with cloudiness, sunlight, latitude in...

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    COVID-19 mortality correlation with cloudiness, sunlight, latitude in European countries [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4266757
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Iftime Adrian
    Omer Secil
    Burcea Victor
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Europe
    Description

    "COVID-19 mortality correlation with cloudiness, sunlight, latitude in European countries"

    Dataset for article titled "COVID-19 mortality: positive correlation with cloudiness, sunlight and no correlation with latitude in Europe"

    by SECIL OMER, ADRIAN IFTIME, VICTOR BURCEA

    Corresponding author: A. Iftime, University of Medicine and Pharmacy "Carol Davila", Biophysics Department, 8 Blvd. Eroii Sanitari, 050474 Bucharest, Romania. Email address: adrian.iftime [at] umfcd.ro.

    Preprint corresponding to this dataset: https://doi.org/10.1101/2021.01.27.21250658

    ===========

    Dataset file: 1.0.0.COVID-19_Mortality_Cloudiness_Insolation_EUROPE_March_August_2020.csv

    Dataset graphical preview: 1.0.0.INFOGRAFIC_CloudFraction_vs_COVID-19_mortality_Europe_March-August_2020.png

    DATASET fields: "Country" : Country name; 37 European countries included.

    "Date": Date stamp at the collection time. Data collection was performed in the last day of every month. Date format: YYYY-MM-DD

    "Month_Key" : Date stamp at the collection time, formatted for easier monthly time series analysis. Date format: YYYY-MM

    "Month_Fct2020" Date stamp at the collection time,formatted for easier graphing, as a string with names of the months (in English).

    "Deaths_per_1Mpop" : Monthly mortality from COVID-19 raported in the country, reported as number of COVID-19 deaths per 1 million population of the country, in that particular month / country. NB: it is reported as million population, not patients.

    "LogDeaths_per_1Mpop" : Log10 transformation of "Deaths_per_1Mpop"

    "Insolation_Average" : Insolation average (solar irradiance at ground level), in that particular month / country. It is expressed in Watt / square meter of the ground surface. Data derived from data avaialble at NASA Langley Research Center, NASA’s Earth Observatory, CERES / FLASHFlux team, 2020, https://neo.sci.gsfc.nasa.gov/view.php?datasetId=CERES_INSOL_M

    "Cloud_Fraction" : Cloudiness (also known as cloud fraction, cloud cover, cloud amount or sky cover), as decimal fraction of the sky obscured by clouds, in that particular month / country. Data derived from NASA Goddard Space Flight Center, NASA’s Earth Observatory, MODIS Atmosphere Science Team, 2020, https://neo.sci.gsfc.nasa.gov/view.php?datasetId=MODAL2_M_CLD_FR

    "CENTR_latitude" and "CENTR_longitude" : Latitude and Longitude of the country centroid, for each country. Data derived from Google LLC, "Dataset publishing language: country centroids", https://developers.google.com/public-data/docs/canonical/countries_csv
    NOTE: This is identical in every month (obviuously); it is redundantly included for easier monthly sectional analysis of the data.

    ===========

    Versioning: 1.0.0.COVID-19_Mortality_Cloudiness_Insolation_EUROPE_March_August_2020.csv

    MAJOR: changes yearly; 1 = 2020 MINOR: changes if new monthly data is added in that particular year. PATCH: Changes only if errors or minor edits were performed.

    DOI for this version: 10.5281/zenodo.4266758

    Dataset file source for this version (internal analysis source file): db_covid_all-ANALYSIS.2020-09-22_r10.csv

  7. Coronavirus Worldwide Dataset

    • kaggle.com
    Updated Aug 11, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saurabh Raj (2020). Coronavirus Worldwide Dataset [Dataset]. https://www.kaggle.com/saurabhraj19/coronavirus-worldwide-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 11, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Saurabh Raj
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    The European CDC publishes daily statistics on the COVID-19 pandemic. Not just for Europe, but for the entire world. We rely on the ECDC as they collect and harmonize data from around the world which allows us to compare what is happening in different countries.

    Content

    This dataset has daily level information on the number of affected cases, deaths and recovery etc. from coronavirus. It also contains various other parameters like average life expectancy, population density, smocking population etc. which users can find useful in further prediction that they need to make.

    The data is available from 31 Dec,2019.

    Inspiration

    Give people weekly data so that they can use it to make accurate predictions.

  8. o

    Deaths Involving COVID-19 by Vaccination Status

    • data.ontario.ca
    • gimi9.com
    • +4more
    csv, docx, xlsx
    Updated Dec 13, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health (2024). Deaths Involving COVID-19 by Vaccination Status [Dataset]. https://data.ontario.ca/dataset/deaths-involving-covid-19-by-vaccination-status
    Explore at:
    docx(26086), docx(29332), xlsx(10972), csv(321473), xlsx(11053)Available download formats
    Dataset updated
    Dec 13, 2024
    Dataset authored and provided by
    Health
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Nov 14, 2024
    Area covered
    Ontario
    Description

    This dataset reports the daily reported number of the 7-day moving average rates of Deaths involving COVID-19 by vaccination status and by age group.

    Learn how the Government of Ontario is helping to keep Ontarians safe during the 2019 Novel Coronavirus outbreak.

    Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool

    Data includes:

    • Date on which the death occurred
    • Age group
    • 7-day moving average of the last seven days of the death rate per 100,000 for those not fully vaccinated
    • 7-day moving average of the last seven days of the death rate per 100,000 for those fully vaccinated
    • 7-day moving average of the last seven days of the death rate per 100,000 for those vaccinated with at least one booster

    Additional notes

    As of June 16, all COVID-19 datasets will be updated weekly on Thursdays by 2pm.

    As of January 12, 2024, data from the date of January 1, 2024 onwards reflect updated population estimates. This update specifically impacts data for the 'not fully vaccinated' category.

    On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023.

    CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags.

    The data does not include vaccination data for people who did not provide consent for vaccination records to be entered into the provincial COVaxON system. This includes individual records as well as records from some Indigenous communities where those communities have not consented to including vaccination information in COVaxON.

    “Not fully vaccinated” category includes people with no vaccine and one dose of double-dose vaccine. “People with one dose of double-dose vaccine” category has a small and constantly changing number. The combination will stabilize the results.

    Spikes, negative numbers and other data anomalies: Due to ongoing data entry and data quality assurance activities in Case and Contact Management system (CCM) file, Public Health Units continually clean up COVID-19, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes, negative numbers and current totals being different from previously reported case and death counts.

    Public Health Units report cause of death in the CCM based on information available to them at the time of reporting and in accordance with definitions provided by Public Health Ontario. The medical certificate of death is the official record and the cause of death could be different.

    Deaths are defined per the outcome field in CCM marked as “Fatal”. Deaths in COVID-19 cases identified as unrelated to COVID-19 are not included in the Deaths involving COVID-19 reported.

    Rates for the most recent days are subject to reporting lags

    All data reflects totals from 8 p.m. the previous day.

    This dataset is subject to change.

  9. D

    Covid-19 Country Level Social Science Dataset

    • dataverse.no
    • dataverse.azure.uit.no
    application/dbf +10
    Updated Oct 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Øystein Solvang; Øystein Solvang; Kari Elida Eriksen; Jonas Stein; Camilla Brattland; Kari Elida Eriksen; Jonas Stein; Camilla Brattland (2020). Covid-19 Country Level Social Science Dataset [Dataset]. http://doi.org/10.18710/VMUP44
    Explore at:
    type/x-r-syntax(11257), csv(36577), application/prj(146), type/x-r-syntax(12007), application/shx(2140), application/dbf(323441), bin(5), txt(9844), application/sbn(2796), application/prj(145), application/shp(8800376), type/x-r-syntax(4038), application/sbx(349), pdf(189956), bin(6), csv(41050), pdf(138533), application/dbf(10298), application/sbx(348), pdf(339251)Available download formats
    Dataset updated
    Oct 20, 2020
    Dataset provided by
    DataverseNO
    Authors
    Øystein Solvang; Øystein Solvang; Kari Elida Eriksen; Jonas Stein; Camilla Brattland; Kari Elida Eriksen; Jonas Stein; Camilla Brattland
    License

    https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/VMUP44https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/VMUP44

    Time period covered
    Jan 1, 2020 - Jul 15, 2020
    Area covered
    Covers 199 countries
    Description

    The dataset is a cross-sectional dataset covering social and public health data pertaining to the Covid-19 outbreak in 199 countries. The dataset was compiled from public register and other openly available sources. Data on Covid-19 cases and related fatalities is current as of medio July 2020. Data on other variables is mainly from the last three years, depending on data availability. Standardized unique unit identifiers (ISO-3166-1 Alpha-3) are included, enabling merging with other data. The dataset was assembled concurrently with a similar one on the Norwegian municipal level, as part of the project «Ressurs for studentaktiv læring i undervisning i statistisk og romlig analyse for samfunnsfag», at the Department of Social Science and The Norwegian College of Fishery Science, UiT. Dette er et tverrsnittsdatasett med samfunns- og folkehelsedata relatert til den pågående Covid-19-pandemien. Datasettet dekker 199 land. Det er satt sammen med data fra offentlige registre og andre åpent tilgjengelige kilder. Data om Covid-19-tilfeller og -dødsfall er à jour per medio juli 2020. Data på andre variabler er hovedsaklig fra de tre siste årene, avhengig av hva som var tilgjengelig på innsamlingstidspunktet. Standardiserte unike ID-variabler (ISO-3166-1 Alpha-3) er inkludert for å muliggjøre fusjonering med annen data. Datasettet ble satt sammen parallellt med et tilsvarende på kommunenivå (Norge), som en del av prosjektet «Ressurs for studentaktiv læring i undervisning i statistisk og romlig analyse for samfunnsfag» ved Institutt for samfunnsvitenskap og Norges fiskerihøgskole, UiT.

  10. Covid-19 dataset

    • kaggle.com
    zip
    Updated Jul 11, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Maheshashwin (2021). Covid-19 dataset [Dataset]. https://www.kaggle.com/maheshashwin/covid19-dataset
    Explore at:
    zip(4997945 bytes)Available download formats
    Dataset updated
    Jul 11, 2021
    Authors
    Maheshashwin
    Description

    Coronavirus (COVID-19) pandemic

    Our complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It includes data on confirmed cases, deaths, hospitalizations, and testing. Data is collected from multiple sources that update at different times and may not always align. Some locations may not provide complete information.

  11. d

    MD COVID-19 - Vaccination Percent Age Group Population

    • catalog.data.gov
    • opendata.maryland.gov
    Updated Apr 29, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    opendata.maryland.gov (2023). MD COVID-19 - Vaccination Percent Age Group Population [Dataset]. https://catalog.data.gov/dataset/md-covid-19-vaccination-percent-age-group-population
    Explore at:
    Dataset updated
    Apr 29, 2023
    Dataset provided by
    opendata.maryland.gov
    Description

    Regarding all Vaccination Data The date of Last Update is 4/21/2023. Additionally on 4/27/2023 several COVID-19 datasets were retired and no longer included in public COVID-19 data dissemination. See this link for more information https://imap.maryland.gov/pages/covid-data Summary The cumulative number of COVID-19 vaccinations percent age group population: 16-17; 18-49; 50-64; 65 Plus. Description COVID-19 - Vaccination Percent Age Group Population data layer is a collection of COVID-19 vaccinations that have been reported each day into ImmuNet. COVID-19 is a disease caused by a respiratory virus first identified in Wuhan, Hubei Province, China in December 2019. COVID-19 is a new virus that hasn't caused illness in humans before. Worldwide, COVID-19 has resulted in thousands of infections, causing illness and in some cases death. Cases have spread to countries throughout the world, with more cases reported daily. The Maryland Department of Health reports daily on COVID-19 cases by county. Terms of Use The Spatial Data, and the information therein, (collectively the Data) is provided as is without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata. This map is for planning purposes only. MEMA does not guarantee the accuracy of any forecast or predictive elements.

  12. T

    CORONAVIRUS DEATHS by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CORONAVIRUS DEATHS by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/coronavirus-deaths
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  13. Coronavirus COVID-19 Global Cases

    • redivis.com
    application/jsonl +7
    Updated Jul 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Stanford Center for Population Health Sciences (2020). Coronavirus COVID-19 Global Cases [Dataset]. http://doi.org/10.57761/pyf5-4e40
    Explore at:
    application/jsonl, parquet, csv, stata, avro, spss, sas, arrowAvailable download formats
    Dataset updated
    Jul 13, 2020
    Dataset provided by
    Redivis Inc.
    Authors
    Stanford Center for Population Health Sciences
    Time period covered
    Jan 22, 2020 - Jul 12, 2020
    Description

    Abstract

    JHU Coronavirus COVID-19 Global Cases, by country

    Documentation

    PHS is updating the Coronavirus Global Cases dataset weekly, Monday, Wednesday and Friday from Cloud Marketplace.

    This data comes from the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post.

    Visual Dashboard (desktop): https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

    Section 2

    Included Data Sources are:

    %3C!-- --%3E

    Section 3

    **Terms of Use: **

    This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

    Section 4

    **U.S. county-level characteristics relevant to COVID-19 **

    Chin, Kahn, Krieger, Buckee, Balsari and Kiang (forthcoming) show that counties differ significantly in biological, demographic and socioeconomic factors that are associated with COVID-19 vulnerability. A range of publicly available county-specific data identifying these key factors, guided by international experiences and consideration of epidemiological parameters of importance, have been combined by the authors and are available for use:

    https://github.com/mkiang/county_preparedness/

  14. Covid-19 Czech Republic

    • kaggle.com
    Updated Jul 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michal Brezak (2020). Covid-19 Czech Republic [Dataset]. https://www.kaggle.com/michalbrezk/covid19-czech-republic/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 3, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Michal Brezak
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Czechia
    Description

    Context

    This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.

    Content

    What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.

    Columns description

    Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex Infected abroad - Identifies if person was infected by Covid-19 in Czech republic or abroad Infected in country - code of country from where person came (origin country of Covid-19)

    Data granularity

    Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.

    Day level

    df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()
    

    Region level

    df_region = df[df['region'] != ''].groupby(['region']).agg(
      region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'), 
      infected=pd.NamedAgg(column='infected', aggfunc='sum'),
      cured=pd.NamedAgg(column='cured', aggfunc='sum'),
      death=pd.NamedAgg(column='death', aggfunc='sum')
    ).reset_index()
    

    Detail level

    df_detail = df[['date','region','sub_region','age','sex','infected','cured','death','infected_abroad','infected_in_country']].reset_index(drop=True)
    

    Acknowledgements

    Thanks to websites of MVCR for sharing such great information.

    Inspiration

    Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?

  15. m

    MD COVID19 ContactTracing CasesReportedGatherings Summary

    • coronavirus.maryland.gov
    • data.imap.maryland.gov
    • +3more
    Updated Sep 28, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2020). MD COVID19 ContactTracing CasesReportedGatherings Summary [Dataset]. https://coronavirus.maryland.gov/datasets/maryland::md-covid19-contacttracing-casesreportedgatherings-summary/about
    Explore at:
    Dataset updated
    Sep 28, 2020
    Dataset authored and provided by
    ArcGIS Online for Maryland
    Area covered
    Description

    SummaryThe number of cases interviewed who had a completed answer to the question asking if they attended any gatherings of more than 10 people in the 14 days before they became ill (or had a positive test) during their covidLINK interviews.DescriptionMD COVID-19 - Contact Tracing Cases Social Gatherings of More than 10 People layer reflects the number of cases interviewed who had a completed answer to the question asking if they attended any gatherings of more than 10 people in the 14 days before they became ill (or had a positive test) during their covidLINK interviews. Respondents may indicate that they attended more than one category of social gathering. For a variety of reasons, some individuals choose not to answer particular questions during the course of their interview.Events and locations where there is prolonged exposure to other people — including weddings, parties, stores, restaurants, etc. — are considered “high risk” for COVID-19 transmission. The more interaction at a gathering or location, the more likely a person may be to transmit or become infected with the virus. More information about considerations for events and gatherings — including how to assess risk levels and promote healthy behaviors that reduce spread — is available from the Centers for Disease Control and Prevention.Answers to interview questions do not provide evidence of cause and effect. Due to the nature of COVID-19 and the wide range of scenarios in which a person can become infected, most of the time it will not be possible to pinpoint exactly where and when a case became infected. Though a person may report attendance at a particular location, that does not mean that transmission happened at that location.The covidLINK interview questionnaire is updated as necessary to capture relevant information related to case exposure and potential onward transmission. These revisions should be taken into consideration when evaluating trends in case responses over time.COVID-19 is a disease caused by a respiratory virus first identified in Wuhan, Hubei Province, China in December 2019. COVID-19 is a new virus that hasn't caused illness in humans before. Worldwide, COVID-19 has resulted in thousands of infections, causing illness and in some cases death. Cases have spread to countries throughout the world, with more cases reported daily. The Maryland Department of Health reports daily on COVID-19 cases by county.

  16. o

    COVID-19 Pandemic - CH/Switzerland

    • public.aws-ec2-eu-1.opendatasoft.com
    • data.smartidf.services
    • +2more
    csv, excel, geojson +1
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2024). COVID-19 Pandemic - CH/Switzerland [Dataset]. https://public.aws-ec2-eu-1.opendatasoft.com/explore/dataset/covid-19-pandemic-ch-switzerland/?flg=fr
    Explore at:
    json, geojson, csv, excelAvailable download formats
    Dataset updated
    Apr 17, 2024
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Switzerland
    Description

    This dataset is based on the Github repository maintained by OpenZH. Data has been enriched with geographical data for the cantons, in order to produce visualisations.Field NameDescriptionFormatNote

    updateDate and time of notification YYYY-MM-DD-HH-MM

    nameName of the reporting cantonTextabbreviation_canton_and_fl Abbreviation of the reporting canton

    Text

    ncumul_testedReported number of tests performed as of dateNumberIrrespective of canton of residence

    ncumul_confReported number of confirmed cases as of dateNumberOnly cases that reside in the current canton

    current_hosp (formerly ncumul_hosp) *Reported number of hospitalised patients on dateNumberIrrespective of canton of residencecurrent_icu (formerly ncumul_icu) *Reported number of hospitalised patients in ICUs on dateNumberIrrespective of canton of residencecurrent_vent(formerly ncumul_vent) *Reported number of patients requiring ventilation on dateNumberIrrespective of canton of residencencumul_released Reported number of patients released from hospitals or reported recovered as of date

    NumberIrrespective of canton of residence

    ncumul_deceasedReported number of deceased as of dateNumberOnly cases that reside in the current cantonnew_hosp *Number of new hospitalisations since last dateNumberIrrespective of canton of residence

    sourceSource of the informationURL linkgeo_point_2dGeographical centroid of the cantongeo_point_2dcurrent_isolatedReported number of isolated persons on dateNumberInfected persons, who are not hospitalisedcurrent_quarantinedReported number of quarantined persons on dateNumberPersons, who were in 'close contact' with an infected person, while that person was infectious, and are not hospitalised themselvescurrent_quarantined_riskareatravelReported number of quarantined persons on dateNumberPeople arriving in Switzerland from certain countries and areas, required to go into quarantine (introduced in May 2021)*These variables were affected by the format change on April 9th, 2020, which consists in:- new variable "new_hosp"- variables "ncumul_hosp", "ncumul_icu", "ncumul_vent" have been renamed to "current_hosp", "current_icu", "current_vent", to fit with their nature. To ensure compatibility with already made dashboards or reuses, these fields have been duplicated to avoid errors when their old names are used; but we strongly recommand to replace their old names by the new as soon as possible.

  17. A

    ‘COVID-19: Holidays of countries’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Aug 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2020). ‘COVID-19: Holidays of countries’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-covid-19-holidays-of-countries-d8bd/e5a9e831/?iid=005-848&v=presentation
    Explore at:
    Dataset updated
    Aug 4, 2020
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘COVID-19: Holidays of countries’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/vbmokin/covid19-holidays-of-countries on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This research is devoted to the analysis of the impact of holidays on the statistics of confirmed coronavirus diseases. The Prophet using the holidays library with holidays of countries and their regions. As of 30 June 2020, only 62 countries (some with regions) are available in the holidays library:

    ['AR', 'AT', 'AU', 'BD', 'BE', 'BG', 'BR', 'BY', 'CA', 'CH', 'CL', 'CN', 'CO', 'CZ', 'DE', 'DK', 'DO', 'EE', 'EG', 'ES', 'FI', 'FR', 'GB', 'GR', 'HN', 'HR', 'HU', 'ID', 'IE', 'IL', 'IN', 'IS', 'IT', 'JP', 'KE', 'KR', 'LT', 'LU', 'MX', 'MY', 'NG', 'NI', 'NL', 'NO', 'NZ', 'PE', 'PH', 'PK', 'PL', 'PT', 'PY', 'RS', 'RU', 'SE', 'SG', 'SI', 'SK', 'TH', 'TR', 'UA', 'US', 'ZA'] or ['Argentina', 'Australia', 'Austria', 'Bangladesh', 'Belarus', 'Belgium', 'Brazil', 'Bulgaria', 'Canada', 'Chile', 'China', 'Colombia', 'Croatia', 'Czechia', 'Denmark', 'Dominican Republic', 'Egypt', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Ireland', 'Israel', 'Italy', 'Japan', 'Kenya', 'Korea, Republic of', 'Lithuania', 'Luxembourg', 'Malaysia', 'Mexico', 'Netherlands', 'New Zealand', 'Nicaragua', 'Nigeria', 'Norway', 'Pakistan', 'Paraguay', 'Peru', 'Philippines', 'Poland', 'Portugal', 'Russian Federation', 'Serbia', 'Singapore', 'Slovakia', 'Slovenia', 'South Africa', 'Spain', 'Sweden', 'Switzerland', 'Thailand', 'Turkey', 'Ukraine', 'United Kingdom', 'United States']

    I will note at once that the list of available countries in the description of the holidays library contains a lot of mistakes, which I wrote to the authors.

    When I asked if this list would expand, the Prophet team made it clear that they were waiting for help from the community with holidays library expand.

    As of Jan 2021 (version 8.4.1), 67 countries (some with regions) are available in the holidays library: a number of data have been refined and countries ['BI', 'LV', 'MA', 'RO', 'VN' - two-letter country codes or alpha_2 of the country (ISO 3166)] added.

    Unfortunately, the format of the holidays library is not very suitable for coronavirus problems, as it has a number of disadvantages. First, the names of the countries are given in one word, which makes it difficult for many of them to identify them according to their common names (ISO 3166). It is best that the dataset contains the common name and two-letter abbreviation in English according to ISO 3166 (see pycountry). Second, the dates are not adapted to the potential impact of the holidays on coronavirus statistics. It is known that after the moment of infection, the active manifestation of symptoms occurs with a delay of 4-10 days, that is a person is likely to get into the statistics on the number of diseases only after 4-7 days. Therefore, it is advisable to use the dates window of impacts: ``` Lower_window = [4, 7] Upper_window = [7, 10]

    `Lower_window <= 0`
    But my [request](https://github.com/facebook/prophet/issues/1588#issue-661098613) to allow positive numbers in this parameter [was refused](https://github.com/facebook/prophet/issues/1588#issuecomment-661984730) by the Prophet team and [advised](https://github.com/facebook/prophet/issues/1588#issuecomment-661984730) to simply move the dates themselves.
    Therefore, it is advisable to shift the holiday dates by 7 days. If the researcher thinks that 7 is too much and enough is 4 days, then he simply indicates "Lower" of the window in -3. Actually, by default, it makes sense to specify parameters:
    

    Lower_window = -3 Upper_window = 3

    If necessary, these settings are easy to change
    
    ### Content
    
    This dataset:
    1. Contains ISO codes, ISO names (common and official) (ISO 3166) of **70** countries (3 European countries **['Albania' - 'AL', 'Georgia' - 'GE', 'Moldova' - 'MD']** have been added).
    2. Contains imported dates from the holidays library for 2020-01-20-2021-12-31 (all countries from holidays library as of Jan 2021), and the same dates, but moved 7 days forward.
    3. Holidays of countries that are not in the list of holidays of the library, but which are in the data of the World Health Organization and on which considerable statistics of diseases on coronavirus are already collected.
    4. Parameters for Prophet model:
    `lower_window, upper_window, prior_scale`
    If you find errors, please write to the [Discussion](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries/discussion).
    
    It is planned to periodically update (and, if necessary, correct) this dataset. 
    
    ### Acknowledgements
    
    Thanks to the authors of the information resources
    * [https://github.com/dr-prodigy/python-holidays](https://github.com/dr-prodigy/python-holidays)
    * [https://en.wikipedia.org/wiki/List_of_holidays_by_country](https://en.wikipedia.org/wiki/List_of_holidays_by_country)
    about the dates and names of holidays in different countries, which I used.
    
    Thanks for the image to <a href="https://pixabay.com/ru/users/iXimus-2352783/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=5062659">iXimus</a> from <a href="https://pixabay.com/ru/?utm_source=link-attribution&utm_medium=referral&utm_campaign=image&utm_content=5062659">Pixabay</a>
    
    
    ### Inspiration
    
    The main task for which this dataset was created is to study the impact of holidays on the accuracy of predicting coronavirus diseases, identifying new patterns, and forming optimal solutions to counteract or minimize its spread.
    
    Tasks that need to be solved to improve this dataset in order to increase the accuracy of modeling the impact of holidays on the number of coronavirus patients:
    1) Expanding the list of countries
    2) Clarification of holiday dates
    3) Clarification of parameters 
    `lower_window, upper_window, prior_scale`
    they must be unique for each country and each holiday.
    
    Also, it is advisable to carry out similar work for each region of countries, but this will not be done in this dataset.
    
    --- Original source retains full ownership of the source dataset ---
    
  18. g

    Coronavirus COVID-19 Global Cases by the Center for Systems Science and...

    • github.com
    • systems.jhu.edu
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) [Dataset]. https://github.com/CSSEGISandData/COVID-19
    Explore at:
    Dataset provided by
    Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
    Area covered
    Global
    Description

    2019 Novel Coronavirus COVID-19 (2019-nCoV) Visual Dashboard and Map:
    https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

    • Confirmed Cases by Country/Region/Sovereignty
    • Confirmed Cases by Province/State/Dependency
    • Deaths
    • Recovered

    Downloadable data:
    https://github.com/CSSEGISandData/COVID-19

    Additional Information about the Visual Dashboard:
    https://systems.jhu.edu/research/public-health/ncov

  19. m

    Dataset of development of business during the COVID-19 crisis

    • data.mendeley.com
    • narcis.nl
    Updated Nov 9, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tatiana N. Litvinova (2020). Dataset of development of business during the COVID-19 crisis [Dataset]. http://doi.org/10.17632/9vvrd34f8t.1
    Explore at:
    Dataset updated
    Nov 9, 2020
    Authors
    Tatiana N. Litvinova
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To create the dataset, the top 10 countries leading in the incidence of COVID-19 in the world were selected as of October 22, 2020 (on the eve of the second full of pandemics), which are presented in the Global 500 ranking for 2020: USA, India, Brazil, Russia, Spain, France and Mexico. For each of these countries, no more than 10 of the largest transnational corporations included in the Global 500 rating for 2020 and 2019 were selected separately. The arithmetic averages were calculated and the change (increase) in indicators such as profitability and profitability of enterprises, their ranking position (competitiveness), asset value and number of employees. The arithmetic mean values of these indicators for all countries of the sample were found, characterizing the situation in international entrepreneurship as a whole in the context of the COVID-19 crisis in 2020 on the eve of the second wave of the pandemic. The data is collected in a general Microsoft Excel table. Dataset is a unique database that combines COVID-19 statistics and entrepreneurship statistics. The dataset is flexible data that can be supplemented with data from other countries and newer statistics on the COVID-19 pandemic. Due to the fact that the data in the dataset are not ready-made numbers, but formulas, when adding and / or changing the values in the original table at the beginning of the dataset, most of the subsequent tables will be automatically recalculated and the graphs will be updated. This allows the dataset to be used not just as an array of data, but as an analytical tool for automating scientific research on the impact of the COVID-19 pandemic and crisis on international entrepreneurship. The dataset includes not only tabular data, but also charts that provide data visualization. The dataset contains not only actual, but also forecast data on morbidity and mortality from COVID-19 for the period of the second wave of the pandemic in 2020. The forecasts are presented in the form of a normal distribution of predicted values and the probability of their occurrence in practice. This allows for a broad scenario analysis of the impact of the COVID-19 pandemic and crisis on international entrepreneurship, substituting various predicted morbidity and mortality rates in risk assessment tables and obtaining automatically calculated consequences (changes) on the characteristics of international entrepreneurship. It is also possible to substitute the actual values identified in the process and following the results of the second wave of the pandemic to check the reliability of pre-made forecasts and conduct a plan-fact analysis. The dataset contains not only the numerical values of the initial and predicted values of the set of studied indicators, but also their qualitative interpretation, reflecting the presence and level of risks of a pandemic and COVID-19 crisis for international entrepreneurship.

  20. MD COVID19 ContactTracing CasesReportedHighRiskLocations Summary

    • dev-maryland.opendata.arcgis.com
    • data.imap.maryland.gov
    • +2more
    Updated Mar 30, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2021). MD COVID19 ContactTracing CasesReportedHighRiskLocations Summary [Dataset]. https://dev-maryland.opendata.arcgis.com/datasets/md-covid19-contacttracing-casesreportedhighrisklocations-summary
    Explore at:
    Dataset updated
    Mar 30, 2021
    Dataset provided by
    Authors
    ArcGIS Online for Maryland
    Description

    SummaryThe number of cases interviewed who had a completed answer to the question asking if they visited or worked at any of a list of high risk locations in the 14 days before they became ill (or had a positive test) during their covidLINK interviews.DescriptionMD COVID-19 - Contact Tracing Cases High Risk Locations layer reflects the number of cases interviewed who had a completed answer to the question asking if they visited or worked at any of a list of high risk locations in the 14 days before they became ill (or had a positive test) during their covidLINK interviews. Respondents may indicate that they visited or worked at more than one category of high risk location. For a variety of reasons, some individuals choose not to answer particular questions during the course of their interview.Events and locations where there is prolonged exposure to other people — including weddings, parties, stores, restaurants, etc. — are considered “high risk” for COVID-19 transmission. The more interaction at a gathering or location, the more likely a person may be to transmit or become infected with the virus. More information about considerations for events and gatherings — including how to assess risk levels and promote healthy behaviors that reduce spread — is available from the Centers for Disease Control and Prevention.Answers to interview questions do not provide evidence of cause and effect. Due to the nature of COVID-19 and the wide range of scenarios in which a person can become infected, most of the time it will not be possible to pinpoint exactly where and when a case became infected. Though a person may report attendance at a particular location, that does not mean that transmission happened at that location.The covidLINK interview questionnaire is updated as necessary to capture relevant information related to case exposure and potential onward transmission. These revisions should be taken into consideration when evaluating trends in case responses over time.COVID-19 is a disease caused by a respiratory virus first identified in Wuhan, Hubei Province, China in December 2019. COVID-19 is a new virus that hasn't caused illness in humans before. Worldwide, COVID-19 has resulted in thousands of infections, causing illness and in some cases death. Cases have spread to countries throughout the world, with more cases reported daily. The Maryland Department of Health reports daily on COVID-19 cases by county.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
ESRI (2020). COVID-19 Trends in Each Country [Dataset]. https://data.amerigeoss.org/dataset/covid-19-trends-in-each-country
Organization logo

COVID-19 Trends in Each Country

Explore at:
esri rest, htmlAvailable download formats
Dataset updated
Jul 29, 2020
Dataset provided by
Esrihttp://esri.com/
Description

COVID-19 Trends Methodology
Our goal is to analyze and present daily updates in the form of recent trends within countries, states, or counties during the COVID-19 global pandemic. The data we are analyzing is taken directly from the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard, though we expect to be one day behind the dashboard’s live feeds to allow for quality assurance of the data.


6/24/2020 - Expanded Case Rates discussion to include fix on 6/23 for calculating active cases.
6/22/2020 - Added Executive Summary and Subsequent Outbreaks sections
Revisions on 6/10/2020 based on updated CDC reporting. This affects the estimate of active cases by revising the average duration of cases with hospital stays downward from 30 days to 25 days. The result shifted 76 U.S. counties out of Epidemic to Spreading trend and no change for national level trends.
Methodology update on 6/2/2020: This sets the length of the tail of new cases to 6 to a maximum of 14 days, rather than 21 days as determined by the last 1/3 of cases. This was done to align trends and criteria for them with U.S. CDC guidance. The impact is areas transition into Controlled trend sooner for not bearing the burden of new case 15-21 days earlier.
Correction on 6/1/2020
Discussion of our assertion of an abundance of caution in assigning trends in rural counties added 5/7/2020.
Revisions added on 4/30/2020 are highlighted.
Revisions added on 4/23/2020 are highlighted.

Executive Summary
COVID-19 Trends is a methodology for characterizing the current trend for places during the COVID-19 global pandemic. Each day we assign one of five trends: Emergent, Spreading, Epidemic, Controlled, or End Stage to geographic areas to geographic areas based on the number of new cases, the number of active cases, the total population, and an algorithm (described below) that contextualize the most recent fourteen days with the overall COVID-19 case history. Currently we analyze the countries of the world and the U.S. Counties.
The purpose is to give policymakers, citizens, and analysts a fact-based data driven sense for the direction each place is currently going. When a place has the initial cases, they are assigned Emergent, and if that place controls the rate of new cases, they can move directly to Controlled, and even to End Stage in a short time. However, if the reporting or measures to curtail spread are not adequate and significant numbers of new cases continue, they are assigned to Spreading, and in cases where the spread is clearly uncontrolled, Epidemic trend.

We analyze the data reported by Johns Hopkins University to produce the trends, and we report the rates of cases, spikes of new cases, the number of days since the last reported case, and number of deaths. We also make adjustments to the assignments based on population so rural areas are not assigned trends based solely on case rates, which can be quite high relative to local populations.

Two key factors are not consistently known or available and should be taken into consideration with the assigned trend. First is the amount of resources, e.g., hospital beds, physicians, etc.that are currently available in each area. Second is the number of recoveries, which are often not tested or reported. On the latter, we provide a probable number of active cases based on CDC guidance for the typical duration of mild to severe cases.

Reasons for undertaking this work in March of 2020:
  1. The popular online maps and dashboards show counts of confirmed cases, deaths, and recoveries by country or administrative sub-region. Comparing the counts of one country to another can only provide a basis for comparison during the initial stages of the outbreak when counts were low and the number of local outbreaks in each country was low. By late March 2020, countries with small populations were being left out of the mainstream news because it was not easy to recognize they had high per capita rates of cases (Switzerland, Luxembourg, Iceland, etc.). Additionally, comparing countries that have had confirmed COVID-19 cases for high numbers of days to countries where the outbreak occurred recently is also a poor basis for comparison.
  2. The graphs of confirmed cases and daily increases in cases were fit into a standard size rectangle, though the Y-axis for one country had a maximum value of 50, and for another country 100,000, which potentially misled people interpreting the slope of the curve. Such misleading circumstances affected comparing large population countries to small population counties or countries with low numbers of cases to China which had a large count of cases in the early part of the outbreak. These challenges for interpreting and comparing these graphs represent work each reader must do based on their experience and ability. Thus, we felt it would be a service to attempt to automate the thought process experts would use when visually analyzing these graphs, particularly the most recent tail of the graph, and provide readers with an a resulting synthesis to characterize the state of the pandemic in that country, state, or county.
  3. The lack of reliable data for confirmed recoveries and therefore active cases. Merely subtracting deaths from total cases to arrive at this figure progressively loses accuracy after two weeks. The reason is 81% of cases recover after experiencing mild symptoms in 10 to 14 days. Severe cases are 14% and last 15-30 days (based on average days with symptoms of 11 when admitted to hospital plus 12 days median stay, and plus of one week to include a full range of severely affected people who recover). Critical cases are 5% and last 31-56 days. Sources:
  • U.S. CDC. April 3, 2020 Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19). Accessed online.
  • Initial older guidance was also obtained online.
Additionally, many people who recover may not be tested, and many who are, may not be tracked due to privacy laws.
Thus, the formula used to compute an estimate of active cases is:

Active Cases = 100% of new cases in past 14 days + 19% from past 15-25 days + 5% from past 26-49 days - total deaths.
<br

Search
Clear search
Close search
Google apps
Main menu