100+ datasets found
  1. Novel Covid-19 Dataset

    • kaggle.com
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GHOST5612
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context:

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited:

    Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Here’s a polished version suitable for a professional Kaggle dataset description:

    Dataset Description

    This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

    Files and Columns

    1. covid_19_data.csv (Main File)

    This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

    • Sno – Serial number of the record
    • ObservationDate – Date of the observation (MM/DD/YYYY)
    • Province/State – Province or state of the observation (may be missing for some entries)
    • Country/Region – Country of the observation
    • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
    • Confirmed – Cumulative number of confirmed cases on that date
    • Deaths – Cumulative number of deaths on that date
    • Recovered – Cumulative number of recoveries on that date

    2. 2019_ncov_data.csv (Legacy File)

    This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

    3. COVID_open_line_list_data.csv

    This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

    4. COVID19_line_list_data.csv

    Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

    ✅ Use covid_19_data.csv for up-to-date aggregated global trends.

    ✅ Use the line list datasets for detailed, individual-level case analysis.

    Country level datasets:

    If you are interested in knowing country level data, please refer to the following Kaggle datasets:

    India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

    South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

    Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

    Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

    USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

    Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

    Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

    Acknowledgements :

    Johns Hopkins University for making the data available for educational and academic research purposes

    MoBS lab - https://www.mobs-lab.org/2019ncov.html

    World Health Organization (WHO): https://www.who.int/

    DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

    BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

    National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

    China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

    Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

    Macau Government: https://www.ssm.gov.mo/portal/

    Taiwan CDC: https://sites.google....

  2. Country data on COVID-19

    • kaggle.com
    zip
    Updated Aug 6, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carla Oliveira (2023). Country data on COVID-19 [Dataset]. https://www.kaggle.com/datasets/carlaoliveira/country-data-on-covid19
    Explore at:
    zip(8634707 bytes)Available download formats
    Dataset updated
    Aug 6, 2023
    Authors
    Carla Oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    The data is in CSV format and includes all historical data on the pandemic up to 03/01/2023, following a 1-line format per country and date.

    In the pre-processing of these data, missing data were checked. It was observed, for example, that the missing data referring to new_cases was where the total number of cases had not been changed and that most of the missing data related to vaccination, which actually at the beginning of the pandemic there was no data. Therefore, to solve these cases of missing data it was decided to replace the data containing “NaN” by zero. Some of these features were combined to generate new features. This process that creates new features (data) from existing data, aiming to improve the data before applying machine learning algorithms, is called feature engineering. The new features created were: - Vaccination rate (vaccination_ratio'): total number of people who received at least one dose of vaccine divided by the population at risk. This dose number was chosen because it has a higher correlation with new deaths. - Prevalence: existing cases of the disease at a given time divided by the population at risk of having the disease. Formula: COVID-19 cases ÷ Population at risk * 100. Example: 168,331 ÷ 210,000,000 * 100 = 0.08. - Incidence: new cases of the disease in a defined population during a specific period (one day, for example) divided by the population at risk. Formula: New COVID-19 cases in one day ÷ Population - Total cases * 100. Example: 5,632 ÷ 209,837,301 * 100 = 0.0026.

  3. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +4more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  4. T

    CORONAVIRUS DEATHS by Country Dataset

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). CORONAVIRUS DEATHS by Country Dataset [Dataset]. https://tradingeconomics.com/country-list/coronavirus-deaths
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    World
    Description

    This dataset provides values for CORONAVIRUS DEATHS reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  5. m

    Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First...

    • data.mendeley.com
    Updated Jul 20, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hasmot Ali (2020). Data for: COVID-19 Dataset: Worldwide Spread Log Including Countries First Case And First Death [Dataset]. http://doi.org/10.17632/vw427wzzkk.5
    Explore at:
    Dataset updated
    Jul 20, 2020
    Authors
    Hasmot Ali
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Contain informative data related to COVID-19 pandemic. Specially, figure out about the First Case and First Death information for every single country. The datasets mainly focus on two major fields first one is First Case which consists of information of Date of First Case(s), Number of confirm Case(s) at First Day, Age of the patient(s) of First Case, Last Visited Country and the other one First Death information consist of Date of First Death and Age of the Patient who died first for every Country mentioning corresponding Continent. The datasets also contain the Binary Matrix of spread chain among different country and region.

    *This is not a country. This is a ship. The name of the Cruise Ship was not given from the government.
    "N+": the age is not specified but greater than N
    “No Trace”: some data was not found
    “Unspecified”: not available from the authority
    “N/A”: for “Last Visited Country(s) of Confirmed Case(s)” column, “N/A” indicates that the confirmed case(s) of those countries do not have any travel history in recent past; in “Age of First Death(s)” column “N/A” indicates that those countries do not have may death case till May 16, 2020.

  6. g

    Coronavirus COVID-19 Global Cases by the Center for Systems Science and...

    • github.com
    • systems.jhu.edu
    • +1more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE), Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU) [Dataset]. https://github.com/CSSEGISandData/COVID-19
    Explore at:
    Dataset provided by
    Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE)
    Area covered
    Global
    Description

    2019 Novel Coronavirus COVID-19 (2019-nCoV) Visual Dashboard and Map:
    https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

    • Confirmed Cases by Country/Region/Sovereignty
    • Confirmed Cases by Province/State/Dependency
    • Deaths
    • Recovered

    Downloadable data:
    https://github.com/CSSEGISandData/COVID-19

    Additional Information about the Visual Dashboard:
    https://systems.jhu.edu/research/public-health/ncov

  7. e

    COVID-19 Coronavirus data - weekly (from 17 December 2020)

    • data.europa.eu
    csv, excel xlsx, html +3
    Updated Dec 17, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    European Centre for Disease Prevention and Control (2020). COVID-19 Coronavirus data - weekly (from 17 December 2020) [Dataset]. https://data.europa.eu/data/datasets/covid-19-coronavirus-data-weekly-from-17-december-2020?locale=en
    Explore at:
    html, csv, json, unknown, xml, excel xlsxAvailable download formats
    Dataset updated
    Dec 17, 2020
    Dataset authored and provided by
    European Centre for Disease Prevention and Control
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset contains a weekly situation update on COVID-19, the epidemiological curve and the global geographical distribution (EU/EEA and the UK, worldwide).

    Since the beginning of the coronavirus pandemic, ECDC’s Epidemic Intelligence team has collected the number of COVID-19 cases and deaths, based on reports from health authorities worldwide. This comprehensive and systematic process was carried out on a daily basis until 14/12/2020. See the discontinued daily dataset: COVID-19 Coronavirus data - daily. ECDC’s decision to discontinue daily data collection is based on the fact that the daily number of cases reported or published by countries is frequently subject to retrospective corrections, delays in reporting and/or clustered reporting of data for several days. Therefore, the daily number of cases may not reflect the true number of cases at EU/EEA level at a given day of reporting. Consequently, day to day variations in the number of cases does not constitute a valid basis for policy decisions.

    ECDC continues to monitor the situation. Every week between Monday and Wednesday, a team of epidemiologists screen up to 500 relevant sources to collect the latest figures for publication on Thursday. The data screening is followed by ECDC’s standard epidemic intelligence process for which every single data entry is validated and documented in an ECDC database. An extract of this database, complete with up-to-date figures and data visualisations, is then shared on the ECDC website, ensuring a maximum level of transparency.

    ECDC receives regular updates from EU/EEA countries through the Early Warning and Response System (EWRS), The European Surveillance System (TESSy), the World Health Organization (WHO) and email exchanges with other international stakeholders. This information is complemented by screening up to 500 sources every day to collect COVID-19 figures from 196 countries. This includes websites of ministries of health (43% of the total number of sources), websites of public health institutes (9%), websites from other national authorities (ministries of social services and welfare, governments, prime minister cabinets, cabinets of ministries, websites on health statistics and official response teams) (6%), WHO websites and WHO situation reports (2%), and official dashboards and interactive maps from national and international institutions (10%). In addition, ECDC screens social media accounts maintained by national authorities on for example Twitter, Facebook, YouTube or Telegram accounts run by ministries of health (28%) and other official sources (e.g. official media outlets) (2%). Several media and social media sources are screened to gather additional information which can be validated with the official sources previously mentioned. Only cases and deaths reported by the national and regional competent authorities from the countries and territories listed are aggregated in our database.

    Disclaimer: National updates are published at different times and in different time zones. This, and the time ECDC needs to process these data, might lead to discrepancies between the national numbers and the numbers published by ECDC. Users are advised to use all data with caution and awareness of their limitations. Data are subject to retrospective corrections; corrected datasets are released as soon as processing of updated national data has been completed.

    If you reuse or enrich this dataset, please share it with us.

  8. T

    World Coronavirus COVID-19 Cases

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). World Coronavirus COVID-19 Cases [Dataset]. https://tradingeconomics.com/world/coronavirus-cases
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 4, 2020 - May 17, 2023
    Area covered
    World
    Description

    The World Health Organization reported 766440796 Coronavirus Cases since the epidemic began. In addition, countries reported 6932591 Coronavirus Deaths. This dataset provides - World Coronavirus Cases- actual values, historical data, forecast, chart, statistics, economic calendar and news.

  9. COVID-19 First Case Date By Country (Coronavirus)

    • kaggle.com
    zip
    Updated May 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joseph Glynn (2020). COVID-19 First Case Date By Country (Coronavirus) [Dataset]. https://www.kaggle.com/datasets/josephglynn/covid19-first-case-date-by-country-coronavirus/code
    Explore at:
    zip(3258 bytes)Available download formats
    Dataset updated
    May 20, 2020
    Authors
    Joseph Glynn
    Description

    Context

    This data was collected as part of a university research paper where COVID-19 cases were analysed using a cross-sectional regression model as at 17th May 2020. In order to better understand COVID-19 cases growth at a country level I decided to create a dataset containing key dates in the progression of the virus globally.

    Content

    210 rows, 6 columns.

    This dataset contains data relating to COVID-19 cases for 210 countries globally. Data was collected using the most recent and reliable information as at 17th May 2020. The majority of data was collected from Worldometer. https://www.worldometers.info/coronavirus/#countries

    This dataset contains dates for the 1st coronavirus case, 100th coronavirus case, and (50th coronavirus case per 1 million people) for 210 countries. Data is also provided for the number of days between the 1st case and the 100th as well as the 1st case and the 50th per 1 million people.

    Data prior to 15th February 2020, was not easily accessible at the country level from Worldometer. Therefore any dates prior to 15th February 2020 were not sourced from Worldometer but reputable government and local media sources.

    Blanks (null values) indicate that the country in question has not reached either 50 coronavirus cases per 1 million people or 100 coronavirus cases. These were left blank.

    Acknowledgements

    I would like to acknowledge Worldometer for providing the vast majority of the data in this file. Worldometer is a website that provides real time statistics on topics such as coronavirus cases. Its sources include government official reports as well as trusted local media sources all of which are referenced on their website.

    Inspiration

    Hopefully this data can be used to better understand the growth of COVID-19 cases globally.

  10. P

    [Archived] COVID-19 cases in Pacific Island Countries and Territories

    • pacificdata.org
    • pacific-data.sprep.org
    csv
    Updated May 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SPC (2025). [Archived] COVID-19 cases in Pacific Island Countries and Territories [Dataset]. https://pacificdata.org/data/dataset/archived-covid-19-cases-in-pacific-island-countries-and-territories-df-covid
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 22, 2025
    Dataset provided by
    SPC
    Time period covered
    Jan 1, 2020 - May 31, 2024
    Description

    Disclaimer: As of January 2025, SPC will no longer provide updated information on COVID-19 cases and deaths. The information presented on this page is for reference only. For current epidemic and emerging disease alerts in the Pacific region, please visit: https://www.spc.int/epidemics/

    Statistics from SPC's Public Health Division (PHD) on the number of cases of COVID-19 and the number of deaths attributed to COVID-19 in Pacific Island Countries and Territories.

    Find more Pacific data on PDH.stat.

  11. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Independent University, Bangladesh
    University of Memphis, USA
    Silicon Orchard Lab, Bangladesh
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  12. A

    Spatiotemporal data for 2019-Novel Coronavirus Covid-19 Cases and deaths

    • data.amerigeoss.org
    csv, pdf, txt
    Updated Jan 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UN Humanitarian Data Exchange (2022). Spatiotemporal data for 2019-Novel Coronavirus Covid-19 Cases and deaths [Dataset]. https://data.amerigeoss.org/it/dataset/2019-novel-coronavirus-cases
    Explore at:
    txt(23645), csv(4916), pdf(15032), txt(7422), csv(795112664)Available download formats
    Dataset updated
    Jan 4, 2022
    Dataset provided by
    UN Humanitarian Data Exchange
    Description

    Data Overview

    This repository contains spatiotemporal data from many official sources for 2019-Novel Coronavirus beginning 2019 in Hubei, China ("nCoV_2019")

    You may not use this data for commercial purposes. If there is a need for commercial use of the data, please contact Metabiota at info@metabiota.com to obtain a commercial use license.

    The incidence data are in a CSV file format. One row in an incidence file contains a piece of epidemiological data extracted from the specified source.

    The file contains data from multiple sources at multiple spatial resolutions in cumulative and non-cumulative formats by confirmation status. To select a single time series of case or death data, filter the incidence dataset by source, spatial resolution, location, confirmation status, and cumulative flag.

    Data are collected, structured, and validated by Metabiota’s digital surveillance experts. The data structuring process is designed to produce the most reliable estimates of reported cases and deaths over space and time. The data are cleaned and provided in a uniform format such that information can be compared across multiple sources. Data are collected at the time of publication in the highest geographic and temporal resolutions available in the original report.

    This repository is intended to provide a single access point for data from a wide range of data sources. Data will be updated periodically with the latest epidemiological data. Metabiota maintains a database of epidemiological information for over two thousand high-priority infectious disease events. Please contact us (info@metabiota.com) if you are interested in licensing the complete dataset.

    Cumulative vs. Non-Cumulative Incidence

    Reporting sources provide either cumulative incidence, non-cumulative incidence, or both. If the source only provides a non-cumulative incidence value, the cumulative values are inferred using prior reports from the same source. Use the CUMULATIVE FLAG variable to subset the data to cumulative (TRUE) or non-cumulative (FALSE) values.

    Case Confirmation Status

    The incidence datasets include the confirmation status of cases and deaths when this information is provided by the reporting source. Subset the data by the CONFIRMATION_STATUS variable to either TOTAL, CONFIRMED, SUSPECTED, or PROBABLE to obtain the data of your choice.

    Total incidence values include confirmed, suspected, and probable incidence values. If a source only provides suspected, probable, or confirmed incidence, the total incidence is inferred to be the sum of the provided values. If the report does not specify confirmation status, the value is included in the "total" confirmation status value.

    The data provided under the "Metabiota Composite Source" often does not include suspected incidence due to inconsistencies in reporting cases and deaths with this confirmation status.

    Outcome - Cases vs. Deaths

    The incidence datasets include cases and deaths. Subset the data to either CASE or DEATH using the OUTCOME variable. It should be noted that deaths are included in case counts.

    Spatial Resolution

    Data are provided at multiple spatial resolutions. Data should be subset to a single spatial resolution of interest using the SPATIAL_RESOLUTION variable.

    Information is included at the finest spatial resolution provided to the original epidemic report. We also aggregate incidence to coarser geographic resolutions. For example, if a source only provides data at the province-level, then province-level data are included in the dataset as well as country-level totals. Users should avoid summing all cases or deaths in a given country for a given date without specifying the SPATIAL_RESOLUTION value. For example, subset the data to SPATIAL_RESOLUTION equal to “AL0” in order to view only the aggregated country level data.

    There are differences in administrative division naming practices by country. Administrative levels in this dataset are defined using the Google Geolocation API (https://developers.google.com/maps/documentation/geolocation/). For example, the data for the 2019-nCoV from one source provides information for the city of Beijing, which Google Geolocations indicates is a “locality.” Beijing is also the name of the municipality where the city Beijing is located. Thus, the 2019-nCoV dataset includes rows of data for both the city Beijing, as well as the municipality of the same name. If additional cities in the Beijing municipality reported data, those data would be aggregated with the city Beijing data to form the municipality Beijing data.

    Sources

    Data sources in this repository were selected to provide comprehensive spatiotemporal data for each outbreak. Data from a specific source can be selected using the SOURCE variable.

    In addition to the original reporting sources, Metabiota compiles multiple sources to generate the most comprehensive view of an outbreak. This compilation is stored in the database under the source name “Metabiota Composite Source.” The purpose of generating this new view of the outbreak is to provide the most accurate and precise spatiotemporal data for the outbreak. At this time, Metabiota does not incorporate unofficial - including media - sources into the “Metabiota Composite Source” dataset.

    Quality Assurance

    Data are collected by a team of digital surveillance experts and undergo many quality assurance tests. After data are collected, they are independently verified by at least one additional analyst. The data also pass an automated validation program to ensure data consistency and integrity.

    NonCommercial Use License

    • Creative Commons License Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

    • This is a human-readable summary of the Legal Code.

    • You are free:

      to Share — to copy, distribute and transmit the work to Remix — to adapt the work

    • Under the following conditions:

      Attribution — You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work).

      Noncommercial — You may not use this work for commercial purposes.

      Share Alike — If you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one.

    • With the understanding that:

      Waiver — Any of the above conditions can be waived if you get permission from the copyright holder.

      Public Domain — Where the work or any of its elements is in the public domain under applicable law, that status is in no way affected by the license.

      Other Rights — In no way are any of the following rights affected by the license: Your fair dealing or fair use rights, or other applicable copyright exceptions and limitations; The author's moral rights; Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights. Notice — For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to this web page.

    For details and the full license text, see http://creativecommons.org/licenses/by-nc-sa/3.0/

    Liability

    Metabiota shall in no event be liable for any decision taken by the user based on the data made available. Under no circumstances, shall Metabiota be liable for any damages (whatsoever) arising out of the use or inability to use the database. The entire risk arising out of the use of the database remains with the user.

  13. a

    COVID-19 Cases US

    • data-brookhavenga.opendata.arcgis.com
    • coronavirus-resources.esri.com
    • +9more
    Updated Mar 21, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CSSE_covid19 (2020). COVID-19 Cases US [Dataset]. https://data-brookhavenga.opendata.arcgis.com/items/628578697fb24d8ea4c32fa0c5ae1843
    Explore at:
    Dataset updated
    Mar 21, 2020
    Dataset authored and provided by
    CSSE_covid19
    Area covered
    Description

    On March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit the following sources:Global: World Health Organization (WHO)U.S.: U.S. Centers for Disease Control and Prevention (CDC)For more information, visit the Johns Hopkins Coronavirus Resource Center.This feature layer contains the most up-to-date COVID-19 cases for the US and Canada. Data sources: WHO, CDC, ECDC, NHC, DXY, 1point3acres, Worldometers.info, BNO, state and national government health departments, and local media reports. This layer is created and maintained by the Center for Systems Science and Engineering (CSSE) at the Johns Hopkins University. This feature layer is supported by the Esri Living Atlas team and JHU Data Services. This layer is opened to the public and free to share. Contact Johns Hopkins.IMPORTANT NOTICE: 1. Fields for Active Cases and Recovered Cases are set to 0 in all locations. John Hopkins has not found a reliable source for this information at the county level but will continue to look and carry the fields.2. Fields for Incident Rate and People Tested are placeholders for when this becomes available at the county level.3. In some instances, cases have not been assigned a location at the county scale. those are still assigned a state but are listed as unassigned and given a Lat Long of 0,0.Data Field Descriptions by Alias Name:Province/State: (Text) Country Province or State Name (Level 2 Key)Country/Region: (Text) Country or Region Name (Level 1 Key)Last Update: (Datetime) Last data update Date/Time in UTCLatitude: (Float) Geographic Latitude in Decimal Degrees (WGS1984)Longitude: (Float) Geographic Longitude in Decimal Degrees (WGS1984)Confirmed: (Long) Best collected count of Confirmed Cases reported by geographyRecovered: (Long) Not Currently in Use, JHU is looking for a sourceDeaths: (Long) Best collected count for Case Deaths reported by geographyActive: (Long) Confirmed - Recovered - Deaths (computed) Not Currently in Use due to lack of Recovered dataCounty: (Text) US County Name (Level 3 Key)FIPS: (Text) US State/County CodesCombined Key: (Text) Comma separated concatenation of Key Field values (L3, L2, L1)Incident Rate: (Long) People Tested: (Long) Not Currently in Use Placeholder for additional dataPeople Hospitalized: (Long) Not Currently in Use Placeholder for additional data

  14. s

    CoVid Plots and Analysis

    • orda.shef.ac.uk
    • datasetcatalog.nlm.nih.gov
    • +2more
    txt
    Updated Feb 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Colin Angus (2023). CoVid Plots and Analysis [Dataset]. http://doi.org/10.15131/shef.data.12328226.v60
    Explore at:
    txtAvailable download formats
    Dataset updated
    Feb 26, 2023
    Dataset provided by
    The University of Sheffield
    Authors
    Colin Angus
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    COVID-19Plots and analysis relating to the coronavirus pandemic. Includes five sets of plots and associated R code to generate them.1) HeatmapsUpdated every few days - heatmaps of COVID-19 case and death trajectories for Local Authorities (or equivalent) in England, Wales, Scotland, Ireland and Germany.2) All cause mortalityUpdated on Tuesday (for England & Wales), Wednesday (for Scotland) and Friday (for Northern Ireland) - analysis and plots of weekly all-cause deaths in 2020 compared to previous years by country, age, sex and region. Also a set of international comparisons using data from mortality.org3) ExposuresNo longer updated - mapping of potential COVID-19 mortality exposure at local levels (LSOAs) in England based on the age-sex structure of the population and levels of poor health.There is also a Shiny app which creates slightly lower resolution versions of the same plots online, which you can find here: https://victimofmaths.shinyapps.io/covidmapper/, on GitHub https://github.com/VictimOfMaths/COVIDmapper and uploaded to this record4) Index of Multiple Deprivation No longer updated - preliminary analysis of the inequality impacts of COVID-19 based on Local Authority level cases and levels of deprivation. 5) Socioeconomic inequalities. No longer updated (unless ONS release more data) - Analysis of published ONS figures of COVID-19 and other cause mortality in 2020 compared to previous years by deprivation decile.Latest versions of plots and associated analysis can be found on Twitter: https://twitter.com/victimofmathsThis work is described in more detail on the UK Data Service Impact and Innovation Lab blog: https://blog.ukdataservice.ac.uk/visualising-high-risk-areas-for-covid-19-mortality/Adapted from data from the Office for National Statistics licensed under the Open Government Licence v.1.0.http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/

  15. COVID-19 Tracking Germany

    • kaggle.com
    zip
    Updated Feb 7, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heads or Tails (2023). COVID-19 Tracking Germany [Dataset]. https://www.kaggle.com/datasets/headsortails/covid19-tracking-germany
    Explore at:
    zip(14492010 bytes)Available download formats
    Dataset updated
    Feb 7, 2023
    Authors
    Heads or Tails
    Area covered
    Germany
    Description

    Read the associated blogpost for a detailed description of how this dataset was prepared; plus extra code for producing animated maps.

    Context

    The 2019 Novel Coronavirus (COVID-19) continues to spread in countries around the world. This dataset provides daily updated number of reported cases & deaths in Germany on the federal state (Bundesland) and county (Landkreis/Stadtkreis) level. In April 2021 I added a dataset on vaccination progress. In addition, I provide geospatial shape files and general state-level population demographics to aid the analysis.

    Content

    The dataset consists of thre main csv files: covid_de.csv, demgraphics_de.csv, and covid_de_vaccines.csv. The geospatial shapes are included in the de_state.* files. See the column descriptions below for more detailed information.

    • covid_de.csv: COVID-19 cases and deaths which will be updated daily. The original data are being collected by Germany's Robert Koch Institute and can be download through the National Platform for Geographic Data (the latter site also hosts an interactive dashboard). I reshaped and translated the data (using R tidyverse tools) to make it better accessible. This blogpost explains how I prepared the data, and describes how to produces animated maps.

    • demographics_de.csv: General Demographic Data about Germany on the federal state level. Those have been downloaded from Germany's Federal Office for Statistics (Statistisches Bundesamt) through their Open Data platform GENESIS. The data reflect the (most recent available) estimates on 2018-12-31. You can find the corresponding table here.

    • covid_de_vaccines.csv: In April 2021 I added this file that contains the Covid-19 vaccination progress for Germany as a whole. It details daily doses, broken down cumulatively by manufacturer, as well as the cumulative number of people having received their first and full vaccination. The earliest data are from 2020-12-27.

    • de_state.*: Geospatial shape files for Germany's 16 federal states. Downloaded via Germany's Federal Agency for Cartography and Geodesy . Specifically, the shape file was obtained from this link.

    Column Description

    COVID-19 dataset covid_de.csv:

    • state: Name of the German federal state. Germany has 16 federal states. I removed converted special characters from the original data.

    • county: The name of the German Landkreis (LK) or Stadtkreis (SK), which correspond roughly to US counties.

    • age_group: The COVID-19 data is being reported for 6 age groups: 0-4, 5-14, 15-34, 35-59, 60-79, and above 80 years old. As a shortcut the last category I'm using "80-99", but there might well be persons above 99 years old in this dataset. This column has a few NA entries.

    • gender: Reported as male (M) or female (F). This column has a few NA entries.

    • date: The calendar date of when a case or death were reported. There might be delays that will be corrected by retroactively assigning cases to earlier dates.

    • cases: COVID-19 cases that have been confirmed through laboratory work. This and the following 2 columns are counts per day, not cumulative counts.

    • deaths: COVID-19 related deaths.

    • recovered: Recovered cases.

    Demographic dataset demographics_de.csv:

    • state, gender, age_group: same as above. The demographic data is available in higher age resolution, but I have binned it here to match the corresponding age groups in the covid_de.csv file.

    • population: Population counts for the respective categories. These numbers reflect the (most recent available) estimates on 2018-12-31.

    Vaccination progress dataset covid_de_vaccines.csv:

    • date: calendar date of vaccination

    • doses, doses_first, doses_second: Daily count of administered doses: total, 1st shot, 2nd shot.

    • pfizer_cumul, moderna_cumul, astrazeneca_cumul: Daily cumulative number of administered vaccinations by manufacturer.

    • persons_first_cumul, persons_full_cumul: Daily cumulative number of people having received their 1st shot and full vaccination, respectively.

    Acknowledgements

    All the data have been extracted from open data sources which are being gratefully acknowledged:

    • The [Robert ...
  16. T

    CORONAVIRUS CASES by Country in AFRICA

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). CORONAVIRUS CASES by Country in AFRICA [Dataset]. https://tradingeconomics.com/country-list/coronavirus-cases?continent=africa
    Explore at:
    xml, csv, excel, jsonAvailable download formats
    Dataset updated
    Mar 27, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    2025
    Area covered
    Africa
    Description

    This dataset provides values for CORONAVIRUS CASES reported in several countries. The data includes current values, previous releases, historical highs and record lows, release frequency, reported unit and currency.

  17. D

    Covid-19 Country Level Social Science Dataset

    • dataverse.no
    • dataverse.azure.uit.no
    application/dbf +10
    Updated Oct 20, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Øystein Solvang; Øystein Solvang; Kari Elida Eriksen; Jonas Stein; Camilla Brattland; Kari Elida Eriksen; Jonas Stein; Camilla Brattland (2020). Covid-19 Country Level Social Science Dataset [Dataset]. http://doi.org/10.18710/VMUP44
    Explore at:
    type/x-r-syntax(11257), csv(36577), application/prj(146), type/x-r-syntax(12007), application/shx(2140), application/dbf(323441), bin(5), txt(9844), application/sbn(2796), application/prj(145), application/shp(8800376), type/x-r-syntax(4038), application/sbx(349), pdf(189956), bin(6), csv(41050), pdf(138533), application/dbf(10298), application/sbx(348), pdf(339251)Available download formats
    Dataset updated
    Oct 20, 2020
    Dataset provided by
    DataverseNO
    Authors
    Øystein Solvang; Øystein Solvang; Kari Elida Eriksen; Jonas Stein; Camilla Brattland; Kari Elida Eriksen; Jonas Stein; Camilla Brattland
    License

    https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/VMUP44https://dataverse.no/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.18710/VMUP44

    Time period covered
    Jan 1, 2020 - Jul 15, 2020
    Area covered
    Covers 199 countries
    Description

    The dataset is a cross-sectional dataset covering social and public health data pertaining to the Covid-19 outbreak in 199 countries. The dataset was compiled from public register and other openly available sources. Data on Covid-19 cases and related fatalities is current as of medio July 2020. Data on other variables is mainly from the last three years, depending on data availability. Standardized unique unit identifiers (ISO-3166-1 Alpha-3) are included, enabling merging with other data. The dataset was assembled concurrently with a similar one on the Norwegian municipal level, as part of the project «Ressurs for studentaktiv læring i undervisning i statistisk og romlig analyse for samfunnsfag», at the Department of Social Science and The Norwegian College of Fishery Science, UiT. Dette er et tverrsnittsdatasett med samfunns- og folkehelsedata relatert til den pågående Covid-19-pandemien. Datasettet dekker 199 land. Det er satt sammen med data fra offentlige registre og andre åpent tilgjengelige kilder. Data om Covid-19-tilfeller og -dødsfall er à jour per medio juli 2020. Data på andre variabler er hovedsaklig fra de tre siste årene, avhengig av hva som var tilgjengelig på innsamlingstidspunktet. Standardiserte unike ID-variabler (ISO-3166-1 Alpha-3) er inkludert for å muliggjøre fusjonering med annen data. Datasettet ble satt sammen parallellt med et tilsvarende på kommunenivå (Norge), som en del av prosjektet «Ressurs for studentaktiv læring i undervisning i statistisk og romlig analyse for samfunnsfag» ved Institutt for samfunnsvitenskap og Norges fiskerihøgskole, UiT.

  18. m

    Coronavirus Panoply.io for Database Warehousing and Post Analysis using...

    • data.mendeley.com
    Updated Feb 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Pranav Pandya (2020). Coronavirus Panoply.io for Database Warehousing and Post Analysis using Sequal Language (SQL) [Dataset]. http://doi.org/10.17632/4gphfg5tgs.2
    Explore at:
    Dataset updated
    Feb 4, 2020
    Authors
    Pranav Pandya
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    It has never been easier to solve any database related problem using any sequel language and the following gives an opportunity for you guys to understand how I was able to figure out some of the interline relationships between databases using Panoply.io tool.

    I was able to insert coronavirus dataset and create a submittable, reusable result. I hope it helps you work in Data Warehouse environment.

    The following is list of SQL commands performed on dataset attached below with the final output as stored in Exports Folder QUERY 1 SELECT "Province/State" As "Region", Deaths, Recovered, Confirmed FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Deaths>0 Description: How will we estimate where Coronavirus has infiltrated, but there is effective recovery amongst patients? We can view those places by having Recovery twice more than the Death Toll.

    Query 2 SELECT country, sum(confirmed) as "Confirmed Count", sum(Recovered) as "Recovered Count", sum(Deaths) as "Death Toll" FROM "public"."coronavirus_updated" WHERE Recovered>(Deaths/2) AND Confirmed>0 GROUP BY country

    Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries

    Query 3 SELECT country as "Countries where Coronavirus has reached" FROM "public"."coronavirus_updated" WHERE confirmed>0 GROUP BY country Description: Coronavirus Epidemic has infiltrated multiple countries, and the only way to be safe is by knowing the countries which have confirmed Coronavirus Cases. So here is a list of those countries.

    Query 4 SELECT country, sum(suspected) as "Suspected Cases under potential CoronaVirus outbreak" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 AND confirmed=0 GROUP BY country ORDER BY sum(suspected) DESC

    Description: Coronavirus is spreading at alarming rate. In order to know which countries are newly getting the virus is important because in these countries if timely measures are taken, it could prevent any causalities. Here is a list of suspected cases with no virus resulted deaths.

    Query 5 SELECT country, sum(suspected) as "Coronavirus uncontrolled spread count and human life loss", 100*sum(suspected)/(SELECT sum((suspected)) FROM "public"."coronavirus_updated") as "Global suspected Exposure of Coronavirus in percentage" FROM "public"."coronavirus_updated" WHERE suspected>0 AND deaths=0 GROUP BY country ORDER BY sum(suspected) DESC Description: Coronavirus is getting stronger in particular countries, but how will we measure that? We can measure it by knowing the percentage of suspected patients amongst countries which still doesn’t have any Coronavirus related deaths. The following is a list.

    Data Provided by: SRK, Data Scientist at H2O.ai, Chennai, India

  19. COVID-19 cases in India as of October 2023, by type

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, COVID-19 cases in India as of October 2023, by type [Dataset]. https://www.statista.com/statistics/1101713/india-covid-19-cases-by-type/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    India
    Description

    India reported over 44 million confirmed cases of the coronavirus (COVID-19) as of October 20, 2023. The number of people infected with the virus was declining across the south Asian country.

    What is the coronavirus?

    COVID-19 is part of a large family of coronaviruses (CoV) that are transmitted from animals to people. The name COVID-19 is derived from the words corona, virus, and disease, while the number 19 represents the year that it emerged. Symptoms of COVID-19 resemble that of the common cold, with fever, coughing, and shortness of breath. However, serious infections can lead to pneumonia, multi-organ failure, severe acute respiratory syndrome, and even death, if appropriate medical help is not provided.

    COVID-19 in India

    India reported its first case of this coronavirus in late January 2020 in the southern state of Kerala. That led to a nation-wide lockdown between March and June that year to curb numbers from rising. After marginal success, the economy opened up leading to some recovery for the rest of 2020. In March 2021, however, the second wave hit the country causing record-breaking numbers of infections and deaths, crushing the healthcare system. The central government has been criticized for not taking action this time around, with "#ResignModi" trending on social media platforms in late April. The government's response was to block this line of content on the basis of fighting misinformation and reducing panic across the country.

  20. The World Dataset of COVID-19

    • kaggle.com
    zip
    Updated May 25, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    C-3PO (2021). The World Dataset of COVID-19 [Dataset]. https://www.kaggle.com/aditeloo/the-world-dataset-of-covid19
    Explore at:
    zip(24211978 bytes)Available download formats
    Dataset updated
    May 25, 2021
    Authors
    C-3PO
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    World
    Description

    Context

    These datasets are from Our World in Data. Their complete COVID-19 dataset is a collection of the COVID-19 data maintained by Our World in Data. It is updated daily and includes data on confirmed cases, deaths, hospitalizations, testing, and vaccinations as well as other variables of potential interest.

    Content

    Confirmed cases and deaths:

    our data comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU). We discuss how and when JHU collects and publishes this data. The cases & deaths dataset is updated daily. Note: the number of cases or deaths reported by any institution—including JHU, the WHO, the ECDC, and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. This also means that negative values in cases and deaths can sometimes appear when a country corrects historical data because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if JHU decides (and has access to the necessary data) to correct values retrospectively.

    Hospitalizations and intensive care unit (ICU) admissions:

    our data comes from the European Centre for Disease Prevention and Control (ECDC) for a select number of European countries; the government of the United Kingdom; the Department of Health & Human Services for the United States; the COVID-19 Tracker for Canada. Unfortunately, we are unable to provide data on hospitalizations for other countries: there is currently no global, aggregated database on COVID-19 hospitalization, and our team at Our World in Data does not have the capacity to build such a dataset.

    Testing for COVID-19:

    this data is collected by the Our World in Data team from official reports; you can find further details in our post on COVID-19 testing, including our checklist of questions to understand testing data, information on geographical and temporal coverage, and detailed country-by-country source information. The testing dataset is updated around twice a week.

    Acknowledgements

    Our World in Data GitHub repository for covid-19.

    Inspiration

    All we love data, cause we love to go inside it and discover the truth that's the main inspiration I have.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
Organization logo

Novel Covid-19 Dataset

Day level Info On Covid-19 affected cases Worldwide

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GHOST5612
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Context:

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited:

Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020.

Here’s a polished version suitable for a professional Kaggle dataset description:

Dataset Description

This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

Files and Columns

1. covid_19_data.csv (Main File)

This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

  • Sno – Serial number of the record
  • ObservationDate – Date of the observation (MM/DD/YYYY)
  • Province/State – Province or state of the observation (may be missing for some entries)
  • Country/Region – Country of the observation
  • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
  • Confirmed – Cumulative number of confirmed cases on that date
  • Deaths – Cumulative number of deaths on that date
  • Recovered – Cumulative number of recoveries on that date

2. 2019_ncov_data.csv (Legacy File)

This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

3. COVID_open_line_list_data.csv

This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

4. COVID19_line_list_data.csv

Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

✅ Use covid_19_data.csv for up-to-date aggregated global trends.

✅ Use the line list datasets for detailed, individual-level case analysis.

Country level datasets:

If you are interested in knowing country level data, please refer to the following Kaggle datasets:

India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

Acknowledgements :

Johns Hopkins University for making the data available for educational and academic research purposes

MoBS lab - https://www.mobs-lab.org/2019ncov.html

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google....

Search
Clear search
Close search
Google apps
Main menu