94 datasets found
  1. Novel Covid-19 Dataset

    • kaggle.com
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GHOST5612
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context:

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited:

    Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Here’s a polished version suitable for a professional Kaggle dataset description:

    Dataset Description

    This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

    Files and Columns

    1. covid_19_data.csv (Main File)

    This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

    • Sno – Serial number of the record
    • ObservationDate – Date of the observation (MM/DD/YYYY)
    • Province/State – Province or state of the observation (may be missing for some entries)
    • Country/Region – Country of the observation
    • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
    • Confirmed – Cumulative number of confirmed cases on that date
    • Deaths – Cumulative number of deaths on that date
    • Recovered – Cumulative number of recoveries on that date

    2. 2019_ncov_data.csv (Legacy File)

    This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

    3. COVID_open_line_list_data.csv

    This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

    4. COVID19_line_list_data.csv

    Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

    ✅ Use covid_19_data.csv for up-to-date aggregated global trends.

    ✅ Use the line list datasets for detailed, individual-level case analysis.

    Country level datasets:

    If you are interested in knowing country level data, please refer to the following Kaggle datasets:

    India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

    South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

    Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

    Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

    USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

    Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

    Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

    Acknowledgements :

    Johns Hopkins University for making the data available for educational and academic research purposes

    MoBS lab - https://www.mobs-lab.org/2019ncov.html

    World Health Organization (WHO): https://www.who.int/

    DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

    BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

    National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

    China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

    Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

    Macau Government: https://www.ssm.gov.mo/portal/

    Taiwan CDC: https://sites.google....

  2. T

    China Coronavirus COVID-19 Cases

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). China Coronavirus COVID-19 Cases [Dataset]. https://tradingeconomics.com/china/coronavirus-cases
    Explore at:
    excel, csv, xml, jsonAvailable download formats
    Dataset updated
    Mar 4, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 4, 2020 - May 17, 2023
    Area covered
    China
    Description

    China recorded 99256991 Coronavirus Cases since the epidemic began, according to the World Health Organization (WHO). In addition, China reported 5226 Coronavirus Deaths. This dataset includes a chart with historical data for China Coronavirus Cases.

  3. COVID -19 Coronavirus Pandemic Dataset

    • kaggle.com
    zip
    Updated Sep 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Chauhan (2022). COVID -19 Coronavirus Pandemic Dataset [Dataset]. https://www.kaggle.com/datasets/whenamancodes/covid-19-coronavirus-pandemic-dataset/code
    Explore at:
    zip(10926 bytes)Available download formats
    Dataset updated
    Sep 30, 2022
    Authors
    Aman Chauhan
    Description

    Context

    The 2019–20 coronavirus pandemic is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus first emerged in Wuhan, Hubei, China, in December 2019. On 11 March 2020, the World Health Organization declared the outbreak a pandemic. As of 11 March 2020, over 126,000 cases have been confirmed in more than 110 countries and territories, with major outbreaks in mainland China, Italy, South Korea, and Iran. More than 4,600 have died from the disease and 67,000 have recovered.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this data was scrapped from https://www.worldometers.info/coronavirus/.This data is solely for education purposes only.

    More - Find More Exciting🙀 Datasets Here - An Upvote👍 A Dayᕙ(`▿´)ᕗ , Keeps Aman Hurray Hurray..... ٩(˘◡˘)۶Hehe

    Acknowledgements

    This data is solely belongs to https://www.worldometers.info/coronavirus/. for licensing visit https://www.worldometers.info/licensing/

  4. Coronavirus (COVID-19) dataset

    • kaggle.com
    Updated Apr 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Balaaje (2020). Coronavirus (COVID-19) dataset [Dataset]. https://www.kaggle.com/balaaje/coronavirus-covid19-dataset/metadata
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Apr 29, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Balaaje
    Description

    Context

    The 2019–20 coronavirus pandemic is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus first emerged in Wuhan, Hubei, China, in December 2019. On 11 March 2020, the World Health Organization declared the outbreak a pandemic. As of 11 March 2020, over 126,000 cases have been confirmed in more than 110 countries and territories, with major outbreaks in mainland China, Italy, South Korea, and Iran. More than 4,600 have died from the disease and 67,000 have recovered.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this data was scrapped from https://www.worldometers.info/coronavirus/.This data is solely for education purposes only.

    Acknowledgements

    This data is solely belongs to https://www.worldometers.info/coronavirus/. for licensing visit https://www.worldometers.info/licensing/

  5. T

    China Coronavirus COVID-19 Vaccination Total

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Apr 20, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2021). China Coronavirus COVID-19 Vaccination Total [Dataset]. https://tradingeconomics.com/china/coronavirus-vaccination-total
    Explore at:
    csv, xml, excel, jsonAvailable download formats
    Dataset updated
    Apr 20, 2021
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 15, 2020 - Feb 9, 2023
    Area covered
    China
    Description

    The number of COVID-19 vaccination doses administered in China rose to 3491077000 as of Oct 27 2023. This dataset includes a chart with historical data for China Coronavirus Vaccination Total.

  6. Data_Sheet_1_COVID-19 Vaccination Acceptance Among Chinese Population and...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jian Wu; Mingze Ma; Yudong Miao; Beizhu Ye; Quanman Li; Clifford Silver Tarimo; Meiyun Wang; Jianqin Gu; Wei Wei; Lipei Zhao; Zihan Mu; Xiaoli Fu (2023). Data_Sheet_1_COVID-19 Vaccination Acceptance Among Chinese Population and Its Implications for the Pandemic: A National Cross-Sectional Study.docx [Dataset]. http://doi.org/10.3389/fpubh.2022.796467.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Jian Wu; Mingze Ma; Yudong Miao; Beizhu Ye; Quanman Li; Clifford Silver Tarimo; Meiyun Wang; Jianqin Gu; Wei Wei; Lipei Zhao; Zihan Mu; Xiaoli Fu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectiveTo examine the COVID-19 vaccination rate among a representative sample of adults from 31 provinces on the Chinese mainland and identify its influencing factors.MethodsWe gathered sociodemographic information, data on people's awareness and behavior regarding COVID-19 and the COVID-19 vaccine, the accessibility of COVID-19 vaccination services, community environmental factors influencing people's awareness and behavior regarding the vaccination, information about people's skepticism on COVID-19 vaccine, and information about people's trust in doctors as well as vaccine developers through an online nationwide cross-sectional survey among Chinese adults (18 years and older). The odds ratios (OR) and 95% confidence intervals (CI) for the statistical associations were estimated using logistic regression models.ResultsA total of 29,925 participants (51.4% females and 48.6% males) responded. 89.4% of the participants had already received a COVID-19 vaccination. After adjusting for demographic characteristics, awareness of COVID-19 pandemic/ COVID-19 vaccine, community environmental factors, awareness and behavior of general vaccinations, we discovered that having no religious affiliation, having the same occupational status as a result of coronavirus epidemic, being a non-smoker, always engaging in physical activity, having a lower social status, perceiving COVID-19 to be easily curable, and having easier access to vaccination are all associated with high vaccination rate (all P

  7. f

    Epidemiological data on the novel coronavirus 2019-nCoV infection cases in...

    • datasetcatalog.nlm.nih.gov
    Updated Jun 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xie, Min; Liu, Jun; Yang, Qin; Luo, Wei; Guo, Limin; Duan, Qinwei; Liu, Xi; Wu, Ying; Zhu, Rong; Feng, Shipin; Wang, Li; Li, Jia (2020). Epidemiological data on the novel coronavirus 2019-nCoV infection cases in China [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000578958
    Explore at:
    Dataset updated
    Jun 23, 2020
    Authors
    Xie, Min; Liu, Jun; Yang, Qin; Luo, Wei; Guo, Limin; Duan, Qinwei; Liu, Xi; Wu, Ying; Zhu, Rong; Feng, Shipin; Wang, Li; Li, Jia
    Area covered
    China
    Description

    This data record contains one dataset deta of 2019-nCOV in China.xlsx, in .xlsx file format.The dataset includes the following information (in 8 separate columns) on the novel coronavirus (2019-nCoV) infection cases in China:-total number of confirmed cases, -total number of suspected cases-total number of cured cases-total number of deaths-total number of new confirmed cases-total number of new suspected cases-total number of new cured cases-total number of new deathsThe number of cases are reported for each day from January 20th to February 21st 2020.Study background, aims and methodology: The 2019–20 coronavirus outbreak is an ongoing public health emergency of international concern involving coronavirus disease 2019 (COVID-19). At the end of December 2019, the epidemic of the novel coronavirus 2019-nCOV infection has spread from the initial place of Wuhan, Huibei province in China, resulting in an epidemic throughout China, with sporadic cases reported globally.The elderly, as well as people with primary diseases, are more likely to die from the infection. Children with chronic kidney disease (CKD), and children on dialysis, are vulnerable, due to their primary diseases and low immunity, especially those who suffer from long-term hormone, immunosuppressive therapy, and maintenance hemodialysis.The aim of this study was to analyse the epidemiological and clinical characteristics of the novel coronavirus, and to explore the infection prevention and control strategies of 2019-nCoV in children with chronic kidney disease (CKD) and children on dialysis.Data were collected from the 2019-nCoV management plan of the National Health Commission of the People’s Republic of China and relevant guidelines. Data on the COVID-19 cases in China, including the number of people, clinical characteristics, effective prevention and control measures from January 20th to February 21st, 2020, and statistical data on CKD in children were collected.

  8. T

    China Coronavirus COVID-19 Recovered

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). China Coronavirus COVID-19 Recovered [Dataset]. https://tradingeconomics.com/china/coronavirus-recovered
    Explore at:
    xml, json, csv, excelAvailable download formats
    Dataset updated
    Mar 11, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 31, 2019 - Dec 15, 2021
    Area covered
    China
    Description

    China recorded 86689 Coronavirus Recovered since the epidemic began, according to the World Health Organization (WHO). In addition, China reported 4636 Coronavirus Deaths. This dataset includes a chart with historical data for China Coronavirus Recovered.

  9. Corona Dataset

    • kaggle.com
    zip
    Updated Mar 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Khunt (2020). Corona Dataset [Dataset]. https://www.kaggle.com/datasets/amankhunt/corona-dataset
    Explore at:
    zip(304595 bytes)Available download formats
    Dataset updated
    Mar 12, 2020
    Authors
    Aman Khunt
    Description

    Context

    Data is extracted from the google sheets associated and made available here.

    Now data is available as csv files in the Johns Hopkins Github repository

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

  10. m

    COVID-19 Combined Data-set with Improved Measurement Errors

    • data.mendeley.com
    Updated May 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Afshin Ashofteh (2020). COVID-19 Combined Data-set with Improved Measurement Errors [Dataset]. http://doi.org/10.17632/nw5m4hs3jr.3
    Explore at:
    Dataset updated
    May 13, 2020
    Authors
    Afshin Ashofteh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Public health-related decision-making on policies aimed at controlling the COVID-19 pandemic outbreak depends on complex epidemiological models that are compelled to be robust and use all relevant available data. This data article provides a new combined worldwide COVID-19 dataset obtained from official data sources with improved systematic measurement errors and a dedicated dashboard for online data visualization and summary. The dataset adds new measures and attributes to the normal attributes of official data sources, such as daily mortality, and fatality rates. We used comparative statistical analysis to evaluate the measurement errors of COVID-19 official data collections from the Chinese Center for Disease Control and Prevention (Chinese CDC), World Health Organization (WHO) and European Centre for Disease Prevention and Control (ECDC). The data is collected by using text mining techniques and reviewing pdf reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, international number of countries, Alpha-2 code, Alpha-3 code, latitude, longitude, and some additional attributes such as population. The improved dataset benefits from major corrections on the referenced data sets and official reports such as adjustments in the reporting dates, which suffered from a one to two days lag, removing negative values, detecting unreasonable changes in historical data in new reports and corrections on systematic measurement errors, which have been increasing as the pandemic outbreak spreads and more countries contribute data for the official repositories. Additionally, the root mean square error of attributes in the paired comparison of datasets was used to identify the main data problems. The data for China is presented separately and in more detail, and it has been extracted from the attached reports available on the main page of the CCDC website. This dataset is a comprehensive and reliable source of worldwide COVID-19 data that can be used in epidemiological models assessing the magnitude and timeline for confirmed cases, long-term predictions of deaths or hospital utilization, the effects of quarantine, stay-at-home orders and other social distancing measures, the pandemic’s turning point or in economic and social impact analysis, helping to inform national and local authorities on how to implement an adaptive response approach to re-opening the economy, re-open schools, alleviate business and social distancing restrictions, design economic programs or allow sports events to resume.

  11. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Independent University, Bangladesh
    University of Memphis, USA
    Silicon Orchard Lab, Bangladesh
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  12. f

    Data_Sheet_1_Meta-analysis of KAP toward COVID-19 in Chinese residents.docx

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Mar 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deng, Jie; Yang, Yuting; Fang, Yu; Li, Songzhe; Tian, Yanyan; Wang, QiaoLing; Wang, Shumin; Yang, Dongdong (2024). Data_Sheet_1_Meta-analysis of KAP toward COVID-19 in Chinese residents.docx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001288334
    Explore at:
    Dataset updated
    Mar 1, 2024
    Authors
    Deng, Jie; Yang, Yuting; Fang, Yu; Li, Songzhe; Tian, Yanyan; Wang, QiaoLing; Wang, Shumin; Yang, Dongdong
    Description

    BackgroundDuring the coronavirus disease-2019 (COVID-19) pandemic, there have been many studies on knowledge, attitudes, and practices (KAP) toward prevention of COVID-19 infection in China. Except for symptomatic treatment and vaccination, KAP toward COVID-19 plays an important role in the prevention of COVID-19. There is no systematic evaluation and meta-analysis of KAP toward COVID-19 in China. This study is the earliest meta-analysis of KAP toward COVID-19 in China’s general population. Hence, this systematic review aimed to summarize the knowledge, attitudes, and practices (KAP) of Chinese residents toward COVID-19 during the pandemic.MethodologyFollowing the PRISMA guidelines, articles relevant to COVID-19 KAP that were conducted among the Chinese population were found in databases such as Scopus, ProQuest, PubMed, EMbase, Web of Science, Cochrane Library, China Biology Medicine, China National Knowledge Infrastructure, CQVIP, Wanfang and Google Scholar. A random-effect meta-analysis is used to summarize studies on knowledge, attitudes, and practice levels toward COVID-19 infection in China’s general population.ResultsFifty-seven articles published between August 2020 and November 2022 were included in this review. Overall, 75% (95% CI: 72–79%) of Chinese residents had good knowledge about COVID-19, 80% (95% CI: 73–87%) of Chinese residents had a positive attitude toward COVID-19 pandemic control and prevention (they believe that Chinese people will win the battle against the epidemic), and the aggregated proportion of residents with a correct practice toward COVID-19 was 84% (95% CI: 82–87%, I2 = 99.7%).In the gender subgroup analysis, there is no significant difference between Chinese men and Chinese women in terms of their understanding of COVID-19. However, Chinese women tend to have slightly higher levels of knowledge and a more positive attitude toward the virus compared to Chinese men. When considering the urban and rural subgroup analysis, it was found that Chinese urban residents have a better understanding of COVID-19 compared to Chinese rural residents. Interestingly, the rural population displayed higher rates of correct behavior and positive attitudes toward COVID-19 compared to the urban population. Furthermore, in the subgroup analysis based on different regions in China, the eastern, central, and southwestern regions exhibited higher levels of knowledge awareness compared to other regions. It is worth noting that all regions in China demonstrated good rates of correct behavior and positive attitudes toward COVID-19.ConclusionThis study reviews the level of KAP toward COVID-19 during the pandemic period in China. The results show that the KAP toward COVID-19 in Chinese residents was above a favorable level, but the lack of translation of knowledge into practice should be further reflected on and improved. A subgroup analysis suggests that certain groups need more attention, such as males and people living in rural areas. Policy makers should pay attention to the results of this study and use them as a reference for the development of prevention and control strategies for major public health events that may occur in the future.Systematic Review Registrationhttps://www.crd.york.ac.uk/PROSPERO/display_record.php?RecordID=348246, CRD42022348246.

  13. f

    DataSheet1_The Impact of the COVID-19 Pandemic on Depressive Symptoms in...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Oct 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xie, Liyang; Cai, Weicheng; Zhou, Yi (2022). DataSheet1_The Impact of the COVID-19 Pandemic on Depressive Symptoms in China: A Longitudinal, Population-Based Study.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000300946
    Explore at:
    Dataset updated
    Oct 4, 2022
    Authors
    Xie, Liyang; Cai, Weicheng; Zhou, Yi
    Area covered
    China
    Description

    Objectives: We aimed to examine how COVID-19 incidence is associated with depressive symptoms in China, whether the association is transient, and whether the association differs across groups.Methods: We used a longitudinal sample from 2018 to 2020 waves of the China Family Panel Study. We constructed COVID-19 incidence rates as the number of new cases per 100,000 population in respondents’ resident provinces in the past 7, 14, and 28 days when a respondent was surveyed. We performed linear or logistic regressions to examine the associations, and performed stratified analyses to explore the heterogeneity of the associations.Results: Our sample included 13,655 adults. The 7-day incidence rate was positively associated with the CES-D score (coef. = 2.551, 95% CI: 1.959–3.142), and likelihood of being more depressed (adjusted odds ratio = 6.916, 95% CI: 4.715–10.144). The associations were larger among those with less education, pre-existing depression, or chronic conditions. We did not find any significant association between the 14- or 28-day local incidence rates and depressive symptoms.Conclusion: The impact of COVID-19 incidence on mental health in China’s general population was statistically significant and moderate in magnitude and transient. Disadvantaged groups experienced higher increases in depressive symptoms.

  14. d

    Dataset for P003/China: Effect of message framing on motivation to follow...

    • demo-b2find.dkrz.de
    Updated Nov 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Dataset for P003/China: Effect of message framing on motivation to follow vs. defy social distancing guidelines during the COVID 19 pandemic PSA COVID-19 Rapid Project 003/China - Dataset - B2FIND [Dataset]. http://demo-b2find.dkrz.de/dataset/8c7aed20-3ef5-5fdb-b78e-f8e6c04da056
    Explore at:
    Dataset updated
    Nov 11, 2025
    Description

    This proposal relates to the call of the PSA PSA COVID-19 Rapid Project 003 data collection effort in China. To slow the transmission of COVID-19, governments around the world are asking their citizens to participate in social distancing, that is, to stay at home as much as possible. In most countries, individuals have some choice over whether or not they follow recommendations for social distancing. Thus, understanding how to best motivate social distancing has become a public health priority. This study tests, in a confirmatory manner, whether self-determination theory-guided message framing impacts people’s motivation to participate in social distancing. Specifically, we expect autonomy-supportive messages that help people understand the value of behavior change to a) increase ‘buy in’, or autonomous motivation, for social distancing, b) reduce feelings of defiance in response to those messages, and c) increase behavioral intentions to socially distance, relative to neutral and controlling messages. Further, we expect controlling messages that pressure people to change using shame, guilt, and demands, may backfire and a) decrease ‘buy in’ for social distancing, b) increase defiance, relative to the control condition, and c) reduce behavioral intentions to socially distance. We also expect ‘buy in’, or autonomous motivation, to explain why messages impact defiance and behavioral intentions. Exploratory tests will examine whether the effects of message framing on motivation, defiance, and behavioral intentions are moderated by culture, providing sufficient variability on this measure is obtained. This work has direct relevance for how public officials, health professionals, journalists, and others can communicate about solving this and future public health crises in ways that motivate people more effectively.

  15. ARCHIVED - Weekly COVID-19 Statistical Data in Scotland

    • dtechtive.com
    • find.data.gov.scot
    csv
    Updated Dec 22, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Public Health Scotland (2022). ARCHIVED - Weekly COVID-19 Statistical Data in Scotland [Dataset]. https://dtechtive.com/datasets/19628
    Explore at:
    csv(0.0537 MB), csv(0.0008 MB), csv(0.0535 MB), csv(0.014 MB), csv(0.1093 MB), csv(0.0265 MB), csv(0.0016 MB), csv(0.0022 MB), csv(0.0729 MB), csv(0.0026 MB), csv(0.0038 MB), csv(0.4845 MB), csv(0.0296 MB), csv(0.0126 MB), csv(0.0732 MB), csv(0.0005 MB), csv(0.0553 MB), csv(0.0002 MB), csv(0.0015 MB), csv(0.0348 MB), csv(0.033 MB), csv(0.0304 MB), csv(0.0551 MB), csv(0.0112 MB), csv(0.0037 MB), csv(0.0317 MB), csv(0.109 MB), csv(0.002 MB), csv(0.0192 MB)Available download formats
    Dataset updated
    Dec 22, 2022
    Dataset provided by
    Public Health Scotland
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Area covered
    Scotland
    Description

    This open data publication has moved to COVID-19 Statistical Data in Scotland (from 02/11/2022) Novel coronavirus (COVID-19) is a new strain of coronavirus first identified in Wuhan, China. Clinical presentation may range from mild-to-moderate illness to pneumonia or severe acute respiratory infection. This dataset provides information on demographic characteristics (age, sex, deprivation) of confirmed novel coronavirus (COVID-19) cases, as well as trend data regarding the wider impact of the virus on the healthcare system. Data includes information on primary care out of hours consultations, respiratory calls made to NHS24, contact with COVID-19 Hubs and Assessment Centres, incidents received by Scottish Ambulance Services (SAS), as well as COVID-19 related hospital admissions and admissions to ICU (Intensive Care Unit). Further data on the wider impact of the COVID-19 response, focusing on hospital admissions, unscheduled care and volume of calls to NHS24, is available on the COVID-19 Wider Impact Dashboard. There is a large amount of data being regularly published regarding COVID-19 (for example, Coronavirus in Scotland - Scottish Government and Deaths involving coronavirus in Scotland - National Records of Scotland. Additional data sources relating to this topic area are provided in the Links section of the Metadata below. Information on COVID-19, including stay at home advice for people who are self-isolating and their households, can be found on NHS Inform. All publications and supporting material to this topic area can be found in the weekly COVID-19 Statistical Report. The date of the next release can be found on our list of forthcoming publications. Data visualisation is available to view in the interactive dashboard accompanying the COVID-19 Statistical Report. Please note information on COVID-19 in children and young people of educational age, education staff and educational settings is presented in a new COVID-19 Education Surveillance dataset going forward.

  16. S

    COVID-19 Wider Impacts - Excess Deaths

    • find.data.gov.scot
    csv
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    National Records of Scotland (2023). COVID-19 Wider Impacts - Excess Deaths [Dataset]. https://find.data.gov.scot/datasets/19559
    Explore at:
    csv(0.6786 MB), csv(1.1421 MB), csv(0.0262 MB)Available download formats
    Dataset updated
    Oct 5, 2023
    Dataset provided by
    National Records of Scotland
    License

    Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
    License information was derived automatically

    Description

    Novel coronavirus (COVID-19) is a new strain of coronavirus first identified in Wuhan, China. Clinical presentation may range from mild-to-moderate illness to pneumonia or severe acute respiratory infection. The COVID-19 pandemic has wider impacts on individuals' health, and their use of healthcare services, than those that occur as the direct result of infection. Reasons for this may include: * Individuals being reluctant to use health services because they do not want to burden the NHS or are anxious about the risk of infection. * The health service delaying preventative and non-urgent care such as some screening services and planned surgery. * Other indirect effects of interventions to control COVID-19, such as mental or physical consequences of distancing measures. This dataset provides information on trend data regarding the wider impact of the pandemic on the number of deaths in Scotland, derived from the National Records of Scotland (NRS) weekly deaths registration data. Data show recent trends in deaths (2020), whether COVID or non-COVID related, and historic trends for comparison (five-year average, 2015-2019). The recent trend data are shown by age group and sex, and the national data are also shown by broad area deprivation category (Scottish Index of Multiple Deprivation, SIMD). This data is also available on the COVID-19 Wider Impact Dashboard. Additional data sources relating to this topic area are provided in the Links section of the Metadata below. Information on COVID-19, including stay at home advice for people who are self-isolating and their households, can be found on NHS Inform. All publications and supporting material to this topic area can be found in the weekly COVID-19 Statistical Report. The date of the next release can be found on our list of forthcoming publications.

  17. f

    Data_Sheet_1_Epidemiological Characteristics and Transmissibility for...

    • figshare.com
    • datasetcatalog.nlm.nih.gov
    docx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shanshan Yu; Shufeng Cui; Jia Rui; Zeyu Zhao; Bin Deng; Chan Liu; Kangguo Li; Yao Wang; Zimei Yang; Qun Li; Tianmu Chen; Shan Wang (2023). Data_Sheet_1_Epidemiological Characteristics and Transmissibility for SARS-CoV-2 of Population Level and Cluster Level in a Chinese City.docx [Dataset]. http://doi.org/10.3389/fpubh.2021.799536.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers
    Authors
    Shanshan Yu; Shufeng Cui; Jia Rui; Zeyu Zhao; Bin Deng; Chan Liu; Kangguo Li; Yao Wang; Zimei Yang; Qun Li; Tianmu Chen; Shan Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    BackgroundTo date, there is a lack of sufficient evidence on the type of clusters in which severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most likely to spread. Notably, the differences between cluster-level and population-level outbreaks in epidemiological characteristics and transmissibility remain unclear. Identifying the characteristics of these two levels, including epidemiology and transmission dynamics, allows us to develop better surveillance and control strategies following the current removal of suppression measures in China.MethodsWe described the epidemiological characteristics of SARS-CoV-2 and calculated its transmissibility by taking a Chinese city as an example. We used descriptive analysis to characterize epidemiological features for coronavirus disease 2019 (COVID-19) incidence database from 1 Jan 2020 to 2 March 2020 in Chaoyang District, Beijing City, China. The susceptible-exposed-infected-asymptomatic-recovered (SEIAR) model was fitted with the dataset, and the effective reproduction number (Reff) was calculated as the transmissibility of a single population. Also, the basic reproduction number (R0) was calculated by definition for three clusters, such as household, factory and community, as the transmissibility of subgroups.ResultsThe epidemic curve in Chaoyang District was divided into three stages. We included nine clusters (subgroups), which comprised of seven household-level and one factory-level and one community-level cluster, with sizes ranging from 2 to 17 cases. For the nine clusters, the median incubation period was 17.0 days [Interquartile range (IQR): 8.4–24.0 days (d)], and the average interval between date of onset (report date) and diagnosis date was 1.9 d (IQR: 1.7 to 6.4 d). At the population level, the transmissibility of the virus was high in the early stage of the epidemic (Reff = 4.81). The transmissibility was higher in factory-level clusters (R0 = 16) than in community-level clusters (R0 = 3), and household-level clusters (R0 = 1).ConclusionsIn Chaoyang District, the epidemiological features of SARS-CoV-2 showed multi-stage pattern. Many clusters were reported to occur indoors, mostly from households and factories, and few from the community. The risk of transmission varies by setting, with indoor settings being more severe than outdoor settings. Reported household clusters were the predominant type, but the population size of the different types of clusters limited transmission. The transmissibility of SARS-CoV-2 was different between a single population and its subgroups, with cluster-level transmissibility higher than population-level transmissibility.

  18. a

    MDCOVID19 NumberOfPersonsTestedNegative

    • dev-maryland.opendata.arcgis.com
    • coronavirus.maryland.gov
    • +3more
    Updated May 31, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ArcGIS Online for Maryland (2020). MDCOVID19 NumberOfPersonsTestedNegative [Dataset]. https://dev-maryland.opendata.arcgis.com/datasets/67c49f40064c45f9aadfcc9298cba9e6
    Explore at:
    Dataset updated
    May 31, 2020
    Dataset authored and provided by
    ArcGIS Online for Maryland
    Description

    SummaryThe cumulative number of Maryland residents who tested negative for COVID-19.DescriptionThe MD COVID-19 - Number of Persons Tested Negative data layer is a collection of the number of people statewide who have tested negative for COVID-19 reported each day by each local health department via the NEDSS system.COVID-19 is a disease caused by a respiratory virus first identified in Wuhan, Hubei Province, China in December 2019. COVID-19 is a new virus that hasn't caused illness in humans before. Worldwide, COVID-19 has resulted in thousands of infections, causing illness and in some cases death. Cases have spread to countries throughout the world, with more cases reported daily. The Maryland Department of Health reports daily on COVID-19 cases by county.

  19. V

    Dataset from French Multicentre Observational Study on SARS-Cov-2 Infections...

    • data.niaid.nih.gov
    Updated Nov 27, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IDDO; CLAIRE ROGER (2024). Dataset from French Multicentre Observational Study on SARS-Cov-2 Infections (COVID-19) ICU Management: the FRENCH CORONA Study [Dataset]. http://doi.org/10.25934/PR00007463
    Explore at:
    Dataset updated
    Nov 27, 2024
    Dataset provided by
    Centre Hospitalier Universitaire de Nīmes
    Authors
    IDDO; CLAIRE ROGER
    Area covered
    French, France
    Description

    Since December 2019, a new agent, the SARS-Cov-2 coronavirus has been rapidly spreading from China to other countries causing an international outbreak of respiratory illnesses named COVID-19. In France, the first cases have been reported at the end of January with more than 60000 cases reported since then. A significant proportion (20-30%) of hospitalized COVID-19 patients will be admitted to intensive care unit. However, few data are available for this special population in France.

    We conduct a large observational cohort of ICU suspected or proven COVID-19 patients that will enable to describe the initial management of COVID 19 patients admitted to ICU and to identify factors correlated to clinical outcome.

  20. n

    Data from the study on Framing COVID-19 In China: A Multi-platform Content...

    • narcis.nl
    • data.4tu.nl
    • +1more
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Xun Li (2020). Data from the study on Framing COVID-19 In China: A Multi-platform Content Analysis of The Political Conversation [Dataset]. http://doi.org/10.4121/uuid:cccabb6d-232c-4870-939d-f3c074509d33
    Explore at:
    media types: application/vnd.google-earth.kml+xml, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, text/plainAvailable download formats
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    4TU.Centre for Research Data
    Authors
    Xun Li
    Area covered
    China
    Description

    The study examines 30 consecutive episodes of Xinwen Lianbo (541 stories), 332 posts on “cctvxwlianbo” WeChat official account, 161 articles on the front page of People’s Daily newspaper in 30 days, as well as 1,015 hashtags and news articles on Sina Weibo’s two categories of ranking (top searched and real-time hot topic).

    The study's methodology is related to these publications: An, Seon-Kyoung, and Karla K. Gower. “How Do the News Media Frame Crises? A Content Analysis of Crisis News Coverage.” Public Relations Review 35, no. 2 (2009): 107–12. https://doi.org/10.1016/j.pubrev.2009.01.010. Goodall, Catherine; Sabo, Jason; Cline, Rebecca & Egbert Nichole (2012), Threat, Efficacy, and Uncertainty in the First 5 Months of National Print and Electronic News Coverage of the H1N1 Virus, Journal of Health Communication, 17:3, 338-355, DOI: 10.1080/10810730.2011.626499

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
Organization logo

Novel Covid-19 Dataset

Day level Info On Covid-19 affected cases Worldwide

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Sep 18, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
GHOST5612
License

MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically

Description

Context:

From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

Edited:

Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

Content

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

The data is available from 22 Jan, 2020.

Here’s a polished version suitable for a professional Kaggle dataset description:

Dataset Description

This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

Files and Columns

1. covid_19_data.csv (Main File)

This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

  • Sno – Serial number of the record
  • ObservationDate – Date of the observation (MM/DD/YYYY)
  • Province/State – Province or state of the observation (may be missing for some entries)
  • Country/Region – Country of the observation
  • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
  • Confirmed – Cumulative number of confirmed cases on that date
  • Deaths – Cumulative number of deaths on that date
  • Recovered – Cumulative number of recoveries on that date

2. 2019_ncov_data.csv (Legacy File)

This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

3. COVID_open_line_list_data.csv

This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

4. COVID19_line_list_data.csv

Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

✅ Use covid_19_data.csv for up-to-date aggregated global trends.

✅ Use the line list datasets for detailed, individual-level case analysis.

Country level datasets:

If you are interested in knowing country level data, please refer to the following Kaggle datasets:

India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

Acknowledgements :

Johns Hopkins University for making the data available for educational and academic research purposes

MoBS lab - https://www.mobs-lab.org/2019ncov.html

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google....

Search
Clear search
Close search
Google apps
Main menu