24 datasets found
  1. Z

    COVID-19 Press Briefings Corpus

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +1more
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chatsiou, Kakia (2020). COVID-19 Press Briefings Corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3872416
    Explore at:
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    University of Essex
    Authors
    Chatsiou, Kakia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Coronavirus (COVID-19) Press Briefings Corpus is a work in progress to collect and present in a machine readable text dataset of the daily briefings from around the world by government authorities. During the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic.

    Usually daily briefings did not occur on a Sunday.

    At the moment the dataset includes:

    UK/England: Daily Press Briefings by UK Government between 12 March 2020 - 01 June 2020 (70 briefings in total)

    Scotland: Daily Press Briefings by Scottish Government between 3 March 2020 - 01 June 2020 (76 briefings in total)

    Wales: Daily Press Briefings by Welsh Government between 23 March 2020 - 01 June 2020 (56 briefings in total)

    Northern Ireland: Daily Press Briefings by N. Ireland Assembly between 23 March 2020 - 01 June 2020 (56 briefings in total)

    World Health Organisation: Press Briefings occuring usually every 2 days between 22 January 2020 - 01 June 2020 (63 briefings in total)

    More countries will be added in due course, and we will be keeping this updated to cover the latest daily briefings available.

    The corpus is compiled to allow for further automated political discourse analysis (classification).

  2. Share of people watching the daily Government briefing in the UK March-June...

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of people watching the daily Government briefing in the UK March-June 2020 [Dataset]. https://www.statista.com/statistics/1111869/government-coronavirus-briefing-audience-uk/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2020 - Jun 2020
    Area covered
    United Kingdom
    Description

    The UK Government has been holding daily press briefings in order to provide updates on the coronavirus (COVID-19) pandemic and outline any new measures being put in place to deal with the outbreak. Boris Johnson announced that the UK would be going into lockdown in a broadcast on March 23 which was watched live by more than half of the respondents to a daily survey. On June 28, just ** percent of respondents said they had not watched or read about the previous day's briefing. For further information about the coronavirus (COVID-19) pandemic, please visit our dedicated Facts and Figures page.

  3. f

    Descriptive statistics of the WHO COVID-19 press conference corpus.

    • datasetcatalog.nlm.nih.gov
    • plos.figshare.com
    Updated Mar 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Feng, Jiaming; Li, Dapeng; He, Sike; Wen, Ju; Liu, Chang-Hai; Xiong, Ying; Liu, Dan (2023). Descriptive statistics of the WHO COVID-19 press conference corpus. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000946120
    Explore at:
    Dataset updated
    Mar 13, 2023
    Authors
    Feng, Jiaming; Li, Dapeng; He, Sike; Wen, Ju; Liu, Chang-Hai; Xiong, Ying; Liu, Dan
    Description

    Descriptive statistics of the WHO COVID-19 press conference corpus.

  4. Hot topics in the WHO COVID-19 press conferences.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    xls
    Updated Jun 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sike He; Dapeng Li; Chang-Hai Liu; Ying Xiong; Dan Liu; Jiaming Feng; Ju Wen (2023). Hot topics in the WHO COVID-19 press conferences. [Dataset]. http://doi.org/10.1371/journal.pone.0282855.t002
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 21, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sike He; Dapeng Li; Chang-Hai Liu; Ying Xiong; Dan Liu; Jiaming Feng; Ju Wen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    ObjectivesThe objective of this study is to investigate, from a longitudinal perspective, how WHO communicated COVID-19 related information to the public through its press conferences during the first two years of the pandemic.MethodsThe transcripts of 195 WHO COVID-19 press conferences held between January 22, 2020 and February 23, 2022 were collected. All transcripts were syntactically parsed to extract highly frequent noun chunks that were potential topics of the press conferences. First-order autoregression models were fit to identify “hot” and “cold” topics. In addition, sentiments and emotions expressed in the transcripts were analyzed using lexicon-based sentiment/emotion analyses. Mann-Kendall tests were performed to capture the possible trends of sentiments and emotions over time.ResultsFirst, eleven “hot” topics were identified. These topics were pertinent to anti-pandemic measures, disease surveillance and development, and vaccine-related issues. Second, no significant trend was captured in sentiments. Last, significant downward trends were found in anticipation, surprise, anger, disgust, and fear. However, no significant trends were found in joy, trust, and sadness.ConclusionsThis retrospective study provided new empirical evidence on how WHO communicated issues pertaining to COVID-19 to the general public through its press conferences. With the help of the study, members of the general public, health organizations, and other stake-holders will be able to better understand the way in which WHO has responded to various critical events during the first two years of the pandemic.

  5. Press Conferences COVID-19 USA

    • kaggle.com
    zip
    Updated Jun 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Varisha25 (2020). Press Conferences COVID-19 USA [Dataset]. https://www.kaggle.com/datasets/varisha25/trumpusa
    Explore at:
    zip(277855 bytes)Available download formats
    Dataset updated
    Jun 23, 2020
    Authors
    Varisha25
    Area covered
    United States
    Description

    Dataset

    This dataset was created by Varisha25

    Contents

  6. Press Releases COVID.xlsx

    • figshare.com
    xlsx
    Updated Oct 31, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Greta Maras (2022). Press Releases COVID.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.21433614.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Oct 31, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Greta Maras
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This dataset contains all press releases published by every member of the House of Representatives from their first mention of the COVID-19 pandemic in 2020 until May 30, 2020. Notes about missing data:

    Press releases for former Representative Anthony Brindisi (NY-22) could not be obtained via the Wayback Machine. Several seats were vacant during the time period noted, and others were filled in the middle of the time period. Chris Collins (NY-27) had no successor, Kwesi Mfume (MD-7) was sworn in on May 5, 2020, Mike Garcia (CA-25) assumed office on May 19, 2020 following the vacancy left by former Representative Katie Hill, Mark Meadows (NC-11) left his position as a Representative to become White House Chief of Staff on March 30, 2020, John Ratcliffe (TX-4) left his position on May 22, 2020, and Thomas Tiffany (WI-7) was sworn in on May 19, 2020 after former Representative Sean Duffy resigned in 2019.

  7. Transport use during the coronavirus (COVID-19) pandemic and developing...

    • s3.amazonaws.com
    Updated Nov 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department for Transport (2021). Transport use during the coronavirus (COVID-19) pandemic and developing faster indicators of transport activity [Dataset]. https://s3.amazonaws.com/thegovernmentsays-files/content/176/1765494.html
    Explore at:
    Dataset updated
    Nov 10, 2021
    Dataset provided by
    GOV.UKhttp://gov.uk/
    Authors
    Department for Transport
    Description

    These statistics on transport use are published weekly.

    For each day, the Department for Transport produces statistics on domestic transport:

    1. road traffic in Great Britain
    2. rail passenger journeys in Great Britain
    3. Transport for London (TfL) tube and bus routes
    4. bus travel in Great Britain (excluding London)
    5. cycling in England

    The full time series for these statistics, starting 1 March 2020, is usually published here every Wednesday at 9.30am.

    The associated methodology notes set out information on the data sources and methodology used to generate these headline measures.

    For the charts previously published alongside daily coronavirus press conferences, please see the slides and datasets to accompany coronavirus press conferences.

    ModePublication and linkLatest period covered and next publication
    Road trafficRoad traffic statisticsQuarterly data up to September 2020 was published December 2020.

    Full annual data up to December 2020 will be published on 28 April 2021.

    Statistics for the first quarter of 2021 are expected in June 2021.
    Rail usageThe Office of Rail and Road (ORR) publishes a range of statistics including passenger and freight rail performance and usage. Statistics are available at the https://www.orr.gov.uk/published-statistics" class="govuk-link">ORR website



    Statistics for rail passenger numbers and crowding on weekdays in major cities in England and Wales are published by DfT
    ORR’s quarterly rail usage statistics for 2020 to 2021 were published on 11 March 2021.

    Quarterly data up to March 2021 and annual data for 2020 to 2021 will be published on 3 June 2021.

    DfT’s most recent annual passenger numbers and crowding statistics for 2019 were published on 24 September 2020. Statistics for 2020 will be released in summer 2021.
    Bus usageBus statisticsThe most recent annual publication covered the year ending March 2020.

    The data for the year ending March 2021 is due to be published in October 2021.

    The most recent quarterly publication covered October to December 2020. The data for January to March 2021 is due to be published in June 2021.
    TFL tube and bus usageData on buses is covered by the section above. https://tfl.gov.uk/status-updates/busiest-times-to-travel" class="govuk-link">Station level business data is available.
    Cycling usageWalking and cycling statistics, England2019 calendar year

    2020 calendar year data is due to be published in August 2021
    Cross Modal and journey by purposeNational Travel Survey2019 calendar year

    2020 calendar year data is due to be published in August 2021
  8. Coronavirus Source Data (COVID-19) Daily reports

    • kaggle.com
    zip
    Updated Mar 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassine Hamdaoui (2020). Coronavirus Source Data (COVID-19) Daily reports [Dataset]. https://www.kaggle.com/yassinehamdaoui1/coronavirus-source-data-covid19-daily-reports
    Explore at:
    zip(22189 bytes)Available download formats
    Dataset updated
    Mar 12, 2020
    Authors
    Yassine Hamdaoui
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    Context

    On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organization declared the outbreak a public health emergency of international concern (PHEIC). On January 31, 2020, Health and Human Services Secretary Alex M. Azar II declared a public health emergency (PHE) for the United States to aid the nation’s healthcare community in responding to COVID-19. On March 11, 2020 WHO publicly characterized COVID-19 as a pandemic.

    Content

    The data files present the total confirmed cases, total deaths and daily new cases and deaths by country. This data is sourced from the World Health Organization (WHO) Situation Reports (which you find here). The WHO Situation Reports are published daily [reporting data as of 10am (CET; Geneva time)]. The main section of the Situations Reports are long tables of the latest number of confirmed cases and confirmed deaths by country.

    This dataset has five files : - total_cases.csv : Total confirmed cases - total_deaths.csv : Total deaths - new_cases.csv : New confirmed cases - new_deathes.csv : New deaths - full_data.csv : put it all files together

    Acknowledgements

    This dataset is sourced from WHO and confirmed by OurworldInData Special Thank to Hannah Ritchie that did a great reports explaining those datasets.

    Inspiration

    Insights on - Confirmed cases is what we do know - Confirmed COVID-19 cases by country - How we can make preventive measures - Growth of cases: How long did it take for the number of confirmed cases to double? - Understanding exponential growth - Try to predict the spread of COVID-19 ahead of time .

  9. Table1_Shortcomings in Public Health Authorities’ Videos on COVID-19:...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Therese Shortt; Ionica Smeets; Siri Wiig; Siv Hilde Berg; Daniel Adrian Lungu; Henriette Thune; Jo Røislien (2023). Table1_Shortcomings in Public Health Authorities’ Videos on COVID-19: Limited Reach and a Creative Gap.XLSX [Dataset]. http://doi.org/10.3389/fcomm.2021.764220.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Marie Therese Shortt; Ionica Smeets; Siri Wiig; Siv Hilde Berg; Daniel Adrian Lungu; Henriette Thune; Jo Røislien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Video communication has played a key role in relaying important and complex information on the COVID-19 pandemic to the general public. The aim of the present study is to compare Norwegian health authorities’ and WHO’s use of video communication during the COVID-19 pandemic to the most viewed COVID-19 videos on YouTube, in order to identify how videos created by health authorities measure up to contemporary video content, both creatively and in reaching video consumers. Through structured search on YouTube we found that Norwegian health authorities have published 26 videos, and the WHO 29 videos on the platform. Press briefings, live videos, news reports, and videos recreated/translated into other languages than English or Norwegian, were not included. A content analysis comparing the 55 videos by the health authorities to the 27 most viewed videos on COVID-19 on YouTube demonstrates poor reach of health authorities’ videos in terms of views and it elucidates a clear creative gap. While the videos created by various YouTube creators communicate using a wide range of creative presentation means (such as professional presenters, contextual backgrounds, advanced graphic animations, and humour), videos created by the health authorities are significantly more homogenous in style often using field experts or public figures, plain backgrounds or PowerPoint style animations. We suggest that further studies into various creative presentation means and their influence on reach, recall, and on different groups of the population, are carried out in the future to evaluate specific factors of this creative gap.

  10. A

    Communicating the COVID-19 Crisis: A Comparative Analysis of Crisis...

    • data.aussda.at
    • dv05.aussda.at
    pdf, tsv, zip
    Updated Oct 2, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Lore Hayek; Lore Hayek; Sarah C. Dingler; Sarah C. Dingler; Christian Schwaderer; Christian Schwaderer; Martin Senn; Martin Senn; Andreas M. Kraxberger; Andreas M. Kraxberger; Nada Ragheb; Nada Ragheb; Fiona C. Nordone; Fiona C. Nordone (2025). Communicating the COVID-19 Crisis: A Comparative Analysis of Crisis Communication by Governments and Heads of State (SUF edition) [Dataset]. http://doi.org/10.11587/RWHCSF
    Explore at:
    tsv(118112), tsv(287823), zip(27074), zip(43474), pdf(395403), tsv(14932922), pdf(141823), zip(4353382), pdf(88033)Available download formats
    Dataset updated
    Oct 2, 2025
    Dataset provided by
    AUSSDA
    Authors
    Lore Hayek; Lore Hayek; Sarah C. Dingler; Sarah C. Dingler; Christian Schwaderer; Christian Schwaderer; Martin Senn; Martin Senn; Andreas M. Kraxberger; Andreas M. Kraxberger; Nada Ragheb; Nada Ragheb; Fiona C. Nordone; Fiona C. Nordone
    License

    https://data.aussda.at/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.11587/RWHCSFhttps://data.aussda.at/api/datasets/:persistentId/versions/2.0/customlicense?persistentId=doi:10.11587/RWHCSF

    Area covered
    Austria, New Zealand, Italy, Sweden, Republic of, Korea, Hungary, Israel, France, Spain, Canada
    Dataset funded by
    Austrian Science Fund (FWF)
    Description

    Full edition for scientific use. The COVID-19 pandemic confronted the world with a crisis that was massive in scale, rapid in pace, and global in scope. Governments all over the world were confronted with the challenging task of steering anxious publics through the crisis and they chose televised press conferences as their preferred means of communication. This dataset contains transcripts of 1211 press conferences with 522 speakers in 18 OECD countries and three U.S. states throughout the initial phase of COVID-19, beginning with the first public mention of COVID-19 in a press conference and ending with the first announcement of easing restrictions. In addition, the dataset includes contextual data on the countries and publicly available biographical data on the speakers.

  11. Transcripts Press Conferences NL on COVID-19

    • kaggle.com
    zip
    Updated May 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Varisha25 (2020). Transcripts Press Conferences NL on COVID-19 [Dataset]. https://www.kaggle.com/varisha25/transcripts-press-conferences-nl
    Explore at:
    zip(502385 bytes)Available download formats
    Dataset updated
    May 3, 2020
    Authors
    Varisha25
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    In this dataset you can find all press conferences including tv speeches and a speech from the king of the Netherlands on the COVID-19 situation.

    Content

    Content ranges from the very first press conference on the coronavirus situation until the 1st of May. All files will be updated weekly. Note that all transcripts are in Dutch.

    Acknowledgements

    All transcripts were retrieved from the Rijksoverheid website.

    Inspiration

    What can we retrieve from the transcripts using text mining?

  12. Coronavirus in Italy (COVID-19)

    • kaggle.com
    zip
    Updated Oct 11, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Mooney (2025). Coronavirus in Italy (COVID-19) [Dataset]. https://www.kaggle.com/paultimothymooney/coronavirus-in-italy
    Explore at:
    zip(457027318 bytes)Available download formats
    Dataset updated
    Oct 11, 2025
    Authors
    Paul Mooney
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Area covered
    Italy
    Description

    Context

    To inform citizens and make the collected data available, the Department of Civil Protection has developed an interactive geographic dashboard accessible at the addresses http://arcg.is/C1unv (desktop version) and http://arcg.is/081a51 (mobile version) and makes available, with CC-BY-4.0 license, the following information updated daily at 18:30 (after the Head of Department press conference). For more detail, see https://github.com/pcm-dpc/COVID-19.

    Content

    COVID-19 data Italy

    National trend Json data Provinces data Regions data Summary cards Areas Repository structure COVID-19 / │ ├── national-trend / │ ├── dpc-covid19-eng-national-trend-yyyymmdd.csv ├── areas / │ ├── geojson │ │ ├── dpc-covid19-ita-aree.geojson │ ├── shp │ │ ├── dpc-covid19-eng-areas.shp ├── data-provinces / │ ├── dpc-covid19-ita-province-yyyymmdd.csv ├── data-json / │ ├── dpc-covid19-eng - *. Json ├── data-regions / │ ├── dpc-covid19-eng-regions-yyyymmdd.csv ├── summary-sheets / │ ├── provinces │ │ ├── dpc-covid19-ita-scheda-province-yyyymmdd.pdf │ ├── regions │ │ ├── dpc-covid19-eng-card-regions-yyyymmdd.pdf

    Data by Region Directory: data-regions Daily file structure: dpc-covid19-ita-regions-yyyymmdd.csv (dpc-covid19-ita-regions-20200224.csv) Overall file: dpc-covid19-eng-regions.csv An overall JSON file of all dates is made available in the "data-json" folder: dpc-covid19-eng-regions.json

    Data by Province Directory: data-provinces Daily file structure: dpc-covid19-ita-province-yyyymmdd.csv (dpc-covid19-ita-province-20200224.csv) Overall file: dpc-covid19-ita-province.csv

    Acknowledgements

    Banner photo by CDC on Unsplash

    Data from https://github.com/pcm-dpc/COVID-19 released under a CC 4.0 license. See https://github.com/pcm-dpc/COVID-19 for more detail.

  13. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    University of Memphis, USA
    Independent University, Bangladesh
    Silicon Orchard Lab, Bangladesh
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  14. COVID-19 Taiwan data, including individual course of disease

    • figshare.com
    xlsx
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Heng Wu; Torbjörn Nordling (2024). COVID-19 Taiwan data, including individual course of disease [Dataset]. http://doi.org/10.6084/m9.figshare.24623964.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yu-Heng Wu; Torbjörn Nordling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Taiwan
    Description

    This dataset encompasses information on 579 confirmed COVID-19 cases in Taiwan, spanning from January 21 to November 9, 2020. The dataset includes various features such as travel history, age, gender, onset of symptoms, confirmed date, symptoms, critically ill date, recovered date, death date, and details on contact types between cases.In addition to individual case data, supplementary daily summary information is provided, sourced from the Taiwan CDC and covering the period from January 21, 2020, to May 23, 2022. This supplementary dataset furnishes population-level insights into the progression of the COVID-19 pandemic in Taiwan.Data Fields:Travel HistoryAgeGenderOnset of SymptomsConfirmed DateSymptomsCritically Ill DateRecovered DateDeath DateContact Types Between CasesTemporal Coverage:Individual Case Data: January 21, 2020, to November 9, 2020Daily Summary Data: January 21, 2020, to May 23, 2022Source:Taiwan Centers for Disease Control press release (CDC press release)United Daily News (COVID-19 Visualization)Taiwan CDC Open Data Portal, Regents of the National Center for High-performance Computing (COVID-19 Dashboard)Taiwan Centers for Disease Control open data portal (CDC open data portal)Taiwan Centers for Disease Control press conference (CDC press conference)

  15. Country-wise weather data for covid19

    • kaggle.com
    zip
    Updated Apr 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sudhir Kakumanu (2020). Country-wise weather data for covid19 [Dataset]. https://www.kaggle.com/ksudhir/weather-data-countries-covid19
    Explore at:
    zip(35118 bytes)Available download formats
    Dataset updated
    Apr 2, 2020
    Authors
    Sudhir Kakumanu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    COVID-19 CORONAVIRUS PANDEMIC has over 1 million cases worldwide. This dataset is created in an attempt to uncover if there is a co-relation of the country wise weather parameters with growing number of cases day by day.

    Many questions raised on the effects of Seasonality to SARS-CoV-2.

    According to the officials of WHO, press conference transcript on 05-mar-2020 speaker Dr Maria van Kerkhove answered - "so we’ve had some questions previously about what this virus will do in different climates, in different temperatures ?"

    We have no reason to believe that this virus would behave differently in different temperatures. We have no reason to believe that this virus would behave differently in different temperatures, which is why we want aggressive action in all countries to make sure that we prevent onward transmission, and that it’s taken seriously in every country. But this is something that will be of interest. We have the... In the northern hemisphere we have the flu season, which was ending fairly soon, and in the southern hemisphere we’ll have the flu season starting. And so it will be interesting to see what will happen in the northern hemisphere and the southern hemisphere. But to look at seasonality you need to look at patterns over time, and we do need some of that time to be able to see what happens. So it’s important that we aggressively look for cases, and so that we can understand the extent of infection and how the virus behaves in different populations.

    Some believe temperature will play a role in the outbreak but that the subject was worth investigating. Few studies by Harward CSPH, BBC, Bloomberg, Centre for Evidence-Based Medicine develops

    Content

    Basic weather parameters like, min/max temperature and humidity captured since 1/22/2020. Each country has three rows defining the weather parameters over the time. The structure is kept to be inline with Data Repository by Johns Hopkins CSSE.

    Acknowledgements

    Country names are picked from: https://github.com/CSSEGISandData/COVID-19

    Inspiration

    https://github.com/kakumanu-sudhir/covid19/tree/master/weather_data_extraction The data begins with the first reported coronavirus case on Jan. 21, 2020. I plan to publish regular updates (weekly twice till WK23) to the data in this repository.

  16. m

    Easier Said than Done: Are Responses to the COVID-19 Pandemic Inclusive?

    • data.mendeley.com
    Updated Jul 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jaclyn Yap (2020). Easier Said than Done: Are Responses to the COVID-19 Pandemic Inclusive? [Dataset]. http://doi.org/10.17632/7v6rzdst99.3
    Explore at:
    Dataset updated
    Jul 18, 2020
    Authors
    Jaclyn Yap
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This folder contains primary data for Sign Language Interpretation (SLI) collected from various web sources. The scope of the data collection includes electronically and publicly available government press conferences and press briefings for COVID-19 issued during February 28 to May 26, 2020, as COVID-19 cases spread from Wuhan, China to other countries across the globe. This folder also includes the R-code for replicating the spatial graph and the processed SLI data used for the code.

  17. l

    CIC Media: Policy Brief

    • figshare.le.ac.uk
    pdf
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte King; Diane Levine; Fransiska Louwagie; Sarah Weidman; Kara Blackmore (2022). CIC Media: Policy Brief [Dataset]. http://doi.org/10.25392/leicester.data.20038538.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    University of Leicester
    Authors
    Charlotte King; Diane Levine; Fransiska Louwagie; Sarah Weidman; Kara Blackmore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Covid in Cartoons project engaged 15-18 year olds with political cartoons and cartoonists to foster processes of meaning-making in relation to the pandemic. Working with Cartooning for Peace and ShoutOut UK we engaged young people in building critical narratives of the crisis and its impact on their lives. We aimed to promote an inclusive, socially-responsive curriculum that supports young people's ability to cope in difficult circumstances. We used surveys, focus groups, and records of the participants' experiences in the form of workbooks to gather data. The project was led by Dr Fransiska Louwagie (PI) and Dr Diane Levine (Co-I), with postdoctoral associates Dr Kara Blackmore and Dr Sarah Weidman, and ran between January 2021 and July 2022. The Covid in Cartoons team produced a briefing for policy makers in January 2022. We provided insights from our data relating to the recovery curriculum and priorities represented by the Departments for Education and Culture, Media, and Sport.

  18. B

    Media Briefings by Deena Hinshaw the Chief Medical Officer of Health of...

    • borealisdata.ca
    • search.dataone.org
    Updated Dec 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Geoffrey Rockwell; Bennett Kuwan Tchoh; Katrina Ingram (2023). Media Briefings by Deena Hinshaw the Chief Medical Officer of Health of Alberta [Dataset]. http://doi.org/10.5683/SP3/5IMCW6
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 13, 2023
    Dataset provided by
    Borealis
    Authors
    Geoffrey Rockwell; Bennett Kuwan Tchoh; Katrina Ingram
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Alberta
    Description

    This dataset contains the transcripts of the media briefings given by Dr. Deena Hinshaw the Chief medical Officer of Health of Alberta during the COVID-19 pandemic. The dataset also includes word frequency (raw frequency, relative frequency, TF-IDF) and sentiment analysis of the transcripts. The dataset spans the period of March 2020 to march 2022. Check the readme document for more information on the dataset. (2023-10-30)

  19. COVID-19 Data & scrapy for France South Korea

    • kaggle.com
    zip
    Updated Aug 22, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Grégory LANG (2021). COVID-19 Data & scrapy for France South Korea [Dataset]. https://www.kaggle.com/jeugregg/covid19-data-scrapy-for-france-south-korea
    Explore at:
    zip(6214128 bytes)Available download formats
    Dataset updated
    Aug 22, 2021
    Authors
    Grégory LANG
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    South Korea, France
    Description

    Context

    Try to scrap data from official website of South Korea & France linked to COVID-19 confirmed cases and death in 2020

    Content

    Script to scrap data (France Publique Santé et South Korean KCDC) Results of scrapy : Data of COVID-19 confirmed cases & deaths Use direct link to differents sources : look at Acknowledgements

    I use a very simple R0 model to try to evaluate what would happened without lock-down in Hubei, France, South-Korea, Italy in this https://www.kaggle.com/jeugregg/coronavirus-visualization-modeling

    Acknowledgements

    The world data is taken from https://github.com/CSSEGISandData/COVID-19 provided by JHU CSSE

    South Korea areas data are retrieved with scrapy from online KCDC Press Release articles at https://www.cdc.go.kr/board/board.es?mid=a30402000000&bid=0030.

    France areas data are taken with scrapy from online santepubliquefrance.fr Press articles at https://www.santepubliquefrance.fr/maladies-et-traumatismes/maladies-et-infections-respiratoires/infection-a-coronavirus/articles/infection-au-nouveau-coronavirus-sars-cov-2-covid-19-france-et-monde and https://www.worldometers.info/coronavirus/country/france/ but until 25th March 2020.

    For Global France, data are from https://www.data.gouv.fr/fr/datasets/donnees-relatives-aux-resultats-des-tests-virologiques-covid-19/

    For Global Italy, Germany, Hubei data are from https://www.worldometers.info/coronavirus/

    Inspiration

    What is the result of how each countries try to struggle this virus ?

  20. COVID-19 Bangladesh Dataset

    • kaggle.com
    zip
    Updated Apr 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shuvro Pal (2020). COVID-19 Bangladesh Dataset [Dataset]. https://www.kaggle.com/ridoy11/covid19-bangladesh-dataset
    Explore at:
    zip(1375 bytes)Available download formats
    Dataset updated
    Apr 18, 2020
    Authors
    Shuvro Pal
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Area covered
    Bangladesh
    Description

    Context

    WHO declared COVID-19 as the global pandemic. Data science and research communities all over the world came together to fight against it in this tough time. This dataset contains the datewise updates of the number of confirmed, deaths, recovered, quarantine and released from quarantine cases for Bangladesh. Hopefully it will help the local community to find meaningful insight and find the pattern of the pandemic which may save millions of life.

    Content

    All of data are taken from the Govt.site, WHO, DGHS and Worldometer open source data. The dataset contains all data from the date of March 1, 2020 to April 3, 2020.

    Column Description

    Date- Specific Date
    Confirmed - The number of confirmed cases
    Recovered - The number of recovered cases
    Deaths- The number of death cases
    Quarantine - The number of quarantined cases
    Released From Quarantine - The number of released quarantine cases
    

    Acknowledgements

    Inspiration

    As the dataset contains datewise updates of the coronavirus cases in Bangladesh, feel free to prepare meaningful insights from the data. Share and collaborate to find the factors of pandemic for Bangladesh, make time series calculation and so on. Don't forget to suggest useful dataset to merge along with this dataset. Thanks.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chatsiou, Kakia (2020). COVID-19 Press Briefings Corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3872416

COVID-19 Press Briefings Corpus

Explore at:
Dataset updated
Jun 2, 2020
Dataset provided by
University of Essex
Authors
Chatsiou, Kakia
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Coronavirus (COVID-19) Press Briefings Corpus is a work in progress to collect and present in a machine readable text dataset of the daily briefings from around the world by government authorities. During the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic.

Usually daily briefings did not occur on a Sunday.

At the moment the dataset includes:

UK/England: Daily Press Briefings by UK Government between 12 March 2020 - 01 June 2020 (70 briefings in total)

Scotland: Daily Press Briefings by Scottish Government between 3 March 2020 - 01 June 2020 (76 briefings in total)

Wales: Daily Press Briefings by Welsh Government between 23 March 2020 - 01 June 2020 (56 briefings in total)

Northern Ireland: Daily Press Briefings by N. Ireland Assembly between 23 March 2020 - 01 June 2020 (56 briefings in total)

World Health Organisation: Press Briefings occuring usually every 2 days between 22 January 2020 - 01 June 2020 (63 briefings in total)

More countries will be added in due course, and we will be keeping this updated to cover the latest daily briefings available.

The corpus is compiled to allow for further automated political discourse analysis (classification).

Search
Clear search
Close search
Google apps
Main menu