37 datasets found
  1. Z

    COVID-19 Press Briefings Corpus

    • data.niaid.nih.gov
    • live.european-language-grid.eu
    • +1more
    Updated Jun 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chatsiou, Kakia (2020). COVID-19 Press Briefings Corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3872416
    Explore at:
    Dataset updated
    Jun 2, 2020
    Dataset provided by
    University of Essex
    Authors
    Chatsiou, Kakia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Coronavirus (COVID-19) Press Briefings Corpus is a work in progress to collect and present in a machine readable text dataset of the daily briefings from around the world by government authorities. During the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic.

    Usually daily briefings did not occur on a Sunday.

    At the moment the dataset includes:

    UK/England: Daily Press Briefings by UK Government between 12 March 2020 - 01 June 2020 (70 briefings in total)

    Scotland: Daily Press Briefings by Scottish Government between 3 March 2020 - 01 June 2020 (76 briefings in total)

    Wales: Daily Press Briefings by Welsh Government between 23 March 2020 - 01 June 2020 (56 briefings in total)

    Northern Ireland: Daily Press Briefings by N. Ireland Assembly between 23 March 2020 - 01 June 2020 (56 briefings in total)

    World Health Organisation: Press Briefings occuring usually every 2 days between 22 January 2020 - 01 June 2020 (63 briefings in total)

    More countries will be added in due course, and we will be keeping this updated to cover the latest daily briefings available.

    The corpus is compiled to allow for further automated political discourse analysis (classification).

  2. Share of people watching the daily Government briefing in the UK March-June...

    • statista.com
    Updated Jul 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2025). Share of people watching the daily Government briefing in the UK March-June 2020 [Dataset]. https://www.statista.com/statistics/1111869/government-coronavirus-briefing-audience-uk/
    Explore at:
    Dataset updated
    Jul 9, 2025
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 2020 - Jun 2020
    Area covered
    United Kingdom
    Description

    The UK Government has been holding daily press briefings in order to provide updates on the coronavirus (COVID-19) pandemic and outline any new measures being put in place to deal with the outbreak. Boris Johnson announced that the UK would be going into lockdown in a broadcast on March 23 which was watched live by more than half of the respondents to a daily survey. On June 28, just ** percent of respondents said they had not watched or read about the previous day's briefing. For further information about the coronavirus (COVID-19) pandemic, please visit our dedicated Facts and Figures page.

  3. Coronavirus Source Data (COVID-19) Daily reports

    • kaggle.com
    zip
    Updated Mar 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yassine Hamdaoui (2020). Coronavirus Source Data (COVID-19) Daily reports [Dataset]. https://www.kaggle.com/yassinehamdaoui1/coronavirus-source-data-covid19-daily-reports
    Explore at:
    zip(22189 bytes)Available download formats
    Dataset updated
    Mar 12, 2020
    Authors
    Yassine Hamdaoui
    License

    Attribution-NoDerivs 4.0 (CC BY-ND 4.0)https://creativecommons.org/licenses/by-nd/4.0/
    License information was derived automatically

    Description

    Context

    On January 30, 2020, the International Health Regulations Emergency Committee of the World Health Organization declared the outbreak a public health emergency of international concern (PHEIC). On January 31, 2020, Health and Human Services Secretary Alex M. Azar II declared a public health emergency (PHE) for the United States to aid the nation’s healthcare community in responding to COVID-19. On March 11, 2020 WHO publicly characterized COVID-19 as a pandemic.

    Content

    The data files present the total confirmed cases, total deaths and daily new cases and deaths by country. This data is sourced from the World Health Organization (WHO) Situation Reports (which you find here). The WHO Situation Reports are published daily [reporting data as of 10am (CET; Geneva time)]. The main section of the Situations Reports are long tables of the latest number of confirmed cases and confirmed deaths by country.

    This dataset has five files : - total_cases.csv : Total confirmed cases - total_deaths.csv : Total deaths - new_cases.csv : New confirmed cases - new_deathes.csv : New deaths - full_data.csv : put it all files together

    Acknowledgements

    This dataset is sourced from WHO and confirmed by OurworldInData Special Thank to Hannah Ritchie that did a great reports explaining those datasets.

    Inspiration

    Insights on - Confirmed cases is what we do know - Confirmed COVID-19 cases by country - How we can make preventive measures - Growth of cases: How long did it take for the number of confirmed cases to double? - Understanding exponential growth - Try to predict the spread of COVID-19 ahead of time .

  4. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Independent University, Bangladesh
    University of Memphis, USA
    Silicon Orchard Lab, Bangladesh
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  5. d

    COVID-19 Cases and Deaths by Age Group - ARCHIVE

    • catalog.data.gov
    • data.ct.gov
    • +1more
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). COVID-19 Cases and Deaths by Age Group - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-age-group
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    data.ct.gov
    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken out by age group. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update. Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes. Starting in July 2020, this dataset will be updated every weekday. Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020. A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports. Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.

  6. Johns Hopkins COVID-19 Case Tracker

    • kaggle.com
    • data.world
    Updated Aug 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cansin Wayne (2020). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://www.kaggle.com/datasets/thecansin/johns-hopkins-covid19-case-tracker
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 16, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Cansin Wayne
    Description

    DESCRIPTION

    Johns Hopkins' county-level COVID-19 case and death data, paired with population and rates per 100,000

    SUMMARY Updates April 9, 2020 The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County. April 20, 2020 Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well. April 29, 2020 The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

    Overview The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Filter cases by state here

    Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

    Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

    Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

    Pull the 100 counties with the highest per-capita confirmed cases here

    Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

    Interactive Embed Code

    Caveats This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website. In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules. In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county" This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members. Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates. Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey. The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories --...

  7. d

    COVID-19 Cases and Deaths by Gender - ARCHIVE

    • catalog.data.gov
    • data.ct.gov
    Updated Aug 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    data.ct.gov (2023). COVID-19 Cases and Deaths by Gender - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-gender
    Explore at:
    Dataset updated
    Aug 12, 2023
    Dataset provided by
    data.ct.gov
    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by gender. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes. Starting in Ju

  8. COVID-19 Mexico

    • kaggle.com
    zip
    Updated Apr 19, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carlos Lira (2020). COVID-19 Mexico [Dataset]. https://www.kaggle.com/carloslira/covid19-mexico
    Explore at:
    zip(52739 bytes)Available download formats
    Dataset updated
    Apr 19, 2020
    Authors
    Carlos Lira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Mexico
    Description

    Content

    COVID-19 data for Mexico, consist of two main datasets: time_series_confirmed_MX: time series of confirmed cases by state. time_series_deaths_MX: time series of deaths by state The data will be updated every day at the start of Secretaría de Salud conference (18:00), with last information recived at 13:00.
    If you want the data in github form: https://github.com/carloscerlira/COVIDMX.

    Source

    https://www.gob.mx/salud/archivo/documentos?idiom=es&filter_id=395&filter_origin=archive

  9. Table1_Shortcomings in Public Health Authorities’ Videos on COVID-19:...

    • frontiersin.figshare.com
    • datasetcatalog.nlm.nih.gov
    xlsx
    Updated May 30, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Marie Therese Shortt; Ionica Smeets; Siri Wiig; Siv Hilde Berg; Daniel Adrian Lungu; Henriette Thune; Jo Røislien (2023). Table1_Shortcomings in Public Health Authorities’ Videos on COVID-19: Limited Reach and a Creative Gap.XLSX [Dataset]. http://doi.org/10.3389/fcomm.2021.764220.s001
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Marie Therese Shortt; Ionica Smeets; Siri Wiig; Siv Hilde Berg; Daniel Adrian Lungu; Henriette Thune; Jo Røislien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Video communication has played a key role in relaying important and complex information on the COVID-19 pandemic to the general public. The aim of the present study is to compare Norwegian health authorities’ and WHO’s use of video communication during the COVID-19 pandemic to the most viewed COVID-19 videos on YouTube, in order to identify how videos created by health authorities measure up to contemporary video content, both creatively and in reaching video consumers. Through structured search on YouTube we found that Norwegian health authorities have published 26 videos, and the WHO 29 videos on the platform. Press briefings, live videos, news reports, and videos recreated/translated into other languages than English or Norwegian, were not included. A content analysis comparing the 55 videos by the health authorities to the 27 most viewed videos on COVID-19 on YouTube demonstrates poor reach of health authorities’ videos in terms of views and it elucidates a clear creative gap. While the videos created by various YouTube creators communicate using a wide range of creative presentation means (such as professional presenters, contextual backgrounds, advanced graphic animations, and humour), videos created by the health authorities are significantly more homogenous in style often using field experts or public figures, plain backgrounds or PowerPoint style animations. We suggest that further studies into various creative presentation means and their influence on reach, recall, and on different groups of the population, are carried out in the future to evaluate specific factors of this creative gap.

  10. d

    Media Briefings by Deena Hinshaw the Chief Medical Officer of Health of...

    • search.dataone.org
    • borealisdata.ca
    Updated Dec 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rockwell, Geoffrey; Tchoh, Bennett Kuwan; Ingram, Katrina (2023). Media Briefings by Deena Hinshaw the Chief Medical Officer of Health of Alberta [Dataset]. http://doi.org/10.5683/SP3/5IMCW6
    Explore at:
    Dataset updated
    Dec 28, 2023
    Dataset provided by
    Borealis
    Authors
    Rockwell, Geoffrey; Tchoh, Bennett Kuwan; Ingram, Katrina
    Description

    This dataset contains the transcripts of the media briefings given by Dr. Deena Hinshaw the Chief medical Officer of Health of Alberta during the COVID-19 pandemic. The dataset also includes word frequency (raw frequency, relative frequency, TF-IDF) and sentiment analysis of the transcripts. The dataset spans the period of March 2020 to march 2022. Check the readme document for more information on the dataset. (2023-10-30)

  11. Z

    Data from: PANACEA dataset - Heterogeneous COVID-19 Claims

    • data.niaid.nih.gov
    Updated Jul 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arana-Catania, Miguel; Kochkina, Elena; Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; He, Yulan (2022). PANACEA dataset - Heterogeneous COVID-19 Claims [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_6493846
    Explore at:
    Dataset updated
    Jul 15, 2022
    Dataset provided by
    Queen-Mary University of London
    University of Warwick
    Authors
    Arana-Catania, Miguel; Kochkina, Elena; Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; He, Yulan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.

    This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.

    The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).

    The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).

    The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.

    The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.

    The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.

    The data sources used are:

    The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).

    The entries in the dataset contain the following information:

    • Claim. Text of the claim.

    • Claim label. The labels are: False, and True.

    • Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.

    • Original information source. Information about which general information source was used to obtain the claim.

    • Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.

    Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).

    References

    • Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596

    • Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.

    • Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.

    • Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.

    • Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.

    • Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.

    • Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.

    • Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.

    • Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.

    • Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.

    • Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.

    • Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.

  12. O

    COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE

    • data.ct.gov
    • catalog.data.gov
    csv, xlsx, xml
    Updated Jun 24, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Department of Public Health (2022). COVID-19 Tests, Cases, Hospitalizations, and Deaths (Statewide) - ARCHIVE [Dataset]. https://data.ct.gov/w/rf3k-f8fg/wqz6-rhce?cur=vOuL1lYLRwf
    Explore at:
    csv, xlsx, xmlAvailable download formats
    Dataset updated
    Jun 24, 2022
    Dataset authored and provided by
    Department of Public Health
    License

    U.S. Government Workshttps://www.usa.gov/government-works
    License information was derived automatically

    Description

    Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

    The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

    The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

    The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

    The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

    COVID-19 tests, cases, and associated deaths that have been reported among Connecticut residents. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Hospitalization data were collected by the Connecticut Hospital Association and reflect the number of patients currently hospitalized with laboratory-confirmed COVID-19. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update.

    Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

    Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes.

    Starting in July 2020, this dataset will be updated every weekday.

    Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.

    A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.

    A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.

    Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.

    Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.

    On 5/16/2022, 8,622 historical cases were included in the data. The date range for these cases were from August 2021 – April 2022.”

  13. COVID-19 News by CDC and WHO

    • kaggle.com
    zip
    Updated Apr 29, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    xhlulu (2020). COVID-19 News by CDC and WHO [Dataset]. https://www.kaggle.com/xhlulu/covid19-public-health-news-by-cdc-and-who
    Explore at:
    zip(258874 bytes)Available download formats
    Dataset updated
    Apr 29, 2020
    Authors
    xhlulu
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Context

    This contains all the press releases, conference call transcripts, official statements, and other newsroom details published by CDC and WHO.

    Method

    Everything was scraped using webscraper.io, and pre-processed with pandas. I did not modify any of the original content. The latest scrape was on April 22.

    Sources

    All the WHO data were retrieved from here, and the CDC data were taken from here.

  14. Datasets for Tweets from Anonymous Physicians about COVID-19 in the U.S.

    • zenodo.org
    • live.european-language-grid.eu
    csv
    Updated Oct 1, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Katherine J. Sullivan; Katherine J. Sullivan (2020). Datasets for Tweets from Anonymous Physicians about COVID-19 in the U.S. [Dataset]. http://doi.org/10.5281/zenodo.4060340
    Explore at:
    csvAvailable download formats
    Dataset updated
    Oct 1, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Katherine J. Sullivan; Katherine J. Sullivan
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    This dataset was created for a project that assessed Twitter data from physicians posted anonymously by administrators of a specific Twitter user page to better understand physician perspectives and sentiments about COVID-19 in the United States.

    Tweet identifiers are contained in the 'tweet_identifiers.csv file'

    Other files contain sentiment analysis data; one file used vaderSentiment in Python 3, and the other file used NRC in R (see sources below for further information and use of these packages.

    1. Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
    2. NRC Emotion Lexicon, Saif M. Mohammad and Peter D. Turney, NRC Technical Report, December 2013, Ottawa, Canada.
    3. Jockers ML (2015). Syuzhet: Extract Sentiment and Plot Arcs from Text. https://github.com/mjockers/syuzhet.

    Code used specifically for this project may be found at: https://github.com/sullkath/tweet_analysis

    Link to paper publication:

    Pre-print in bioRxiv available at:

  15. l

    CIC Media: Policy Brief

    • figshare.le.ac.uk
    pdf
    Updated Jul 5, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlotte King; Diane Levine; Fransiska Louwagie; Sarah Weidman; Kara Blackmore (2022). CIC Media: Policy Brief [Dataset]. http://doi.org/10.25392/leicester.data.20038538.v1
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jul 5, 2022
    Dataset provided by
    University of Leicester
    Authors
    Charlotte King; Diane Levine; Fransiska Louwagie; Sarah Weidman; Kara Blackmore
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The Covid in Cartoons project engaged 15-18 year olds with political cartoons and cartoonists to foster processes of meaning-making in relation to the pandemic. Working with Cartooning for Peace and ShoutOut UK we engaged young people in building critical narratives of the crisis and its impact on their lives. We aimed to promote an inclusive, socially-responsive curriculum that supports young people's ability to cope in difficult circumstances. We used surveys, focus groups, and records of the participants' experiences in the form of workbooks to gather data. The project was led by Dr Fransiska Louwagie (PI) and Dr Diane Levine (Co-I), with postdoctoral associates Dr Kara Blackmore and Dr Sarah Weidman, and ran between January 2021 and July 2022. The Covid in Cartoons team produced a briefing for policy makers in January 2022. We provided insights from our data relating to the recovery curriculum and priorities represented by the Departments for Education and Culture, Media, and Sport.

  16. COVID-19: potential media rights revenue loss per school for March Madness...

    • statista.com
    Updated Mar 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). COVID-19: potential media rights revenue loss per school for March Madness 2020 [Dataset]. https://www.statista.com/statistics/1104209/coronavirus-revenue-loss-march-madness-media-rights/
    Explore at:
    Dataset updated
    Mar 12, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    The COVID-19 pandemic at the beginning of 2020 hit the sports industry hard. Many leagues across the globe suspended their seasons, including the NCAA’s Division I men’s basketball tournament, also known as March Madness. This college basketball tournament is very lucrative for the NCAA and its potential complete cancellation could mean a loss of ** million U.S. dollars per school in the Southeastern Conference (SEC) due to the media rights deal with CBS that is now in jeopardy.

  17. COVID-19 Sentiment: 500K Instagram Posts (2020-24)

    • kaggle.com
    zip
    Updated Oct 21, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nirmalya Thakur, PhD (2024). COVID-19 Sentiment: 500K Instagram Posts (2020-24) [Dataset]. https://www.kaggle.com/datasets/thakurnirmalya/covid-19-sentiment-500k-instagram-posts-2020-24
    Explore at:
    zip(118444389 bytes)Available download formats
    Dataset updated
    Oct 21, 2024
    Authors
    Nirmalya Thakur, PhD
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    Please cite the following paper when using this dataset:

    N. Thakur, “Five Years of COVID-19 Discourse on Instagram: A Labeled Instagram Dataset of Over Half a Million Posts for Multilingual Sentiment Analysis”, Proceedings of the 7th International Conference on Machine Learning and Natural Language Processing (MLNLP 2024), Chengdu, China, October 18-20, 2024 (Paper accepted for publication, Preprint available at: https://arxiv.org/abs/2410.03293)

    Abstract

    The outbreak of COVID-19 served as a catalyst for content creation and dissemination on social media platforms, as such platforms serve as virtual communities where people can connect and communicate with one another seamlessly. While there have been several works related to the mining and analysis of COVID-19-related posts on social media platforms such as Twitter (or X), YouTube, Facebook, and TikTok, there is still limited research that focuses on the public discourse on Instagram in this context. Furthermore, the prior works in this field have only focused on the development and analysis of datasets of Instagram posts published during the first few months of the outbreak. The work presented in this paper aims to address this research gap and presents a novel multilingual dataset of 500,153 Instagram posts about COVID-19 published between January 2020 and September 2024. This dataset contains Instagram posts in 161 different languages. After the development of this dataset, multilingual sentiment analysis was performed using VADER and twitter-xlm-roberta-base-sentiment. This process involved classifying each post as positive, negative, or neutral. The results of sentiment analysis are presented as a separate attribute in this dataset.

    For each of these posts, the Post ID, Post Description, Date of publication, language code, full version of the language, and sentiment label are presented as separate attributes in the dataset.

    The Instagram posts in this dataset are present in 161 different languages out of which the top 10 languages in terms of frequency are English (343041 posts), Spanish (30220 posts), Hindi (15832 posts), Portuguese (15779 posts), Indonesian (11491 posts), Tamil (9592 posts), Arabic (9416 posts), German (7822 posts), Italian (5162 posts), Turkish (4632 posts)

    There are 535,021 distinct hashtags in this dataset with the top 10 hashtags in terms of frequency being #covid19 (169865 posts), #covid (132485 posts), #coronavirus (117518 posts), #covid_19 (104069 posts), #covidtesting (95095 posts), #coronavirusupdates (75439 posts), #corona (39416 posts), #healthcare (38975 posts), #staysafe (36740 posts), #coronavirusoutbreak (34567 posts)

    The following is a description of the attributes present in this dataset - Post ID: Unique ID of each Instagram post - Post Description: Complete description of each post in the language in which it was originally published - Date: Date of publication in MM/DD/YYYY format - Language code: Language code (for example: “en”) that represents the language of the post as detected using the Google Translate API - Full Language: Full form of the language (for example: “English”) that represents the language of the post as detected using the Google Translate API - Sentiment: Results of sentiment analysis (using the preprocessed version of each post) where each post was classified as positive, negative, or neutral

    Open Research Questions

    This dataset is expected to be helpful for the investigation of the following research questions and even beyond:

    • How does sentiment toward COVID-19 vary across different languages?
    • How has public sentiment toward COVID-19 evolved from 2020 to the present?
    • How do cultural differences affect social media discourse about COVID-19 across various languages?
    • How has COVID-19 impacted mental health, as reflected in social media posts across different languages?
    • How effective were public health campaigns in shifting public sentiment in different languages?
    • What patterns of vaccine hesitancy or support are present in different languages?
    • How did geopolitical events influence public sentiment about COVID-19 in multilingual social media discourse?
    • What role does social media discourse play in shaping public behavior toward COVID-19 in different linguistic communities?
    • How does the sentiment of minority or underrepresented languages compare to that of major world languages regarding COVID-19?
    • What insights can be gained by comparing the sentiment of COVID-19 posts in widely spoken languages (e.g., English, Spanish) to those in less common languages?

    All the Instagram posts that were collected during this data mining process to develop this dataset were publicly available on Instagram and did not require a user to log in to Instagram to view the same (at the time of writing this paper).

  18. Dataset: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A...

    • zenodo.org
    • data.niaid.nih.gov
    application/gzip, pdf +2
    Updated Jul 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ramya Tekumalla; Ramya Tekumalla; Zia Baig; Michelle Pan; Luis Alberto Robles Hernandez; Luis Alberto Robles Hernandez; Michael Wang; Juan M. Banda; Juan M. Banda; Zia Baig; Michelle Pan; Michael Wang (2024). Dataset: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter [Dataset]. http://doi.org/10.5281/zenodo.6523152
    Explore at:
    pdf, text/x-python, tsv, application/gzipAvailable download formats
    Dataset updated
    Jul 16, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Ramya Tekumalla; Ramya Tekumalla; Zia Baig; Michelle Pan; Luis Alberto Robles Hernandez; Luis Alberto Robles Hernandez; Michael Wang; Juan M. Banda; Juan M. Banda; Zia Baig; Michelle Pan; Michael Wang
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This is the dataset, trained model, and software companion for the paper titled: Characterizing Anti-Asian Rhetoric During The COVID-19 Pandemic: A Sentiment Analysis Case Study on Twitter accepted for the Workshop on Data for the Wellbeing of Most Vulnerable of the ICWSM 2022 conference.

    The COVID-19 pandemic has shown a measurable increase in the usage of sinophobic comments or terms on online social media platforms. In the United States, Asian Americans have been primarily targeted by violence and hate speech stemming from negative sentiments about the origins of the novel SARS-CoV-2 virus. While most published research focuses on extracting these sentiments from social media data, it does not connect the specific news events during the pandemic with changes in negative sentiment on social media platforms. In this work we combine and enhance publicly available resources with our own manually annotated set of tweets to create machine learning classification models to characterize the sinophobic behavior. We then applied our classifier to a pre-filtered longitudinal dataset spanning two years of pandemic related tweets and overlay our findings with relevant news events.

  19. CT-FAN: A Multilingual dataset for Fake News Detection

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Oct 23, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel (2022). CT-FAN: A Multilingual dataset for Fake News Detection [Dataset]. http://doi.org/10.5281/zenodo.6555293
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 23, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel; Gautam Kishore Shahi; Julia Maria Struß; Thomas Mandl; Juliane Köhler; Michael Wiegand; Melanie Siegel
    Description

    By downloading the data, you agree with the terms & conditions mentioned below:

    Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes.

    Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

    We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

    Citation

    Please cite our work as

    @InProceedings{clef-checkthat:2022:task3,
    author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
    title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
    year = {2022},
    booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
    series = {CLEF~'2022},
    address = {Bologna, Italy},}
    
    @article{shahi2021overview,
     title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
     author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
     journal={Working Notes of CLEF},
     year={2021}
    }

    Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

    Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

    • False - The main claim made in an article is untrue.

    • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

    • True - This rating indicates that the primary elements of the main claim are demonstrably true.

    • Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

    Cross-Lingual Task (German)

    Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

    Input Data

    The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

    • ID- Unique identifier of the news article
    • Title- Title of the news article
    • text- Text mentioned inside the news article
    • our rating - class of the news article as false, partially false, true, other

    Output data format

    • public_id- Unique identifier of the news article
    • predicted_rating- predicted class

    Sample File

    public_id, predicted_rating
    1, false
    2, true

    IMPORTANT!

    1. We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

    Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

    Related Work

    • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
    • G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
    • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media, 22, 100104. doi: 10.1016/j.osnem.2020.100104
    • Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
    • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.
  20. COVID-19 Taiwan data, including individual course of disease

    • figshare.com
    xlsx
    Updated Jun 20, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yu-Heng Wu; Torbjörn Nordling (2024). COVID-19 Taiwan data, including individual course of disease [Dataset]. http://doi.org/10.6084/m9.figshare.24623964.v2
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 20, 2024
    Dataset provided by
    Figsharehttp://figshare.com/
    Authors
    Yu-Heng Wu; Torbjörn Nordling
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Taiwan
    Description

    This dataset encompasses information on 579 confirmed COVID-19 cases in Taiwan, spanning from January 21 to November 9, 2020. The dataset includes various features such as travel history, age, gender, onset of symptoms, confirmed date, symptoms, critically ill date, recovered date, death date, and details on contact types between cases.In addition to individual case data, supplementary daily summary information is provided, sourced from the Taiwan CDC and covering the period from January 21, 2020, to May 23, 2022. This supplementary dataset furnishes population-level insights into the progression of the COVID-19 pandemic in Taiwan.Data Fields:Travel HistoryAgeGenderOnset of SymptomsConfirmed DateSymptomsCritically Ill DateRecovered DateDeath DateContact Types Between CasesTemporal Coverage:Individual Case Data: January 21, 2020, to November 9, 2020Daily Summary Data: January 21, 2020, to May 23, 2022Source:Taiwan Centers for Disease Control press release (CDC press release)United Daily News (COVID-19 Visualization)Taiwan CDC Open Data Portal, Regents of the National Center for High-performance Computing (COVID-19 Dashboard)Taiwan Centers for Disease Control open data portal (CDC open data portal)Taiwan Centers for Disease Control press conference (CDC press conference)

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Chatsiou, Kakia (2020). COVID-19 Press Briefings Corpus [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3872416

COVID-19 Press Briefings Corpus

Explore at:
Dataset updated
Jun 2, 2020
Dataset provided by
University of Essex
Authors
Chatsiou, Kakia
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

The Coronavirus (COVID-19) Press Briefings Corpus is a work in progress to collect and present in a machine readable text dataset of the daily briefings from around the world by government authorities. During the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic.

Usually daily briefings did not occur on a Sunday.

At the moment the dataset includes:

UK/England: Daily Press Briefings by UK Government between 12 March 2020 - 01 June 2020 (70 briefings in total)

Scotland: Daily Press Briefings by Scottish Government between 3 March 2020 - 01 June 2020 (76 briefings in total)

Wales: Daily Press Briefings by Welsh Government between 23 March 2020 - 01 June 2020 (56 briefings in total)

Northern Ireland: Daily Press Briefings by N. Ireland Assembly between 23 March 2020 - 01 June 2020 (56 briefings in total)

World Health Organisation: Press Briefings occuring usually every 2 days between 22 January 2020 - 01 June 2020 (63 briefings in total)

More countries will be added in due course, and we will be keeping this updated to cover the latest daily briefings available.

The corpus is compiled to allow for further automated political discourse analysis (classification).

Search
Clear search
Close search
Google apps
Main menu