Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Coronavirus (COVID-19) Press Briefings Corpus is a work in progress to collect and present in a machine readable text dataset of the daily briefings from around the world by government authorities. During the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic.
Usually daily briefings did not occur on a Sunday.
At the moment the dataset includes:
UK/England: Daily Press Briefings by UK Government between 12 March 2020 - 01 June 2020 (70 briefings in total)
Scotland: Daily Press Briefings by Scottish Government between 3 March 2020 - 01 June 2020 (76 briefings in total)
Wales: Daily Press Briefings by Welsh Government between 23 March 2020 - 01 June 2020 (56 briefings in total)
Northern Ireland: Daily Press Briefings by N. Ireland Assembly between 23 March 2020 - 01 June 2020 (56 briefings in total)
World Health Organisation: Press Briefings occuring usually every 2 days between 22 January 2020 - 01 June 2020 (63 briefings in total)
More countries will be added in due course, and we will be keeping this updated to cover the latest daily briefings available.
The corpus is compiled to allow for further automated political discourse analysis (classification).
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken out by age group. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update. Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes. Starting in July 2020, this dataset will be updated every weekday. Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020. A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports. Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.
The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.
The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .
The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .
The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.
COVID-19 tests, cases, and associated deaths that have been reported among Connecticut residents. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Hospitalization data were collected by the Connecticut Hospital Association and reflect the number of patients currently hospitalized with laboratory-confirmed COVID-19. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the daily COVID-19 update.
Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics
Data are reported daily, with timestamps indicated in the daily briefings posted at: portal.ct.gov/coronavirus. Data are subject to future revision as reporting changes.
Starting in July 2020, this dataset will be updated every weekday.
Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.
A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.
A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.
Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Starting April 4, 2022, negative rapid antigen and rapid PCR test results for SARS-CoV-2 are no longer required to be reported to the Connecticut Department of Public Health as of April 4. Negative test results from laboratory based molecular (PCR/NAAT) results are still required to be reported as are all positive test results from both molecular (PCR/NAAT) and antigen tests.
On 5/16/2022, 8,622 historical cases were included in the data. The date range for these cases were from August 2021 – April 2022.”
Twitter Data on Covid-19 for Aspect-Based Sentiment Analysis in Brazilian Portuguese
Opinion mining or sentiment analysis is a relevant task above all when the world lives a pandemic.Extracting and classifying fine-grained opinion and polarity may provide insight to public and private agencies making better decision against the coronavirus pandemic. Therefore, in this paper, we provide a new dataset of twitter data related to novel coronavirus (2019-nCoV), which it composed by 600 twitters manually annotated with aspects-level target and binary polarity (positive negative).
The OPCovid-Br consists of a dataset of twitter data for fine-grained opinion mining or sentiment analysis applications in Portuguese Language. The OPCovid-BR is composed by 600 annotated twitters with opinion aspects, as well as the binary document polarity (positive or negative).
The authors are grateful to CAPES and CNPq for supporting this work.
Vargas, F.A.,Santos, R.S.S. and Rocha, P.R. (2020). Identifying fine-grained opinion and classifying polarity of twitter data on coronavirus pandemic. Proceedings of the 9th Brazilian Conference on Intelligent Systems (BRACIS 2020), Rio Grande, RS, Brazil.
@inproceedings{VargasEtAll2020, author = {Francielle Alves Vargas and Rodolfo Sanches Saraiva Dos Santos and Pedro Regattieri Rocha}, title = {Identifying fine-grained opinion and classifying polarity of twitter data on coronavirus pandemic}, booktitle = {Proceedings of the 9th Brazilian Conference on Intelligent Systems (BRACIS 2020)}, pages = {01-10}, year = {2020}, address = {Rio Grande, RS, Brazil}, crossref = {http://bracis2020.c3.furg.br/acceptedPapers.html}, }
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The peer-reviewed publication for this dataset has been presented in the 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), and can be accessed here: https://arxiv.org/abs/2205.02596. Please cite this when using the dataset.
This dataset contains a heterogeneous set of True and False COVID claims and online sources of information for each claim.
The claims have been obtained from online fact-checking sources, existing datasets and research challenges. It combines different data sources with different foci, thus enabling a comprehensive approach that combines different media (Twitter, Facebook, general websites, academia), information domains (health, scholar, media), information types (news, claims) and applications (information retrieval, veracity evaluation).
The processing of the claims included an extensive de-duplication process eliminating repeated or very similar claims. The dataset is presented in a LARGE and a SMALL version, accounting for different degrees of similarity between the remaining claims (excluding respectively claims with a 90% and 99% probability of being similar, as obtained through the MonoT5 model). The similarity of claims was analysed using BM25 (Robertson et al., 1995; Crestani et al., 1998; Robertson and Zaragoza, 2009) with MonoT5 re-ranking (Nogueira et al., 2020), and BERTScore (Zhang et al., 2019).
The processing of the content also involved removing claims making only a direct reference to existing content in other media (audio, video, photos); automatically obtained content not representing claims; and entries with claims or fact-checking sources in languages other than English.
The claims were analysed to identify types of claims that may be of particular interest, either for inclusion or exclusion depending on the type of analysis. The following types were identified: (1) Multimodal; (2) Social media references; (3) Claims including questions; (4) Claims including numerical content; (5) Named entities, including: PERSON − People, including fictional; ORGANIZATION − Companies, agencies, institutions, etc.; GPE − Countries, cities, states; FACILITY − Buildings, highways, etc. These entities have been detected using a RoBERTa base English model (Liu et al., 2019) trained on the OntoNotes Release 5.0 dataset (Weischedel et al., 2013) using Spacy.
The original labels for the claims have been reviewed and homogenised from the different criteria used by each original fact-checker into the final True and False labels.
The data sources used are:
The CoronaVirusFacts/DatosCoronaVirus Alliance Database. https://www.poynter.org/ifcn-covid-19-misinformation/
CoAID dataset (Cui and Lee, 2020) https://github.com/cuilimeng/CoAID
MM-COVID (Li et al., 2020) https://github.com/bigheiniu/MM-COVID
CovidLies (Hossain et al., 2020) https://github.com/ucinlp/covid19-data
TREC Health Misinformation track https://trec-health-misinfo.github.io/
TREC COVID challenge (Voorhees et al., 2021; Roberts et al., 2020) https://ir.nist.gov/covidSubmit/data.html
The LARGE dataset contains 5,143 claims (1,810 False and 3,333 True), and the SMALL version 1,709 claims (477 False and 1,232 True).
The entries in the dataset contain the following information:
Claim. Text of the claim.
Claim label. The labels are: False, and True.
Claim source. The sources include mostly fact-checking websites, health information websites, health clinics, public institutions sites, and peer-reviewed scientific journals.
Original information source. Information about which general information source was used to obtain the claim.
Claim type. The different types, previously explained, are: Multimodal, Social Media, Questions, Numerical, and Named Entities.
Funding. This work was supported by the UK Engineering and Physical Sciences Research Council (grant no. EP/V048597/1, EP/T017112/1). ML and YH are supported by Turing AI Fellowships funded by the UK Research and Innovation (grant no. EP/V030302/1, EP/V020579/1).
References
Arana-Catania M., Kochkina E., Zubiaga A., Liakata M., Procter R., He Y.. Natural Language Inference with Self-Attention for Veracity Assessment of Pandemic Claims. NAACL 2022 https://arxiv.org/abs/2205.02596
Stephen E Robertson, Steve Walker, Susan Jones, Micheline M Hancock-Beaulieu, Mike Gatford, et al. 1995. Okapi at trec-3. Nist Special Publication Sp,109:109.
Fabio Crestani, Mounia Lalmas, Cornelis J Van Rijsbergen, and Iain Campbell. 1998. “is this document relevant?. . . probably” a survey of probabilistic models in information retrieval. ACM Computing Surveys (CSUR), 30(4):528–552.
Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond. Now Publishers Inc.
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pre-trained sequence-to-sequence model. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pages 708–718.
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA, 23.
Limeng Cui and Dongwon Lee. 2020. Coaid: Covid-19 healthcare misinformation dataset. arXiv preprint arXiv:2006.00885.
Yichuan Li, Bohan Jiang, Kai Shu, and Huan Liu. 2020. Mm-covid: A multilingual and multimodal data repository for combating covid-19 disinformation.
Tamanna Hossain, Robert L. Logan IV, Arjuna Ugarte, Yoshitomo Matsubara, Sean Young, and Sameer Singh. 2020. COVIDLies: Detecting COVID-19 misinformation on social media. In Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020, Online. Association for Computational Linguistics.
Ellen Voorhees, Tasmeer Alam, Steven Bedrick, Dina Demner-Fushman, William R Hersh, Kyle Lo, Kirk Roberts, Ian Soboroff, and Lucy Lu Wang. 2021. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 61411 tweet IDs of tweets, written in Bulgarian, with annotations. The dataset can be used for general use or for building lies and disinformation detection applications.
Note: this dataset is not fact-checked, the social media messages have been retrieved via keywords. For fact-checked datasets, see our other datasets.
The tweets (written between 1 Jan 2020 and 28 June 2022) have been collected via Twitter API under academic access in June 2022 with the following keywords:
(Covid OR коронавирус OR Covid19 OR Covid-19 OR Covid_19) - without replies and without retweets
(Корона OR корона OR Corona OR пандемия OR пандемията OR Spikevax OR SARS-CoV-2 OR бустерна доза) - with replies, but without retweets
Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper (please cite it when using this dataset):
Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology
Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.
Late in December 2019, the World Health Organisation (WHO) China Country Office obtained information about severe pneumonia of an unknown cause, detected in the city of Wuhan in Hubei province, China. This later turned out to be the novel coronavirus disease (COVID-19), an infectious disease caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) of the coronavirus family. The disease causes respiratory illness characterized by primary symptoms like cough, fever, and in more acute cases, difficulty in breathing. WHO later declared COVID-19 as a Pandemic because of its fast rate of spread across the Globe.
The COVID-19 datasets organized by continent contain daily level information about the COVID-19 cases in the different continents of the world. It is a time-series data and the number of cases on any given day is cumulative. The original datasets can be found on this John Hopkins University Github repository. I will be updating the COVID-19 datasets on a daily basis, with every update from John Hopkins University. I have also included the World COVID-19 tests data scraped from Worldometer and 2020 world population also from [worldometer]((https://www.worldometers.info/world-population/population-by-country/).
COVID-19 cases
covid19_world.csv
. It contains the cumulative number of COVID-19 cases from around the world since January 22, 2020, as compiled by John Hopkins University.
covid19_asia.csv
, covid19_africa.csv
, covid19_europe.csv
, covid19_northamerica.csv
, covid19.southamerica.csv
, covid19_oceania.csv
, and covid19_others.csv
. These contain the cumulative number of COVID-19 cases organized by the continent.
Field description - ObservationDate: Date of observation in YY/MM/DD - Country_Region: name of Country or Region - Province_State: name of Province or State - Confirmed: the number of COVID-19 confirmed cases - Deaths: the number of deaths from COVID-19 - Recovered: the number of recovered cases - Active: the number of people still infected with COVID-19 Note: Active = Confirmed - (Deaths + Recovered)
COVID-19 tests `covid19_tests.csv. It contains the cumulative number of COVID tests data from worldometer conducted since the onset of the pandemic. Data available from June 01, 2020.
Field description Date: date in YY/MM/DD Country, Other: Country, Region, or dependency TotalTests: cumulative number of tests up till that date Population: population of Country, Region, or dependency Tests/1M pop: tests per 1 million of the population 1 Testevery X ppl: 1 test for every X number of people
2020 world population
world_population(2020).csv
. It contains the 2020 world population as reported by woldometer.
Field description Country (or dependency): Country or dependency Population (2020): population in 2020 Yearly Change: yearly change in population as a percentage Net Change: the net change in population Density(P/km2): population density Land Area(km2): land area Migrants(net): net number of migrants Fert. Rate: Fertility Rate Med. Age: median age Urban pop: urban population World Share: share of the world population as a percentage
Possible Insights 1. The current number of COVID-19 cases in Africa 2. The current number of COVID-19 cases by country 3. The number of COVID-19 cases in Africa / African country(s) by May 30, 2020 (Any future date)
The briefing materials prepared for the Minister of Northern Affairs for the Special Committee on the COVID-19 pandemic and Committees of the Whole related to the pandemic included Question Period notes that were published December 13, 2019, and May 26, 2020. These materials were subsequently updated for appearances by the Minister at Committees of the Whole and meetings of the Special Committee on the COVID-19 Pandemic that were held between May 14 and June 18, 2020.
Briefing materials on the Indigenous Services or Crown-Indigenous Relations portfolios are included when the Minister of Northern Affairs intervened on behalf of the Minister of Indigenous Services or Minister of Crown-Indigenous Relations.
Appearance dates: April 29, May 6, 14, 21 (no updates, COVI Committee 11), June 9, 18 (no updates, COVI Committee 25).
These are the key findings from the second of three rounds of the DCMS Coronavirus Business Survey. These surveys are being conducted to help DCMS understand how our sectors are responding to the ongoing Coronavirus pandemic. The data collected is not longitudinal as responses are voluntary, meaning that businesses have no obligation to complete multiple rounds of the survey and businesses that did not submit a response to one round are not excluded from response collection in following rounds.
The indicators and analysis presented in this bulletin are based on responses from the voluntary business survey, which captures organisations responses on how their turnover, costs, workforce and resilience have been affected by the coronavirus (COVID-19) outbreak. The results presented in this release are based on 3,870 completed responses collected between 17 August and 8 September 2020.
This is the first time we have published these results as Official Statistics. An earlier round of the business survey can be found on gov.uk.
We have designated these as Experimental Statistics, which are newly developed or innovative statistics. These are published so that users and stakeholders can be involved in the assessment of their suitability and quality at an early stage.
We expect to publish a third round of the survey before the end of the financial year. To inform that release, we would welcome any user feedback on the presentation of these results to evidence@dcms.gov.uk by the end of November 2020.
The survey was run simultaneously through DCMS stakeholder engagement channels and via a YouGov panel.
The two sets of results have been merged to create one final dataset.
Invitations to submit a response to the survey were circulated to businesses in relevant sectors through DCMS stakeholder engagement channels, prompting 2,579 responses.
YouGov’s business omnibus panel elicited a further 1,288 responses. YouGov’s respondents are part of their panel of over one million adults in the UK. A series of pre-screened information on these panellists allows YouGov to target senior decision-makers of organisations in DCMS sectors.
One purpose of the survey is to highlight the characteristics of organisations in DCMS sectors whose viability is under threat in order to shape further government support. The timeliness of these results is essential, and there are some limitations, arising from the need for this timely information:
This release is published in accordance with the Code of Practice for Statistics, as produced by the UK Statistics Authority. The Authority has the overall objective of promoting and safeguarding the production and publication of official statistics that serve the public good. It monitors and reports on all official statistics, and promotes good practice in this area.
The responsible statistician for this release is Alex Bjorkegren. For further details about the estimates, or to be added to a distribution list for future updates, please email us at evidence@dcms.gov.uk.
The document above contains a list of ministers and officials who have received privileged early access to this release. In line with best practice, the list has been kept to a minimum and those given access for briefing purposes had a maximum of 24 hours.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
To inform citizens and make the collected data available, the Department of Civil Protection has developed an interactive geographic dashboard accessible at the addresses http://arcg.is/C1unv (desktop version) and http://arcg.is/081a51 (mobile version) and makes available, with CC-BY-4.0 license, the following information updated daily at 18:30 (after the Head of Department press conference). For more detail, see https://github.com/pcm-dpc/COVID-19.
COVID-19 data Italy
National trend Json data Provinces data Regions data Summary cards Areas Repository structure COVID-19 / │ ├── national-trend / │ ├── dpc-covid19-eng-national-trend-yyyymmdd.csv ├── areas / │ ├── geojson │ │ ├── dpc-covid19-ita-aree.geojson │ ├── shp │ │ ├── dpc-covid19-eng-areas.shp ├── data-provinces / │ ├── dpc-covid19-ita-province-yyyymmdd.csv ├── data-json / │ ├── dpc-covid19-eng - *. Json ├── data-regions / │ ├── dpc-covid19-eng-regions-yyyymmdd.csv ├── summary-sheets / │ ├── provinces │ │ ├── dpc-covid19-ita-scheda-province-yyyymmdd.pdf │ ├── regions │ │ ├── dpc-covid19-eng-card-regions-yyyymmdd.pdf
Data by Region Directory: data-regions Daily file structure: dpc-covid19-ita-regions-yyyymmdd.csv (dpc-covid19-ita-regions-20200224.csv) Overall file: dpc-covid19-eng-regions.csv An overall JSON file of all dates is made available in the "data-json" folder: dpc-covid19-eng-regions.json
Data by Province Directory: data-provinces Daily file structure: dpc-covid19-ita-province-yyyymmdd.csv (dpc-covid19-ita-province-20200224.csv) Overall file: dpc-covid19-ita-province.csv
Banner photo by CDC on Unsplash
Data from https://github.com/pcm-dpc/COVID-19 released under a CC 4.0 license. See https://github.com/pcm-dpc/COVID-19 for more detail.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
During the first half of 2020, the COVID-19 pandemic changed the social gathering lifestyle to online business and social interaction. The worldwide imposed travel bans and national lockdown prevented social gatherings, making learning institutions and businesses to adopt an online platform for learning and business transactions. This development led to the incorporation of video conferencing into daily activities. This data article presents broadband data usage measurement collected using Glasswire software on various conference calls made between July and August. The services considered in this work are Google Meet, Zoom, Mixir, and Hangout. The data were recorded in Microsoft Excel 2016, running on a personal computer. The data was cleaned and processed using google colaboratory, which runs Python scripts on the browser. Exploratory data analysis is conducted on the data set using linear regression to model a predictive model to assess the best performance model that offers the best quality of service for online video and voice conferencing. The data is necessary to learning institutions using online programs and to learners accessing online programs in a smart city and developing countries. The data is presented in tables and graphs
This data package includes the underlying data to replicate the charts presented in Lessons from China's fiscal policy during the COVID-19 pandemic, PIIE Working Paper 24-7.
If you use the data, please cite as: Huang, Tianlei. 2024. Lessons from China's fiscal policy during the COVID-19 pandemic. PIIE Working Paper 24-7. Washington: Peterson Institute for International Economics.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Please cite the following paper when using this dataset:Vanessa Su and Nirmalya Thakur, “COVID-19 on YouTube: A Data-Driven Analysis of Sentiment, Toxicity, and Content Recommendations”, Proceedings of the IEEE 15th Annual Computing and Communication Workshop and Conference 2025, Las Vegas, USA, Jan 06-08, 2025 (Paper accepted for publication, Preprint: https://arxiv.org/abs/2412.17180).Abstract:This dataset comprises metadata and analytical attributes for 9,325 publicly available YouTube videos related to COVID-19, published between January 1, 2023, and October 25, 2024. The dataset was created using the YouTube API and refined through rigorous data cleaning and preprocessing. Key Attributes of the Dataset:Video URL: The full URL linking to each video.Video ID: A unique identifier for each video.Title: The title of the video.Description: A detailed textual description provided by the video uploader.Publish Date: The date the video was published, ranging from January 1, 2023, to October 25, 2024.View Count: The total number of views per video, ranging from 0 to 30,107,100 (mean: ~59,803).Like Count: The number of likes per video, ranging from 0 to 607,138 (mean: ~1,413).Comment Count: The number of comments, varying from 1 to 25,000 (mean: ~147).Duration: Video length in seconds, ranging from 0 to 42,900 seconds (median: 137 seconds).Categories: Categorization of videos into 15 unique categories, with "News & Politics" being the most common (4,035 videos).Tags: Tags associated with each video.Language: The language of the video, predominantly English ("en").
The total amount of data created, captured, copied, and consumed globally is forecast to increase rapidly, reaching 149 zettabytes in 2024. Over the next five years up to 2028, global data creation is projected to grow to more than 394 zettabytes. In 2020, the amount of data created and replicated reached a new high. The growth was higher than previously expected, caused by the increased demand due to the COVID-19 pandemic, as more people worked and learned from home and used home entertainment options more often. Storage capacity also growing Only a small percentage of this newly created data is kept though, as just two percent of the data produced and consumed in 2020 was saved and retained into 2021. In line with the strong growth of the data volume, the installed base of storage capacity is forecast to increase, growing at a compound annual growth rate of 19.2 percent over the forecast period from 2020 to 2025. In 2020, the installed base of storage capacity reached 6.7 zettabytes.
Gold Standard annotations for SMM4H-Spanish shared task and unannotated test and background files. SMM4H 2021 accepted at NAACL (scheduled in Mexico City in June) https://2021.naacl.org/. Please, cite: Miranda-Escalada, A., Farré-Maduell, E., Lima-López, S., Gascó, L., Briva-Iglesias, V., Agüero-Torales, M., & Krallinger, M. (2021, June). The profner shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora. In Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task (pp. 13-20). @inproceedings{miranda2021profner, title={The profner shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora}, author={Miranda-Escalada, Antonio and Farr{\'e}-Maduell, Eul{`a}lia and Lima-L{\'o}pez, Salvador and Gasc{\'o}, Luis and Briva-Iglesias, Vicent and Ag{"u}ero-Torales, Marvin and Krallinger, Martin}, booktitle={Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task}, pages={13--20}, year={2021} } Introduction: The entire corpus contains 10,000 annotated tweets. It has been split into training, validation and test (60-20-20). The current version contains the training and development set of the shared task with Gold Standard annotations. In addition, it contains the unannotated test and background sets will be released. Participants must submit predictions for the files under the directory "test-background-txt-files" For subtask-1 (classification), annotations are distributed in a tab-separated file (TSV). The TSV format follows the format employed in SMM4H 2019 Task 2: tweet_id class For subtask-2 (Named Entity Recognition, profession detection), annotations are distributed in 2 formats: Brat standoff and TSV. See the Brat webpage for more information about the Brat standoff format (https://brat.nlplab.org/standoff.html). The TSV format follows the format employed in SMM4H 2019 Task 2: tweet_id begin end type extraction In addition, we provide a tokenized version of the dataset, for participant's convenience. It follows the BIO format (similar to CONLL). The files were generated with the brat_to_conll.py script (included), which employs the es_core_news_sm-2.3.1 Spacy model for tokenization. Zip structure: subtask-1: files of tweet classification subtask. Content: One TSV file per corpus split (train and valid). train-valid-txt-files: folder with training and validation text files. One text file per tweet. One sub-directory per corpus split (train and valid). train-valid-txt-files-english: folder with training and validation text files Machine Translated to English. test-background-txt-files: folder with the test and background text files. You must make your predictions for these files and upload them to CodaLab. subtask-2: files of Named Entity Recognition subtask. Content: brat: folder with annotations in Brat format. One sub-directory per corpus split (train and valid) TSV: folder with annotations in TSV. One file per corpus split (train and valid) BIO: folder with corpus in BIO tagging. One file per corpus split (train and valid) train-valid-txt-files: folder with training and validation text files. One text file per tweet. One sub-directory per corpus split (train and valid) train-valid-txt-files-english: folder with training and validation text files Machine Translated to English. test-background-txt-files: folder with the test and background text files. You must make your predictions for these files and upload them to CodaLab. Annotation quality: We have performed a consistency analysis of the corpus. 10% of the documents have been annotated by an internal annotator as well as by the linguist experts following the same annotation guideliens. The preliminary Inter-Annotator Agreement (pairwise agreement) is 0.919. For further information, please visit https://temu.bsc.es/smm4h-spanish/ or email us at encargo-pln-life@bsc.es Do not share the data with other individuals/teams without permission from the task organizer. Tweets IDs are the primary source of information. Tweet texts are provided as support material. By downloading this resource, you agree to the Twitter Terms of Service, Privacy Policy, Developer Agreement, and Developer Policy. Resources: Web Annotation guidelines (in Spanish) Annotation guidelines (in English) FastText COVID-19 Twitter embeddings Occupations gazetteer Conference Proceedings Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The briefing materials below were initially prepared for the Minister of Indigenous Services for Committee of the Whole on April 20, 2020. These materials were subsequently updated for appearances by the Minister at additional Committees of the Whole and meetings of the Special Committee on the COVID-19 Pandemic that were held between April 29 and June 18, 2020. Briefing materials on the Northern portfolio are included when the Minister of Indigenous Services intervened on behalf of the Minister of Northern Affairs. Appearance dates: April 20, 28 (COVI Committee #1, no updates) and 29. May 5 (COVI Committee #3, no updates), 6, 12, 14, 20, 25 (Committee of the Whole, no updates). June 3, 11, 16 and 17.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Coronavirus (COVID-19) Press Briefings Corpus is a work in progress to collect and present in a machine readable text dataset of the daily briefings from around the world by government authorities. During the peak of the pandemic, most countries around the world informed their citizens of the status of the pandemic (usually involving an update on the number of infection cases, number of deaths) and other policy-oriented decisions about dealing with the health crisis, such as advice about what to do to reduce the spread of the epidemic.
Usually daily briefings did not occur on a Sunday.
At the moment the dataset includes:
UK/England: Daily Press Briefings by UK Government between 12 March 2020 - 01 June 2020 (70 briefings in total)
Scotland: Daily Press Briefings by Scottish Government between 3 March 2020 - 01 June 2020 (76 briefings in total)
Wales: Daily Press Briefings by Welsh Government between 23 March 2020 - 01 June 2020 (56 briefings in total)
Northern Ireland: Daily Press Briefings by N. Ireland Assembly between 23 March 2020 - 01 June 2020 (56 briefings in total)
World Health Organisation: Press Briefings occuring usually every 2 days between 22 January 2020 - 01 June 2020 (63 briefings in total)
More countries will be added in due course, and we will be keeping this updated to cover the latest daily briefings available.
The corpus is compiled to allow for further automated political discourse analysis (classification).