Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news articles from Swedish news sites during the covid-19 corona pandemic 2020–2021. The purpose was to develop and test new methods for collection and analyses of large news corpora by computational means. In total, there are 677,151 articles collected from 19 news sites during 2020-01-01 to 2021-04-26. The articles were collected by scraping all links on the homepages and main sections of each site every two hours, day and night.
The dataset also includes about 45 million timestamps at which the articles were present on the front pages (homepages and main sections of each news site, such as domestic news, sports, editorials, etc.). This allows for detailed analysis of what articles any reader likely was exposed to when visiting a news site. The time resolution is (as stated previously) two hours, meaning that you can detect changes in which articles were on the front pages every two hours.
The 19 news sites are aftonbladet.se, arbetet.se, da.se, di.se, dn.se, etc.se, expressen.se, feministisktperspektiv.se, friatider.se, gp.se, nyatider.se, nyheteridag.se, samnytt.se, samtiden.nu, svd.se, sverigesradio.se, svt.se, sydsvenskan.se and vlt.se.
Due to copyright, the full text is not available but instead transformed into a document-term matrix (in long format) which contains the frequency of all words for each article (in total, 80 million words). Each article also includes extensive metadata that was extracted from the articles themselves (URL, document title, article heading, author, publish date, edit date, language, section, tags, category) and metadata that was inferred by simple heuristic algorithms (page type, article genre, paywall).
The dataset consists of the following: article_metadata.csv (53 MB): The file contains information about each news article, one article per row. In total, there are 677,151 observations and 17 variables.
article_text.csv (236 MB): The file contains the id of each news article and how many times (count) a specific word occurs in the news article. The file contains 80,090,784 observations and 3 variables in long format.
frontpage_timestamps.csv (175 MB): The file contains when each news article was found on the front page (homepage and main sections) of the news sites. The file contains 45,337,740 observations and 4 variables in long format.
More information about the content in the files is found in the README-file. In it you will also find the R-script for using the data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
sv-covid-19 is a collection of Swedish news texts, scientific and popular science articles and articles from certain blogs and social media wuch as Flashback and Twitter, which started to be published at the beginning of the coronavirus pandemic (early 2020). The latest verision of the corpus consists of approximately eight million words and 9000 articles. The corpus contains various text types and texts with different stylistic levels. The texts have been marked up with word class tags, morphological analysis and lemma, as well as some structural and functional information, such as author names.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset consists of historical data of pre-pandemic period and doesn’t represent the current reality which may have changed due to the spikes in demand. This dataset has been generated in collaboration of efforts within CoronaWhy community.
Last updated: April 26th 2020 Updates: April 14th 2020 - Added missing population data April 15th 2020 - Added Brazil statewise ICU hospital beds dataset April 21th 2020 - Added Italy, Spain statewise ICU hospital beds dataset, India statewise TOTAL hospital beds dataset April 26th 2020 - Added Sweden ICU(2019) and TOTAL(2018) beds datasets
I am trying to produce a dataset that will provide a foundation for policymakers to understand the realistic capacity of healthcare providers being able to deal with the spikes in demand for intensive care. As a way to help, I’ve prepared a dataset of beds across countries and states. Work in progress dataset that should and will be updated as more data becomes available and public on weekly basis.
This dataset is intended to be used as a baseline for understanding the typical bed capacity and coverage globally. This information is critical for understanding the impact of a high utilization event, like COVID-19.
Datasets are scattered across the web and are very hard to normalize, I did my best but help would be much appreciated.
arcgis (USA) - https://services1.arcgis.com/Hp6G80Pky0om7QvQ/arcgis/rest/services/Hospitals_1/FeatureServer/0 KHN (USA) - https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/ datahub.io (World) - https://datahub.io/world-bank/sh.med.beds.zs eurostat - https://data.europa.eu/euodp/en/data/dataset/vswUL3c6yKoyahrvIRyew OECD - https://data.oecd.org/healtheqt/hospital-beds.htm WDI (World) - https://data.worldbank.org/indicator/SH.MED.BEDS.ZS NHP(India) - http://www.cbhidghs.nic.in/showfile.php?lid=1147 data.gov.sg (Singapore) - https://data.gov.sg/dataset/health-facilities?view_id=91b4feed-dcb9-4720-8cb0-ac2f04b7efd0&resource_id=dee5ccce-4dfb-467f-bcb4-dc025b56b977 dati.salute.gov.it (Italy)- http://www.dati.salute.gov.it/dati/dettaglioDataset.jsp?menu=dati&idPag=96 portal.icuregswe.org (Sweden) - https://portal.icuregswe.org/seiva/en/Rapport publications: Intensive Care Medicine Journal (Europe) - https://link.springer.com/article/10.1007/s00134-012-2627-8 Critical Care Medicine Journal (Asia) - https://www.researchgate.net/figure/Number-of-critical-care-beds-per-100-000-population_fig1_338520008 Medicina Intensiva (Spain) - https://www.medintensiva.org/en-pdf-S2173572713000878 news: https://lanuovaferrara.gelocal.it/italia-mondo/cronaca/2020/03/19/news/dietro-la-corsa-a-nuovi-posti-in-terapia-intensiva-gli-errori-del-passato-1.38611596 kaggle: germany - https://www.kaggle.com/manuelblechschmidt/icu-beds-in-germany brazil (IBGE) - https://www.kaggle.com/thiagobodruk/brazilianstates Manual population data search from wiki
country,state,county,lat,lng,type,measure,beds,population,year,source,source_url - country - country of origin, if present - state - more granular location, if present - lat - latitude - lng - longtitude - type - [TOTAL, ICU, ACUTE(some data could include ICU beds too), PSYCHIATRIC, OTHER(merged ‘SPECIAL’, ‘CHRONIC DISEASE’, ‘CHILDREN’, ‘LONG TERM CARE’, ‘REHABILITATION’, ‘WOMEN’, ‘MILITARY’] - measure - type of measure (per 1000 inhabitants) - beds - number of beds per 1000 - population - population of location based on multiple sources and wikipedia - year - source year for beds and population data - source - source of data - source_url - URL of the original source
for each of datasource: hospital_beds_per_source.csv
US only: US arcgis + khn (state/county granularity): hospital_beds_USA.csv
Global (state(region)/county granularity): hospital_beds_global_regional.csv
Global (country granularity): hospital_beds_global_v1.csv
Igor Kiulian - extracting/normalizing/formatting/merging data Artur Kiulian - helped with Kaggle setup Augaly S. Kiedi - helped with country population data Kristoffer Jan Zieba - found Swedish data sources
Find and megre more detailed (state/county wise) or newer datasource
Facebook
TwitterTikTok saw an unprecedented increase in popularity during the coronavirus (COVID-19) outbreak in the Nordic region. The largest increase, of up to *** percent was observed among Danish youth. While *** percent of them used TikTik before the COVID-19 outbreak, the corresponding share during the pandemic was ** percent. Overall, TikTok became more popular in Denmark, Sweden, Norway and Finland during the pandemic, regardless of the users’ age.
The rise of TikTok
TikTok is a Chinese video-sharing social network, initially released in 2018, as Musical.ly. Over the period from 2017 to 2020, the app generated increasingly larger engagement rates, reaching nearly ** million daily active users via iOS as of May 2020 on a global scale. Among the most followed accounts in Norway were the pop duo Marcus & Martinus.
COVID-19 on social media
As of March 2020, almost all the most popular hashtags on social media in Sweden were related to the coronavirus. In fact, a recent survey showed that especially younger individuals worldwide seemed to rely on social media for updates on the coronavirus that same month . In contrast, the figures were much lower for people aged 55 or older. Nevertheless, social media use generally increased during the pandemic and facilitated the spread of news regarding the coronavirus. In some cases, even false information.
Facebook
TwitterNews audiences in Norway were the most likely to pay for online news according to a global study on paid digital news content consumption, with 42 percent having paid for news online in the last year. Ranked second was Sweden, followed by Switzerland, Australia, and Austria. With the changing media landscape leading to more and more consumers turning to digital sources to access the news, publishers are adding paywalls on their sites. However, not all consumers are equally inclined to pay for digital news content. Italy and UK news audiences for example were substantially less likely to pay for online news than U.S. consumers. Why pay for online news? The reasons for paying for news are diverse and dependent on various factors. The digitalization of news allows stories to be shared and disseminated on a global scale, but not all sources are reliable or credible. For consumers, it is often difficult to identify trustworthy news sources, and as such which sources they would happily pay for. Consumers may also be reluctant to pay for news because of the sheer amount of free content online. Whilst the availability of free content made news more accessible, at the same time this impacts journalists and publishers. In Finland for example, this has led to a correlated decrease in sales of printed content. As traditional print publications move online, there is also a growing reliance on advertising to generate revenue. Users are encouraged to pay for access to restricted material as publishers limit content to members only. Consumer’s willingness to pay was seen to be dependent on content, with Americans happier to pay for news than features or e-magazines. Impact of the coronavirus With the coronavirus pandemic forcing millions across the globe to stay at home, having access to digital news has never been more crucial, accordingly an increase of subscribers paying for premium news content could be expected. However the health crisis has also led to economic hardship for many, which may instead lead to people cutting out luxuries such as paid news subscriptions. In the UK for example, 2020 saw a decrease in people paying for news content compared to the previous year. With the pandemic dominating news reports, 2020 also saw audiences experience news fatigue, and after a year of news coverage saturated with coronavirus updates, consumers may feel the need to switch off entirely.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains news articles from Swedish news sites during the covid-19 corona pandemic 2020–2021. The purpose was to develop and test new methods for collection and analyses of large news corpora by computational means. In total, there are 677,151 articles collected from 19 news sites during 2020-01-01 to 2021-04-26. The articles were collected by scraping all links on the homepages and main sections of each site every two hours, day and night.
The dataset also includes about 45 million timestamps at which the articles were present on the front pages (homepages and main sections of each news site, such as domestic news, sports, editorials, etc.). This allows for detailed analysis of what articles any reader likely was exposed to when visiting a news site. The time resolution is (as stated previously) two hours, meaning that you can detect changes in which articles were on the front pages every two hours.
The 19 news sites are aftonbladet.se, arbetet.se, da.se, di.se, dn.se, etc.se, expressen.se, feministisktperspektiv.se, friatider.se, gp.se, nyatider.se, nyheteridag.se, samnytt.se, samtiden.nu, svd.se, sverigesradio.se, svt.se, sydsvenskan.se and vlt.se.
Due to copyright, the full text is not available but instead transformed into a document-term matrix (in long format) which contains the frequency of all words for each article (in total, 80 million words). Each article also includes extensive metadata that was extracted from the articles themselves (URL, document title, article heading, author, publish date, edit date, language, section, tags, category) and metadata that was inferred by simple heuristic algorithms (page type, article genre, paywall).
The dataset consists of the following: article_metadata.csv (53 MB): The file contains information about each news article, one article per row. In total, there are 677,151 observations and 17 variables.
article_text.csv (236 MB): The file contains the id of each news article and how many times (count) a specific word occurs in the news article. The file contains 80,090,784 observations and 3 variables in long format.
frontpage_timestamps.csv (175 MB): The file contains when each news article was found on the front page (homepage and main sections) of the news sites. The file contains 45,337,740 observations and 4 variables in long format.
More information about the content in the files is found in the README-file. In it you will also find the R-script for using the data.