40 datasets found

Most used sources of coronavirus news U.S. 2020, by age group
statista.com
Updated Aug 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2023). Most used sources of coronavirus news U.S. 2020, by age group [Dataset]. https://www.statista.com/statistics/1104391/coronavirus-news-sources-by-age-us/
Explore at:
Dataset updated
Aug 29, 2023
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 13, 2020
Area covered
United States
Description
According to a survey conducted in March 2020, 58 percent of U.S. news consumers said that they were seeking out the latest information about the coronavirus via news media in general, including TV news, radio news, online news, and newspapers. In fact, 70 percent of adults aged 55 or above were getting most of their news about the virus this way, compared to just 37 percent of 18 to 24-year-olds who were more likely than their older peers to turn to websites or social media posts from government or health agencies.
n
Coronavirus (Covid-19) Data in the United States
nytimes.com
openicpsr.org
+2more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Explore at:
Dataset provided by
New York Times
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Most trusted sources of coronavirus news U.S. 2020
statista.com
Updated Jun 18, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2020). Most trusted sources of coronavirus news U.S. 2020 [Dataset]. https://www.statista.com/statistics/1104557/coronavirus-trusted-news-sources-by-us/
Explore at:
Dataset updated
Jun 18, 2020
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 13, 2020 - Mar 16, 2020
Area covered
United States
Description
As the United States battles the coronavirus, news consumers across the country have been attempting to keep themselves updated with how the pandemic is progressing, and a survey held in March 2020 revealed that the most trusted news source for details on COVID-19 was the CDC, with 85 percent of respondents saying that they trusted the centers to provide accurate information on the topic. Following closely behind was the World Health Organization and then the state government, but just 25 percent of consumers said that they trusted social media sites to publish reliable and accurate news about the coronavirus outbreak.
Z
INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET
data.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nafiz Sadman (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
Nafiz Sadman
Nishat Anjum
Kishor Datta Gupta
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Bangladesh, United States
Description
Introduction

There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

2 Data-set Introduction

2.1 Data Collection

We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

The headline must have one or more words directly or indirectly related to COVID-19.

The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

Avoid taking duplicate reports.

Maintain a time frame for the above mentioned newspapers.

To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

2.2 Data Pre-processing and Statistics

Some pre-processing steps performed on the newspaper report dataset are as follows:

Remove hyperlinks.

Remove non-English alphanumeric characters.

Remove stop words.

Lemmatize text.

While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

The primary data statistics of the two dataset are shown in Table 1 and 2.

Table 1: Covid-News-USA-NNK data statistics

No of words per headline

7 to 20

No of words per body content

150 to 2100

Table 2: Covid-News-BD-NNK data statistics No of words per headline

10 to 20

No of words per body content

100 to 1500

2.3 Dataset Repository

We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

3 Literature Review

Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

4 Our experiments and Result analysis

We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

In February, both the news paper have talked about China and source of the outbreak.

StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

Washington Post discussed global issues more than StarTribune.

StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,
Levels of confidence fact checking coronavirus news U.S. 2020
statista.com
Updated Jul 1, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2020). Levels of confidence fact checking coronavirus news U.S. 2020 [Dataset]. https://www.statista.com/statistics/1121717/fact-check-corona-news-us/
Explore at:
Dataset updated
Jul 1, 2020
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Apr 20, 2020 - Apr 26, 2020
Area covered
United States
Description
According to the most recently available data, around one third of Americans feel very confident in their ability to check the accuracy of news stories regarding coronavirus. In an online survey conducted in April 2020, 28 percent of respondents stated they would know how to confirm the accuracy of news and information regarding the COVID-19 pandemic. The majority of participants expressed a moderate level of self confidence in their capacity to fact check, with 49 percent somewhat confident.
d
Johns Hopkins COVID-19 Case Tracker
data.world
csv, zip
Updated Jun 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Jun 29, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
COVID19 - The New York Times
kaggle.com
zip
Updated May 18, 2020
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Google BigQuery (2020). COVID19 - The New York Times [Dataset]. https://www.kaggle.com/datasets/bigquery/covid19-nyt
Explore at:
zip(0 bytes)Available download formats
Dataset updated
May 18, 2020
Dataset authored and provided by
Google BigQuery
Description
Context

This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site

Sample Queries

Query 1

Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.

SELECT covid19.county, covid19.state_name, total_pop AS county_population, confirmed_cases, ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000, deaths, ROUND(deaths/total_pop *100000,2) AS deaths_per_100000 FROM bigquery-public-data.covid19_nyt.us_counties covid19 JOIN bigquery-public-data.census_bureau_acs.county_2017_5yr acs ON covid19.county_fips_code = acs.geo_id WHERE date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day) AND covid19.county_fips_code != "00000" ORDER BY confirmed_cases_per_100000 desc

Query 2

How do I calculate the number of new COVID-19 cases per day? This query determines the total number of new cases in each state for each day available in the dataset SELECT b.state_name, b.date, MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases FROM (SELECT state_name AS state, state_fips_code , confirmed_cases, DATE_ADD(date, INTERVAL 1 day) AS date_shift FROM bigquery-public-data.covid19_nyt.us_states WHERE confirmed_cases + deaths > 0) a JOIN bigquery-public-data.covid19_nyt.us_states b ON a.state_fips_code = b.state_fips_code AND a.date_shift = b.date GROUP BY b.state_name, date ORDER BY date desc
Coronavirus COVID-19 Global Cases
redivis.com
application/jsonl +7
Updated Jul 13, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Stanford Center for Population Health Sciences (2020). Coronavirus COVID-19 Global Cases [Dataset]. http://doi.org/10.57761/pyf5-4e40
Explore at:
sas, csv, application/jsonl, spss, stata, parquet, arrow, avroAvailable download formats
Unique identifier
https://doi.org/10.57761/pyf5-4e40
Dataset updated
Jul 13, 2020
Dataset provided by
Redivis Inc.
Authors
Stanford Center for Population Health Sciences
Time period covered
Jan 22, 2020 - Jul 12, 2020
Description
Abstract

JHU Coronavirus COVID-19 Global Cases, by country

Documentation

PHS is updating the Coronavirus Global Cases dataset weekly, Monday, Wednesday and Friday from Cloud Marketplace.

This data comes from the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths, and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post.

Visual Dashboard (desktop): https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

Section 2

Included Data Sources are:

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0

US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html

Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html

Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance

European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases

Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19

Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus

1Point3Arces: https://coronavirus.1point3acres.com/en

WorldoMeters: https://www.worldometers.info/coronavirus/

%3C!-- --%3E

Section 3

**Terms of Use: **

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

Section 4

**U.S. county-level characteristics relevant to COVID-19 **

Chin, Kahn, Krieger, Buckee, Balsari and Kiang (forthcoming) show that counties differ significantly in biological, demographic and socioeconomic factors that are associated with COVID-19 vulnerability. A range of publicly available county-specific data identifying these key factors, guided by international experiences and consideration of epidemiological parameters of importance, have been combined by the authors and are available for use:

https://github.com/mkiang/county_preparedness/
c
The COVID Tracking Project
covidtracking.com
google sheets
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The COVID Tracking Project [Dataset]. https://covidtracking.com/
Explore at:
google sheetsAvailable download formats
Description
The COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.
Testing is a crucial part of any public health response, and sharing test data is essential to understanding this outbreak. The CDC is currently not publishing complete testing data, so we’re doing our best to collect it from each state and provide it to the public. The information is patchy and inconsistent, so we’re being transparent about what we find and how we handle it—the spreadsheet includes our live comments about changing data and how we’re working with incomplete information.
From here, you can also learn about our methodology, see who makes this, and find out what information states provide and how we handle it.
Consumers looking for news unrelated to COVID-19 in the U.S. 2020
statista.com
Updated Jan 25, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Consumers looking for news unrelated to COVID-19 in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/1133602/news-unrelated-to-covid-us/
Explore at:
Dataset updated
Jan 25, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
May 13, 2020 - May 15, 2020
Area covered
United States
Description
A recent survey found that although around half of Americans wanted to keep up with news concerning the coronavirus epidemic, 51 percent of adults surveyed said that they were seeking out news that is unrelated to the virus. As of May 2020, news fatigue surrounding the COVID-19 outbreak is apparent, with a significant proportion of the population tired of news concerning coronavirus, and the majority searching for information on unrelated topics.
Least trusted sources of coronavirus news U.S. 2020
statista.com
Updated Jun 21, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2022). Least trusted sources of coronavirus news U.S. 2020 [Dataset]. https://www.statista.com/statistics/1104569/least-trusted-news-sources-coronavirus-us/
Explore at:
Dataset updated
Jun 21, 2022
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 12, 2020 - Mar 15, 2020
Area covered
United States
Description
As the coronavirus has spread throughout the United States and across the globe, consumers have turned to the media to inform about them how the pandemic is progressing and have been seeking news from sources they trust, and 60 percent of respondents to a U.S. survey said that they did not trust social media to provide correct information about the outbreak. Social media was by far the least trusted news outlet for coronavirus updates, followed by podcasts and online-only news sites. Conversely, traditional media outlets like newspapers and radio fared better in terms of consumer trust, along with cable and network news.
d
The Marshall Project: COVID Cases in Prisons
data.world
csv, zip
Updated Apr 6, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2023). The Marshall Project: COVID Cases in Prisons [Dataset]. https://data.world/associatedpress/marshall-project-covid-cases-in-prisons
Explore at:
csv, zipAvailable download formats
Dataset updated
Apr 6, 2023
Authors
The Associated Press
Time period covered
Jul 31, 2019 - Aug 1, 2021
Description
Overview

The Marshall Project, the nonprofit investigative newsroom dedicated to the U.S. criminal justice system, has partnered with The Associated Press to compile data on the prevalence of COVID-19 infection in prisons across the country. The Associated Press is sharing this data as the most comprehensive current national source of COVID-19 outbreaks in state and federal prisons.

Lawyers, criminal justice reform advocates and families of the incarcerated have worried about what was happening in prisons across the nation as coronavirus began to take hold in the communities outside. Data collected by The Marshall Project and AP shows that hundreds of thousands of prisoners, workers, correctional officers and staff have caught the illness as prisons became the center of some of the country’s largest outbreaks. And thousands of people — most of them incarcerated — have died.

In December, as COVID-19 cases spiked across the U.S., the news organizations also shared cumulative rates of infection among prison populations, to better gauge the total effects of the pandemic on prison populations. The analysis found that by mid-December, one in five state and federal prisoners in the United States had tested positive for the coronavirus -- a rate more than four times higher than the general population.

This data, which is updated weekly, is an effort to track how those people have been affected and where the crisis has hit the hardest.

Methodology and Caveats

The data tracks the number of COVID-19 tests administered to people incarcerated in all state and federal prisons, as well as the staff in those facilities. It is collected on a weekly basis by Marshall Project and AP reporters who contact each prison agency directly and verify published figures with officials.

Each week, the reporters ask every prison agency for the total number of coronavirus tests administered to its staff members and prisoners, the cumulative number who tested positive among staff and prisoners, and the numbers of deaths for each group.

The time series data is aggregated to the system level; there is one record for each prison agency on each date of collection. Not all departments could provide data for the exact date requested, and the data indicates the date for the figures.

To estimate the rate of infection among prisoners, we collected population data for each prison system before the pandemic, roughly in mid-March, in April, June, July, August, September and October. Beginning the week of July 28, we updated all prisoner population numbers, reflecting the number of incarcerated adults in state or federal prisons. Prior to that, population figures may have included additional populations, such as prisoners housed in other facilities, which were not captured in our COVID-19 data. In states with unified prison and jail systems, we include both detainees awaiting trial and sentenced prisoners.

To estimate the rate of infection among prison employees, we collected staffing numbers for each system. Where current data was not publicly available, we acquired other numbers through our reporting, including calling agencies or from state budget documents. In six states, we were unable to find recent staffing figures: Alaska, Hawaii, Kentucky, Maryland, Montana, Utah.

To calculate the cumulative COVID-19 impact on prisoner and prison worker populations, we aggregated prisoner and staff COVID case and death data up through Dec. 15. Because population snapshots do not account for movement in and out of prisons since March, and because many systems have significantly slowed the number of new people being sent to prison, it’s difficult to estimate the total number of people who have been held in a state system since March. To be conservative, we calculated our rates of infection using the largest prisoner population snapshots we had during this time period.

As with all COVID-19 data, our understanding of the spread and impact of the virus is limited by the availability of testing. Epidemiology and public health experts say that aside from a few states that have recently begun aggressively testing in prisons, it is likely that there are more cases of COVID-19 circulating undetected in facilities. Sixteen prison systems, including the Federal Bureau of Prisons, would not release information about how many prisoners they are testing.

Corrections departments in Indiana, Kansas, Montana, North Dakota and Wisconsin report coronavirus testing and case data for juvenile facilities; West Virginia reports figures for juvenile facilities and jails. For consistency of comparison with other state prison systems, we removed those facilities from our data that had been included prior to July 28. For these states we have also removed staff data. Similarly, Pennsylvania’s coronavirus data includes testing and cases for those who have been released on parole. We removed these tests and cases for prisoners from the data prior to July 28. The staff cases remain.

About the Data

There are four tables in this data:

covid_prison_cases.csv contains weekly time series data on tests, infections and deaths in prisons. The first dates in the table are on March 26. Any questions that a prison agency could not or would not answer are left blank.

prison_populations.csv contains snapshots of the population of people incarcerated in each of these prison systems for whom data on COVID testing and cases are available. This varies by state and may not always be the entire number of people incarcerated in each system. In some states, it may include other populations, such as those on parole or held in state-run jails. This data is primarily for use in calculating rates of testing and infection, and we would not recommend using these numbers to compare the change in how many people are being held in each prison system.

staff_populations.csv contains a one-time, recent snapshot of the headcount of workers for each prison agency, collected as close to April 15 as possible.

covid_prison_rates.csv contains the rates of cases and deaths for prisoners. There is one row for every state and federal prison system and an additional row with the National totals.

Queries

The Associated Press and The Marshall Project have created several queries to help you use this data:

Get your state's prison COVID data: Provides each week's data from just your state and calculates a cases-per-100000-prisoners rate, a deaths-per-100000-prisoners rate, a cases-per-100000-workers rate and a deaths-per-100000-workers rate here

Rank all systems' most recent data by cases per 100,000 prisoners here

Find what percentage of your state's total cases and deaths -- as reported by Johns Hopkins University -- occurred within the prison system here

Attribution

In stories, attribute this data to: “According to an analysis of state prison cases by The Marshall Project, a nonprofit investigative newsroom dedicated to the U.S. criminal justice system, and The Associated Press.”

Contributors

Many reporters and editors at The Marshall Project and The Associated Press contributed to this data, including: Katie Park, Tom Meagher, Weihua Li, Gabe Isman, Cary Aspinwall, Keri Blakinger, Jake Bleiberg, Andrew R. Calderón, Maurice Chammah, Andrew DeMillo, Eli Hager, Jamiles Lartey, Claudia Lauer, Nicole Lewis, Humera Lodhi, Colleen Long, Joseph Neff, Michelle Pitcher, Alysia Santo, Beth Schwartzapfel, Damini Sharma, Colleen Slevin, Christie Thompson, Abbie VanSickle, Adria Watson, Andrew Welsh-Huggins.

Questions

If you have questions about the data, please email The Marshall Project at info+covidtracker@themarshallproject.org or file a Github issue.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
T
World Coronavirus COVID-19 Deaths
tradingeconomics.com
csv, excel, json, xml
Updated May 15, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2020). World Coronavirus COVID-19 Deaths [Dataset]. https://tradingeconomics.com/world/coronavirus-deaths
Explore at:
excel, csv, xml, jsonAvailable download formats
Dataset updated
May 15, 2025
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 4, 2020 - May 17, 2023
Area covered
World, World
Description
The World Health Organization reported 6932591 Coronavirus Deaths since the epidemic began. In addition, countries reported 766440796 Coronavirus Cases. This dataset provides - World Coronavirus Deaths- actual values, historical data, forecast, chart, statistics, economic calendar and news.
Millennials' most trusted coronavirus news outlets in the U.S. 2020
statista.com
Updated Mar 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2020). Millennials' most trusted coronavirus news outlets in the U.S. 2020 [Dataset]. https://www.statista.com/statistics/1106880/coronavirus-news-trust-millennials-us/
Explore at:
Dataset updated
Mar 25, 2020
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Mar 2020
Area covered
United States
Description
A survey held in March 2020 revealed that Millennials in the United States most trusted local TV broadcast news for updates on the coronavirus outbreak, whereas just 52 percent trusted cable news channels for this information. The COVID-19 pandemic has seriously impacted news consumption in general, with consumers across the world seeking figures, opinions, and updates on the spread of the disease.
d
Face Mask Perception Scale Attitudes in U.S. Covid-19 News
dataone.org
dataverse.harvard.edu
+1more
Updated Mar 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rabb, Nicholas (2024). Face Mask Perception Scale Attitudes in U.S. Covid-19 News [Dataset]. http://doi.org/10.7910/DVN/X1KKAB
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/X1KKAB
Dataset updated
Mar 6, 2024
Dataset provided by
Harvard Dataverse
Authors
Rabb, Nicholas
Description
This data set aggregates mainstream U.S. media news articles and opinion show transcripts concerning Covid-19 mask-wearing between April 6 and June 8, 2020. Additionally, for several paragraphs of the news articles, it includes crowd-sourced annotation of the statements according to 14 mask-wearing attitude questions taken from Howard 2020's Face Mask Perception Scale (FMPS). Each annotated paragraph thus contains 14 labels (e.g., for "Does the text presented convey the idea that it is difficult to breathe while wearing a face mask?" options include 0 (it is difficult), 1 (it is not difficult), and 2 (does not mention)) with a confidence score ranging from 0-6 for each label. In total, this data set contains 2,361 news articles from eight sources (Daily Kos, Vox, New York Times, Fox, Breitbart, Tucker Carlson, Laura Ingraham, Sean Hannity), including article title, publication date, source, and raw text. Another file of the 8,473 paragraphs contained in all the articles is included with unique paragraph IDs. A separate file of crowd-sourced annotations is also included where labels are given for certain paragraph IDs, and contains 7,559 total annotations across 297 paragraphs and 202 articles. Instructions for how to load the data, as well as filter the annotations for high-quality versions (where there is high confidence or inner-annotator agreement), can be found at https://github.com/ricknabb/media-ideology-coding.
COVID_19_CSSEGISandData
kaggle.com
Updated Mar 15, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nuzul Muhammad Ramadhan (2022). COVID_19_CSSEGISandData [Dataset]. https://www.kaggle.com/datasets/newzoel/covid-19-cssegisanddata
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 15, 2022
Dataset provided by
Kaggle
Authors
Nuzul Muhammad Ramadhan
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).

Data Source

World Health Organization (WHO): https://www.who.int/

DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

National Health Commission of the People’s Republic of China (NHC):

http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

Macau Government: https://www.ssm.gov.mo/portal/

Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0

US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html

Government of Canada: https://www.canada.ca/en/public-health/services/diseases/coronavirus.html

Australia Government Department of Health: https://www.health.gov.au/news/coronavirus-update-at-a-glance

European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/geographical-distribution-2019-ncov-cases

Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19

Italy Ministry of Health: http://www.salute.gov.it/nuovocoronavirus

1Point3Arces: https://coronavirus.1point3acres.com/en

WorldoMeters: https://www.worldometers.info/coronavirus/

COVID Tracking Project: https://covidtracking.com/data. (US Testing and Hospitalization Data. We use the maximum reported value from "Currently" and - - - "Cumulative" Hospitalized for our hospitalization number reported for each state.)

French Government: https://dashboard.covid19.data.gouv.fr/

Terms of Use

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
Latin America: online news consumption during Covid 2020
ai-chatbox.pro
statista.com
Updated Jun 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
A. Guttmann (2025). Latin America: online news consumption during Covid 2020 [Dataset]. https://www.ai-chatbox.pro/?_=%2Fstudy%2F73951%2Fcoronavirus-impact-on-internet-and-media-in-latin-america%2F%23XgboD02vawLKoDs%2BT%2BQLIV8B6B4Q9itA
Explore at:
Dataset updated
Jun 3, 2025
Dataset provided by
Statistahttp://statista.com/
Authors
A. Guttmann
Area covered
Latin America
Description
In 2020, the daily time Brazilians spent consuming online news between March 19 and 31 was 46 percent higher compared to the period between January 1 and March 18. The first case of a person infected with the novel coronavirus (COVID-19) in Brazil was reported on February 26, 2020. The lock for non-essential business started on March 24. For further information about the coronavirus (COVID-19) pandemic, please visit our dedicated Facts and Figures page.
h
trec-covid
huggingface.co
opendatalab.com
Updated Aug 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BEIR (2023). trec-covid [Dataset]. https://huggingface.co/datasets/BeIR/trec-covid
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 16, 2023
Dataset authored and provided by
BEIR
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Dataset Card for BEIR Benchmark

Dataset Summary

BEIR is a heterogeneous benchmark that has been built from 18 diverse datasets representing 9 information retrieval tasks:

Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04 Argument Retrieval: Touche-2020, ArguAna Duplicate Question Retrieval: Quora, CqaDupstack Citation-Prediction: SCIDOCS Tweet… See the full description on the dataset page: https://huggingface.co/datasets/BeIR/trec-covid.
Covid Daily Deaths in USA Till August 9, 2020
kaggle.com
Updated Aug 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
WildGrok (2020). Covid Daily Deaths in USA Till August 9, 2020 [Dataset]. https://www.kaggle.com/wildgrok/covid-daily-deaths-in-usa-till-august-9-2020/discussion
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 10, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
WildGrok
Area covered
United States
Description
Dataset

This dataset was created by WildGrok

Contents
M
Crowdtangle Coronavirus (COVID-19) Live Displays
catalog.midasnetwork.us
Updated Sep 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
MIDAS Coordination Center (2024). Crowdtangle Coronavirus (COVID-19) Live Displays [Dataset]. https://catalog.midasnetwork.us/collection/132
Explore at:
Dataset updated
Sep 9, 2024
Dataset authored and provided by
MIDAS Coordination Center
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Variables measured
media, disease, COVID-19, pathogen, Homo sapiens, social media, host organism, infectious disease, Severe acute respiratory syndrome coronavirus 2
Dataset funded by
National Institute of General Medical Sciences
Description
A hub with live displays of news, information, and content related to COVID-19 extracted from Facebook, Reddit and Instagram, from around the world (or in specific regions or country) from local news outlets, regional WHO pages, government agencies, local politicians and more. Each Live Display features post streams sorted by COVID-19 or vaccine related keywords and public accounts to each region.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2023). Most used sources of coronavirus news U.S. 2020, by age group [Dataset]. https://www.statista.com/statistics/1104391/coronavirus-news-sources-by-age-us/

Most used sources of coronavirus news U.S. 2020, by age group

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Aug 29, 2023

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

Mar 13, 2020

Area covered

United States

Description

According to a survey conducted in March 2020, 58 percent of U.S. news consumers said that they were seeking out the latest information about the coronavirus via news media in general, including TV news, radio news, online news, and newspapers. In fact, 70 percent of adults aged 55 or above were getting most of their news about the virus this way, compared to just 37 percent of 18 to 24-year-olds who were more likely than their older peers to turn to websites or social media posts from government or health agencies.

Clear search

Close search

Google apps

Main menu

Most used sources of coronavirus news U.S. 2020, by age group

Coronavirus (Covid-19) Data in the United States

Most trusted sources of coronavirus news U.S. 2020

INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

Levels of confidence fact checking coronavirus news U.S. 2020

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

COVID19 - The New York Times

Context

Sample Queries

Query 1

Query 2

Coronavirus COVID-19 Global Cases

Abstract

Documentation

Section 2

Section 3

Section 4

The COVID Tracking Project

Consumers looking for news unrelated to COVID-19 in the U.S. 2020

Least trusted sources of coronavirus news U.S. 2020

The Marshall Project: COVID Cases in Prisons

Overview

Methodology and Caveats

About the Data

Queries

Attribution

Contributors

Questions

World Coronavirus COVID-19 Deaths

Millennials' most trusted coronavirus news outlets in the U.S. 2020

Face Mask Perception Scale Attitudes in U.S. Covid-19 News

COVID_19_CSSEGISandData

Context

Data Source

Terms of Use

Latin America: online news consumption during Covid 2020

trec-covid

Covid Daily Deaths in USA Till August 9, 2020

Dataset

Contents

Crowdtangle Coronavirus (COVID-19) Live Displays

Most used sources of coronavirus news U.S. 2020, by age group