72 datasets found
  1. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +4more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  2. U.S. cable news networks: coronavirus viewership impact 2019-2020

    • statista.com
    Updated Mar 23, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). U.S. cable news networks: coronavirus viewership impact 2019-2020 [Dataset]. https://www.statista.com/statistics/1105352/cable-news-network-viewership-coronavirus-usa/
    Explore at:
    Dataset updated
    Mar 23, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Area covered
    United States
    Description

    In the week running from March 9 to 15, 2020, Fox News averaged **** million viewers in primetime, and CNN outperformed MSNBC with its primetime audience of **** million. Comparing these figures to the corresponding week of the previous year, primetime viewership is noticeably higher among all three of the major cable news networks. Cable news network viewership varies monthly, though Fox News generally comes out on top, but the TV industry as a whole will be keeping a close eye on developments and ratings in spring 2020 in light of the coronavirus outbreak. The pandemic which is driving people indoors as they self-isolate, contrary to normal spring behaviour which tends to send viewers outdoors and away from their television sets.

    Important to note here is that on March 11, 2020, the World Health Organization announced that the coronavirus was a global pandemic, right in the middle of the week in March 2020 presented in the graph. In that week, Fox News averaged over *** million more primetime viewers than in the corresponding period in 2019, and CNN's primetime audience was around ***** times higher.

  3. Most trusted sources of coronavirus news U.S. 2020

    • statista.com
    Updated Mar 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). Most trusted sources of coronavirus news U.S. 2020 [Dataset]. https://www.statista.com/statistics/1104557/coronavirus-trusted-news-sources-by-us/
    Explore at:
    Dataset updated
    Mar 13, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 13, 2020 - Mar 16, 2020
    Area covered
    United States
    Description

    As the United States battles the coronavirus, news consumers across the country have been attempting to keep themselves updated with how the pandemic is progressing, and a survey held in March 2020 revealed that the most trusted news source for details on COVID-19 was the CDC, with ** percent of respondents saying that they trusted the centers to provide accurate information on the topic. Following closely behind was the World Health Organization and then the state government, but just ** percent of consumers said that they trusted social media sites to publish reliable and accurate news about the coronavirus outbreak.

  4. c

    The COVID Tracking Project

    • covidtracking.com
    google sheets
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The COVID Tracking Project [Dataset]. https://covidtracking.com/
    Explore at:
    google sheetsAvailable download formats
    Description

    The COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.

    Testing is a crucial part of any public health response, and sharing test data is essential to understanding this outbreak. The CDC is currently not publishing complete testing data, so we’re doing our best to collect it from each state and provide it to the public. The information is patchy and inconsistent, so we’re being transparent about what we find and how we handle it—the spreadsheet includes our live comments about changing data and how we’re working with incomplete information.

    From here, you can also learn about our methodology, see who makes this, and find out what information states provide and how we handle it.

  5. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    • kaggle.com
    csv, zip
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 3, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  6. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Silicon Orchard Lab, Bangladesh
    University of Memphis, USA
    Independent University, Bangladesh
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  7. New York Times Covid-19 Data (United States)

    • kaggle.com
    zip
    Updated Nov 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Peteuil (2025). New York Times Covid-19 Data (United States) [Dataset]. https://www.kaggle.com/datasets/mpeteuil/nytimes-covid-19-data
    Explore at:
    zip(162971226 bytes)Available download formats
    Dataset updated
    Nov 22, 2025
    Authors
    Michael Peteuil
    Area covered
    United States
    Description

    Source

    This data comes from the New York Times Coronavirus (Covid-19) Data in the United States GitHub repository. They use it to power their interactive page(s) on Covid-19, such as Coronavirus in the U.S.: Latest Map and Case Count.

    What's Included?

    The primary data published here are the daily cumulative number of cases and deaths reported in each county and state across the U.S. since the beginning of the pandemic. We have also published these additional data sets:

    • Prisons: Cases in prisons
    • Colleges: Cases on college and university campuses.
    • Excess deaths: The elevated overall number of deaths during the pandemic.
    • Mask use: A July 2020 survey of how regularly people in each county wore masks.
    • Averages and anomalies: A set of pre-computed rolling averages of cases and deaths for ease of analysis or use in making graphics, along with a set of days with anomalous data that have been excluded from the averages.

    The cumulative & rolling averages for cases and deaths are continually updated, but the more specific data mentioned above for prisons, etc. is no longer being updated.

    This includes data at the national, state, and county levels.

    License and Attribution

    If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”

    Acknowledgements

    Header Image: https://www.pexels.com/photo/n95-face-mask-3993241/

    More Information

    See the original New York Times source README which is also included in this dataset.

  8. COVID-19 US Daily Data

    • kaggle.com
    zip
    Updated Sep 2, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Altadata (2020). COVID-19 US Daily Data [Dataset]. https://www.kaggle.com/altadata/covid19us
    Explore at:
    zip(232018 bytes)Available download formats
    Dataset updated
    Sep 2, 2020
    Authors
    Altadata
    Area covered
    United States
    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F5505749%2F2b83271d61e47e2523e10dc9c28e545c%2F600x200.jpg?generation=1599042483103679&alt=media" alt="">

    ALTADATA is a curated data marketplace where our subscribers and our data partners can easily exchange ready-to-analyze datasets and create insights with EPO, our visual data analytics platform.

    COVID-19 US Daily Data

    State level daily COVID-19 data for United States, provided by Johns Hopkins University (JHU) Center for Systems Science and Engineering (CSSE). If you want to use the updated version of the data, you can use our daily updated data with the help of api key by entering it via Altadata.

    Overview

    In this data product, you may find the latest and historical daily data on the COVID-19 pandemic for United States with the states level breakdown.

    The COVID‑19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID‑19), caused by severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2). The outbreak was first identified in December 2019 in Wuhan, China. The World Health Organization declared the outbreak a Public Health Emergency of International Concern on 30 January 2020 and a pandemic on 11 March. As of 12 August 2020, more than 20.2 million cases of COVID‑19 have been reported in more than 188 countries and territories, resulting in more than 741,000 deaths; more than 12.5 million people have recovered.

    The Johns Hopkins Coronavirus Resource Center is a continuously updated source of COVID-19 data and expert guidance. They aggregate and analyze the best data available on COVID-19 - including cases, as well as testing, contact tracing and vaccine efforts - to help the public, policymakers and healthcare professionals worldwide respond to the pandemic.

    Methodology

    • Cases and Death counts include confirmed and probable (where reported)
    • Recovered cases are estimates based on local media reports, and state and local reporting when available, and therefore may be substantially lower than the true number. US state-level recovered cases are from COVID Tracking Project.
    • Active cases = total cases - total recovered - total deaths
    • Incidence Rate = cases per 100,000 persons
    • Case-Fatality Ratio (%) = Number recorded deaths / Number cases
    • US Testing Rate = total test results per 100,000 persons. The "total test results" are equal to "Total test results (Positive + Negative)" from COVID Tracking Project.
    • US Hospitalization Rate (%) = Total number hospitalized / Number cases. The "Total number hospitalized" is the "Hospitalized – Cumulative" count from COVID Tracking Project. The "hospitalization rate" and "Total number hospitalized" are only presented for those states which provide cumulative hospital data.
    • States Population data is retrieved from U.S. Census Bureau on top of the JHU CSSE's COVID-19 data

    Data Source

    Related Data Products

    Suggested Blog Posts

    Data Dictionary

    • Reported Date (reported_date): Covid-19 Report Date
    • Province State (province_state): State name
    • Population (population): Estimated state populations as of July 2019, as per U.S. Census Bureau Population Division
    • Latitude (lat): Dot locations, not representative of a specific address
    • Longitude (lng): Dot locations longitude, not representative of a specific address
    • Confirmed Case (confirmed): Confirmed cases include presumptive positive cases and probable cases
    • Active cases (active): Active cases = total confirmed - total recovered - total deaths
    • Deaths (deaths): Death cases counts
    • Recovered (recovered): Recovered cases counts
    • Hospitalization Rate (hospitalization_rate): Total number of people hospitalized * 100...
  9. Most used sources of coronavirus news U.S. 2020, by age group

    • statista.com
    Updated Mar 13, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). Most used sources of coronavirus news U.S. 2020, by age group [Dataset]. https://www.statista.com/statistics/1104391/coronavirus-news-sources-by-age-us/
    Explore at:
    Dataset updated
    Mar 13, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 13, 2020
    Area covered
    United States
    Description

    According to a survey conducted in March 2020, ** percent of U.S. news consumers said that they were seeking out the latest information about the coronavirus via news media in general, including TV news, radio news, online news, and newspapers. In fact, ** percent of adults aged 55 or above were getting most of their news about the virus this way, compared to just ** percent of ** to 24-year-olds who were more likely than their older peers to turn to websites or social media posts from government or health agencies.

  10. Novel Covid-19 Dataset

    • kaggle.com
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GHOST5612
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context:

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited:

    Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Here’s a polished version suitable for a professional Kaggle dataset description:

    Dataset Description

    This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

    Files and Columns

    1. covid_19_data.csv (Main File)

    This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

    • Sno – Serial number of the record
    • ObservationDate – Date of the observation (MM/DD/YYYY)
    • Province/State – Province or state of the observation (may be missing for some entries)
    • Country/Region – Country of the observation
    • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
    • Confirmed – Cumulative number of confirmed cases on that date
    • Deaths – Cumulative number of deaths on that date
    • Recovered – Cumulative number of recoveries on that date

    2. 2019_ncov_data.csv (Legacy File)

    This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

    3. COVID_open_line_list_data.csv

    This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

    4. COVID19_line_list_data.csv

    Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

    ✅ Use covid_19_data.csv for up-to-date aggregated global trends.

    ✅ Use the line list datasets for detailed, individual-level case analysis.

    Country level datasets:

    If you are interested in knowing country level data, please refer to the following Kaggle datasets:

    India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

    South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

    Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

    Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

    USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

    Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

    Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

    Acknowledgements :

    Johns Hopkins University for making the data available for educational and academic research purposes

    MoBS lab - https://www.mobs-lab.org/2019ncov.html

    World Health Organization (WHO): https://www.who.int/

    DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

    BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

    National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

    China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

    Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

    Macau Government: https://www.ssm.gov.mo/portal/

    Taiwan CDC: https://sites.google....

  11. COVID-19 in USA

    • kaggle.com
    zip
    Updated Dec 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SRK (2020). COVID-19 in USA [Dataset]. https://www.kaggle.com/sudalairajkumar/covid19-in-usa
    Explore at:
    zip(10190889 bytes)Available download formats
    Dataset updated
    Dec 7, 2020
    Authors
    SRK
    Area covered
    United States
    Description

    Context

    Data is obtained from COVID-19 Tracking project and NYTimes. Sincere thanks to them for making it available to the public.

    Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - World Health Organization

    The number of new cases are increasing day by day around the world. This dataset has information from 50 US states and the District of Columbia at daily level.

    LICENSE:

    Please refer here Apache License 2.0 A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code.

    For counties dataset, please refer here

    Content

    us_states_covid19_daily.csv

    This dataset has number of tests conducted in each state at daily level. Column descriptions are

    date - date of observation state - US state 2 digit code positive - number of tests with positive results negative - number of tests with negative results pending - number of test with pending results death - number of deaths total - total number of tests

    Acknowledgements

    Sincere thanks to COVID-19 Tracking project from which the data is obtained.

    Sincere thanks to NYTimes for the counties dataset

    There is a nice tableau public dashboard on the data. Images for this dataset is obtained from the same. Thank you.

    Inspiration

    Some of the questions that could be answered are 1. How is the spread over time to various states 2. Change in number of people tested over time

  12. US COVID19 cases by date

    • kaggle.com
    zip
    Updated May 7, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Charlie Craine (2020). US COVID19 cases by date [Dataset]. https://www.kaggle.com/crained/us-covid19-cases-by-date
    Explore at:
    zip(841 bytes)Available download formats
    Dataset updated
    May 7, 2020
    Authors
    Charlie Craine
    License

    http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.htmlhttp://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html

    Area covered
    United States
    Description

    Data structure

    date,cases,deaths 2020-01-21,1,0

    Acknowledgements

    The New York Times data

    The data is the product of dozens of journalists working across several time zones to monitor news conferences, analyze data releases and seek clarification from public officials on how they categorize cases.

    https://github.com/nytimes/covid-19-data

  13. COVID-19 WORLDWIDE DATASET

    • kaggle.com
    zip
    Updated Nov 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Valles (2025). COVID-19 WORLDWIDE DATASET [Dataset]. https://www.kaggle.com/jamesvalles/covid19-worldwide-dataset
    Explore at:
    zip(2621167 bytes)Available download formats
    Dataset updated
    Nov 20, 2025
    Authors
    James Valles
    Description

    Context

    Each workbook contains daily COVID-19 stats by each country affected. Additional sheets have also been added for more specific breakdown by different locations within Australia, Canada, China, and USA. Worked with BNO News to put this together. Additional credits include: Michael Van Poppel and Carlos Robles. Github updated every 24 hrs can be found here: https://github.com/jamesvalles/CORONAVIUS-COVID-19-DAILYSTATS

  14. Least trusted sources of coronavirus news U.S. 2020

    • statista.com
    Updated Mar 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2020). Least trusted sources of coronavirus news U.S. 2020 [Dataset]. https://www.statista.com/statistics/1104569/least-trusted-news-sources-coronavirus-us/
    Explore at:
    Dataset updated
    Mar 12, 2020
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Mar 12, 2020 - Mar 15, 2020
    Area covered
    United States
    Description

    As the coronavirus has spread throughout the United States and across the globe, consumers have turned to the media to inform about them how the pandemic is progressing and have been seeking news from sources they trust, and ** percent of respondents to a U.S. survey said that they did not trust social media to provide correct information about the outbreak. Social media was by far the least trusted news outlet for coronavirus updates, followed by podcasts and online-only news sites. Conversely, traditional media outlets like newspapers and radio fared better in terms of consumer trust, along with cable and network news.

  15. Covid Daily Deaths in USA Till August 9, 2020

    • kaggle.com
    zip
    Updated Aug 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    WildGrok (2020). Covid Daily Deaths in USA Till August 9, 2020 [Dataset]. https://www.kaggle.com/datasets/wildgrok/covid-daily-deaths-in-usa-till-august-9-2020
    Explore at:
    zip(28913 bytes)Available download formats
    Dataset updated
    Aug 10, 2020
    Authors
    WildGrok
    Area covered
    United States
    Description

    Dataset

    This dataset was created by WildGrok

    Contents

  16. Q

    Data for: COVID Diaries, Part II: U.S. Media Response to COVID Vaccination...

    • data.qdr.syr.edu
    pdf, tsv, txt
    Updated Nov 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Avalon S. Moore; Avalon S. Moore; Bridget Vitu; Felicia Fraizer-Bisner; Peter J. Williams; Madeline Chun; Claire Archer; Abdelrhman Gouda; Abdelrhman Gouda; Akhil Vallabh; Alixandra Wilens; Alixandra Wilens; Christopher Pittenger; Christopher Pittenger; Helen Pushkarskaya; Helen Pushkarskaya; Bridget Vitu; Felicia Fraizer-Bisner; Peter J. Williams; Madeline Chun; Claire Archer; Akhil Vallabh (2025). Data for: COVID Diaries, Part II: U.S. Media Response to COVID Vaccination Program, December 2020 to September 2021 [Dataset]. http://doi.org/10.5064/F63IIXNY
    Explore at:
    tsv(111033), pdf(327734), pdf(236549), txt(2885)Available download formats
    Dataset updated
    Nov 25, 2025
    Dataset provided by
    Qualitative Data Repository
    Authors
    Avalon S. Moore; Avalon S. Moore; Bridget Vitu; Felicia Fraizer-Bisner; Peter J. Williams; Madeline Chun; Claire Archer; Abdelrhman Gouda; Abdelrhman Gouda; Akhil Vallabh; Alixandra Wilens; Alixandra Wilens; Christopher Pittenger; Christopher Pittenger; Helen Pushkarskaya; Helen Pushkarskaya; Bridget Vitu; Felicia Fraizer-Bisner; Peter J. Williams; Madeline Chun; Claire Archer; Akhil Vallabh
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 1, 2020 - Sep 30, 2021
    Area covered
    United States
    Description

    Project Overview This portion of the COVID DIARIES project provides full bibliographic information (including original and permanent links) to media items related to the COVID-19 vaccination program, published on the official websites of 20 major U.S. news outlets, including television networks, magazines, and newspapers. It spans the period from December 2020, when states began implementing Phase 1a of the vaccine allocation plan, through September 2021, when vaccines became widely available to all adults and were frequently mandated. News items were collected to preserve a contemporaneous record of how the vaccination effort was discussed across national media. The dataset enables researchers to analyze media communication strategies during a nationwide public health emergency, with the broader aim of informing more effective public health messaging through mass media. This project represents a collaborative effort between the Yale School of Medicine and the Tobin Center for Economic Policy. Data and Data Collection Overview This collection comprises 5,383 unique publication links from 20 major news outlets—including television networks, magazines, and newspapers—published between December 1, 2020, and September 30, 2021. Only articles that were freely accessible online without subscription or paywall restrictions were included. Articles were collected by the research team (specifically AM) between August 2021 and November 2023 and in April 2024 (by AM and AG). These 20 news outlets were selected based on a 2020–2021 survey of 511 U.S. adults, which identified the outlets most commonly used to obtain information about the COVID-19 vaccination program. A full list of news outlets, along with their reported usage and perceived trustworthiness, is provided in Sources_Selection.docx. Online publications were identified using Google search with a custom date range in week-long increments (e.g., 12/01/2020–12/07/2020), using the keyword “vaccine” in combination with the link to the respective news outlet’s website. Search results were manually reviewed by AM according to the following inclusion and exclusion criteria. Inclusion criteria: Articles published on the selected U.S. news outlets websites ending in “.com” or “.co” that relate to the COVID-19 vaccination program; Articles from the selected international news outlets that serve both their country of origin and the U.S. audience (e.g., BBC, The Daily Mail). Exclusion criteria: Articles published on the international news outlets websites that exclusively serve their country of origin (e.g., domains ending in .uk, .ca, etc. without .com, .co); Publications from universities, government agencies, or other organizations not affiliated with major U.S. news outlets (e.g., domains ending in .edu, .gov, .org); Videos without accompanying transcripts; Publications without textual content; Articles referencing vaccines unrelated to COVID-19; Non-English language publications. Selection and Organization of Shared Data The full list of publications is provided in the data file named "News_Outlets_Publications_Full_List." Entries are organized by news outlet (one per tab), then by publication year, month, week, and article title within each tab. For each entry, the list includes the article’s original download date by the research team, file format (e.g., PDF), original link to the publication, and a permanent link record. The list was verified by MC, CA, AV, AG, and AM, with final quality control performed by AM. Each article was assigned a unique identifier in the format: "Article Title – News Outlet Name", ensuring that each entry appears only once in the final dataset. Additional documentation includes this Data Narrative, a document explaining the source selection and an administrative README file.

  17. h

    covid_news

    • huggingface.co
    Updated Jan 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Omar Sanseviero (2020). covid_news [Dataset]. https://huggingface.co/datasets/osanseviero/covid_news
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 3, 2020
    Authors
    Omar Sanseviero
    License

    https://choosealicense.com/licenses/cc0-1.0/https://choosealicense.com/licenses/cc0-1.0/

    Description

    Dataset Card for COVID News Articles (2020 - 2022)

      Dataset Summary
    

    The dataset encapsulates approximately half a million news articles collected over a period of 2 years during the Coronavirus pandemic onset and surge. It consists of 3 columns - title, content and category. title refers to the headline of the news article. content refers to the article in itself and category denotes the overall context of the news article at a high level. The dataset encapsulates… See the full description on the dataset page: https://huggingface.co/datasets/osanseviero/covid_news.

  18. Table_1_Social Media News Use Induces COVID-19 Vaccine Hesitancy Through...

    • frontiersin.figshare.com
    docx
    Updated Jun 13, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saifuddin Ahmed; Muhammad Ehab Rasul; Jaeho Cho (2023). Table_1_Social Media News Use Induces COVID-19 Vaccine Hesitancy Through Skepticism Regarding Its Efficacy: A Longitudinal Study From the United States.DOCX [Dataset]. http://doi.org/10.3389/fpsyg.2022.900386.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Jun 13, 2023
    Dataset provided by
    Frontiers Mediahttp://www.frontiersin.org/
    Authors
    Saifuddin Ahmed; Muhammad Ehab Rasul; Jaeho Cho
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    United States
    Description

    There are mounting concerns about the adverse effects of social media on the public understanding of the COVID-19 pandemic and its potential effects on vaccination coverage. Yet early studies have focused on generic social media use and been based on cross-sectional data limiting any causal inferences. This study is among the first to provide causal support for the speculation that social media news use leads to vaccine hesitancy among US citizens. This two-wave survey study was conducted in the US using Qualtrics online panel-based recruitment. We employ mediation and moderated mediation analyses to test our assumptions. The results suggest that using social media to consume news content can translate into vaccine hesitancy by increasing citizens’ skepticism regarding the efficacy of vaccines. However, these effects are contingent upon the news literacy of users, as the effects on vaccine hesitancy are more substantial among those with lower news literacy. The current study recommends to public policymakers and vaccine communication strategists that any attempt to reduce vaccine hesitancy in society should factor in the adverse effects of social media news use that can increase vaccine safety concerns.

  19. n

    Daily United States COVID-19 Testing and Outcomes Data By State, March 7,...

    • data.niaid.nih.gov
    • datadryad.org
    zip
    Updated Jul 28, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The COVID Tracking Project at The Atlantic (2021). Daily United States COVID-19 Testing and Outcomes Data By State, March 7, 2020 to March 7, 2021 [Dataset]. http://doi.org/10.5061/dryad.9kd51c5hk
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jul 28, 2021
    Dataset provided by
    .
    Authors
    The COVID Tracking Project at The Atlantic
    License

    https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html

    Area covered
    United States
    Description

    The COVID Tracking Project was a volunteer organization launched from The Atlantic and dedicated to collecting and publishing the data required to understand the COVID-19 outbreak in the United States. Our dataset was in use by national and local news organizations across the United States and by research projects and agencies worldwide.

    Every day, we collected data on COVID-19 testing and patient outcomes from all 50 states, 5 territories, and the District of Columbia by visiting official public health websites for those jurisdictions and entering reported values in a spreadsheet. The files in this dataset represent the entirety of our COVID-19 testing and outcomes data collection from March 7, 2020 to March 7, 2021. This dataset includes official values reported by each state on each day of antigen, antibody, and PCR test result totals; the total number of probable and confirmed cases of COVID-19; the number of people currently hospitalized, in intensive care, and on a ventilator; the total number of confirmed and probable COVID-19 deaths; and more.

    Methods This dataset was compiled by about 300 volunteers with The COVID Tracking Project from official sources of state-level COVID-19 data such as websites and press conferences. Every day, a team of about a dozen available volunteers visited these official sources and recorded the publicly reported values in a shared Google Sheet, which was used as a data source to publish the full dataset each day between about 5:30pm and 7pm Eastern time. All our data came from state and territory public health authorities or official statements from state officials. We did not automatically scrape data or attempt to offer a live feed. Our data was gathered and double-checked by humans, and we emphasized accuracy and context over speed. Some data was corrected or backfilled from structured data provided by public health authorities. Additional information about our methods can be found in a series of posts at http://covidtracking.com/analysis-updates.

    We offer thanks and heartfelt gratitude for the labor and sacrifice of our volunteers. Volunteers on the Data Entry, Data Quality, and Data Infrastructure teams who granted us permission to use their name publicly are listed in VOLUNTEERS.md.

  20. Levels of confidence fact checking coronavirus news U.S. 2020

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Levels of confidence fact checking coronavirus news U.S. 2020 [Dataset]. https://www.statista.com/statistics/1121717/fact-check-corona-news-us/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Apr 20, 2020 - Apr 26, 2020
    Area covered
    United States
    Description

    According to the most recently available data, around ********* of Americans feel very confident in their ability to check the accuracy of news stories regarding coronavirus. In an online survey conducted in **********, ** percent of respondents stated they would know how to confirm the accuracy of news and information regarding the COVID-19 pandemic. The majority of participants expressed a moderate level of self confidence in their capacity to fact check, with ** percent somewhat confident.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Coronavirus (Covid-19) Data in the United States

Explore at:
Dataset provided by
New York Times
Description

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Search
Clear search
Close search
Google apps
Main menu