100+ datasets found
  1. n

    Coronavirus (Covid-19) Data in the United States

    • nytimes.com
    • openicpsr.org
    • +4more
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
    Explore at:
    Dataset provided by
    New York Times
    Description

    The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

    Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

    We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

    The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

  2. Coronavirus (COVID-19) new cases in Italy as of January 2025, by date of...

    • statista.com
    Updated Feb 15, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2022). Coronavirus (COVID-19) new cases in Italy as of January 2025, by date of report [Dataset]. https://www.statista.com/statistics/1101690/coronavirus-new-cases-development-italy/
    Explore at:
    Dataset updated
    Feb 15, 2022
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Feb 22, 2020 - Jan 8, 2025
    Area covered
    Europe, Italy
    Description

    The first two cases of the new coronavirus (COVID-19) in Italy were recorded between the end of January and the beginning of February 2020. Since then, the number of cases in Italy increased steadily, reaching over 26.9 million as of January 8, 2025. The region mostly hit by the virus in the country was Lombardy, counting almost 4.4 million cases. On January 11, 2022, 220,532 new cases were registered, which represented the biggest daily increase in cases in Italy since the start of the pandemic. The virus originated in Wuhan, a Chinese city populated by millions and located in the province of Hubei. More statistics and facts about the virus in Italy are available here.For a global overview, visit Statista's webpage exclusively dedicated to coronavirus, its development, and its impact.

  3. c

    The COVID Tracking Project

    • covidtracking.com
    google sheets
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The COVID Tracking Project [Dataset]. https://covidtracking.com/
    Explore at:
    google sheetsAvailable download formats
    Description

    The COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.

    Testing is a crucial part of any public health response, and sharing test data is essential to understanding this outbreak. The CDC is currently not publishing complete testing data, so we’re doing our best to collect it from each state and provide it to the public. The information is patchy and inconsistent, so we’re being transparent about what we find and how we handle it—the spreadsheet includes our live comments about changing data and how we’re working with incomplete information.

    From here, you can also learn about our methodology, see who makes this, and find out what information states provide and how we handle it.

  4. d

    Johns Hopkins COVID-19 Case Tracker

    • data.world
    • kaggle.com
    csv, zip
    Updated Dec 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
    Explore at:
    zip, csvAvailable download formats
    Dataset updated
    Dec 3, 2025
    Authors
    The Associated Press
    Time period covered
    Jan 22, 2020 - Mar 9, 2023
    Area covered
    Description

    Updates

    • Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

    • April 9, 2020

      • The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.
    • April 20, 2020

      • Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.
    • April 29, 2020

      • The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.
    • September 1st, 2020

      • Johns Hopkins is now providing counts for the five New York City counties individually.
    • February 12, 2021

      • The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."
      • Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.
    • February 16, 2021

      - Johns Hopkins has reconciled Ohio's historical deaths data with the state.

      Overview

    The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

    The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

    This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

    The AP is updating this dataset hourly at 45 minutes past the hour.

    To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

    Queries

    Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

    Interactive

    The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

    @(https://datawrapper.dwcdn.net/nRyaf/15/)

    Interactive Embed Code

    <iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
    

    Caveats

    • This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.
    • In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.
    • In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"
    • This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.
    • Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
    • Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.
    • The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

    Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

    Attribution

    This data should be credited to Johns Hopkins University COVID-19 tracking project

  5. m

    COVID-19 reporting

    • mass.gov
    Updated Mar 4, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Executive Office of Health and Human Services (2020). COVID-19 reporting [Dataset]. https://www.mass.gov/info-details/covid-19-reporting
    Explore at:
    Dataset updated
    Mar 4, 2020
    Dataset provided by
    Executive Office of Health and Human Services
    Department of Public Health
    Area covered
    Massachusetts
    Description

    The COVID-19 dashboard includes data on city/town COVID-19 activity, confirmed and probable cases of COVID-19, confirmed and probable deaths related to COVID-19, and the demographic characteristics of cases and deaths.

  6. m

    Covid-19 latest news dataset

    • data.mendeley.com
    Updated Oct 27, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rajat Thakur (2021). Covid-19 latest news dataset [Dataset]. http://doi.org/10.17632/8rbm7d874k.1
    Explore at:
    Dataset updated
    Oct 27, 2021
    Authors
    Rajat Thakur
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Coronavirus disease 2019 (COVID19) time series that lists confirmed cases, reported deaths, and reported recoveries. Data is broken down by country (and sometimes by sub-region).

    Coronavirus disease (COVID19) is caused by severe acute respiratory syndrome Coronavirus 2 (SARSCoV2) and has had an effect worldwide. On March 11, 2020, the World Health Organization (WHO) declared it a pandemic, currently indicating more than 118,000 cases of coronavirus disease in more than 110 countries and territories around the world.

    This dataset contains the latest news related to Covid-19 and it was fetched with the help of Newsdata.io news API.

  7. COVID Fake News Dataset

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    Updated Nov 27, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sumit Banik; Sumit Banik (2020). COVID Fake News Dataset [Dataset]. http://doi.org/10.5281/zenodo.4282522
    Explore at:
    Dataset updated
    Nov 27, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Sumit Banik; Sumit Banik
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Context

    The dataset contains the list of COVID Fake News/Claims which is shared all over the internet.

    Content

    1. Headlines: String attribute consisting of the headlines/fact shared.
    2. Outcome: It is binary data where 0 means the headline is fake and 1 means that it is true.

    Inspiration

    In many research portals, there was this common question in which the combined fake news dataset is available or not. This led to the publication of this dataset.

  8. Coronavirus (COVID-19) cases in Italy as of January 2025, by region

    • statista.com
    Updated Nov 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista (2023). Coronavirus (COVID-19) cases in Italy as of January 2025, by region [Dataset]. https://www.statista.com/statistics/1099375/coronavirus-cases-by-region-in-italy/
    Explore at:
    Dataset updated
    Nov 15, 2023
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jan 1, 2025
    Area covered
    Italy
    Description

    After entering Italy, the coronavirus (COVID-19) spread fast. The strict lockdown implemented by the government during the Spring 2020 helped to slow down the outbreak. However, the country had to face four new harsh waves of contagion. As of January 1, 2025, the total number of cases reported by the authorities reached over 26.9 million. The north of the country was mostly hit, and the region with the highest number of cases was Lombardy, which registered almost 4.4 million of them. The north-eastern region of Veneto and the southern region of Campania followed in the list. When adjusting these figures for the population size of each region, however, the picture changed, with the region of Veneto being the area where the virus had the highest relative incidence. Coronavirus in Italy Italy has been among the countries most impacted by the coronavirus outbreak. Moreover, the number of deaths due to coronavirus recorded in Italy is significantly high, making it one of the countries with the highest fatality rates worldwide, especially in the first stages of the pandemic. In particular, a very high mortality rate was recorded among patients aged 80 years or older. Impact on the economy The lockdown imposed during the Spring 2020, and other measures taken in the following months to contain the pandemic, forced many businesses to shut their doors and caused industrial production to slow down significantly. As a result, consumption fell, with the sectors most severely hit being hospitality and tourism, air transport, and automotive. Several predictions about the evolution of the global economy were published at the beginning of the pandemic, based on different scenarios about the development of the pandemic. According to the official results, it appeared that the coronavirus outbreak had caused Italy’s GDP to shrink by approximately nine percent in 2020.

  9. i

    Covid-19 Fake News Infodemic Research Dataset (CoVID19-FNIR Dataset)

    • ieee-dataport.org
    Updated Jul 29, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DIKSHA SHUKLA (2025). Covid-19 Fake News Infodemic Research Dataset (CoVID19-FNIR Dataset) [Dataset]. https://ieee-dataport.org/open-access/covid-19-fake-news-infodemic-research-dataset-covid19-fnir-dataset
    Explore at:
    Dataset updated
    Jul 29, 2025
    Authors
    DIKSHA SHUKLA
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The United States of America

  10. COVID-19 Trends in Each Country

    • coronavirus-response-israel-systematics.hub.arcgis.com
    • coronavirus-disasterresponse.hub.arcgis.com
    • +2more
    Updated Mar 28, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Urban Observatory by Esri (2020). COVID-19 Trends in Each Country [Dataset]. https://coronavirus-response-israel-systematics.hub.arcgis.com/maps/a16bb8b137ba4d8bbe645301b80e5740
    Explore at:
    Dataset updated
    Mar 28, 2020
    Dataset provided by
    Esrihttp://esri.com/
    Authors
    Urban Observatory by Esri
    Area covered
    Earth
    Description

    On March 10, 2023, the Johns Hopkins Coronavirus Resource Center ceased its collecting and reporting of global COVID-19 data. For updated cases, deaths, and vaccine data please visit: World Health Organization (WHO)For more information, visit the Johns Hopkins Coronavirus Resource Center.COVID-19 Trends MethodologyOur goal is to analyze and present daily updates in the form of recent trends within countries, states, or counties during the COVID-19 global pandemic. The data we are analyzing is taken directly from the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard, though we expect to be one day behind the dashboard’s live feeds to allow for quality assurance of the data.DOI: https://doi.org/10.6084/m9.figshare.125529863/7/2022 - Adjusted the rate of active cases calculation in the U.S. to reflect the rates of serious and severe cases due nearly completely dominant Omicron variant.6/24/2020 - Expanded Case Rates discussion to include fix on 6/23 for calculating active cases.6/22/2020 - Added Executive Summary and Subsequent Outbreaks sectionsRevisions on 6/10/2020 based on updated CDC reporting. This affects the estimate of active cases by revising the average duration of cases with hospital stays downward from 30 days to 25 days. The result shifted 76 U.S. counties out of Epidemic to Spreading trend and no change for national level trends.Methodology update on 6/2/2020: This sets the length of the tail of new cases to 6 to a maximum of 14 days, rather than 21 days as determined by the last 1/3 of cases. This was done to align trends and criteria for them with U.S. CDC guidance. The impact is areas transition into Controlled trend sooner for not bearing the burden of new case 15-21 days earlier.Correction on 6/1/2020Discussion of our assertion of an abundance of caution in assigning trends in rural counties added 5/7/2020. Revisions added on 4/30/2020 are highlighted.Revisions added on 4/23/2020 are highlighted.Executive SummaryCOVID-19 Trends is a methodology for characterizing the current trend for places during the COVID-19 global pandemic. Each day we assign one of five trends: Emergent, Spreading, Epidemic, Controlled, or End Stage to geographic areas to geographic areas based on the number of new cases, the number of active cases, the total population, and an algorithm (described below) that contextualize the most recent fourteen days with the overall COVID-19 case history. Currently we analyze the countries of the world and the U.S. Counties. The purpose is to give policymakers, citizens, and analysts a fact-based data driven sense for the direction each place is currently going. When a place has the initial cases, they are assigned Emergent, and if that place controls the rate of new cases, they can move directly to Controlled, and even to End Stage in a short time. However, if the reporting or measures to curtail spread are not adequate and significant numbers of new cases continue, they are assigned to Spreading, and in cases where the spread is clearly uncontrolled, Epidemic trend.We analyze the data reported by Johns Hopkins University to produce the trends, and we report the rates of cases, spikes of new cases, the number of days since the last reported case, and number of deaths. We also make adjustments to the assignments based on population so rural areas are not assigned trends based solely on case rates, which can be quite high relative to local populations.Two key factors are not consistently known or available and should be taken into consideration with the assigned trend. First is the amount of resources, e.g., hospital beds, physicians, etc.that are currently available in each area. Second is the number of recoveries, which are often not tested or reported. On the latter, we provide a probable number of active cases based on CDC guidance for the typical duration of mild to severe cases.Reasons for undertaking this work in March of 2020:The popular online maps and dashboards show counts of confirmed cases, deaths, and recoveries by country or administrative sub-region. Comparing the counts of one country to another can only provide a basis for comparison during the initial stages of the outbreak when counts were low and the number of local outbreaks in each country was low. By late March 2020, countries with small populations were being left out of the mainstream news because it was not easy to recognize they had high per capita rates of cases (Switzerland, Luxembourg, Iceland, etc.). Additionally, comparing countries that have had confirmed COVID-19 cases for high numbers of days to countries where the outbreak occurred recently is also a poor basis for comparison.The graphs of confirmed cases and daily increases in cases were fit into a standard size rectangle, though the Y-axis for one country had a maximum value of 50, and for another country 100,000, which potentially misled people interpreting the slope of the curve. Such misleading circumstances affected comparing large population countries to small population counties or countries with low numbers of cases to China which had a large count of cases in the early part of the outbreak. These challenges for interpreting and comparing these graphs represent work each reader must do based on their experience and ability. Thus, we felt it would be a service to attempt to automate the thought process experts would use when visually analyzing these graphs, particularly the most recent tail of the graph, and provide readers with an a resulting synthesis to characterize the state of the pandemic in that country, state, or county.The lack of reliable data for confirmed recoveries and therefore active cases. Merely subtracting deaths from total cases to arrive at this figure progressively loses accuracy after two weeks. The reason is 81% of cases recover after experiencing mild symptoms in 10 to 14 days. Severe cases are 14% and last 15-30 days (based on average days with symptoms of 11 when admitted to hospital plus 12 days median stay, and plus of one week to include a full range of severely affected people who recover). Critical cases are 5% and last 31-56 days. Sources:U.S. CDC. April 3, 2020 Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19). Accessed online. Initial older guidance was also obtained online. Additionally, many people who recover may not be tested, and many who are, may not be tracked due to privacy laws. Thus, the formula used to compute an estimate of active cases is: Active Cases = 100% of new cases in past 14 days + 19% from past 15-25 days + 5% from past 26-49 days - total deaths. On 3/17/2022, the U.S. calculation was adjusted to: Active Cases = 100% of new cases in past 14 days + 6% from past 15-25 days + 3% from past 26-49 days - total deaths. Sources: https://www.cdc.gov/mmwr/volumes/71/wr/mm7104e4.htm https://covid.cdc.gov/covid-data-tracker/#variant-proportions If a new variant arrives and appears to cause higher rates of serious cases, we will roll back this adjustment. We’ve never been inside a pandemic with the ability to learn of new cases as they are confirmed anywhere in the world. After reviewing epidemiological and pandemic scientific literature, three needs arose. We need to specify which portions of the pandemic lifecycle this map cover. The World Health Organization (WHO) specifies six phases. The source data for this map begins just after the beginning of Phase 5: human to human spread and encompasses Phase 6: pandemic phase. Phase six is only characterized in terms of pre- and post-peak. However, these two phases are after-the-fact analyses and cannot ascertained during the event. Instead, we describe (below) a series of five trends for Phase 6 of the COVID-19 pandemic.Choosing terms to describe the five trends was informed by the scientific literature, particularly the use of epidemic, which signifies uncontrolled spread. The five trends are: Emergent, Spreading, Epidemic, Controlled, and End Stage. Not every locale will experience all five, but all will experience at least three: emergent, controlled, and end stage.This layer presents the current trends for the COVID-19 pandemic by country (or appropriate level). There are five trends:Emergent: Early stages of outbreak. Spreading: Early stages and depending on an administrative area’s capacity, this may represent a manageable rate of spread. Epidemic: Uncontrolled spread. Controlled: Very low levels of new casesEnd Stage: No New cases These trends can be applied at several levels of administration: Local: Ex., City, District or County – a.k.a. Admin level 2State: Ex., State or Province – a.k.a. Admin level 1National: Country – a.k.a. Admin level 0Recommend that at least 100,000 persons be represented by a unit; granted this may not be possible, and then the case rate per 100,000 will become more important.Key Concepts and Basis for Methodology: 10 Total Cases minimum threshold: Empirically, there must be enough cases to constitute an outbreak. Ideally, this would be 5.0 per 100,000, but not every area has a population of 100,000 or more. Ten, or fewer, cases are also relatively less difficult to track and trace to sources. 21 Days of Cases minimum threshold: Empirically based on COVID-19 and would need to be adjusted for any other event. 21 days is also the minimum threshold for analyzing the “tail” of the new cases curve, providing seven cases as the basis for a likely trend (note that 21 days in the tail is preferred). This is the minimum needed to encompass the onset and duration of a normal case (5-7 days plus 10-14 days). Specifically, a median of 5.1 days incubation time, and 11.2 days for 97.5% of cases to incubate. This is also driven by pressure to understand trends and could easily be adjusted to 28 days. Source

  11. T

    World Coronavirus COVID-19 Cases

    • tradingeconomics.com
    csv, excel, json, xml
    Updated Mar 9, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TRADING ECONOMICS (2020). World Coronavirus COVID-19 Cases [Dataset]. https://tradingeconomics.com/world/coronavirus-cases
    Explore at:
    csv, excel, xml, jsonAvailable download formats
    Dataset updated
    Mar 9, 2020
    Dataset authored and provided by
    TRADING ECONOMICS
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jan 4, 2020 - May 17, 2023
    Area covered
    World
    Description

    The World Health Organization reported 766440796 Coronavirus Cases since the epidemic began. In addition, countries reported 6932591 Coronavirus Deaths. This dataset provides - World Coronavirus Cases- actual values, historical data, forecast, chart, statistics, economic calendar and news.

  12. Twitter COVID-19 (Updated every 24hrs)

    • kaggle.com
    zip
    Updated Mar 18, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Benjamín Casazza (2020). Twitter COVID-19 (Updated every 24hrs) [Dataset]. https://www.kaggle.com/uracilo/twitter-covid19
    Explore at:
    zip(113567 bytes)Available download formats
    Dataset updated
    Mar 18, 2020
    Authors
    Benjamín Casazza
    Description

    Dataset

    This dataset was created by Benjamín Casazza

    Contents

  13. C

    China CN: COVID-19: Confirmed Case: New Increase

    • ceicdata.com
    Updated Oct 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2025). China CN: COVID-19: Confirmed Case: New Increase [Dataset]. https://www.ceicdata.com/en/china/covid19-no-of-patient/cn-covid19-confirmed-case-new-increase
    Explore at:
    Dataset updated
    Oct 15, 2025
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 29, 2020 - May 10, 2020
    Area covered
    China
    Description

    China COVID-19: Confirmed Case: New Increase data was reported at 17.000 Person in 10 May 2020. This records an increase from the previous number of 14.000 Person for 09 May 2020. China COVID-19: Confirmed Case: New Increase data is updated daily, averaging 51.000 Person from Jan 2020 (Median) to 10 May 2020, with 112 observations. The data reached an all-time high of 15,152.000 Person in 12 Feb 2020 and a record low of 1.000 Person in 08 May 2020. China COVID-19: Confirmed Case: New Increase data remains active status in CEIC and is reported by National Health Commission. The data is categorized under China Premium Database’s Socio-Demographic – Table CN.GZ: COVID-19: No of Patient.

  14. Z

    INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET

    • data.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta (2024). INTRODUCTION OF COVID-NEWS-US-NNK AND COVID-NEWS-BD-NNK DATASET [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4047647
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    Silicon Orchard Lab, Bangladesh
    Independent University, Bangladesh
    University of Memphis, USA
    Authors
    Nafiz Sadman; Nishat Anjum; Kishor Datta Gupta
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Area covered
    Bangladesh, United States
    Description

    Introduction

    There are several works based on Natural Language Processing on newspaper reports. Mining opinions from headlines [ 1 ] using Standford NLP and SVM by Rameshbhaiet. Al.compared several algorithms on a small and large dataset. Rubinet. al., in their paper [ 2 ], created a mechanism to differentiate fake news from real ones by building a set of characteristics of news according to their types. The purpose was to contribute to the low resource data available for training machine learning algorithms. Doumitet. al.in [ 3 ] have implemented LDA, a topic modeling approach to study bias present in online news media.

    However, there are not many NLP research invested in studying COVID-19. Most applications include classification of chest X-rays and CT-scans to detect presence of pneumonia in lungs [ 4 ], a consequence of the virus. Other research areas include studying the genome sequence of the virus[ 5 ][ 6 ][ 7 ] and replicating its structure to fight and find a vaccine. This research is crucial in battling the pandemic. The few NLP based research publications are sentiment classification of online tweets by Samuel et el [ 8 ] to understand fear persisting in people due to the virus. Similar work has been done using the LSTM network to classify sentiments from online discussion forums by Jelodaret. al.[ 9 ]. NKK dataset is the first study on a comparatively larger dataset of a newspaper report on COVID-19, which contributed to the virus’s awareness to the best of our knowledge.

    2 Data-set Introduction

    2.1 Data Collection

    We accumulated 1000 online newspaper report from United States of America (USA) on COVID-19. The newspaper includes The Washington Post (USA) and StarTribune (USA). We have named it as “Covid-News-USA-NNK”. We also accumulated 50 online newspaper report from Bangladesh on the issue and named it “Covid-News-BD-NNK”. The newspaper includes The Daily Star (BD) and Prothom Alo (BD). All these newspapers are from the top provider and top read in the respective countries. The collection was done manually by 10 human data-collectors of age group 23- with university degrees. This approach was suitable compared to automation to ensure the news were highly relevant to the subject. The newspaper online sites had dynamic content with advertisements in no particular order. Therefore there were high chances of online scrappers to collect inaccurate news reports. One of the challenges while collecting the data is the requirement of subscription. Each newspaper required $1 per subscriptions. Some criteria in collecting the news reports provided as guideline to the human data-collectors were as follows:

    The headline must have one or more words directly or indirectly related to COVID-19.

    The content of each news must have 5 or more keywords directly or indirectly related to COVID-19.

    The genre of the news can be anything as long as it is relevant to the topic. Political, social, economical genres are to be more prioritized.

    Avoid taking duplicate reports.

    Maintain a time frame for the above mentioned newspapers.

    To collect these data we used a google form for USA and BD. We have two human editor to go through each entry to check any spam or troll entry.

    2.2 Data Pre-processing and Statistics

    Some pre-processing steps performed on the newspaper report dataset are as follows:

    Remove hyperlinks.

    Remove non-English alphanumeric characters.

    Remove stop words.

    Lemmatize text.

    While more pre-processing could have been applied, we tried to keep the data as much unchanged as possible since changing sentence structures could result us in valuable information loss. While this was done with help of a script, we also assigned same human collectors to cross check for any presence of the above mentioned criteria.

    The primary data statistics of the two dataset are shown in Table 1 and 2.

    Table 1: Covid-News-USA-NNK data statistics

    No of words per headline

    7 to 20

    No of words per body content

    150 to 2100

    Table 2: Covid-News-BD-NNK data statistics No of words per headline

    10 to 20

    No of words per body content

    100 to 1500

    2.3 Dataset Repository

    We used GitHub as our primary data repository in account name NKK^1. Here, we created two repositories USA-NKK^2 and BD-NNK^3. The dataset is available in both CSV and JSON format. We are regularly updating the CSV files and regenerating JSON using a py script. We provided a python script file for essential operation. We welcome all outside collaboration to enrich the dataset.

    3 Literature Review

    Natural Language Processing (NLP) deals with text (also known as categorical) data in computer science, utilizing numerous diverse methods like one-hot encoding, word embedding, etc., that transform text to machine language, which can be fed to multiple machine learning and deep learning algorithms.

    Some well-known applications of NLP includes fraud detection on online media sites[ 10 ], using authorship attribution in fallback authentication systems[ 11 ], intelligent conversational agents or chatbots[ 12 ] and machine translations used by Google Translate[ 13 ]. While these are all downstream tasks, several exciting developments have been made in the algorithm solely for Natural Language Processing tasks. The two most trending ones are BERT[ 14 ], which uses bidirectional encoder-decoder architecture to create the transformer model, that can do near-perfect classification tasks and next-word predictions for next generations, and GPT-3 models released by OpenAI[ 15 ] that can generate texts almost human-like. However, these are all pre-trained models since they carry huge computation cost. Information Extraction is a generalized concept of retrieving information from a dataset. Information extraction from an image could be retrieving vital feature spaces or targeted portions of an image; information extraction from speech could be retrieving information about names, places, etc[ 16 ]. Information extraction in texts could be identifying named entities and locations or essential data. Topic modeling is a sub-task of NLP and also a process of information extraction. It clusters words and phrases of the same context together into groups. Topic modeling is an unsupervised learning method that gives us a brief idea about a set of text. One commonly used topic modeling is Latent Dirichlet Allocation or LDA[17].

    Keyword extraction is a process of information extraction and sub-task of NLP to extract essential words and phrases from a text. TextRank [ 18 ] is an efficient keyword extraction technique that uses graphs to calculate the weight of each word and pick the words with more weight to it.

    Word clouds are a great visualization technique to understand the overall ’talk of the topic’. The clustered words give us a quick understanding of the content.

    4 Our experiments and Result analysis

    We used the wordcloud library^4 to create the word clouds. Figure 1 and 3 presents the word cloud of Covid-News-USA- NNK dataset by month from February to May. From the figures 1,2,3, we can point few information:

    In February, both the news paper have talked about China and source of the outbreak.

    StarTribune emphasized on Minnesota as the most concerned state. In April, it seemed to have been concerned more.

    Both the newspaper talked about the virus impacting the economy, i.e, bank, elections, administrations, markets.

    Washington Post discussed global issues more than StarTribune.

    StarTribune in February mentioned the first precautionary measurement: wearing masks, and the uncontrollable spread of the virus throughout the nation.

    While both the newspaper mentioned the outbreak in China in February, the weight of the spread in the United States are more highlighted through out March till May, displaying the critical impact caused by the virus.

    We used a script to extract all numbers related to certain keywords like ’Deaths’, ’Infected’, ’Died’ , ’Infections’, ’Quarantined’, Lock-down’, ’Diagnosed’ etc from the news reports and created a number of cases for both the newspaper. Figure 4 shows the statistics of this series. From this extraction technique, we can observe that April was the peak month for the covid cases as it gradually rose from February. Both the newspaper clearly shows us that the rise in covid cases from February to March was slower than the rise from March to April. This is an important indicator of possible recklessness in preparations to battle the virus. However, the steep fall from April to May also shows the positive response against the attack. We used Vader Sentiment Analysis to extract sentiment of the headlines and the body. On average, the sentiments were from -0.5 to -0.9. Vader Sentiment scale ranges from -1(highly negative to 1(highly positive). There were some cases

    where the sentiment scores of the headline and body contradicted each other,i.e., the sentiment of the headline was negative but the sentiment of the body was slightly positive. Overall, sentiment analysis can assist us sort the most concerning (most negative) news from the positive ones, from which we can learn more about the indicators related to COVID-19 and the serious impact caused by it. Moreover, sentiment analysis can also provide us information about how a state or country is reacting to the pandemic. We used PageRank algorithm to extract keywords from headlines as well as the body content. PageRank efficiently highlights important relevant keywords in the text. Some frequently occurring important keywords extracted from both the datasets are: ’China’, Government’, ’Masks’, ’Economy’, ’Crisis’, ’Theft’ , ’Stock market’ , ’Jobs’ , ’Election’, ’Missteps’, ’Health’, ’Response’. Keywords extraction acts as a filter allowing quick searches for indicators in case of locating situations of the economy,

  15. United States COVID-19 Community Levels by County as Originally Posted

    • catalog.data.gov
    Updated Mar 19, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2022). United States COVID-19 Community Levels by County as Originally Posted [Dataset]. https://catalog.data.gov/dataset/united-states-covid-19-community-levels-by-county-as-originally-posted-ebafa
    Explore at:
    Dataset updated
    Mar 19, 2022
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Area covered
    United States
    Description

    This public use dataset has 11 data elements reflecting COVID-19 community levels for all available counties. This dataset contains the same values used to display information available at https://www.cdc.gov/coronavirus/2019-ncov/science/community-levels-county-map.html. CDC looks at the combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days — to determine the COVID-19 community level. The COVID-19 community level is determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge. Using these data, the COVID-19 community level is classified as low, medium , or high. COVID-19 Community Levels can help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals. See https://www.cdc.gov/coronavirus/2019-ncov/science/community-levels.html for more information. Visit CDC’s COVID Data Tracker County View* to learn more about the individual metrics used for CDC’s COVID-19 community level in your county. Please note that county-level data are not available for territories. Go to https://covid.cdc.gov/covid-data-tracker/#county-view. For the most accurate and up-to-date data for any county or state, visit the relevant health department website. *COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

  16. a

    COVID-19 Trends in Each Country-Copy

    • hub.arcgis.com
    Updated Jun 4, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United Nations Population Fund (2020). COVID-19 Trends in Each Country-Copy [Dataset]. https://hub.arcgis.com/maps/1c4a4134d2de4e8cb3b4e4814ba6cb81
    Explore at:
    Dataset updated
    Jun 4, 2020
    Dataset authored and provided by
    United Nations Population Fund
    Area covered
    Description

    COVID-19 Trends MethodologyOur goal is to analyze and present daily updates in the form of recent trends within countries, states, or counties during the COVID-19 global pandemic. The data we are analyzing is taken directly from the Johns Hopkins University Coronavirus COVID-19 Global Cases Dashboard, though we expect to be one day behind the dashboard’s live feeds to allow for quality assurance of the data.Revisions added on 4/23/2020 are highlighted.Revisions added on 4/30/2020 are highlighted.Discussion of our assertion of an abundance of caution in assigning trends in rural counties added 5/7/2020. Correction on 6/1/2020Methodology update on 6/2/2020: This sets the length of the tail of new cases to 6 to a maximum of 14 days, rather than 21 days as determined by the last 1/3 of cases. This was done to align trends and criteria for them with U.S. CDC guidance. The impact is areas transition into Controlled trend sooner for not bearing the burden of new case 15-21 days earlier.Reasons for undertaking this work:The popular online maps and dashboards show counts of confirmed cases, deaths, and recoveries by country or administrative sub-region. Comparing the counts of one country to another can only provide a basis for comparison during the initial stages of the outbreak when counts were low and the number of local outbreaks in each country was low. By late March 2020, countries with small populations were being left out of the mainstream news because it was not easy to recognize they had high per capita rates of cases (Switzerland, Luxembourg, Iceland, etc.). Additionally, comparing countries that have had confirmed COVID-19 cases for high numbers of days to countries where the outbreak occurred recently is also a poor basis for comparison.The graphs of confirmed cases and daily increases in cases were fit into a standard size rectangle, though the Y-axis for one country had a maximum value of 50, and for another country 100,000, which potentially misled people interpreting the slope of the curve. Such misleading circumstances affected comparing large population countries to small population counties or countries with low numbers of cases to China which had a large count of cases in the early part of the outbreak. These challenges for interpreting and comparing these graphs represent work each reader must do based on their experience and ability. Thus, we felt it would be a service to attempt to automate the thought process experts would use when visually analyzing these graphs, particularly the most recent tail of the graph, and provide readers with an a resulting synthesis to characterize the state of the pandemic in that country, state, or county.The lack of reliable data for confirmed recoveries and therefore active cases. Merely subtracting deaths from total cases to arrive at this figure progressively loses accuracy after two weeks. The reason is 81% of cases recover after experiencing mild symptoms in 10 to 14 days. Severe cases are 14% and last 15-30 days (based on average days with symptoms of 11 when admitted to hospital plus 12 days median stay, and plus of one week to include a full range of severely affected people who recover). Critical cases are 5% and last 31-56 days. Sources:U.S. CDC. April 3, 2020 Interim Clinical Guidance for Management of Patients with Confirmed Coronavirus Disease (COVID-19). Accessed online. Initial older guidance was also obtained online. Additionally, many people who recover may not be tested, and many who are, may not be tracked due to privacy laws. Thus, the formula used to compute an estimate of active cases is: Active Cases = 100% of new cases in past 14 days + 19% from past 15-30 days + 5% from past 31-56 days - total deaths.We’ve never been inside a pandemic with the ability to learn of new cases as they are confirmed anywhere in the world. After reviewing epidemiological and pandemic scientific literature, three needs arose. We need to specify which portions of the pandemic lifecycle this map cover. The World Health Organization (WHO) specifies six phases. The source data for this map begins just after the beginning of Phase 5: human to human spread and encompasses Phase 6: pandemic phase. Phase six is only characterized in terms of pre- and post-peak. However, these two phases are after-the-fact analyses and cannot ascertained during the event. Instead, we describe (below) a series of five trends for Phase 6 of the COVID-19 pandemic.Choosing terms to describe the five trends was informed by the scientific literature, particularly the use of epidemic, which signifies uncontrolled spread. The five trends are: Emergent, Spreading, Epidemic, Controlled, and End Stage. Not every locale will experience all five, but all will experience at least three: emergent, controlled, and end stage.This layer presents the current trends for the COVID-19 pandemic by country (or appropriate level). There are five trends:Emergent: Early stages of outbreak. Spreading: Early stages and depending on an administrative area’s capacity, this may represent a manageable rate of spread. Epidemic: Uncontrolled spread. Controlled: Very low levels of new casesEnd Stage: No New cases These trends can be applied at several levels of administration: Local: Ex., City, District or County – a.k.a. Admin level 2State: Ex., State or Province – a.k.a. Admin level 1National: Country – a.k.a. Admin level 0Recommend that at least 100,000 persons be represented by a unit; granted this may not be possible, and then the case rate per 100,000 will become more important.Key Concepts and Basis for Methodology: 10 Total Cases minimum threshold: Empirically, there must be enough cases to constitute an outbreak. Ideally, this would be 5.0 per 100,000, but not every area has a population of 100,000 or more. Ten, or fewer, cases are also relatively less difficult to track and trace to sources. 21 Days of Cases minimum threshold: Empirically based on COVID-19 and would need to be adjusted for any other event. 21 days is also the minimum threshold for analyzing the “tail” of the new cases curve, providing seven cases as the basis for a likely trend (note that 21 days in the tail is preferred). This is the minimum needed to encompass the onset and duration of a normal case (5-7 days plus 10-14 days). Specifically, a median of 5.1 days incubation time, and 11.2 days for 97.5% of cases to incubate. This is also driven by pressure to understand trends and could easily be adjusted to 28 days. Source used as basis:Stephen A. Lauer, MS, PhD *; Kyra H. Grantz, BA *; Qifang Bi, MHS; Forrest K. Jones, MPH; Qulu Zheng, MHS; Hannah R. Meredith, PhD; Andrew S. Azman, PhD; Nicholas G. Reich, PhD; Justin Lessler, PhD. 2020. The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application. Annals of Internal Medicine DOI: 10.7326/M20-0504.New Cases per Day (NCD) = Measures the daily spread of COVID-19. This is the basis for all rates. Back-casting revisions: In the Johns Hopkins’ data, the structure is to provide the cumulative number of cases per day, which presumes an ever-increasing sequence of numbers, e.g., 0,0,1,1,2,5,7,7,7, etc. However, revisions do occur and would look like, 0,0,1,1,2,5,7,7,6. To accommodate this, we revised the lists to eliminate decreases, which make this list look like, 0,0,1,1,2,5,6,6,6.Reporting Interval: In the early weeks, Johns Hopkins' data provided reporting every day regardless of change. In late April, this changed allowing for days to be skipped if no new data was available. The day was still included, but the value of total cases was set to Null. The processing therefore was updated to include tracking of the spacing between intervals with valid values.100 News Cases in a day as a spike threshold: Empirically, this is based on COVID-19’s rate of spread, or r0 of ~2.5, which indicates each case will infect between two and three other people. There is a point at which each administrative area’s capacity will not have the resources to trace and account for all contacts of each patient. Thus, this is an indicator of uncontrolled or epidemic trend. Spiking activity in combination with the rate of new cases is the basis for determining whether an area has a spreading or epidemic trend (see below). Source used as basis:World Health Organization (WHO). 16-24 Feb 2020. Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19). Obtained online.Mean of Recent Tail of NCD = Empirical, and a COVID-19-specific basis for establishing a recent trend. The recent mean of NCD is taken from the most recent fourteen days. A minimum of 21 days of cases is required for analysis but cannot be considered reliable. Thus, a preference of 42 days of cases ensures much higher reliability. This analysis is not explanatory and thus, merely represents a likely trend. The tail is analyzed for the following:Most recent 2 days: In terms of likelihood, this does not mean much, but can indicate a reason for hope and a basis to share positive change that is not yet a trend. There are two worthwhile indicators:Last 2 days count of new cases is less than any in either the past five or 14 days. Past 2 days has only one or fewer new cases – this is an extremely positive outcome if the rate of testing has continued at the same rate as the previous 5 days or 14 days. Most recent 5 days: In terms of likelihood, this is more meaningful, as it does represent at short-term trend. There are five worthwhile indicators:Past five days is greater than past 2 days and past 14 days indicates the potential of the past 2 days being an aberration. Past five days is greater than past 14 days and less than past 2 days indicates slight positive trend, but likely still within peak trend time frame.Past five days is less than the past 14 days. This means a downward trend. This would be an

  17. Novel Covid-19 Dataset

    • kaggle.com
    Updated Sep 18, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GHOST5612 (2025). Novel Covid-19 Dataset [Dataset]. https://www.kaggle.com/datasets/ghost5612/novel-covid-19-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 18, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    GHOST5612
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Context:

    From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.

    So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.

    Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.

    Edited:

    Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.

    Content

    2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC

    This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.

    The data is available from 22 Jan, 2020.

    Here’s a polished version suitable for a professional Kaggle dataset description:

    Dataset Description

    This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.

    Files and Columns

    1. covid_19_data.csv (Main File)

    This is the primary dataset and contains aggregated COVID-19 statistics by location and date.

    • Sno – Serial number of the record
    • ObservationDate – Date of the observation (MM/DD/YYYY)
    • Province/State – Province or state of the observation (may be missing for some entries)
    • Country/Region – Country of the observation
    • Last Update – Timestamp (UTC) when the record was last updated (not standardized, requires cleaning before use)
    • Confirmed – Cumulative number of confirmed cases on that date
    • Deaths – Cumulative number of deaths on that date
    • Recovered – Cumulative number of recoveries on that date

    2. 2019_ncov_data.csv (Legacy File)

    This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.

    3. COVID_open_line_list_data.csv

    This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.

    4. COVID19_line_list_data.csv

    Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.

    ✅ Use covid_19_data.csv for up-to-date aggregated global trends.

    ✅ Use the line list datasets for detailed, individual-level case analysis.

    Country level datasets:

    If you are interested in knowing country level data, please refer to the following Kaggle datasets:

    India - https://www.kaggle.com/sudalairajkumar/covid19-in-india

    South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset

    Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy

    Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil

    USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa

    Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland

    Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases

    Acknowledgements :

    Johns Hopkins University for making the data available for educational and academic research purposes

    MoBS lab - https://www.mobs-lab.org/2019ncov.html

    World Health Organization (WHO): https://www.who.int/

    DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.

    BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/

    National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml

    China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm

    Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html

    Macau Government: https://www.ssm.gov.mo/portal/

    Taiwan CDC: https://sites.google....

  18. United States COVID-19 Community Levels by County

    • data.virginia.gov
    • healthdata.gov
    • +2more
    csv, json, rdf, xsl
    Updated Feb 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). United States COVID-19 Community Levels by County [Dataset]. https://data.virginia.gov/dataset/united-states-covid-19-community-levels-by-county
    Explore at:
    xsl, csv, json, rdfAvailable download formats
    Dataset updated
    Feb 23, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Area covered
    United States
    Description

    Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.

    This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.

    The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.

    Using these data, the COVID-19 community level was classified as low, medium, or high.

    COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.

    For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.

    Archived Data Notes:

    This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.

    March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.

    March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.

    March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.

    March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.

    March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).

    March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.

    April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.

    April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials t

  19. o

    Status of COVID-19 cases in Ontario

    • data.ontario.ca
    • ouvert.canada.ca
    • +1more
    csv, xlsx
    Updated Dec 13, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Health (2024). Status of COVID-19 cases in Ontario [Dataset]. https://data.ontario.ca/en/dataset/status-of-covid-19-cases-in-ontario
    Explore at:
    csv(33820), csv(133498), xlsx(19387), csv(162260)Available download formats
    Dataset updated
    Dec 13, 2024
    Dataset authored and provided by
    Health
    License

    https://www.ontario.ca/page/open-government-licence-ontariohttps://www.ontario.ca/page/open-government-licence-ontario

    Time period covered
    Nov 14, 2024
    Area covered
    Ontario
    Description

    Status of COVID-19 cases in Ontario

    This dataset compiles daily snapshots of publicly reported data on 2019 Novel Coronavirus (COVID-19) testing in Ontario.

    Learn how the Government of Ontario is helping to keep Ontarians safe during the 2019 Novel Coronavirus outbreak.

    Effective April 13, 2023, this dataset will be discontinued. The public can continue to access the data within this dataset in the following locations updated weekly on the Ontario Data Catalogue:

    For information on Long-Term Care Home COVID-19 Data, please visit: Long-Term Care Home COVID-19 Data.

    Data includes:

    • reporting date
    • daily tests completed
    • total tests completed
    • test outcomes
    • total case outcomes (resolutions and deaths)
    • current tests under investigation
    • current hospitalizations
      • current patients in Intensive Care Units (ICUs) due to COVID-related critical Illness
      • current patients in Intensive Care Units (ICUs) testing positive for COVID-19
      • current patients in Intensive Care Units (ICUs) no longer testing positive for COVID-19
      • current patients in Intensive Care Units (ICUs) on ventilators due to COVID-related critical illness
      • current patients in Intensive Care Units (ICUs) on ventilators testing positive for COVID-19
      • current patients in Intensive Care Units (ICUs) on ventilators no longer testing positive for COVID-19
    • Long-Term Care (LTC) resident and worker COVID-19 case and death totals
    • Variants of Concern case totals
    • number of new deaths reported (occurred in the last month)
    • number of historical deaths reported (occurred more than one month ago)
    • change in number of cases from previous day by Public Health Unit (PHU).

    This dataset is subject to change. Please review the daily epidemiologic summaries for information on variables, methodology, and technical considerations.

    Cumulative Deaths

    **Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool **

    The methodology used to count COVID-19 deaths has changed to exclude deaths not caused by COVID. This impacts data captured in the columns “Deaths”, “Deaths_Data_Cleaning” and “newly_reported_deaths” starting with data for March 11, 2022. A new column has been added to the file “Deaths_New_Methodology” which represents the methodological change.

    The method used to count COVID-19 deaths has changed, effective December 1, 2022. Prior to December 1, 2022, deaths were counted based on the date the death was updated in the public health unit’s system. Going forward, deaths are counted on the date they occurred.

    On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023. A small number of COVID deaths (less than 20) do not have recorded death date and will be excluded from this file.

    CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags.

    Related dataset(s)

    • Confirmed positive cases of COVID-19 in Ontario
  20. Data from: COVID-19 Case Surveillance Public Use Data with Geography

    • data.virginia.gov
    • healthdata.gov
    • +5more
    csv, json, rdf, xsl
    Updated Feb 23, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (2025). COVID-19 Case Surveillance Public Use Data with Geography [Dataset]. https://data.virginia.gov/dataset/covid-19-case-surveillance-public-use-data-with-geography
    Explore at:
    json, xsl, rdf, csvAvailable download formats
    Dataset updated
    Feb 23, 2025
    Dataset provided by
    Centers for Disease Control and Preventionhttp://www.cdc.gov/
    Description

    Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

    Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

    This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors.

    Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 33 data element restricted access dataset.

    The following apply to the public use datasets and the restricted access dataset:

    Overview

    The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (<a href="https://cdn.ymaws.com/www.cste.org/resource/resmgr/ps/positionstatement2020/Interim-20-ID-01_COVID

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html

Coronavirus (Covid-19) Data in the United States

Explore at:
Dataset provided by
New York Times
Description

The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.

Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.

We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.

The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.

Search
Clear search
Close search
Google apps
Main menu