Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.
The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.
The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.
Facebook
Twitterhttps://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description Source data: https://www.chicago.gov/city/en/sites/covid-19/home/latest-data.html.
Only Chicago residents are included based on the home ZIP Code as provided by the medical provider. If a ZIP was missing or was not valid, it is displayed as "Unknown".
Cases with a positive molecular (PCR) or antigen test are included in this dataset. Cases are counted based on the week the test specimen was collected. For privacy reasons, until a ZIP Code reaches five cumulative cases, both the weekly and cumulative case counts will be blank. Therefore, summing the “Cases - Weekly” column is not a reliable way to determine case totals. Deaths are those that have occurred among cases based on the week of death.
For tests, each test is counted once, based on the week the test specimen was collected. Tests performed prior to 3/1/2020 are not included. Test counts include multiple tests for the same person (a change made on 10/29/2020). PCR and antigen tests reported to Chicago Department of Public Health (CDPH) through electronic lab reporting are included. Electronic lab reporting has taken time to onboard and testing availability has shifted over time, so these counts are likely an underestimate of community infection.
The “Percent Tested Positive” columns are calculated by dividing the number of positive tests by the number of total tests . Because of the data limitations for the Tests columns, such as persons being tested multiple times as a requirement for employment, these percentages may vary in either direction from the actual disease prevalence in the ZIP Code.
All data are provisional and subject to change. Information is updated as additional details are received.
To compare ZIP Codes to Chicago Community Areas, please see http://data.cmap.illinois.gov/opendata/uploads/CKAN/NONCENSUS/ADMINISTRATIVE_POLITICAL_BOUNDARIES/CCAzip.pdf. Both ZIP Codes and Community Areas are also geographic datasets on this data portal.
Data Source: Illinois National Electronic Disease Surveillance System, Cook County Medical Examiner’s Office, Illinois Vital Records, American Community Survey (2018)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This comprehensive dataset provides global information on both COVID-19 related deaths and vaccinations from January 5, 2020, to August 4, 2024. It consists of two parts: one tracking COVID-19 cases, deaths, and population statistics, and another monitoring vaccination progress worldwide. This dataset allows for an in-depth analysis of the pandemic’s spread, fatality rates, and the effectiveness of vaccination campaigns across various countries and regions.
Researchers and data analysts can use this dataset to study trends, compare countries, and evaluate public health responses throughout the COVID-19 pandemic.
Analyzing death rates relative to confirmed cases. Examining the percentage of population affected by COVID-19. Evaluating vaccination rates and coverage across different regions. This dataset is ideal for data exploration, statistical analysis, and visualizations related to the COVID-19 pandemic.
Facebook
TwitterThis case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors. Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 32 data element restricted access dataset. The following apply to the public use datasets and the restricted access dataset: - Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf. - Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers. - Some data are suppressed to protect individual privacy. - Datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the previously updated datasets. This 14-day lag allows case reporting to be stabilized and ensure that time-dependent outcome data are accurately captured. - Datasets are updated monthly. - Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy. - For more information about data collection and reporting, please see wwwn.cdc.gov/nndss/data-collection.html. - For more information about the COVID-19 case surveillance data, please see www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html. Overview The COVID-19 case surveillance database includes patient-level data reported by U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as "immediately notifiable, urgent (within 24 hours)" by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020 to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data collected by jurisdictions are shared voluntarily with CDC. For more information, visit: wwwn.cdc.gov/nndss/conditions/coronavirus-disease-2019-covid-19/case-definition/2020/08/05/. COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by pu
Facebook
TwitterNotice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
The data contained in the table describes COVID-19 in Canada in terms of number of cases and deaths at the provincial and national levels from January 31, 2020 to present time. It also describes the number of tests performed and the number of people recovered. The values displayed in the table are provided by the Public Health Infobase, managed by the Health Promotion and Chronic Disease Prevention Branch (HPCDPB) of the Public Health Agency of Canada (PHAC). The values are updated daily.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The current dataset contains 237M Tweet IDs for Twitter posts that mentioned "COVID" as a keyword or as part of a hashtag (e.g., COVID-19, COVID19) between March and July of 2020. Sampling Method: hourly requests sent to Twitter Search API using Social Feed Manager, an open source software that harvests social media data and related content from Twitter and other platforms. NOTE: 1) In accordance with Twitter API Terms, only Tweet IDs are provided as part of this dataset. 2) To recollect tweets based on the list of Tweet IDs contained in these datasets, you will need to use tweet 'rehydration' programs like Hydrator (https://github.com/DocNow/hydrator) or Python library Twarc (https://github.com/DocNow/twarc). 3) This dataset, like most datasets collected via the Twitter Search API, is a sample of the available tweets on this topic and is not meant to be comprehensive. Some COVID-related tweets might not be included in the dataset either because the tweets were collected using a standardized but intermittent (hourly) sampling protocol or because tweets used hashtags/keywords other than COVID (e.g., Coronavirus or #nCoV). 4) To broaden this sample, consider comparing/merging this dataset with other COVID-19 related public datasets such as: https://github.com/thepanacealab/covid19_twitter https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset https://github.com/echen102/COVID-19-TweetIDs
Facebook
TwitterCollection of scholarly articles about COVID-19 and coronavirus family of viruses for use by global research community. Dataset is updated on weekly basis.
Facebook
TwitterNote: This dataset is no longer being updated as of June 2, 2025.
This dataset contains numbers of COVID-19 outbreaks and associated cases, categorized by setting, reported to CDPH since January 1, 2021.
AB 685 (Chapter 84, Statutes of 2020) and the Cal/OSHA COVID-19 Emergency Temporary Standards (Title 8, Subchapter 7, Sections 3205-3205.4) required non-healthcare employers in California to report workplace COVID-19 outbreaks to their local health department (LHD) between January 1, 2021 – December 31, 2022. Beginning January 1, 2023, non-healthcare employer reporting of COVID-19 outbreaks to local health departments is voluntary, unless a local order is in place. More recent data collected without mandated reporting may therefore be less representative of all outbreaks that have occurred, compared to earlier data collected during mandated reporting. Licensed health facilities continue to be mandated to report outbreaks to LHDs.
LHDs report confirmed outbreaks to the California Department of Public Health (CDPH) via the California Reportable Disease Information Exchange (CalREDIE), the California Connected (CalCONNECT) system, or other established processes. Data are compiled and categorized by setting by CDPH. Settings are categorized by U.S. Census industry codes. Total outbreaks and cases are included for individual industries as well as for broader industrial sectors.
The first dataset includes numbers of outbreaks in each setting by month of onset, for outbreaks reported to CDPH since January 1, 2021. This dataset includes some outbreaks with onset prior to January 1 that were reported to CDPH after January 1; these outbreaks are denoted with month of onset “Before Jan 2021.” The second dataset includes cumulative numbers of COVID-19 outbreaks with onset after January 1, 2021, categorized by setting. Due to reporting delays, the reported numbers may not reflect all outbreaks that have occurred as of the reporting date; additional outbreaks may have occurred that have not yet been reported to CDPH.
While many of these settings are workplaces, cases may have occurred among workers, other community members who visited the setting, or both. Accordingly, these data do not distinguish between outbreaks involving only workers, outbreaks involving only residents or patrons, or outbreaks involving both.
Several additional data limitations should be kept in mind:
Outbreaks are classified as “Insufficient information” for outbreaks where not enough information was available for CDPH to assign an industry code.
Some sectors, particularly congregate residential settings, may have increased testing and therefore increased likelihood of outbreak recognition and reporting. As a result, in congregate residential settings, the number of outbreak-associated cases may be more accurate.
However, in most settings, outbreak and case counts are likely underestimates. For most cases, it is not possible to identify the source of exposure, as many cases have multiple possible exposures.
Because some settings have been at times been closed or open with capacity restrictions, numbers of outbreak reports in those settings do not reflect COVID-19 transmission risk.
The number of outbreaks in different settings will depend on the number of different workplaces in each setting. More outbreaks would be expected in settings with many workplaces compared to settings with few workplaces.
Facebook
TwitterCovid-19 Daily metrics at the county level As of 6/1/2023, this data set is no longer being updated. The COVID-19 Data Report is posted on the Open Data Portal every day at 3pm. The report uses data from multiple sources, including external partners; if data from external partners are not received by 3pm, they are not available for inclusion in the report and will not be displayed. Data that are received after 3pm will still be incorporated and published in the next report update. The cumulative number of COVID-19 cases (cumulative_cases) includes all cases of COVID-19 that have ever been reported to DPH. The cumulative number of COVID_19 cases in the last 7 days (cases_7days) only includes cases where the specimen collection date is within the past 7 days. While most cases are reported to DPH within 48 hours of specimen collection, there are a small number of cases that routinely are delayed, and will have specimen collection dates that fall outside of the rolling 7 day reporting window. Additionally, reporting entities may submit correction files to contribute historic data during initial onboarding or to address data quality issues; while this is rare, these correction files may cause a large amount of data from outside of the current reporting window to be uploaded in a single day; this would result in the change in cumulative_cases being much larger than the value of cases_7days. On June 4, 2020, the US Department of Health and Human Services issued guidance requiring the reporting of positive and negative test results for SARS-CoV-2; this guidance expired with the end of the federal PHE on 5/11/2023, and negative SARS-CoV-2 results were removed from the List of Reportable Laboratory Findings. DPH will no longer be reporting metrics that were dependent on the collection of negative test results, specifically total tests performed or percent positivity. Positive antigen and PCR/NAAT results will continue to be reportable.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The rapid outbreak of COVID-19 due to the novel coronavirus SARS-COV-2 is the biggest issue faced by mankind today. It is important to detect the positive cases as early as possible to prevent the further spread of this pandemic.
Facebook
TwitterThe Nursing Home COVID-19 Public File from the Centers for Medicare & Medicaid Services, filtered for Connecticut. View the full dataset and detailed metadata here. The Nursing Home COVID-19 Public File includes data reported by nursing homes to the CDC’s National Healthcare Safety Network (NHSN) system COVID-19 Long Term Care Facility Module, including Resident Impact, Facility Capacity, Staff & Personnel, and Supplies & Personal Protective Equipment, and Ventilator Capacity and Supplies Data Elements.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - WHO
People can catch COVID-19 from others who have the virus. This has been spreading rapidly around the world and Italy is one of the most affected country.
On March 8, 2020 - Italy’s prime minister announced a sweeping coronavirus quarantine early Sunday, restricting the movements of about a quarter of the country’s population in a bid to limit contagions at the epicenter of Europe’s outbreak. - TIME
This dataset is from https://github.com/pcm-dpc/COVID-19 collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale
This dataset has two files
covid19_italy_province.csv - Province level data of COVID-19 casescovid_italy_region.csv - Region level data of COVID-19 casesData is collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale and is uploaded into this github repo.
Dashboard on the data can be seen here. Picture courtesy is from the dashboard.
Insights on * Spread to various regions over time * Try to predict the spread of COVID-19 ahead of time to take preventive measures
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
This dataset was created by Saikat Sinha Ray
Released under Apache 2.0
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
After over two years of public reporting, the State Profile Report will no longer be produced and distributed after February 2023. The final release was on February 23, 2023. We want to thank everyone who contributed to the design, production, and review of this report and we hope that it provided insight into the data trends throughout the COVID-19 pandemic. Data about COVID-19 will continue to be updated at CDC’s COVID Data Tracker.
The State Profile Report (SPR) is generated by the Data Strategy and Execution Workgroup in the Joint Coordination Cell, in collaboration with the White House. It is managed by an interagency team with representatives from multiple agencies and offices (including the United States Department of Health and Human Services (HHS), the Centers for Disease Control and Prevention, the HHS Assistant Secretary for Preparedness and Response, and the Indian Health Service). The SPR provides easily interpretable information on key indicators for each state, down to the county level.
It is a weekly snapshot in time that:
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Reporting of Aggregate Case and Death Count data was discontinued May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.This archived public use dataset has 11 data elements reflecting United States COVID-19 community levels for all available counties.The COVID-19 community levels were developed using a combination of three metrics — new COVID-19 admissions per 100,000 population in the past 7 days, the percent of staffed inpatient beds occupied by COVID-19 patients, and total new COVID-19 cases per 100,000 population in the past 7 days. The COVID-19 community level was determined by the higher of the new admissions and inpatient beds metrics, based on the current level of new cases per 100,000 population in the past 7 days. New COVID-19 admissions and the percent of staffed inpatient beds occupied represent the current potential for strain on the health system. Data on new cases acts as an early warning indicator of potential increases in health system strain in the event of a COVID-19 surge.Using these data, the COVID-19 community level was classified as low, medium, or high.COVID-19 Community Levels were used to help communities and individuals make decisions based on their local context and their unique needs. Community vaccination coverage and other local information, like early alerts from surveillance, such as through wastewater or the number of emergency department visits for COVID-19, when available, can also inform decision making for health officials and individuals.For the most accurate and up-to-date data for any county or state, visit the relevant health department website. COVID Data Tracker may display data that differ from state and local websites. This can be due to differences in how data were collected, how metrics were calculated, or the timing of web updates.Archived Data Notes:This dataset was renamed from "United States COVID-19 Community Levels by County as Originally Posted" to "United States COVID-19 Community Levels by County" on March 31, 2022.March 31, 2022: Column name for county population was changed to “county_population”. No change was made to the data points previous released.March 31, 2022: New column, “health_service_area_population”, was added to the dataset to denote the total population in the designated Health Service Area based on 2019 Census estimate.March 31, 2022: FIPS codes for territories American Samoa, Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands were re-formatted to 5-digit numeric for records released on 3/3/2022 to be consistent with other records in the dataset.March 31, 2022: Changes were made to the text fields in variables “county”, “state”, and “health_service_area” so the formats are consistent across releases.March 31, 2022: The “%” sign was removed from the text field in column “covid_inpatient_bed_utilization”. No change was made to the data. As indicated in the column description, values in this column represent the percentage of staffed inpatient beds occupied by COVID-19 patients (7-day average).March 31, 2022: Data values for columns, “county_population”, “health_service_area_number”, and “health_service_area” were backfilled for records released on 2/24/2022. These columns were added since the week of 3/3/2022, thus the values were previously missing for records released the week prior.April 7, 2022: Updates made to data released on 3/24/2022 for Guam, Commonwealth of the Northern Mariana Islands, and United States Virgin Islands to correct a data mapping error.April 21, 2022: COVID-19 Community Level (CCL) data released for counties in Nebraska for the week of April 21, 2022 have 3 counties identified in the high category and 37 in the medium category. CDC has been working with state officials to verify the data submitted, as other data systems are not providing alerts for substantial increases in disease transmission or severity in the state.May 26, 2022: COVID-19 Community Level (CCL) data released for McCracken County, KY for the week of May 5, 2022 have been updated to correct a data processing error. McCracken County, KY should have appeared in the low community level category during the week of May 5, 2022. This correction is reflect
Facebook
TwitterThis data is synced hourly from https://github.com/CSSEGISandData/COVID-19. All credit is to them.
Latest Confirmed Cases
I have also added confirmed_pivot.csv which gives a slightly more workable view of the data. Extra columns/day makes things difficult.
#
Facebook
TwitterDaily count of NYC residents who tested positive for SARS-CoV-2, who were hospitalized with COVID-19, and deaths among COVID-19 patients. Note that this dataset currently pulls from https://raw.githubusercontent.com/nychealth/coronavirus-data/master/trends/data-by-day.csv on a daily basis.
Facebook
Twitterhttps://www.usa.gov/government-workshttps://www.usa.gov/government-works
After October 13, 2022, this dataset will no longer be updated as the related CDC COVID Data Tracker site was retired on October 13, 2022.
This dataset contains historical trends in vaccinations and cases by age group, at the US national level. Data is stratified by at least one dose and fully vaccinated. Data also represents all vaccine partners including jurisdictional partner clinics, retail pharmacies, long-term care facilities, dialysis centers, Federal Emergency Management Agency and Health Resources and Services Administration partner sites, and federal entity facilities.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.
The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.
The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.