100+ datasets found

f
Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jun 10, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi (2021). Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000850298
Explore at:
Dataset updated
Jun 10, 2021
Authors
Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi
Description
The complexity of COVID-19 and variations in control measures and containment efforts in different countries have caused difficulties in the prediction and modeling of the COVID-19 pandemic. We attempted to predict the scale of the latter half of the pandemic based on real data using the ratio between the early and latter halves from countries where the pandemic is largely over. We collected daily pandemic data from China, South Korea, and Switzerland and subtracted the ratio of pandemic days before and after the disease apex day of COVID-19. We obtained the ratio of pandemic data and created multiple regression models for the relationship between before and after the apex day. We then tested our models using data from the first wave of the disease from 14 countries in Europe and the US. We then tested the models using data from these countries from the entire pandemic up to March 30, 2021. Results indicate that the actual number of cases from these countries during the first wave mostly fall in the predicted ranges of liniar regression, excepting Spain and Russia. Similarly, the actual deaths in these countries mostly fall into the range of predicted data. Using the accumulated data up to the day of apex and total accumulated data up to March 30, 2021, the data of case numbers in these countries are falling into the range of predicted data, except for data from Brazil. The actual number of deaths in all the countries are at or below the predicted data. In conclusion, a linear regression model built with real data from countries or regions from early pandemics can predict pandemic scales of the countries where the pandemics occur late. Such a prediction with a high degree of accuracy provides valuable information for governments and the public.
U
United States Excess Death excl COVID: Predicted: Single Excess Est:...
ceicdata.com
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-single-excess-est-massachusetts
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 1, 2023 - Sep 16, 2023
Area covered
United States
Variables measured
Vital Statistics
Description
United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts data was reported at 0.000 Number in 16 Sep 2023. This stayed constant from the previous number of 0.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts data is updated weekly, averaging 0.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 209.000 Number in 13 Jan 2018 and a record low of 0.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).
COVID-19 Data
kaggle.com
zip
Updated Nov 21, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Umut Toygar Göz (2025). COVID-19 Data [Dataset]. https://www.kaggle.com/datasets/umuttoygargoz/covid19-data
Explore at:
zip(8484157 bytes)Available download formats
Dataset updated
Nov 21, 2025
Authors
Umut Toygar Göz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Dataset

This dataset was created by Umut Toygar Göz

Released under Attribution 4.0 International (CC BY 4.0)

Contents
COVID-19 Dataset
kaggle.com
zip
Updated Nov 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Meir Nizri (2022). COVID-19 Dataset [Dataset]. https://www.kaggle.com/datasets/meirnizri/covid19-dataset
Explore at:
zip(4890659 bytes)Available download formats
Dataset updated
Nov 13, 2022
Authors
Meir Nizri
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.

The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.

content

The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.

sex: 1 for female and 2 for male.

age: of the patient.

classification: covid test findings. Values 1-3 mean that the patient was diagnosed with covid in different degrees. 4 or higher means that the patient is not a carrier of covid or that the test is inconclusive.

patient type: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.

pneumonia: whether the patient already have air sacs inflammation or not.

pregnancy: whether the patient is pregnant or not.

diabetes: whether the patient has diabetes or not.

copd: Indicates whether the patient has Chronic obstructive pulmonary disease or not.

asthma: whether the patient has asthma or not.

inmsupr: whether the patient is immunosuppressed or not.

hypertension: whether the patient has hypertension or not.

cardiovascular: whether the patient has heart or blood vessels related disease.

renal chronic: whether the patient has chronic renal disease or not.

other disease: whether the patient has other disease or not.

obesity: whether the patient is obese or not.

tobacco: whether the patient is a tobacco user.

usmr: Indicates whether the patient treated medical units of the first, second or third level.

medical unit: type of institution of the National Health System that provided the care.

intubed: whether the patient was connected to the ventilator.

icu: Indicates whether the patient had been admitted to an Intensive Care Unit.

date died: If the patient died indicate the date of death, and 9999-99-99 otherwise.
U
United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming
ceicdata.com
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-single-estimate-wyoming
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 1, 2023 - Sep 16, 2023
Area covered
United States
Variables measured
Vital Statistics
Description
United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming data was reported at 0.000 Number in 16 Sep 2023. This stayed constant from the previous number of 0.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming data is updated weekly, averaging 2.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 51.000 Number in 04 Jan 2020 and a record low of 0.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).
U
United States Excess Death excl COVID: Predicted: Single Estimate: Maine
ceicdata.com
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Single Estimate: Maine [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-single-estimate-maine
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 1, 2023 - Sep 16, 2023
Area covered
United States
Variables measured
Vital Statistics
Description
United States Excess Death excl COVID: Predicted: Single Estimate: Maine data was reported at 0.000 Number in 16 Sep 2023. This stayed constant from the previous number of 0.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Maine data is updated weekly, averaging 0.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 54.000 Number in 06 Nov 2021 and a record low of 0.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Maine data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).
clinical lab parameters covid
kaggle.com
zip
Updated Nov 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Paul Larmuseau (2020). clinical lab parameters covid [Dataset]. https://www.kaggle.com/plarmuseau/forecast-covid-death
Explore at:
zip(3047780 bytes)Available download formats
Dataset updated
Nov 16, 2020
Authors
Paul Larmuseau
Description
Context

***Is there a decision tree for covid19 possible with these datasets ***validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19

https://github.com/HAIRLAB/Pre_Surv_COVID_19/blob/master/response/EDA.ipynb The sudden increase of COVID-19 cases is putting a high pressure on health-care services worldwide. At the current stage, fast, accurate and early clinical assessment of the disease severity is vital. To support decision making and logistical planning in healthcare systems, this study leverages a database of blood samples from 485 infected patients in the region of Wuhan, China to identify crucial predictive biomarkers of disease mortality. For this purpose, machine learning tools selected three biomarkers that predict the mortality of individual patients with more than 90% accuracy: lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP). In particular, relatively high levels of LDH alone seem to play a crucial role in distinguishing the vast majority of cases that require immediate medical attention. This finding is consistent with current medical knowledge that high LDH levels are associated with tissue breakdown occurring in various diseases, including pulmonary disorders such as pneumonia. Overall, this paper suggests a simple and operable decision rule to quickly predict patients at the highest risk, allowing them to be prioritised and potentially reducing the mortality rate.
COVID-19 State Data
kaggle.com
zip
Updated Nov 3, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Night Ranger (2020). COVID-19 State Data [Dataset]. https://www.kaggle.com/nightranger77/covid19-state-data
Explore at:
zip(4501 bytes)Available download formats
Dataset updated
Nov 3, 2020
Authors
Night Ranger
Description
This dataset is a per-state amalgamation of demographic, public health and other relevant predictors for COVID-19.

Deaths, Infections and Tests by State

The COVID Tracking Project: https://covidtracking.com/data/api

Used positive, death and totalTestResults from the API for, respectively, Infected, Deaths and Tested in this dataset. Please read the documentation of the API for more context on those columns

Predictor Data and Sources

Population (2020)

Density is people per meter squared https://worldpopulationreview.com/states/

ICU Beds and Age 60+

https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/

GDP

https://worldpopulationreview.com/states/gdp-by-state/

Income per capita (2018)

https://worldpopulationreview.com/states/per-capita-income-by-state/

Gini

https://en.wikipedia.org/wiki/List_of_U.S._states_by_Gini_coefficient

Unemployment (2020)

Rates from Feb 2020 and are percentage of labor force
https://www.bls.gov/web/laus/laumstrk.htm

Sex (2017)

Ratio is Male / Female
https://www.kff.org/other/state-indicator/distribution-by-gender/

Smoking Percentage (2020)

https://worldpopulationreview.com/states/smoking-rates-by-state/

Influenza and Pneumonia Death Rate (2018)

Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/flu_pneumonia_mortality/flu_pneumonia.htm

Chronic Lower Respiratory Disease Death Rate (2018)

Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/lung_disease_mortality/lung_disease.htm

Active Physicians (2019)

https://www.kff.org/other/state-indicator/total-active-physicians/

Hospitals (2018)

https://www.kff.org/other/state-indicator/total-hospitals

Health spending per capita

Includes spending for all health care services and products by state of residence. Hospital spending is included and reflects the total net revenue. Costs such as insurance, administration, research, and construction expenses are not included.
https://www.kff.org/other/state-indicator/avg-annual-growth-per-capita/

Pollution (2019)

Pollution: Average exposure of the general public to particulate matter of 2.5 microns or less (PM2.5) measured in micrograms per cubic meter (3-year estimate)
https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL

Medium and Large Airports

For each state, number of medium and large airports https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_the_United_States

Temperature (2019)

Note that FL was incorrect in the table, but is corrected in the Hottest States paragraph
https://worldpopulationreview.com/states/average-temperatures-by-state/
District of Columbia temperature computed as the average of Maryland and Virginia

Urbanization (2010)

Urbanization as a percentage of the population https://www.icip.iastate.edu/tables/population/urban-pct-states

Age Groups (2018)

https://www.kff.org/other/state-indicator/distribution-by-age/

School Closure Dates

Schools that haven't closed are marked NaN https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html

Note that some datasets above did not contain data for District of Columbia, this missing data was found via Google searches manually entered.
Covid-19 World-Wide Deaths Prediction
kaggle.com
zip
Updated Apr 10, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
suresh dv (2020). Covid-19 World-Wide Deaths Prediction [Dataset]. https://www.kaggle.com/sureshdv/covid19-worldwide-deaths-prediction
Explore at:
zip(98049 bytes)Available download formats
Dataset updated
Apr 10, 2020
Authors
suresh dv
Area covered
World
Description
Dataset

This dataset was created by suresh dv

Contents
Excess Deaths Associated with COVID-19
datalumos.org
delimited
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics (2025). Excess Deaths Associated with COVID-19 [Dataset]. http://doi.org/10.3886/E227667V1
Explore at:
delimitedAvailable download formats
Unique identifier
https://doi.org/10.3886/E227667V1
Dataset updated
Apr 24, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
United States Department of Health and Human Serviceshttp://www.hhs.gov/
National Center for Health Statisticshttps://www.cdc.gov/nchs/
Authors
United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Time period covered
2017 - 2023
Area covered
United States
Description
Estimates of excess deaths can provide information about the burden of mortality potentially related to the COVID-19 pandemic, including deaths that are directly or indirectly attributed to COVID-19. Excess deaths are typically defined as the difference between the observed numbers of deaths in specific time periods and expected numbers of deaths in the same time periods. This visualization provides weekly estimates of excess deaths by the jurisdiction in which the death occurred. Weekly counts of deaths are compared with historical trends to determine whether the number of deaths is significantly higher than expected.Counts of deaths from all causes of death, including COVID-19, are presented. As some deaths due to COVID-19 may be assigned to other causes of deaths (for example, if COVID-19 was not diagnosed or not mentioned on the death certificate), tracking all-cause mortality can provide information about whether an excess number of deaths is observed, even when COVID-19 mortality may be undercounted. Additionally, deaths from all causes excluding COVID-19 were also estimated. Comparing these two sets of estimates — excess deaths with and without COVID-19 — can provide insight about how many excess deaths are identified as due to COVID-19, and how many excess deaths are reported as due to other causes of death. These deaths could represent misclassified COVID-19 deaths, or potentially could be indirectly related to the COVID-19 pandemic (e.g., deaths from other causes occurring in the context of health care shortages or overburdened health care systems).Estimates of excess deaths can be calculated in a variety of ways, and will vary depending on the methodology and assumptions about how many deaths are expected to occur. Estimates of excess deaths presented in this webpage were calculated using Farrington surveillance algorithms (1). A range of values for the number of excess deaths was calculated as the difference between the observed count and one of two thresholds (either the average expected count or the upper bound of the 95% prediction interval), by week and jurisdiction.Provisional death counts are weighted to account for incomplete data. However, data for the most recent week(s) are still likely to be incomplete. Weights are based on completeness of provisional data in prior years, but the timeliness of data may have changed in 2020 relative to prior years, so the resulting weighted estimates may be too high in some jurisdictions and too low in others. As more information about the accuracy of the weighted estimates is obtained, further refinements to the weights may be made, which will impact the estimates. Any changes to the methods or weighting algorithm will be noted in the Technical Notes when they occur. More detail about the methods, weighting, data, and limitations can be found in the Technical Notes.This visualization includes several different estimates:Number of excess deaths: A range of estimates for the number of excess deaths was calculated as the difference between the observed count and one of two thresholds (either the average expected count or the upper bound threshold), by week and jurisdiction. Negative values, where the observed count fell below the threshold, were set to zero.Percent excess: The percent excess was defined as the number of excess deaths divided by the threshold.Total number of excess deaths: The total number of excess deaths in each jurisdiction was calculated by summing the excess deaths in each week, from February 1, 2020 to present. Similarly, the total number of excess deaths for the US overall was computed as a sum of jurisdiction-specific numbers of excess deaths (with negative values set to zero), and not directly estimated using the Farrington surveillance algorithms.Select a dashboard from the menu, then click on “Update Dashboard” to navigate through the different graphics.The first dashboard shows the weekly predicted counts of deaths from all causes, and the threshold for the expected number of deaths. Select a jurisdiction from the drop-down menu to show data for that jurisdiction.The second dashboard shows the weekly predicted counts of deaths from all causes and the weekly count of deaths from all causes excluding COVID-19. Select a jurisdiction from the drop-down menu to show data for that jurisdiction.The th
U
United States Excess Death excl COVID: Predicted: Total Estimate: Florida
ceicdata.com
Updated Sep 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Total Estimate: Florida [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-total-estimate-florida
Explore at:
Dataset updated
Sep 16, 2023
Dataset provided by
CEICdata.com
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 1, 2023 - Sep 16, 2023
Area covered
United States
Variables measured
Vital Statistics
Description
United States Excess Death excl COVID: Predicted: Total Estimate: Florida data was reported at 20,737.000 Number in 16 Sep 2023. This stayed constant from the previous number of 20,737.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Total Estimate: Florida data is updated weekly, averaging 20,737.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 20,737.000 Number in 16 Sep 2023 and a record low of 20,737.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Total Estimate: Florida data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).
f
Data_Sheet_1_The risk profile of patients with COVID-19 as predictors of...
datasetcatalog.nlm.nih.gov
Updated Jul 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sturkenboom, Miriam; Bouhaddani, Said el; Royo, Albert Cid; Rahimi, Ezat; Ahmadizar, Fariba; Sigari, Naseh; Shahisavandi, Mina; Azizi, Mohammad (2022). Data_Sheet_1_The risk profile of patients with COVID-19 as predictors of lung lesions severity and mortality—Development and validation of a prediction model.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000358035
Explore at:
Dataset updated
Jul 26, 2022
Authors
Sturkenboom, Miriam; Bouhaddani, Said el; Royo, Albert Cid; Rahimi, Ezat; Ahmadizar, Fariba; Sigari, Naseh; Shahisavandi, Mina; Azizi, Mohammad
Description
ObjectiveWe developed and validated a prediction model based on individuals' risk profiles to predict the severity of lung involvement and death in patients hospitalized with coronavirus disease 2019 (COVID-19) infection.MethodsIn this retrospective study, we studied hospitalized COVID-19 patients with data on chest CT scans performed during hospital stay (February 2020-April 2021) in a training dataset (TD) (n = 2,251) and an external validation dataset (eVD) (n = 993). We used the most relevant demographical, clinical, and laboratory variables (n = 25) as potential predictors of COVID-19-related outcomes. The primary and secondary endpoints were the severity of lung involvement quantified as mild (≤25%), moderate (26–50%), severe (>50%), and in-hospital death, respectively. We applied random forest (RF) classifier, a machine learning technique, and multivariable logistic regression analysis to study our objectives.ResultsIn the TD and the eVD, respectively, the mean [standard deviation (SD)] age was 57.9 (18.0) and 52.4 (17.6) years; patients with severe lung involvement [n (%):185 (8.2) and 116 (11.7)] were significantly older [mean (SD) age: 64.2 (16.9), and 56.2 (18.9)] than the other two groups (mild and moderate). The mortality rate was higher in patients with severe (64.9 and 38.8%) compared to moderate (5.5 and 12.4%) and mild (2.3 and 7.1%) lung involvement. The RF analysis showed age, C reactive protein (CRP) levels, and duration of hospitalizations as the three most important predictors of lung involvement severity at the time of the first CT examination. Multivariable logistic regression analysis showed a significant strong association between the extent of the severity of lung involvement (continuous variable) and death; adjusted odds ratio (OR): 9.3; 95% CI: 7.1–12.1 in the TD and 2.6 (1.8–3.5) in the eVD.ConclusionIn hospitalized patients with COVID-19, the severity of lung involvement is a strong predictor of death. Age, CRP levels, and duration of hospitalizations are the most important predictors of severe lung involvement. A simple prediction model based on available clinical and imaging data provides a validated tool that predicts the severity of lung involvement and death probability among hospitalized patients with COVID-19.
f
Table_1_Neurological Comorbidity Is a Predictor of Death in Covid-19...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Jul 7, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Abad-Molina, Cristina; de Lera, Mercedes; Pedraza, María; García-Azorín, David; Chavarría-Miranda, Alba; Talavera, Blanca; Vicente, Jose Manuel; Dueñas-Gutierrez, Carlos; Gómez-Herreras, Jose Ignacio; Ruiz-Martin, Guadalupe; Martínez-Velasco, Elena; Martínez-Pías, Enrique; Arenillas, Juan Francisco; Ezpeleta, David; de Paula, Jose María Prieto; Trigo, Javier; Bustamante-Munguira, Elena; Juarros, Santiago; Simón-Campo, Paula; Gómez-Vicente, Beatriz; Hernández-Pérez, Isabel; del Pozo-Vegas, Carlos; Cantón-Álvarez, Belén; Peñarrubia, María Jesús; López-Sanz, Cristina; Orduña-Domingo, Antonio; Valle-Peñacoba, Gonzalo; Gutiérrez-Sánchez, María; Jiménez-Cuenca, María Isabel; Sierra, Álvaro; Guerrero, Ángel (2020). Table_1_Neurological Comorbidity Is a Predictor of Death in Covid-19 Disease: A Cohort Study on 576 Patients.DOCX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000574084
Explore at:
Dataset updated
Jul 7, 2020
Authors
Abad-Molina, Cristina; de Lera, Mercedes; Pedraza, María; García-Azorín, David; Chavarría-Miranda, Alba; Talavera, Blanca; Vicente, Jose Manuel; Dueñas-Gutierrez, Carlos; Gómez-Herreras, Jose Ignacio; Ruiz-Martin, Guadalupe; Martínez-Velasco, Elena; Martínez-Pías, Enrique; Arenillas, Juan Francisco; Ezpeleta, David; de Paula, Jose María Prieto; Trigo, Javier; Bustamante-Munguira, Elena; Juarros, Santiago; Simón-Campo, Paula; Gómez-Vicente, Beatriz; Hernández-Pérez, Isabel; del Pozo-Vegas, Carlos; Cantón-Álvarez, Belén; Peñarrubia, María Jesús; López-Sanz, Cristina; Orduña-Domingo, Antonio; Valle-Peñacoba, Gonzalo; Gutiérrez-Sánchez, María; Jiménez-Cuenca, María Isabel; Sierra, Álvaro; Guerrero, Ángel
Description
Introduction: Prognosis of Coronavirus disease 2019 (Covid-19) patients with vascular risk factors, and certain comorbidities is worse. The impact of chronic neurological disorders (CND) on prognosis is unclear. We evaluated if the presence of CND in Covid-19 patients is a predictor of a higher in-hospital mortality. As secondary endpoints, we analyzed the association between CND, Covid-19 severity, and laboratory abnormalities during admission.Methods: Retrospective cohort study that included all the consecutive hospitalized patients with confirmed Covid-19 disease from March 8th to April 11th, 2020. The study setting was Hospital Clínico, tertiary academic hospital from Valladolid. CND was defined as those neurological conditions causing permanent disability. We assessed demography, clinical variables, Covid-19 severity, laboratory parameters and outcome. The primary endpoint was in-hospital all-cause mortality, evaluated by multivariate cox-regression log rank test. We analyzed the association between CND, covid-19 severity and laboratory abnormalities.Results: We included 576 patients, 43.3% female, aged 67.2 years in mean. CND were present in 105 (18.3%) patients. Patients with CND were older, more disabled, had more vascular risk factors and comorbidities and fewer clinical symptoms of Covid-19. They presented 1.43 days earlier to the emergency department. Need of ventilation support was similar. Presence of CND was an independent predictor of death (HR 2.129, 95% CI: 1.382–3.280) but not a severer Covid-19 disease (OR: 1.75, 95% CI: 0.970–3.158). Frequency of laboratory abnormalities was similar, except for procalcitonin and INR.Conclusions: The presence of CND is an independent predictor of mortality in hospitalized Covid-19 patients. That was not explained neither by a worse immune response to Covid-19 nor by differences in the level of care received by patients with CND.
Covid19 Global Excess Deaths (daily updates)
kaggle.com
zip
Updated Dec 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joakim Arvidsson (2025). Covid19 Global Excess Deaths (daily updates) [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/covid19-global-excess-deaths-daily-updates
Explore at:
zip(2989004967 bytes)Available download formats
Dataset updated
Dec 2, 2025
Authors
Joakim Arvidsson
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Daily updates of Covid-19 Global Excess Deaths from the Economist's GitHub repository: https://github.com/TheEconomist/covid-19-the-economist-global-excess-deaths-model

Interpreting estimates

Estimating excess deaths for every country every day since the pandemic began is a complex and difficult task. Rather than being overly confident in a single number, limited data means that we can often only give a very very wide range of plausible values. Focusing on central estimates in such cases would be misleading: unless ranges are very narrow, the 95% range should be reported when possible. The ranges assume that the conditions for bootstrap confidence intervals are met. Please see our tracker page and methodology for more information.

New variants

The Omicron variant, first detected in southern Africa in November 2021, appears to have characteristics that are different to earlier versions of sars-cov-2. Where this variant is now dominant, this change makes estimates uncertain beyond the ranges indicated. Other new variants may do the same. As more data is incorporated from places where new variants are dominant, predictions improve.

Non-reporting countries

Turkmenistan and the Democratic People's Republic of Korea have not reported any covid-19 figures since the start of the pandemic. They also have not published all-cause mortality data. Exports of estimates for the Democratic People's Republic of Korea have been temporarily disabled as it now issues contradictory data: reporting a significant outbreak through its state media, but zero confirmed covid-19 cases/deaths to the WHO.

Acknowledgements

A special thanks to all our sources and to those who have made the data to create these estimates available. We list all our sources in our methodology. Within script 1, the source for each variable is also given as the data is loaded, with the exception of our sources for excess deaths data, which we detail in on our free-to-read excess deaths tracker as well as on GitHub. The gradient booster implementation used to fit the models is aGTBoost, detailed here.

Calculating excess deaths for the entire world over multiple years is both complex and imprecise. We welcome any suggestions on how to improve the model, be it data, algorithm, or logic. If you have one, please open an issue.

The Economist would also like to acknowledge the many people who have helped us refine the model so far, be it through discussions, facilitating data access, or offering coding assistance. A special thanks to Ariel Karlinsky, Philip Schellekens, Oliver Watson, Lukas Appelhans, Berent Å. S. Lunde, Gideon Wakefield, Johannes Hunger, Carol D'Souza, Yun Wei, Mehran Hosseini, Samantha Dolan, Mollie Van Gordon, Rahul Arora, Austin Teda Atmaja, Dirk Eddelbuettel and Tom Wenseleers.

All coding and data collection to construct these models (and make them update dynamically) was done by Sondre Ulvund Solstad. Should you have any questions about them after reading the methodology, please open an issue or contact him at sondresolstad@economist.com.

Suggested citation The Economist and Solstad, S. (corresponding author), 2021. The pandemic’s true death toll. [online] The Economist. Available at: https://www.economist.com/graphic-detail/coronavirus-excess-deaths-estimates [Accessed ---]. First published in the article "Counting the dead", The Economist, issue 20, 2021.
d
The geographic latitude-associated anti-COVID capacity index : an...
dataone.org
dataverse.harvard.edu
Updated Nov 8, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Errasfa, Mourad (2023). The geographic latitude-associated anti-COVID capacity index : an epidemiologic, demographic, and climate-based parameter negatively correlated with the COVID-19 death tolls [Dataset]. http://doi.org/10.7910/DVN/AXNZUA
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/AXNZUA
Dataset updated
Nov 8, 2023
Dataset provided by
Harvard Dataverse
Authors
Errasfa, Mourad
Description
During the first two year of the Covid-19 pandemic, deaths tolls differed from a country to another. In a previous research work on 39 countries, we have found that some population’s characteristics were either negatively (birth rate/mortality rate, fertility rate) or positively (cancer score, Alzheimer disease score, percent of people above 65 years old, levels of alcohol intake) correlated with Covid-19 mortality. We also found that low levels of climate factors (average annual temperature, average hours of sunshine, average annual level of UV index) were positively correlated with Covid-19 deaths numbers as well. In the present study, we have developped an anti-Covid Capacity index that takes into account all the above mentioned parameters. The polynomial analysis of the anti-Covid Capacity and its corresponding geographic latitude of each country has generated a bell-shaped curve, with a high coefficient of determination (R2= 0.78). Lower anti-Covid capacity values were recorded in countries of low and high latitudes, respectively. Instead, plotting covid-19 deaths numbers against geographic latitude levels has generated an inverted bell-shaped curve, with higher deaths numbers at low and high latitudes, respectively. The analysis by a simple linear regression has shown that Covid-19 deaths numbers were significantly (p= 2,40 x 10-9) and negatively correlated to the anti-Covid Capacity index values. Our data demonstrate that the negative prepandemic human conditions, and the low scores of both annual temperature and UV index in many countries were the key factors behind high Covid-19 mortality, and they can be expressed as a simple index of anti-Covid capacity of a country that can predict the death-associated severity of Covid-19 disease, and thus, according to a country’s geographic latitude.
Infected and death cases Covid-19 of Bangladesh
kaggle.com
zip
Updated Nov 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Md. Akbar Hossain (2023). Infected and death cases Covid-19 of Bangladesh [Dataset]. https://www.kaggle.com/datasets/mdakbarhossain12/infected-and-death-cases-covid-19-of-bangladesh
Explore at:
zip(2840 bytes)Available download formats
Dataset updated
Nov 15, 2023
Authors
Md. Akbar Hossain
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
Bangladesh
Description
Dataset Description: Infected and Death Cases of Covid-19 in Bangladesh This dataset contains detailed information on Covid-19 cases in Bangladesh, focusing on the number of new cases and deaths reported. The data spans from September 27, 2020, to November 19, 2021. The dataset is structured with three primary columns:

Date: The date when the data was recorded, formatted as YYYY-MM-DD. New Cases: The number of new Covid-19 cases reported on the corresponding date. Deaths: The number of deaths attributed to Covid-19 on the corresponding date. Key Features: Time Range: Covers over a year of data, capturing various waves of the pandemic. Granularity: Daily records, providing detailed insights into the daily progression of the pandemic. Size: The dataset is compact, with a file size of 7.91 KB, making it easy to handle and analyze. Cite this paper

@InProceedings{10.1007/978-981-19-2445-3_38, author="Rahman, Ashifur and Hossain, Md. Akbar and Moon, Mohasina Jannat", editor="Hossain, Sazzad and Hossain, Md. Shahadat and Kaiser, M. Shamim and Majumder, Satya Prasad and Ray, Kanad", title="An LSTM-Based Forecast Of COVID-19 For Bangladesh", booktitle="Proceedings of International Conference on Fourth Industrial Revolution and Beyond 2021 ", year="2022", publisher="Springer Nature Singapore", address="Singapore", pages="551--561", abstract="Preoperative events can be predicted using deep learning-based forecasting techniques. It can help to improve future decision-making. Deep learning has traditionally been used to identify and evaluate adverse risks in a variety of major applications. Numerous prediction approaches are commonly applied to deal with forecasting challenges. The number of infected people, as well as the mortality rate of COVID-19, is increasing every day. Many countries, including India, Brazil, and the United States, were severely affected; however, since the very first case was identified, the transmission rate has decreased dramatically after a set time period. Bangladesh, on the other hand, was unable to keep the rate of infection low. In this situation, several methods have been developed to forecast the number of affected, time to recover, and the number of deaths. This research illustrates the ability of DL models to forecast the number of affected and dead people as a result of COVID-19, which is now regarded as a possible threat to humanity. As part of this study, we developed an LSTM based method to predict the next 100 days of death and newly identified COVID-19 cases in Bangladesh. To do this experiment we collect data on death and newly detected COVID-19 cases through Bangladesh's national COVID-19 help desk website. After collecting data we processed it to make a dataset for training our LSTM model. After completing the training, we predict our model with the test dataset. The result of our model is very robust on the basis of the training and testing dataset. Finally, we forecast the subsequent 100 days of deaths and newly infected COVID-19 cases in Bangladesh.", isbn="978-981-19-2445-3" }
M
Data from: COVID-19 Forecasts: Deaths
catalog.midasnetwork.us
Updated Mar 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (CDC) (2023). COVID-19 Forecasts: Deaths [Dataset]. https://catalog.midasnetwork.us/collection/147
Explore at:
Dataset updated
Mar 9, 2023
Dataset provided by
MIDAS COORDINATION CENTER
Authors
Centers for Disease Control and Prevention (CDC)
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Area covered
Country, State
Variables measured
Viruses, disease, COVID-19, modeling, pathogen, forecasting, Homo sapiens, host organism, mortality data, Population count, and 6 more
Dataset funded by
National Institute of General Medical Sciences
Description
The dataset contains observed and 4 weeks forecast new and total weekly COVID-19 deaths at national and state level until March 9, 2023. Forecasting teams predict numbers of deaths using different types of data (e.g., COVID-19 data, demographic data, mobility data), methods, and estimates of the impacts of interventions (e.g., social distancing, use of face coverings).
COVID-19 Tweets, Vaccination, and Deaths Data
kaggle.com
zip
Updated May 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arya Gavande (2025). COVID-19 Tweets, Vaccination, and Deaths Data [Dataset]. https://www.kaggle.com/datasets/aryagavande/covid-19-tweets-vaccination-and-deaths-data/code
Explore at:
zip(357725 bytes)Available download formats
Dataset updated
May 29, 2025
Authors
Arya Gavande
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
This dataset merges three distinct data sources to explore the relationship between COVID-19 death rates, vaccination efforts, and public sentiment on Twitter from December 25, 2020 to March 29, 2022. It includes 2,000 cleaned rows with 16 variables, created by combining global health statistics and social media sentiment data.

Sources & Variables:

COVID-19 Deaths Data (scraped from Worldometer - COVID-19 Deaths via BeautifulSoup):

Date: Date of record

daily_increase_percent: % change in deaths from previous day

Season: Derived from date (Winter, Spring, Summer, Fall)

Tweet Sentiment Data : COVID Vaccine Tweets Dataset

Date: Tweet timestamp

text_sentiment: Sentiment label (positive, neutral, negative) from NLTK’s SentimentIntensityAnalyzer

user_verified: Whether the user is verified

user_since_days: Age of the Twitter account (in days)

country: Cleaned user location

Vaccination Data : Vaccination Dataset

Date: Date of record

total_vaccinations_per_hundred: Doses per 100 people

daily_vaccinations: Daily dose count

vaccine_group: Grouped vaccine type (e.g., mRNA, Viral Vector)

country: Country name

Preprocessing Summary:

Merged by Date and country

Cleaned invalid country names (e.g., “moon”, “nowhere”)

Standardized all datetime formats

Removed entries with missing or unreliable values

Created derived variables: Season, user_since_days, vaccine_group

This dataset was used in a final data science project to:

Classify public sentiment toward vaccines using health indicators

Predict daily COVID-19 death counts using sentiment and vaccination data
Development and validation of a machine learning model for use as an...
zenodo.org
data.niaid.nih.gov
+1more
bin
Updated Jun 15, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Stachel; Anna Stachel (2020). Development and validation of a machine learning model for use as an automated artificial intelligence tool to predict mortality risk in patients with COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3893846
Explore at:
binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.3893846
Dataset updated
Jun 15, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Anna Stachel; Anna Stachel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Background

New York City quickly became an epicenter of the COVID-19 pandemic. Due to a sudden and massive increase in patients during COVID-19 pandemic, healthcare providers incurred an exponential increase in workload which created a strain on the staff and limited resources. As this is a new infection, predictors of morbidity and mortality are not well characterized.

Methods

We developed a prediction model to predict patients at risk for mortality using only laboratory, vital and demographic information readily available in the electronic health record on more than 3000 hospital admissions with COVID-19. A variable importance algorithm was used for interpretability and understanding of performance and predictors.

Findings

We built a model with 84-97% accuracy to identify predictors and patients with high risk of mortality, and developed an automated artificial intelligence (AI) notification tool that does not require manual calculation by the busy clinician. Oximetry, respirations, blood urea nitrogen, lymphocyte percent, calcium, troponin and neutrophil percentage were important features and key ranges were identified that contributed to a 50% increase in patients’ mortality prediction score. With an increasing negative predictive value (NPV) starting 0.90 after the second day of admission, we are able more confidently able identify likely survivors. This study serves as a use case of a model with visualizations to aide clinicians with a better understanding of the model and predictors of mortality. Additionally, an example of the operationalization of the model via an AI notification tool is illustrated.
Covid-19 Data - Excess Death Increase 2020
kaggle.com
zip
Updated Jan 6, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
SnowyOwl (2021). Covid-19 Data - Excess Death Increase 2020 [Dataset]. https://www.kaggle.com/kyleberdy/covid19-data-excess-death-increase-2020
Explore at:
zip(435002 bytes)Available download formats
Dataset updated
Jan 6, 2021
Authors
SnowyOwl
License
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
Description
Content

This data represents excess deaths in an area (more deaths than expected). Although causes are recorded, one can see that excess deaths increased dramatically in 2020.

Acknowledgements

Original Source: https://catalog.data.gov/dataset/excess-deaths-associated-with-covid-19-35b8c

Inspiration

Using the Excess Death counts, make predictions, per day, as to what the hospital case load, and death load, for covid cases will be. Can use any source (local, federal, international) as the benchmark.

Facebook

Twitter

Click to copy link

Link copied

Cite

Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi (2021). Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000850298

Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF

Explore at:

Dataset updated

Jun 10, 2021

Authors

Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi

Description

The complexity of COVID-19 and variations in control measures and containment efforts in different countries have caused difficulties in the prediction and modeling of the COVID-19 pandemic. We attempted to predict the scale of the latter half of the pandemic based on real data using the ratio between the early and latter halves from countries where the pandemic is largely over. We collected daily pandemic data from China, South Korea, and Switzerland and subtracted the ratio of pandemic days before and after the disease apex day of COVID-19. We obtained the ratio of pandemic data and created multiple regression models for the relationship between before and after the apex day. We then tested our models using data from the first wave of the disease from 14 countries in Europe and the US. We then tested the models using data from these countries from the entire pandemic up to March 30, 2021. Results indicate that the actual number of cases from these countries during the first wave mostly fall in the predicted ranges of liniar regression, excepting Spain and Russia. Similarly, the actual deaths in these countries mostly fall into the range of predicted data. Using the accumulated data up to the day of apex and total accumulated data up to March 30, 2021, the data of case numbers in these countries are falling into the range of predicted data, except for data from Brazil. The actual number of deaths in all the countries are at or below the predicted data. In conclusion, a linear regression model built with real data from countries or regions from early pandemics can predict pandemic scales of the countries where the pandemics occur late. Such a prediction with a high degree of accuracy provides valuable information for governments and the public.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections...

United States Excess Death excl COVID: Predicted: Single Excess Est:...

COVID-19 Data

Dataset

Contents

COVID-19 Dataset

Context

content

United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming

United States Excess Death excl COVID: Predicted: Single Estimate: Maine

clinical lab parameters covid

Context

COVID-19 State Data

Deaths, Infections and Tests by State

The COVID Tracking Project: https://covidtracking.com/data/api

Predictor Data and Sources

Population (2020)

ICU Beds and Age 60+

GDP

Income per capita (2018)

Gini

Unemployment (2020)

Sex (2017)

Smoking Percentage (2020)

Influenza and Pneumonia Death Rate (2018)

Chronic Lower Respiratory Disease Death Rate (2018)

Active Physicians (2019)

Hospitals (2018)

Health spending per capita

Pollution (2019)

Medium and Large Airports

Temperature (2019)

Urbanization (2010)

Age Groups (2018)

School Closure Dates

Covid-19 World-Wide Deaths Prediction

Dataset

Contents

Excess Deaths Associated with COVID-19

United States Excess Death excl COVID: Predicted: Total Estimate: Florida

Data_Sheet_1_The risk profile of patients with COVID-19 as predictors of...

Table_1_Neurological Comorbidity Is a Predictor of Death in Covid-19...

Covid19 Global Excess Deaths (daily updates)

The geographic latitude-associated anti-COVID capacity index : an...

Infected and death cases Covid-19 of Bangladesh

Data from: COVID-19 Forecasts: Deaths

COVID-19 Tweets, Vaccination, and Deaths Data

Sources & Variables:

Preprocessing Summary:

Development and validation of a machine learning model for use as an...

Covid-19 Data - Excess Death Increase 2020

Content

Acknowledgements

Inspiration

Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDFSee More Versions

Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF