100+ datasets found
  1. f

    Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jun 10, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi (2021). Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000850298
    Explore at:
    Dataset updated
    Jun 10, 2021
    Authors
    Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi
    Description

    The complexity of COVID-19 and variations in control measures and containment efforts in different countries have caused difficulties in the prediction and modeling of the COVID-19 pandemic. We attempted to predict the scale of the latter half of the pandemic based on real data using the ratio between the early and latter halves from countries where the pandemic is largely over. We collected daily pandemic data from China, South Korea, and Switzerland and subtracted the ratio of pandemic days before and after the disease apex day of COVID-19. We obtained the ratio of pandemic data and created multiple regression models for the relationship between before and after the apex day. We then tested our models using data from the first wave of the disease from 14 countries in Europe and the US. We then tested the models using data from these countries from the entire pandemic up to March 30, 2021. Results indicate that the actual number of cases from these countries during the first wave mostly fall in the predicted ranges of liniar regression, excepting Spain and Russia. Similarly, the actual deaths in these countries mostly fall into the range of predicted data. Using the accumulated data up to the day of apex and total accumulated data up to March 30, 2021, the data of case numbers in these countries are falling into the range of predicted data, except for data from Brazil. The actual number of deaths in all the countries are at or below the predicted data. In conclusion, a linear regression model built with real data from countries or regions from early pandemics can predict pandemic scales of the countries where the pandemics occur late. Such a prediction with a high degree of accuracy provides valuable information for governments and the public.

  2. U

    United States Excess Death excl COVID: Predicted: Single Excess Est:...

    • ceicdata.com
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-single-excess-est-massachusetts
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2023 - Sep 16, 2023
    Area covered
    United States
    Variables measured
    Vital Statistics
    Description

    United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts data was reported at 0.000 Number in 16 Sep 2023. This stayed constant from the previous number of 0.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts data is updated weekly, averaging 0.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 209.000 Number in 13 Jan 2018 and a record low of 0.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Single Excess Est: Massachusetts data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).

  3. COVID-19 Data

    • kaggle.com
    zip
    Updated Nov 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Umut Toygar Göz (2025). COVID-19 Data [Dataset]. https://www.kaggle.com/datasets/umuttoygargoz/covid19-data
    Explore at:
    zip(8484157 bytes)Available download formats
    Dataset updated
    Nov 21, 2025
    Authors
    Umut Toygar Göz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Umut Toygar Göz

    Released under Attribution 4.0 International (CC BY 4.0)

    Contents

  4. COVID-19 Dataset

    • kaggle.com
    zip
    Updated Nov 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Meir Nizri (2022). COVID-19 Dataset [Dataset]. https://www.kaggle.com/datasets/meirnizri/covid19-dataset
    Explore at:
    zip(4890659 bytes)Available download formats
    Dataset updated
    Nov 13, 2022
    Authors
    Meir Nizri
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment. Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness. During the entire course of the pandemic, one of the main problems that healthcare providers have faced is the shortage of medical resources and a proper plan to efficiently distribute them. In these tough times, being able to predict what kind of resource an individual might require at the time of being tested positive or even before that will be of immense help to the authorities as they would be able to procure and arrange for the resources necessary to save the life of that patient.

    The main goal of this project is to build a machine learning model that, given a Covid-19 patient's current symptom, status, and medical history, will predict whether the patient is in high risk or not.

    content

    The dataset was provided by the Mexican government (link). This dataset contains an enormous number of anonymized patient-related information including pre-conditions. The raw dataset consists of 21 unique features and 1,048,576 unique patients. In the Boolean features, 1 means "yes" and 2 means "no". values as 97 and 99 are missing data.

    • sex: 1 for female and 2 for male.
    • age: of the patient.
    • classification: covid test findings. Values 1-3 mean that the patient was diagnosed with covid in different degrees. 4 or higher means that the patient is not a carrier of covid or that the test is inconclusive.
    • patient type: type of care the patient received in the unit. 1 for returned home and 2 for hospitalization.
    • pneumonia: whether the patient already have air sacs inflammation or not.
    • pregnancy: whether the patient is pregnant or not.
    • diabetes: whether the patient has diabetes or not.
    • copd: Indicates whether the patient has Chronic obstructive pulmonary disease or not.
    • asthma: whether the patient has asthma or not.
    • inmsupr: whether the patient is immunosuppressed or not.
    • hypertension: whether the patient has hypertension or not.
    • cardiovascular: whether the patient has heart or blood vessels related disease.
    • renal chronic: whether the patient has chronic renal disease or not.
    • other disease: whether the patient has other disease or not.
    • obesity: whether the patient is obese or not.
    • tobacco: whether the patient is a tobacco user.
    • usmr: Indicates whether the patient treated medical units of the first, second or third level.
    • medical unit: type of institution of the National Health System that provided the care.
    • intubed: whether the patient was connected to the ventilator.
    • icu: Indicates whether the patient had been admitted to an Intensive Care Unit.
    • date died: If the patient died indicate the date of death, and 9999-99-99 otherwise.
  5. U

    United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming

    • ceicdata.com
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-single-estimate-wyoming
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2023 - Sep 16, 2023
    Area covered
    United States
    Variables measured
    Vital Statistics
    Description

    United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming data was reported at 0.000 Number in 16 Sep 2023. This stayed constant from the previous number of 0.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming data is updated weekly, averaging 2.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 51.000 Number in 04 Jan 2020 and a record low of 0.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Wyoming data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).

  6. U

    United States Excess Death excl COVID: Predicted: Single Estimate: Maine

    • ceicdata.com
    Updated Sep 16, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Single Estimate: Maine [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-single-estimate-maine
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2023 - Sep 16, 2023
    Area covered
    United States
    Variables measured
    Vital Statistics
    Description

    United States Excess Death excl COVID: Predicted: Single Estimate: Maine data was reported at 0.000 Number in 16 Sep 2023. This stayed constant from the previous number of 0.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Maine data is updated weekly, averaging 0.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 54.000 Number in 06 Nov 2021 and a record low of 0.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Single Estimate: Maine data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).

  7. clinical lab parameters covid

    • kaggle.com
    zip
    Updated Nov 16, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Paul Larmuseau (2020). clinical lab parameters covid [Dataset]. https://www.kaggle.com/plarmuseau/forecast-covid-death
    Explore at:
    zip(3047780 bytes)Available download formats
    Dataset updated
    Nov 16, 2020
    Authors
    Paul Larmuseau
    Description

    Context

    ***Is there a decision tree for covid19 possible with these datasets ***validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19

    https://github.com/HAIRLAB/Pre_Surv_COVID_19/blob/master/response/EDA.ipynb The sudden increase of COVID-19 cases is putting a high pressure on health-care services worldwide. At the current stage, fast, accurate and early clinical assessment of the disease severity is vital. To support decision making and logistical planning in healthcare systems, this study leverages a database of blood samples from 485 infected patients in the region of Wuhan, China to identify crucial predictive biomarkers of disease mortality. For this purpose, machine learning tools selected three biomarkers that predict the mortality of individual patients with more than 90% accuracy: lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP). In particular, relatively high levels of LDH alone seem to play a crucial role in distinguishing the vast majority of cases that require immediate medical attention. This finding is consistent with current medical knowledge that high LDH levels are associated with tissue breakdown occurring in various diseases, including pulmonary disorders such as pneumonia. Overall, this paper suggests a simple and operable decision rule to quickly predict patients at the highest risk, allowing them to be prioritised and potentially reducing the mortality rate.

  8. COVID-19 State Data

    • kaggle.com
    zip
    Updated Nov 3, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Night Ranger (2020). COVID-19 State Data [Dataset]. https://www.kaggle.com/nightranger77/covid19-state-data
    Explore at:
    zip(4501 bytes)Available download formats
    Dataset updated
    Nov 3, 2020
    Authors
    Night Ranger
    Description

    This dataset is a per-state amalgamation of demographic, public health and other relevant predictors for COVID-19.

    Deaths, Infections and Tests by State

    The COVID Tracking Project: https://covidtracking.com/data/api

    Used positive, death and totalTestResults from the API for, respectively, Infected, Deaths and Tested in this dataset. Please read the documentation of the API for more context on those columns

    Predictor Data and Sources

    Population (2020)

    Density is people per meter squared https://worldpopulationreview.com/states/

    ICU Beds and Age 60+

    https://khn.org/news/as-coronavirus-spreads-widely-millions-of-older-americans-live-in-counties-with-no-icu-beds/

    GDP

    https://worldpopulationreview.com/states/gdp-by-state/

    Income per capita (2018)

    https://worldpopulationreview.com/states/per-capita-income-by-state/

    Gini

    https://en.wikipedia.org/wiki/List_of_U.S._states_by_Gini_coefficient

    Unemployment (2020)

    Rates from Feb 2020 and are percentage of labor force
    https://www.bls.gov/web/laus/laumstrk.htm

    Sex (2017)

    Ratio is Male / Female
    https://www.kff.org/other/state-indicator/distribution-by-gender/

    Smoking Percentage (2020)

    https://worldpopulationreview.com/states/smoking-rates-by-state/

    Influenza and Pneumonia Death Rate (2018)

    Death rate per 100,000 people
    https://www.cdc.gov/nchs/pressroom/sosmap/flu_pneumonia_mortality/flu_pneumonia.htm

    Chronic Lower Respiratory Disease Death Rate (2018)

    Death rate per 100,000 people
    https://www.cdc.gov/nchs/pressroom/sosmap/lung_disease_mortality/lung_disease.htm

    Active Physicians (2019)

    https://www.kff.org/other/state-indicator/total-active-physicians/

    Hospitals (2018)

    https://www.kff.org/other/state-indicator/total-hospitals

    Health spending per capita

    Includes spending for all health care services and products by state of residence. Hospital spending is included and reflects the total net revenue. Costs such as insurance, administration, research, and construction expenses are not included.
    https://www.kff.org/other/state-indicator/avg-annual-growth-per-capita/

    Pollution (2019)

    Pollution: Average exposure of the general public to particulate matter of 2.5 microns or less (PM2.5) measured in micrograms per cubic meter (3-year estimate)
    https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL

    Medium and Large Airports

    For each state, number of medium and large airports https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_the_United_States

    Temperature (2019)

    Note that FL was incorrect in the table, but is corrected in the Hottest States paragraph
    https://worldpopulationreview.com/states/average-temperatures-by-state/
    District of Columbia temperature computed as the average of Maryland and Virginia

    Urbanization (2010)

    Urbanization as a percentage of the population https://www.icip.iastate.edu/tables/population/urban-pct-states

    Age Groups (2018)

    https://www.kff.org/other/state-indicator/distribution-by-age/

    School Closure Dates

    Schools that haven't closed are marked NaN https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html

    Note that some datasets above did not contain data for District of Columbia, this missing data was found via Google searches manually entered.

  9. Covid-19 World-Wide Deaths Prediction

    • kaggle.com
    zip
    Updated Apr 10, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    suresh dv (2020). Covid-19 World-Wide Deaths Prediction [Dataset]. https://www.kaggle.com/sureshdv/covid19-worldwide-deaths-prediction
    Explore at:
    zip(98049 bytes)Available download formats
    Dataset updated
    Apr 10, 2020
    Authors
    suresh dv
    Area covered
    World
    Description

    Dataset

    This dataset was created by suresh dv

    Contents

  10. Excess Deaths Associated with COVID-19

    • datalumos.org
    delimited
    Updated Apr 24, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics (2025). Excess Deaths Associated with COVID-19 [Dataset]. http://doi.org/10.3886/E227667V1
    Explore at:
    delimitedAvailable download formats
    Dataset updated
    Apr 24, 2025
    Authors
    United States Department of Health and Human Services. Centers for Disease Control and Prevention. National Center for Health Statistics
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2017 - 2023
    Area covered
    United States
    Description

    Estimates of excess deaths can provide information about the burden of mortality potentially related to the COVID-19 pandemic, including deaths that are directly or indirectly attributed to COVID-19. Excess deaths are typically defined as the difference between the observed numbers of deaths in specific time periods and expected numbers of deaths in the same time periods. This visualization provides weekly estimates of excess deaths by the jurisdiction in which the death occurred. Weekly counts of deaths are compared with historical trends to determine whether the number of deaths is significantly higher than expected.Counts of deaths from all causes of death, including COVID-19, are presented. As some deaths due to COVID-19 may be assigned to other causes of deaths (for example, if COVID-19 was not diagnosed or not mentioned on the death certificate), tracking all-cause mortality can provide information about whether an excess number of deaths is observed, even when COVID-19 mortality may be undercounted. Additionally, deaths from all causes excluding COVID-19 were also estimated. Comparing these two sets of estimates — excess deaths with and without COVID-19 — can provide insight about how many excess deaths are identified as due to COVID-19, and how many excess deaths are reported as due to other causes of death. These deaths could represent misclassified COVID-19 deaths, or potentially could be indirectly related to the COVID-19 pandemic (e.g., deaths from other causes occurring in the context of health care shortages or overburdened health care systems).Estimates of excess deaths can be calculated in a variety of ways, and will vary depending on the methodology and assumptions about how many deaths are expected to occur. Estimates of excess deaths presented in this webpage were calculated using Farrington surveillance algorithms (1). A range of values for the number of excess deaths was calculated as the difference between the observed count and one of two thresholds (either the average expected count or the upper bound of the 95% prediction interval), by week and jurisdiction.Provisional death counts are weighted to account for incomplete data. However, data for the most recent week(s) are still likely to be incomplete. Weights are based on completeness of provisional data in prior years, but the timeliness of data may have changed in 2020 relative to prior years, so the resulting weighted estimates may be too high in some jurisdictions and too low in others. As more information about the accuracy of the weighted estimates is obtained, further refinements to the weights may be made, which will impact the estimates. Any changes to the methods or weighting algorithm will be noted in the Technical Notes when they occur. More detail about the methods, weighting, data, and limitations can be found in the Technical Notes.This visualization includes several different estimates:Number of excess deaths: A range of estimates for the number of excess deaths was calculated as the difference between the observed count and one of two thresholds (either the average expected count or the upper bound threshold), by week and jurisdiction. Negative values, where the observed count fell below the threshold, were set to zero.Percent excess: The percent excess was defined as the number of excess deaths divided by the threshold.Total number of excess deaths: The total number of excess deaths in each jurisdiction was calculated by summing the excess deaths in each week, from February 1, 2020 to present. Similarly, the total number of excess deaths for the US overall was computed as a sum of jurisdiction-specific numbers of excess deaths (with negative values set to zero), and not directly estimated using the Farrington surveillance algorithms.Select a dashboard from the menu, then click on “Update Dashboard” to navigate through the different graphics.The first dashboard shows the weekly predicted counts of deaths from all causes, and the threshold for the expected number of deaths. Select a jurisdiction from the drop-down menu to show data for that jurisdiction.The second dashboard shows the weekly predicted counts of deaths from all causes and the weekly count of deaths from all causes excluding COVID-19. Select a jurisdiction from the drop-down menu to show data for that jurisdiction.The th

  11. U

    United States Excess Death excl COVID: Predicted: Total Estimate: Florida

    • ceicdata.com
    Updated Sep 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CEICdata.com (2023). United States Excess Death excl COVID: Predicted: Total Estimate: Florida [Dataset]. https://www.ceicdata.com/en/united-states/number-of-excess-deaths-by-states-all-causes-excluding-covid19-predicted/excess-death-excl-covid-predicted-total-estimate-florida
    Explore at:
    Dataset updated
    Sep 16, 2023
    Dataset provided by
    CEICdata.com
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Jul 1, 2023 - Sep 16, 2023
    Area covered
    United States
    Variables measured
    Vital Statistics
    Description

    United States Excess Death excl COVID: Predicted: Total Estimate: Florida data was reported at 20,737.000 Number in 16 Sep 2023. This stayed constant from the previous number of 20,737.000 Number for 09 Sep 2023. United States Excess Death excl COVID: Predicted: Total Estimate: Florida data is updated weekly, averaging 20,737.000 Number from Jan 2017 (Median) to 16 Sep 2023, with 350 observations. The data reached an all-time high of 20,737.000 Number in 16 Sep 2023 and a record low of 20,737.000 Number in 16 Sep 2023. United States Excess Death excl COVID: Predicted: Total Estimate: Florida data remains active status in CEIC and is reported by Centers for Disease Control and Prevention. The data is categorized under Global Database’s United States – Table US.G012: Number of Excess Deaths: by States: All Causes excluding COVID-19: Predicted (Discontinued).

  12. f

    Data_Sheet_1_The risk profile of patients with COVID-19 as predictors of...

    • datasetcatalog.nlm.nih.gov
    Updated Jul 26, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sturkenboom, Miriam; Bouhaddani, Said el; Royo, Albert Cid; Rahimi, Ezat; Ahmadizar, Fariba; Sigari, Naseh; Shahisavandi, Mina; Azizi, Mohammad (2022). Data_Sheet_1_The risk profile of patients with COVID-19 as predictors of lung lesions severity and mortality—Development and validation of a prediction model.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000358035
    Explore at:
    Dataset updated
    Jul 26, 2022
    Authors
    Sturkenboom, Miriam; Bouhaddani, Said el; Royo, Albert Cid; Rahimi, Ezat; Ahmadizar, Fariba; Sigari, Naseh; Shahisavandi, Mina; Azizi, Mohammad
    Description

    ObjectiveWe developed and validated a prediction model based on individuals' risk profiles to predict the severity of lung involvement and death in patients hospitalized with coronavirus disease 2019 (COVID-19) infection.MethodsIn this retrospective study, we studied hospitalized COVID-19 patients with data on chest CT scans performed during hospital stay (February 2020-April 2021) in a training dataset (TD) (n = 2,251) and an external validation dataset (eVD) (n = 993). We used the most relevant demographical, clinical, and laboratory variables (n = 25) as potential predictors of COVID-19-related outcomes. The primary and secondary endpoints were the severity of lung involvement quantified as mild (≤25%), moderate (26–50%), severe (>50%), and in-hospital death, respectively. We applied random forest (RF) classifier, a machine learning technique, and multivariable logistic regression analysis to study our objectives.ResultsIn the TD and the eVD, respectively, the mean [standard deviation (SD)] age was 57.9 (18.0) and 52.4 (17.6) years; patients with severe lung involvement [n (%):185 (8.2) and 116 (11.7)] were significantly older [mean (SD) age: 64.2 (16.9), and 56.2 (18.9)] than the other two groups (mild and moderate). The mortality rate was higher in patients with severe (64.9 and 38.8%) compared to moderate (5.5 and 12.4%) and mild (2.3 and 7.1%) lung involvement. The RF analysis showed age, C reactive protein (CRP) levels, and duration of hospitalizations as the three most important predictors of lung involvement severity at the time of the first CT examination. Multivariable logistic regression analysis showed a significant strong association between the extent of the severity of lung involvement (continuous variable) and death; adjusted odds ratio (OR): 9.3; 95% CI: 7.1–12.1 in the TD and 2.6 (1.8–3.5) in the eVD.ConclusionIn hospitalized patients with COVID-19, the severity of lung involvement is a strong predictor of death. Age, CRP levels, and duration of hospitalizations are the most important predictors of severe lung involvement. A simple prediction model based on available clinical and imaging data provides a validated tool that predicts the severity of lung involvement and death probability among hospitalized patients with COVID-19.

  13. f

    Table_1_Neurological Comorbidity Is a Predictor of Death in Covid-19...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Jul 7, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abad-Molina, Cristina; de Lera, Mercedes; Pedraza, María; García-Azorín, David; Chavarría-Miranda, Alba; Talavera, Blanca; Vicente, Jose Manuel; Dueñas-Gutierrez, Carlos; Gómez-Herreras, Jose Ignacio; Ruiz-Martin, Guadalupe; Martínez-Velasco, Elena; Martínez-Pías, Enrique; Arenillas, Juan Francisco; Ezpeleta, David; de Paula, Jose María Prieto; Trigo, Javier; Bustamante-Munguira, Elena; Juarros, Santiago; Simón-Campo, Paula; Gómez-Vicente, Beatriz; Hernández-Pérez, Isabel; del Pozo-Vegas, Carlos; Cantón-Álvarez, Belén; Peñarrubia, María Jesús; López-Sanz, Cristina; Orduña-Domingo, Antonio; Valle-Peñacoba, Gonzalo; Gutiérrez-Sánchez, María; Jiménez-Cuenca, María Isabel; Sierra, Álvaro; Guerrero, Ángel (2020). Table_1_Neurological Comorbidity Is a Predictor of Death in Covid-19 Disease: A Cohort Study on 576 Patients.DOCX [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000574084
    Explore at:
    Dataset updated
    Jul 7, 2020
    Authors
    Abad-Molina, Cristina; de Lera, Mercedes; Pedraza, María; García-Azorín, David; Chavarría-Miranda, Alba; Talavera, Blanca; Vicente, Jose Manuel; Dueñas-Gutierrez, Carlos; Gómez-Herreras, Jose Ignacio; Ruiz-Martin, Guadalupe; Martínez-Velasco, Elena; Martínez-Pías, Enrique; Arenillas, Juan Francisco; Ezpeleta, David; de Paula, Jose María Prieto; Trigo, Javier; Bustamante-Munguira, Elena; Juarros, Santiago; Simón-Campo, Paula; Gómez-Vicente, Beatriz; Hernández-Pérez, Isabel; del Pozo-Vegas, Carlos; Cantón-Álvarez, Belén; Peñarrubia, María Jesús; López-Sanz, Cristina; Orduña-Domingo, Antonio; Valle-Peñacoba, Gonzalo; Gutiérrez-Sánchez, María; Jiménez-Cuenca, María Isabel; Sierra, Álvaro; Guerrero, Ángel
    Description

    Introduction: Prognosis of Coronavirus disease 2019 (Covid-19) patients with vascular risk factors, and certain comorbidities is worse. The impact of chronic neurological disorders (CND) on prognosis is unclear. We evaluated if the presence of CND in Covid-19 patients is a predictor of a higher in-hospital mortality. As secondary endpoints, we analyzed the association between CND, Covid-19 severity, and laboratory abnormalities during admission.Methods: Retrospective cohort study that included all the consecutive hospitalized patients with confirmed Covid-19 disease from March 8th to April 11th, 2020. The study setting was Hospital Clínico, tertiary academic hospital from Valladolid. CND was defined as those neurological conditions causing permanent disability. We assessed demography, clinical variables, Covid-19 severity, laboratory parameters and outcome. The primary endpoint was in-hospital all-cause mortality, evaluated by multivariate cox-regression log rank test. We analyzed the association between CND, covid-19 severity and laboratory abnormalities.Results: We included 576 patients, 43.3% female, aged 67.2 years in mean. CND were present in 105 (18.3%) patients. Patients with CND were older, more disabled, had more vascular risk factors and comorbidities and fewer clinical symptoms of Covid-19. They presented 1.43 days earlier to the emergency department. Need of ventilation support was similar. Presence of CND was an independent predictor of death (HR 2.129, 95% CI: 1.382–3.280) but not a severer Covid-19 disease (OR: 1.75, 95% CI: 0.970–3.158). Frequency of laboratory abnormalities was similar, except for procalcitonin and INR.Conclusions: The presence of CND is an independent predictor of mortality in hospitalized Covid-19 patients. That was not explained neither by a worse immune response to Covid-19 nor by differences in the level of care received by patients with CND.

  14. Covid19 Global Excess Deaths (daily updates)

    • kaggle.com
    zip
    Updated Dec 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Joakim Arvidsson (2025). Covid19 Global Excess Deaths (daily updates) [Dataset]. https://www.kaggle.com/datasets/joebeachcapital/covid19-global-excess-deaths-daily-updates
    Explore at:
    zip(2989004967 bytes)Available download formats
    Dataset updated
    Dec 2, 2025
    Authors
    Joakim Arvidsson
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Daily updates of Covid-19 Global Excess Deaths from the Economist's GitHub repository: https://github.com/TheEconomist/covid-19-the-economist-global-excess-deaths-model

    Interpreting estimates

    Estimating excess deaths for every country every day since the pandemic began is a complex and difficult task. Rather than being overly confident in a single number, limited data means that we can often only give a very very wide range of plausible values. Focusing on central estimates in such cases would be misleading: unless ranges are very narrow, the 95% range should be reported when possible. The ranges assume that the conditions for bootstrap confidence intervals are met. Please see our tracker page and methodology for more information.

    New variants

    The Omicron variant, first detected in southern Africa in November 2021, appears to have characteristics that are different to earlier versions of sars-cov-2. Where this variant is now dominant, this change makes estimates uncertain beyond the ranges indicated. Other new variants may do the same. As more data is incorporated from places where new variants are dominant, predictions improve.

    Non-reporting countries

    Turkmenistan and the Democratic People's Republic of Korea have not reported any covid-19 figures since the start of the pandemic. They also have not published all-cause mortality data. Exports of estimates for the Democratic People's Republic of Korea have been temporarily disabled as it now issues contradictory data: reporting a significant outbreak through its state media, but zero confirmed covid-19 cases/deaths to the WHO.

    Acknowledgements

    A special thanks to all our sources and to those who have made the data to create these estimates available. We list all our sources in our methodology. Within script 1, the source for each variable is also given as the data is loaded, with the exception of our sources for excess deaths data, which we detail in on our free-to-read excess deaths tracker as well as on GitHub. The gradient booster implementation used to fit the models is aGTBoost, detailed here.

    Calculating excess deaths for the entire world over multiple years is both complex and imprecise. We welcome any suggestions on how to improve the model, be it data, algorithm, or logic. If you have one, please open an issue.

    The Economist would also like to acknowledge the many people who have helped us refine the model so far, be it through discussions, facilitating data access, or offering coding assistance. A special thanks to Ariel Karlinsky, Philip Schellekens, Oliver Watson, Lukas Appelhans, Berent Å. S. Lunde, Gideon Wakefield, Johannes Hunger, Carol D'Souza, Yun Wei, Mehran Hosseini, Samantha Dolan, Mollie Van Gordon, Rahul Arora, Austin Teda Atmaja, Dirk Eddelbuettel and Tom Wenseleers.

    All coding and data collection to construct these models (and make them update dynamically) was done by Sondre Ulvund Solstad. Should you have any questions about them after reading the methodology, please open an issue or contact him at sondresolstad@economist.com.

    Suggested citation The Economist and Solstad, S. (corresponding author), 2021. The pandemic’s true death toll. [online] The Economist. Available at: https://www.economist.com/graphic-detail/coronavirus-excess-deaths-estimates [Accessed ---]. First published in the article "Counting the dead", The Economist, issue 20, 2021.

  15. d

    The geographic latitude-associated anti-COVID capacity index : an...

    • dataone.org
    • dataverse.harvard.edu
    Updated Nov 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Errasfa, Mourad (2023). The geographic latitude-associated anti-COVID capacity index : an epidemiologic, demographic, and climate-based parameter negatively correlated with the COVID-19 death tolls [Dataset]. http://doi.org/10.7910/DVN/AXNZUA
    Explore at:
    Dataset updated
    Nov 8, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Errasfa, Mourad
    Description

    During the first two year of the Covid-19 pandemic, deaths tolls differed from a country to another. In a previous research work on 39 countries, we have found that some population’s characteristics were either negatively (birth rate/mortality rate, fertility rate) or positively (cancer score, Alzheimer disease score, percent of people above 65 years old, levels of alcohol intake) correlated with Covid-19 mortality. We also found that low levels of climate factors (average annual temperature, average hours of sunshine, average annual level of UV index) were positively correlated with Covid-19 deaths numbers as well. In the present study, we have developped an anti-Covid Capacity index that takes into account all the above mentioned parameters. The polynomial analysis of the anti-Covid Capacity and its corresponding geographic latitude of each country has generated a bell-shaped curve, with a high coefficient of determination (R2= 0.78). Lower anti-Covid capacity values were recorded in countries of low and high latitudes, respectively. Instead, plotting covid-19 deaths numbers against geographic latitude levels has generated an inverted bell-shaped curve, with higher deaths numbers at low and high latitudes, respectively. The analysis by a simple linear regression has shown that Covid-19 deaths numbers were significantly (p= 2,40 x 10-9) and negatively correlated to the anti-Covid Capacity index values. Our data demonstrate that the negative prepandemic human conditions, and the low scores of both annual temperature and UV index in many countries were the key factors behind high Covid-19 mortality, and they can be expressed as a simple index of anti-Covid capacity of a country that can predict the death-associated severity of Covid-19 disease, and thus, according to a country’s geographic latitude.

  16. Infected and death cases Covid-19 of Bangladesh

    • kaggle.com
    zip
    Updated Nov 15, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Md. Akbar Hossain (2023). Infected and death cases Covid-19 of Bangladesh [Dataset]. https://www.kaggle.com/datasets/mdakbarhossain12/infected-and-death-cases-covid-19-of-bangladesh
    Explore at:
    zip(2840 bytes)Available download formats
    Dataset updated
    Nov 15, 2023
    Authors
    Md. Akbar Hossain
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Bangladesh
    Description

    Dataset Description: Infected and Death Cases of Covid-19 in Bangladesh This dataset contains detailed information on Covid-19 cases in Bangladesh, focusing on the number of new cases and deaths reported. The data spans from September 27, 2020, to November 19, 2021. The dataset is structured with three primary columns:

    Date: The date when the data was recorded, formatted as YYYY-MM-DD. New Cases: The number of new Covid-19 cases reported on the corresponding date. Deaths: The number of deaths attributed to Covid-19 on the corresponding date. Key Features: Time Range: Covers over a year of data, capturing various waves of the pandemic. Granularity: Daily records, providing detailed insights into the daily progression of the pandemic. Size: The dataset is compact, with a file size of 7.91 KB, making it easy to handle and analyze. Cite this paper

    @InProceedings{10.1007/978-981-19-2445-3_38, author="Rahman, Ashifur and Hossain, Md. Akbar and Moon, Mohasina Jannat", editor="Hossain, Sazzad and Hossain, Md. Shahadat and Kaiser, M. Shamim and Majumder, Satya Prasad and Ray, Kanad", title="An LSTM-Based Forecast Of COVID-19 For Bangladesh", booktitle="Proceedings of International Conference on Fourth Industrial Revolution and Beyond 2021 ", year="2022", publisher="Springer Nature Singapore", address="Singapore", pages="551--561", abstract="Preoperative events can be predicted using deep learning-based forecasting techniques. It can help to improve future decision-making. Deep learning has traditionally been used to identify and evaluate adverse risks in a variety of major applications. Numerous prediction approaches are commonly applied to deal with forecasting challenges. The number of infected people, as well as the mortality rate of COVID-19, is increasing every day. Many countries, including India, Brazil, and the United States, were severely affected; however, since the very first case was identified, the transmission rate has decreased dramatically after a set time period. Bangladesh, on the other hand, was unable to keep the rate of infection low. In this situation, several methods have been developed to forecast the number of affected, time to recover, and the number of deaths. This research illustrates the ability of DL models to forecast the number of affected and dead people as a result of COVID-19, which is now regarded as a possible threat to humanity. As part of this study, we developed an LSTM based method to predict the next 100 days of death and newly identified COVID-19 cases in Bangladesh. To do this experiment we collect data on death and newly detected COVID-19 cases through Bangladesh's national COVID-19 help desk website. After collecting data we processed it to make a dataset for training our LSTM model. After completing the training, we predict our model with the test dataset. The result of our model is very robust on the basis of the training and testing dataset. Finally, we forecast the subsequent 100 days of deaths and newly infected COVID-19 cases in Bangladesh.", isbn="978-981-19-2445-3" }

  17. M

    Data from: COVID-19 Forecasts: Deaths

    • catalog.midasnetwork.us
    Updated Mar 9, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Centers for Disease Control and Prevention (CDC) (2023). COVID-19 Forecasts: Deaths [Dataset]. https://catalog.midasnetwork.us/collection/147
    Explore at:
    Dataset updated
    Mar 9, 2023
    Dataset provided by
    MIDAS COORDINATION CENTER
    Authors
    Centers for Disease Control and Prevention (CDC)
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Area covered
    Country, State
    Variables measured
    Viruses, disease, COVID-19, modeling, pathogen, forecasting, Homo sapiens, host organism, mortality data, Population count, and 6 more
    Dataset funded by
    National Institute of General Medical Sciences
    Description

    The dataset contains observed and 4 weeks forecast new and total weekly COVID-19 deaths at national and state level until March 9, 2023. Forecasting teams predict numbers of deaths using different types of data (e.g., COVID-19 data, demographic data, mobility data), methods, and estimates of the impacts of interventions (e.g., social distancing, use of face coverings).

  18. COVID-19 Tweets, Vaccination, and Deaths Data

    • kaggle.com
    zip
    Updated May 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arya Gavande (2025). COVID-19 Tweets, Vaccination, and Deaths Data [Dataset]. https://www.kaggle.com/datasets/aryagavande/covid-19-tweets-vaccination-and-deaths-data/code
    Explore at:
    zip(357725 bytes)Available download formats
    Dataset updated
    May 29, 2025
    Authors
    Arya Gavande
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    This dataset merges three distinct data sources to explore the relationship between COVID-19 death rates, vaccination efforts, and public sentiment on Twitter from December 25, 2020 to March 29, 2022. It includes 2,000 cleaned rows with 16 variables, created by combining global health statistics and social media sentiment data.

    Sources & Variables:

    1. COVID-19 Deaths Data (scraped from Worldometer - COVID-19 Deaths via BeautifulSoup):

      • Date: Date of record
      • daily_increase_percent: % change in deaths from previous day
      • Season: Derived from date (Winter, Spring, Summer, Fall)
    2. Tweet Sentiment Data : COVID Vaccine Tweets Dataset

      • Date: Tweet timestamp
      • text_sentiment: Sentiment label (positive, neutral, negative) from NLTK’s SentimentIntensityAnalyzer
      • user_verified: Whether the user is verified
      • user_since_days: Age of the Twitter account (in days)
      • country: Cleaned user location
    3. Vaccination Data : Vaccination Dataset

      • Date: Date of record
      • total_vaccinations_per_hundred: Doses per 100 people
      • daily_vaccinations: Daily dose count
      • vaccine_group: Grouped vaccine type (e.g., mRNA, Viral Vector)
      • country: Country name

    Preprocessing Summary:

    • Merged by Date and country
    • Cleaned invalid country names (e.g., “moon”, “nowhere”)
    • Standardized all datetime formats
    • Removed entries with missing or unreliable values
    • Created derived variables: Season, user_since_days, vaccine_group

    This dataset was used in a final data science project to:

    • Classify public sentiment toward vaccines using health indicators
    • Predict daily COVID-19 death counts using sentiment and vaccination data
  19. Development and validation of a machine learning model for use as an...

    • zenodo.org
    • data.niaid.nih.gov
    • +1more
    bin
    Updated Jun 15, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anna Stachel; Anna Stachel (2020). Development and validation of a machine learning model for use as an automated artificial intelligence tool to predict mortality risk in patients with COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.3893846
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 15, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Anna Stachel; Anna Stachel
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background

    New York City quickly became an epicenter of the COVID-19 pandemic. Due to a sudden and massive increase in patients during COVID-19 pandemic, healthcare providers incurred an exponential increase in workload which created a strain on the staff and limited resources. As this is a new infection, predictors of morbidity and mortality are not well characterized.

    Methods

    We developed a prediction model to predict patients at risk for mortality using only laboratory, vital and demographic information readily available in the electronic health record on more than 3000 hospital admissions with COVID-19. A variable importance algorithm was used for interpretability and understanding of performance and predictors.

    Findings

    We built a model with 84-97% accuracy to identify predictors and patients with high risk of mortality, and developed an automated artificial intelligence (AI) notification tool that does not require manual calculation by the busy clinician. Oximetry, respirations, blood urea nitrogen, lymphocyte percent, calcium, troponin and neutrophil percentage were important features and key ranges were identified that contributed to a 50% increase in patients’ mortality prediction score. With an increasing negative predictive value (NPV) starting 0.90 after the second day of admission, we are able more confidently able identify likely survivors. This study serves as a use case of a model with visualizations to aide clinicians with a better understanding of the model and predictors of mortality. Additionally, an example of the operationalization of the model via an AI notification tool is illustrated.

  20. Covid-19 Data - Excess Death Increase 2020

    • kaggle.com
    zip
    Updated Jan 6, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    SnowyOwl (2021). Covid-19 Data - Excess Death Increase 2020 [Dataset]. https://www.kaggle.com/kyleberdy/covid19-data-excess-death-increase-2020
    Explore at:
    zip(435002 bytes)Available download formats
    Dataset updated
    Jan 6, 2021
    Authors
    SnowyOwl
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Description

    Content

    This data represents excess deaths in an area (more deaths than expected). Although causes are recorded, one can see that excess deaths increased dramatically in 2020.

    Acknowledgements

    Original Source: https://catalog.data.gov/dataset/excess-deaths-associated-with-covid-19-35b8c

    Inspiration

    Using the Excess Death counts, make predictions, per day, as to what the hospital case load, and death load, for covid cases will be. Can use any source (local, federal, international) as the benchmark.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi (2021). Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0000850298

Data_Sheet_1_Toward a Country-Based Prediction Model of COVID-19 Infections and Deaths Between Disease Apex and End: Evidence From Countries With Contained Numbers of COVID-19.PDF

Explore at:
Dataset updated
Jun 10, 2021
Authors
Howard, Scott C.; Li, Zhijun; Wang, Lishi; Xie, Ning; Gu, Tianshu; Wang, Yongjun; Postlethwaite, Arnold; Gu, Weikuan; Meng, Xia; Aleya, Lotfi
Description

The complexity of COVID-19 and variations in control measures and containment efforts in different countries have caused difficulties in the prediction and modeling of the COVID-19 pandemic. We attempted to predict the scale of the latter half of the pandemic based on real data using the ratio between the early and latter halves from countries where the pandemic is largely over. We collected daily pandemic data from China, South Korea, and Switzerland and subtracted the ratio of pandemic days before and after the disease apex day of COVID-19. We obtained the ratio of pandemic data and created multiple regression models for the relationship between before and after the apex day. We then tested our models using data from the first wave of the disease from 14 countries in Europe and the US. We then tested the models using data from these countries from the entire pandemic up to March 30, 2021. Results indicate that the actual number of cases from these countries during the first wave mostly fall in the predicted ranges of liniar regression, excepting Spain and Russia. Similarly, the actual deaths in these countries mostly fall into the range of predicted data. Using the accumulated data up to the day of apex and total accumulated data up to March 30, 2021, the data of case numbers in these countries are falling into the range of predicted data, except for data from Brazil. The actual number of deaths in all the countries are at or below the predicted data. In conclusion, a linear regression model built with real data from countries or regions from early pandemics can predict pandemic scales of the countries where the pandemics occur late. Such a prediction with a high degree of accuracy provides valuable information for governments and the public.

Search
Clear search
Close search
Google apps
Main menu