Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
The World Health Organization (WHO) characterized the COVID-19, caused by the SARS-CoV-2, as a pandemic on March 11, while the exponential increase in the number of cases was risking to overwhelm health systems around the world with a demand for ICU beds far above the existing capacity, with regions of Italy being prominent examples.
Brazil recorded the first case of SARS-CoV-2 on February 26, and the virus transmission evolved from imported cases only, to local and finally community transmission very rapidly, with the federal government declaring nationwide community transmission on March 20.
Until March 27, the state of São Paulo had recorded 1,223 confirmed cases of COVID-19, with 68 related deaths, while the county of São Paulo, with a population of approximately 12 million people and where Hospital Israelita Albert Einstein is located, had 477 confirmed cases and 30 associated death, as of March 23. Both the state and the county of São Paulo decided to establish quarantine and social distancing measures, that will be enforced at least until early April, in an effort to slow the virus spread.
One of the motivations for this challenge is the fact that in the context of an overwhelmed health system with the possible limitation to perform tests for the detection of SARS-CoV-2, testing every case would be impractical and tests results could be delayed even if only a target subpopulation would be tested.
This dataset contains anonymized data from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 RT-PCR and additional laboratory tests during a visit to the hospital.
All data were anonymized following the best international practices and recommendations. All clinical data were standardized to have a mean of zero and a unit standard deviation.
TASK 1 • Predict confirmed COVID-19 cases among suspected cases. Based on the results of laboratory tests commonly collected for a suspected COVID-19 case during a visit to the emergency room, would it be possible to predict the test result for SARS-Cov-2 (positive/negative)?
TASK 2 • Predict admission to general ward, semi-intensive unit or intensive care unit among confirmed COVID-19 cases. Based on the results of laboratory tests commonly collected among confirmed COVID-19 cases during a visit to the emergency room, would it be possible to predict which patients will need to be admitted to a general ward, semi-intensive unit or intensive care unit?
Submit a notebook that implements the full lifecycle of data preparation, model creation and evaluation. Feel free to use this dataset plus any other data you have available. Since this is not a formal competition, you're not submitting a single submission file, but rather your whole approach to building a model.
This is not a formal competition, so we won't measure the results strictly against a given validation set using a strict metric. Rather, what we'd like to see is a well-defined process to build a model that can deliver decent results (evaluated by yourself).
Our team will be looking at: 1. Model Performance - How well does the model perform on the real data? Can it be generalized over time? Can it be applied to other scenarios? Was it overfit? 2. Data Preparation - How well was the data analysed prior to feeding it into the model? Are there any useful visualisations? Does the reader learn any new techniques through this submission? A great entry will be informative, thought provoking, and fresh all at the same time. 3. Documentation - Are your code, and notebook, and additional data sources well documented so a reader can understand what you did? Are your sources clearly cited? A high quality analysis should be concise and clear at each step so the rationale is easy to follow and the process is reproducible.
Additional questions and clarifications can be obtained at data4u@einstein.br
Decision making by health care professionals is a complex process, when physicians see a patient for the first time with an acute complaint (e.g., recent onset of fever and respiratory symptoms) they will take a medical history, perform a physical examination, and will base their decisions on this information. To order or not laboratory tests, and which ones to order, is among these decisions, and there is no standard set of tests that are ordered to every individual or to a specific condition. This will depend on the complaints, the findings on the physical examination, personal medical history (e.g., current and prior diagnosed diseases, medications under use, prior surgeries, vaccination), lifestyle habits (e.g., smoking, alcohol use, exercising), family medical history, and prior exposures (e.g., traveling, occupation). The dataset reflects the complexity of decision making during routine clinical care, as opposed to what happens on a more controlled research setting, and data sparsity is, therefore, expected.
We understand that clinical and exposure data, in addition to the laboratory results, are invaluable information to be added to the models, but at this moment they are not available.
A main objective of this challenge is to develop a generalizable model that could be useful during routine clinical care, and although which laboratory exams are ordered can vary for different individuals, even with the same condition, we aimed at including laboratory tests more commonly order during a visit to the emergency room. So, if you found some additional laboratory test that was not included, it is because it was not considered as commonly order in this situation.
Hospital Israelita Albert Einstein would like to thank you for all the effort and time dedicated to this challenge, the community interest and the number of contributions have surpassed our expectations, and we are extremely satisfied with the results.
These have been challenging times, and we believe that promoting information sharing and collaboration will be crucial to gain insights, as fast as possible, that could help to implement measures to diminish the burden of COVID-19.
The multitude of solutions presented focusing on different aspects of the problem could represent a valuable resource in the evaluation of different strategies to implement predictive models for COVID-19. Besides the data visualization methods employed could make it easier for multidisciplinary teams to collaborate around COVID-19 real-world data.
Although this was not a competition, we would like to highlight some solutions, based on the community and our review of results.
Lucas Moda (https://www.kaggle.com/lukmoda/covid-19-optimizing-recall-with-smote) utilized interesting data visualization methods for the interpretability of models. Fellipe Gomes (https://www.kaggle.com/gomes555/task2-covid-19-admission-ac-94-sens-0-92-auc-0-96) used concise descriptions of the data and model results. We saw interesting ideas for visualizing and understanding the data, like the dendrogram used by CaesarLupum (https://www.kaggle.com/caesarlupum/brazil-against-the-advance-of-covid-19). Ossamu (https://www.kaggle.com/ossamum/eda-and-feat-import-recall-0-95-roc-auc-0-61) also sought to evaluate several data resampling techniques, to verify how it can improve the performance of predictive models, which was also done by Kaike Reis (https://www.kaggle.com/kaikewreis/a-second-end-to-end-solution-for-covid-19) . Jairo Freitas & Christian Espinoza (https://www.kaggle.com/jairofreitas/covid-19-influence-of-exams-in-recall-precision) sought to understand the distribution of exams regarding the outcomes of task 2, to support the decisions to be made in the construction of predictive models.
We thank you all for the feedback on available data, helping to show its potential, and taking the challenge of dealing with real data feed. Your efforts let the feeling that it is possible to build good predictive models in real life healthcare settings.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
This publication was archived on 12 October 2023. Please see the Viral Respiratory Diseases (Including Influenza and COVID-19) in Scotland publication for the latest data. This dataset provides information on number of new daily confirmed cases, negative cases, deaths, testing by NHS Labs (Pillar 1) and UK Government (Pillar 2), new hospital admissions, new ICU admissions, hospital and ICU bed occupancy from novel coronavirus (COVID-19) in Scotland, including cumulative totals and population rates at Scotland, NHS Board and Council Area levels (where possible). Seven day positive cases and population rates are also presented by Neighbourhood Area (Intermediate Zone 2011). Information on how PHS publish small are COVID figures is available on the PHS website. Information on demographic characteristics (age, sex, deprivation) of confirmed novel coronavirus (COVID-19) cases, as well as trend data regarding the wider impact of the virus on the healthcare system is provided in this publication. Data includes information on primary care out of hours consultations, respiratory calls made to NHS24, contact with COVID-19 Hubs and Assessment Centres, incidents received by Scottish Ambulance Services (SAS), as well as COVID-19 related hospital admissions and admissions to ICU (Intensive Care Unit). Further data on the wider impact of the COVID-19 response, focusing on hospital admissions, unscheduled care and volume of calls to NHS24, is available on the COVID-19 Wider Impact Dashboard. Novel coronavirus (COVID-19) is a new strain of coronavirus first identified in Wuhan, China. Clinical presentation may range from mild-to-moderate illness to pneumonia or severe acute respiratory infection. COVID-19 was declared a pandemic by the World Health Organisation on 12 March 2020. We now have spread of COVID-19 within communities in the UK. Public Health Scotland no longer reports the number of COVID-19 deaths within 28 days of a first positive test from 2nd June 2022. Please refer to NRS death certificate data as the single source for COVID-19 deaths data in Scotland. In the process of updating the hospital admissions reporting to include reinfections, we have had to review existing methodology. In order to provide the best possible linkage of COVID-19 cases to hospital admissions, each admission record is required to have a discharge date, to allow us to better match the most appropriate COVID positive episode details to an admission. This means that in cases where the discharge date is missing (either due to the patient still being treated, delays in discharge information being submitted or data quality issues), it has to be estimated. Estimating a discharge date for historic records means that the average stay for those with missing dates is reduced, and fewer stays overlap with records of positive tests. The result of these changes has meant that approximately 1,200 historic COVID admissions have been removed due to improvements in methodology to handle missing discharge dates, while approximately 820 have been added to the cumulative total with the inclusion of reinfections. COVID-19 hospital admissions are now identified as the following: A patient's first positive PCR or LFD test of the episode of infection (including reinfections at 90 days or more) for COVID-19 up to 14 days prior to admission to hospital, on the day of their admission or during their stay in hospital. If a patient's first positive PCR or LFD test of the episode of infection is after their date of discharge from hospital, they are not included in the analysis. Information on COVID-19, including stay at home advice for people who are self-isolating and their households, can be found on NHS Inform. Data visualisation of Scottish COVID-19 cases is available on the Public Health Scotland - Covid 19 Scotland dashboard. Further information on coronavirus in Scotland is available on the Scottish Government - Coronavirus in Scotland page, where further breakdown of past coronavirus data has also been published.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Coronavirus Disease 2019 (COVID-19) has spread all over the world and impacted many people’s lives. The characteristics of COVID-19 and other types of pneumonia have both similarities and differences, which confused doctors initially to separate and understand them. Here we presented a retrospective analysis for both COVID-19 and other types of pneumonia by combining the COVID-19 clinical data, eICU and MIMIC-III databases. Machine learning models, including logistic regression, random forest, XGBoost and deep learning neural networks, were developed to predict the severity of COVID-19 infections as well as the mortality of pneumonia patients in intensive care units (ICU). Statistical analysis and feature interpretation, including the analysis of two-level attention mechanisms on both temporal and non-temporal features, were utilized to understand the associations between different clinical variables and disease outcomes. For the COVID-19 data, the XGBoost model obtained the best performance on the test set (AUROC = 1.000 and AUPRC = 0.833). On the MIMIC-III and eICU pneumonia datasets, our deep learning model (Bi-LSTM_Attn) was able to identify clinical variables associated with death of pneumonia patients (AUROC = 0.924 and AUPRC = 0.802 for 24-hour observation window and 12-hour prediction window). The results highlighted clinical indicators, such as the lymphocyte counts, that may help the doctors to predict the disease progression and outcomes for both COVID-19 and other types of pneumonia.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project