Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Facebook
TwitterBy Valtteri Kurkela [source]
The dataset is constantly updated and synced hourly to ensure up-to-date information. With over several columns available for analysis and exploration purposes, users can extract valuable insights from this extensive dataset.
Some of the key metrics covered in the dataset include:
Vaccinations: The dataset covers total vaccinations administered worldwide as well as breakdowns of people vaccinated per hundred people and fully vaccinated individuals per hundred people.
Testing & Positivity: Information on total tests conducted along with new tests conducted per thousand people is provided. Additionally, details on positive rate (percentage of positive Covid-19 tests out of all conducted) are included.
Hospital & ICU: Data on ICU patients and hospital patients are available along with corresponding figures normalized per million people. Weekly admissions to intensive care units and hospitals are also provided.
Confirmed Cases: The number of confirmed Covid-19 cases globally is captured in both absolute numbers as well as normalized values representing cases per million people.
5.Confirmed Deaths: Total confirmed deaths due to Covid-19 worldwide are provided with figures adjusted for population size (total deaths per million).
6.Reproduction Rate: The estimated reproduction rate (R) indicates the contagiousness of the virus within a particular country or region.
7.Policy Responses: Besides healthcare-related metrics, this comprehensive dataset includes policy responses implemented by countries or regions such as lockdown measures or travel restrictions.
8.Other Variables of InterestThe data encompasses various socioeconomic factors that may influence Covid-19 outcomes including population density,membership in a continent,gross domestic product(GDP)per capita;
For demographic factors: -Age Structure : percentage populations aged 65 and older,aged (70)older,median age -Gender-specific factors: Percentage of female smokers -Lifestyle-related factors: Diabetes prevalence rate and extreme poverty rate
- Excess Mortality: The dataset further provides insights into excess mortality rates, indicating the percentage increase in deaths above the expected number based on historical data.
The dataset consists of numerous columns providing specific information for analysis, such as ISO code for countries/regions, location names,and units of measurement for different parameters.
Overall,this dataset serves as a valuable resource for researchers, analysts, and policymakers seeking to explore various aspects related to Covid-19
Introduction:
Understanding the Basic Structure:
- The dataset consists of various columns containing different data related to vaccinations, testing, hospitalization, cases, deaths, policy responses, and other key variables.
- Each row represents data for a specific country or region at a certain point in time.
Selecting Desired Columns:
- Identify the specific columns that are relevant to your analysis or research needs.
- Some important columns include population, total cases, total deaths, new cases per million people, and vaccination-related metrics.
Filtering Data:
- Use filters based on specific conditions such as date ranges or continents to focus on relevant subsets of data.
- This can help you analyze trends over time or compare data between different regions.
Analyzing Vaccination Metrics:
- Explore variables like total_vaccinations, people_vaccinated, and people_fully_vaccinated to assess vaccination coverage in different countries.
- Calculate metrics such as people_vaccinated_per_hundred or total_boosters_per_hundred for standardized comparisons across populations.
Investigating Testing Information:
- Examine columns such as total_tests, new_tests, and tests_per_case to understand testing efforts in various countries.
- Calculate rates like tests_per_case to assess testing efficiency or identify changes in testing strategies over time.
Exploring Hospitalization and ICU Data:
- Analyze variables like hosp_patients, icu_patients, and hospital_beds_per_thousand to understand healthcare systems' strain.
- Calculate rates like icu_patients_per_million or hosp_patients_per_million for cross-country comparisons.
Assessing Covid-19 Cases and Deaths:
- Analyze variables like total_cases, new_ca...
Facebook
TwitterNotice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Facebook
TwitterNote: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
After observing many naive conversations about COVID-19, claiming that the pandemic can be blamed on just a few factors, I decided to create a data set, to map a number of different data points to every U.S. state (including D.C. and Puerto Rico).
This data set contains basic COVID-19 information about each state, such as total population, total COVID-19 cases, cases per capita, COVID-19 deaths and death rate, Mask mandate start, and end dates, mask mandate duration (in days), and vaccination rates.
However, when evaluating a pandemic (specifically a respiratory virus) it would be wise to also explore the population density of each state, which is also included. For those interested, I also included political party affiliation for each state ("D" for Democrat, "R" for Republican, and "I" for Puerto Rico). Vaccination rates are split into 1-dose and 2-dose rates.
Also included is data ranking the Well-Being Index and Social Determinantes of Health Index for each state (2019). There are also several other columns that "rank" states, such as ranking total cases per state (ascending), total cases per capita per state (ascending), population density rank (ascending), and 2-dose vaccine rate rank (ascending). There are also columns that compare deviation between columns: case count rank vs population density rank (negative numbers indicate that a state has more COVID-19 cases, despite being lower in population density, while positive numbers indicate the opposite), as well as per-capita case count vs density.
Several Statista Sources: * COVID-19 Cases in the US * Population Density of US States * COVID-19 Cases in the US per-capita * COVID-19 Vaccination Rates by State
Other sources I'd like to acknowledge: * Ballotpedia * DC Policy Center * Sharecare Well-Being Index * USA Facts * World Population Overview
I would like to see if any new insights could be made about this pandemic, where states failed, or if these case numbers are 100% expected for each state.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.
The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.
The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .
The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .
The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.
This dataset includes a count and rate per 100,000 population for COVID-19 cases, a count of COVID-19 molecular diagnostic tests, and a percent positivity rate for tests among people living in community settings for the previous two-week period. Dates are based on date of specimen collection (cases and positivity).
A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case.
Percent positivity is calculated as the number of positive tests among community residents conducted during the 14 days divided by the total number of positive and negative tests among community residents during the same period. If someone was tested more than once during that 14 day period, then those multiple test results (regardless of whether they were positive or negative) are included in the calculation.
These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities.
These data are updated weekly and reflect the previous two full Sunday-Saturday (MMWR) weeks (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf).
DPH note about change from 7-day to 14-day metrics: Prior to 10/15/2020, these metrics were calculated using a 7-day average rather than a 14-day average. The 7-day metrics are no longer being updated as of 10/15/2020 but the archived dataset can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/s22x-83rd
As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well.
With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county).
Additional notes: As of 11/5/2020, CT DPH has added antigen testing for SARS-CoV-2 to reported test counts in this dataset. The tests included in this dataset include both molecular and antigen datasets. Molecular tests reported include polymerase chain reaction (PCR) and nucleic acid amplicfication (NAAT) tests.
The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.
Data suppression is applied when the rate is <5 cases per 100,000 or if there are <5 cases within the town. Information on why data suppression rules are applied can be found online here: https://www.cdc.gov/cancer/uscs/technical_notes/stat_methods/suppression.htm
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
From World Health Organization - On 31 December 2019, WHO was alerted to several cases of pneumonia in Wuhan City, Hubei Province of China. The virus did not match any other known virus. This raised concern because when a virus is new, we do not know how it affects people.
So daily level information on the affected people can give some interesting insights when it is made available to the broader data science community.
Johns Hopkins University has made an excellent dashboard using the affected cases data. Data is extracted from the google sheets associated and made available here.
Now data is available as csv files in the Johns Hopkins Github repository. Please refer to the github repository for the Terms of Use details. Uploading it here for using it in Kaggle kernels and getting insights from the broader DS community.
2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus) identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people - CDC
This dataset has daily level information on the number of affected cases, deaths and recovery from 2019 novel coronavirus. Please note that this is a time series data and so the number of cases on any given day is the cumulative number.
The data is available from 22 Jan, 2020.
Here’s a polished version suitable for a professional Kaggle dataset description:
This dataset contains time-series and case-level records of the COVID-19 pandemic. The primary file is covid_19_data.csv, with supporting files for earlier records and individual-level line list data.
This is the primary dataset and contains aggregated COVID-19 statistics by location and date.
This file contains earlier COVID-19 records. It is no longer updated and is provided only for historical reference. For current analysis, please use covid_19_data.csv.
This file provides individual-level case information, obtained from an open data source. It includes patient demographics, travel history, and case outcomes.
Another individual-level case dataset, also obtained from public sources, with detailed patient-level information useful for micro-level epidemiological analysis.
✅ Use covid_19_data.csv for up-to-date aggregated global trends.
✅ Use the line list datasets for detailed, individual-level case analysis.
If you are interested in knowing country level data, please refer to the following Kaggle datasets:
India - https://www.kaggle.com/sudalairajkumar/covid19-in-india
South Korea - https://www.kaggle.com/kimjihoo/coronavirusdataset
Italy - https://www.kaggle.com/sudalairajkumar/covid19-in-italy
Brazil - https://www.kaggle.com/unanimad/corona-virus-brazil
USA - https://www.kaggle.com/sudalairajkumar/covid19-in-usa
Switzerland - https://www.kaggle.com/daenuprobst/covid19-cases-switzerland
Indonesia - https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases
Johns Hopkins University for making the data available for educational and academic research purposes
MoBS lab - https://www.mobs-lab.org/2019ncov.html
World Health Organization (WHO): https://www.who.int/
DXY.cn. Pneumonia. 2020. http://3g.dxy.cn/newh5/view/pneumonia.
BNO News: https://bnonews.com/index.php/2020/02/the-latest-coronavirus-cases/
National Health Commission of the People’s Republic of China (NHC): http://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml
China CDC (CCDC): http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
Macau Government: https://www.ssm.gov.mo/portal/
Taiwan CDC: https://sites.google....
Facebook
TwitterDPH note about change from 7-day to 14-day metrics: As of 10/15/2020, this dataset is no longer being updated. Starting on 10/15/2020, these metrics will be calculated using a 14-day average rather than a 7-day average. The new dataset using 14-day averages can be accessed here: https://data.ct.gov/Health-and-Human-Services/COVID-19-case-rate-per-100-000-population-and-perc/hree-nys2 As you know, we are learning more about COVID-19 all the time, including the best ways to measure COVID-19 activity in our communities. CT DPH has decided to shift to 14-day rates because these are more stable, particularly at the town level, as compared to 7-day rates. In addition, since the school indicators were initially published by DPH last summer, CDC has recommended 14-day rates and other states (e.g., Massachusetts) have started to implement 14-day metrics for monitoring COVID transmission as well. With respect to geography, we also have learned that many people are looking at the town-level data to inform decision making, despite emphasis on the county-level metrics in the published addenda. This is understandable as there has been variation within counties in COVID-19 activity (for example, rates that are higher in one town than in most other towns in the county). This dataset includes a weekly count and weekly rate per 100,000 population for COVID-19 cases, a weekly count of COVID-19 PCR diagnostic tests, and a weekly percent positivity rate for tests among people living in community settings. Dates are based on date of specimen collection (cases and positivity). A person is considered a new case only upon their first COVID-19 testing result because a case is defined as an instance or bout of illness. If they are tested again subsequently and are still positive, it still counts toward the test positivity metric but they are not considered another case. These case and test counts do not include cases or tests among people residing in congregate settings, such as nursing homes, assisted living facilities, or correctional facilities. These data are updated weekly; the previous week period for each dataset is the previous Sunday-Saturday, known as an MMWR week (https://wwwn.cdc.gov/nndss/document/MMWR_week_overview.pdf). The date listed is the date the dataset was last updated and corresponds to a reporting period of the previous MMWR week. For instance, the data for 8/20/2020 corresponds to a reporting period of 8/9/2020-8/15/2020. Notes: 9/25/2020: Data for Mansfield and Middletown for the week of Sept 13-19 were unavailable at the time of reporting due to delays in lab reporting.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.
Gender * The City collects information on gender identity using these guidelines.
Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 cases reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to the California Department of Public Health, Virtual Assistant information gathering beginning December 2021. The Virtual Assistant is only sent to adults who are 18+ years old. https://www.sfdph.org/dph/files/PoliciesProcedures/COM9_SexualOrientationGuidelines.pdf">Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission Type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset has been archived and will no longer update as of 9/11/2023.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cases on each date.
New cases are the count of cases within that characteristic group where the positive tests were collected on that specific specimen collection date. Cumulative cases are the running total of all San Francisco cases in that characteristic group up to the specimen collection date listed.
This data may not be immediately available for recently reported cases. Data updates as more information becomes available.
To explore data on the total number of cases, use the ARCHIVED: COVID-19 Cases Over Time dataset.
E. CHANGE LOG
Facebook
TwitterThis dataset is a per-state amalgamation of demographic, public health and other relevant predictors for COVID-19.
Used positive, death and totalTestResults from the API for, respectively, Infected, Deaths and Tested in this dataset.
Please read the documentation of the API for more context on those columns
Density is people per meter squared https://worldpopulationreview.com/states/
https://worldpopulationreview.com/states/gdp-by-state/
https://worldpopulationreview.com/states/per-capita-income-by-state/
https://en.wikipedia.org/wiki/List_of_U.S._states_by_Gini_coefficient
Rates from Feb 2020 and are percentage of labor force
https://www.bls.gov/web/laus/laumstrk.htm
Ratio is Male / Female
https://www.kff.org/other/state-indicator/distribution-by-gender/
https://worldpopulationreview.com/states/smoking-rates-by-state/
Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/flu_pneumonia_mortality/flu_pneumonia.htm
Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/lung_disease_mortality/lung_disease.htm
https://www.kff.org/other/state-indicator/total-active-physicians/
https://www.kff.org/other/state-indicator/total-hospitals
Includes spending for all health care services and products by state of residence. Hospital spending is included and reflects the total net revenue. Costs such as insurance, administration, research, and construction expenses are not included.
https://www.kff.org/other/state-indicator/avg-annual-growth-per-capita/
Pollution: Average exposure of the general public to particulate matter of 2.5 microns or less (PM2.5) measured in micrograms per cubic meter (3-year estimate)
https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL
For each state, number of medium and large airports https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_the_United_States
Note that FL was incorrect in the table, but is corrected in the Hottest States paragraph
https://worldpopulationreview.com/states/average-temperatures-by-state/
District of Columbia temperature computed as the average of Maryland and Virginia
Urbanization as a percentage of the population https://www.icip.iastate.edu/tables/population/urban-pct-states
https://www.kff.org/other/state-indicator/distribution-by-age/
Schools that haven't closed are marked NaN https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html
Note that some datasets above did not contain data for District of Columbia, this missing data was found via Google searches manually entered.
Facebook
TwitterData for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Updated (Bivalent) Booster Status. Click 'More' for important dataset description and footnotes
Webpage: https://covid.cdc.gov/covid-data-tracker/#rates-by-vaccine-status
Dataset and data visualization details:
These data were posted and archived on May 30, 2023 and reflect cases among persons with a positive specimen collection date through April 22, 2023, and deaths among persons with a positive specimen collection date through April 1, 2023. These data will no longer be updated after May 2023.
Vaccination status: A person vaccinated with at least a primary series had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably completing the primary series of an FDA-authorized or approved COVID-19 vaccine. An unvaccinated person had SARS-CoV-2 RNA or antigen detected on a respiratory specimen and has not been verified to have received COVID-19 vaccine. Excluded were partially vaccinated people who received at least one FDA-authorized vaccine dose but did not complete a primary series ≥14 days before collection of a specimen where SARS-CoV-2 RNA or antigen was detected. A person vaccinated with a primary series and a monovalent booster dose had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably receiving a primary series of an FDA-authorized or approved vaccine and at least one additional dose of any monovalent FDA-authorized or approved COVID-19 vaccine on or after August 13, 2021. (Note: this definition does not distinguish between vaccine recipients who are immunocompromised and are receiving an additional dose versus those who are not immunocompromised and receiving a booster dose.) A person vaccinated with a primary series and an updated (bivalent) booster dose had SARS-CoV-2 RNA or antigen detected in a respiratory specimen collected ≥14 days after verifiably receiving a primary series of an FDA-authorized or approved vaccine and an additional dose of any bivalent FDA-authorized or approved vaccine COVID-19 vaccine on or after September 1, 2022. (Note: Doses with bivalent doses reported as first or second doses are classified as vaccinated with a bivalent booster dose.) People with primary series or a monovalent booster dose were combined in the “vaccinated without an updated booster” category.
Deaths: A COVID-19–associated death occurred in a person with a documented COVID-19 diagnosis who died; health department staff reviewed to make a determination using vital records, public health investigation, or other data sources. Per the interim guidance of the Council of State and Territorial Epidemiologists (CSTE), this should include persons whose death certificate lists COVID-19 disease or SARS-CoV-2 as the underlying cause of death or as a significant condition contributing to death. Rates of COVID-19 deaths by vaccination status are primarily reported based on when the patient was tested for COVID-19. In select jurisdictions, deaths are included that are not laboratory confirmed and are reported based on alternative dates (i.e., onset date for most; or date of death or report date, where onset date is unavailable). Deaths usually occur up to 30 days after COVID-19 diagnosis.
Participating jurisdictions: Currently, these 24 health departments that regularly link their case surveillance to immunization information system data are included in these incidence rate estimates: Alabama, Arizona, Colorado, District of Columbia, Georgia, Idaho, Indiana, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Nebraska, New Jersey, New Mexico, New York, New York City (NY), North Carolina, Rhode Island, Tennessee, Texas, Utah, and West Virginia; 23 jurisdictions also report deaths among vaccinated and unvaccinated people. These jurisdictions represent 48% of the total U.S. population and all ten of the Health and Human Services Regions. This list will be
Facebook
TwitterThis is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). This database was created in response to the Coronavirus public health emergency to track reported cases in real-time. The data include the location and number of confirmed COVID-19 cases, deaths and recoveries for all affected countries, aggregated at the appropriate province or state. It was developed to enable researchers, public health authorities and the general public to track the outbreak as it unfolds. Additional information is available in the blog post, Mapping 2019-nCoV (https://systems.jhu.edu/research/public-health/ncov/), and included data sources are listed here: https://github.com/CSSEGISandData/COVID-19
How many confirmed COVID-19 cases were there in the US, by state?
This query determines the total number of cases by province in February. A "province_state" can refer to any subset of the US in this particular dataset, including a county or state.
SELECT
province_state,
confirmed AS feb_confirmed_cases,
FROM
bigquery-public-data.covid19_jhu_csse.summary
WHERE
country_region = "US"
AND date = '2020-02-29'
ORDER BY
feb_confirmed_cases desc
Which countries with the highest number of confirmed cases have the most per capita? This query joins the Johns Hopkins dataset with the World Bank's global population data to determine which countries among those with the highest total number of confirmed cases have the most confirmed cases per capita.
with country_pop AS(
SELECT
IF(country = "United States","US",IF(country="Iran, Islamic Rep.","Iran",country)) AS country,
year_2018
FROM
bigquery-public-data.world_bank_global_population.population_by_country)
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "US"
AND country_pop.country = "US"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY
country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "France"
AND country_pop.country = "France"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY
country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
SUM(cases.confirmed) AS total_confirmed_cases,
SUM(cases.confirmed)/AVG(country_pop.year_2018) * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region = "China"
AND country_pop.country = "China"
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
GROUP BY country_region, date
UNION ALL
SELECT
cases.date AS date,
cases.country_region AS country_region,
cases.confirmed AS total_confirmed_cases,
cases.confirmed/country_pop.year_2018 * 100000 AS confirmed_cases_per_100000
FROM
bigquery-public-data.covid19_jhu_csse.summary cases
JOIN
country_pop ON cases.country_region LIKE CONCAT('%',country_pop.country,'%')
WHERE
cases.country_region IN ("Italy", "Spain", "Germany", "Iran")
AND cases.date = DATE_SUB(current_date(),INTERVAL 1 day)
ORDER BY
confirmed_cases_per_100000 desc
JHU CSSE
Daily
Facebook
TwitterNote: As of 10/28/2021 this dataset is no longer being updated. For more information about COVID-19 cases by vaccination status, visit the Department of Public Health's daily report here: https://data.ct.gov/stories/s/q5as-kyim Cases of COVID-19 by vaccination status by weekly reporting period. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Since February 2021, cases of COVID-19 among fully vaccinated persons (e.g., vaccine breakthrough cases) were identified based on a medical provider report to DPH identifying such cases. Recently, DPH developed a process that matches COVID-19 case data with the vaccine registry to determine which cases meet the definition of being fully vaccinated and are also vaccine breakthrough cases. A case of COVID-19 in a fully vaccinated person (e.g., vaccine breakthrough case) is defined as a person who has a positive PCR/NAAT or antigen test in a respiratory specimen collected ≥14 days after completing the final dose of an FDA-authorized or approved COVID-19 vaccine series and who did not have a previously positive COVID-19 test <45 days prior to the positive test currently under investigation. This newer process provides more accurate and complete data on the vaccine status of persons who have tested positive for COVID-19.
Facebook
TwitterNote: Starting April 27, 2023 updates change from daily to weekly. Summary The cumulative number of confirmed COVID-19 deaths among Maryland residents. Description The MD COVID-19 - Total Confirmed Deaths Statewide data layer is a collection of the statewide confirmed COVID-19 related deaths that have been reported each day by the Vital Statistics Administration. A death is classified as confirmed if the person had a laboratory-confirmed positive COVID-19 test result. Some data on deaths may be unavailable due to the time lag between the death, typically reported by a hospital or other facility, and the submission of the complete death certificate. Probable deaths are available from the MD COVID-19 - Total Probable Deaths Statewide data layer. Update 5/27/21: The Maryland Department of Health (MDH) Vital Statistics Administration (VSA) revised the state’s COVID-19 data to include deaths that were not properly classified by medical certifiers over the past year. VSA identified these deaths as COVID-19 deaths through an information reconciliation process utilizing other sources of data. Learn more: https://health.maryland.gov/newsroom/Pages/Maryland-Department-of-Health-Vital-Statistics-Administration-issues-revision-of-COVID-19-death-data.aspx Terms of Use The Spatial Data, and the information therein, (collectively the "Data") is provided "as is" without warranty of any kind, either expressed, implied, or statutory. The user assumes the entire risk as to quality and performance of the Data. No guarantee of accuracy is granted, nor is any responsibility for reliance thereon assumed. In no event shall the State of Maryland be liable for direct, indirect, incidental, consequential or special damages of any kind. The State of Maryland does not accept liability for any damages or misrepresentation caused by inaccuracies in the Data or as a result to changes to the Data, nor is there responsibility assumed to maintain the Data in any manner or form. The Data can be freely distributed as long as the metadata entry is not modified or deleted. Any data derived from the Data must acknowledge the State of Maryland in the metadata.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘Indonesia-Coronavirus’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ardisragen/indonesia-coronavirus-cases on 30 September 2021.
--- Dataset description provided by original source is as follows ---
COVID-19 has infected many people in Indonesia, and the number of confirmed cases is increasing exponentially. Indonesia has raised its coronavirus alert to the "Darurat Nasional (National Emergency)" until 29 May 2020. The Java island, especially Jakarta, the capital city of Indonesia, is the most affected region by the coronavirus.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2849532%2Ff46e130bad5d4e74a8835ca057dd05ca%2Facc.png?generation=1584939612835429&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2849532%2F93b53d1b6601da74041f41ea4ba227f6%2Fcases.png?generation=1584938551413887&alt=media" alt="">
Following are the list of available online portals announce the information of COVID-19, from the public community and provincial (regional) government website in Indonesia.
We make a structured dataset based on the report materials in these portals. Thus, the research community can apply recent AI and statistical techniques to generate new insights in support of the ongoing fight against this infectious disease in Indonesia.
Dataset 1) Total Confirmed Positive Cases 2) Google Trend Related keywords 3) Patient Epidemiological Data 4) Daily Case Statistics 5) Case per Province 6) Case in Jakarta Capital City 7) Daily New Confirmed Cases in Each Province (Timeline)
Kernel 1) Predicting Coronavirus Positive Cases in Indonesia 2) Visualization & Analysis of Covid-19 in Indonesia 3) Logistic Model for Indonesia COVID-19 4) DataSet Characteristics of Corona patients in several countries, including Indonesia 5) Novel Corona Virus (Covid-19) Indonesia EDA 6) Simple Visualization and Forecasting 7) Characteristics of Corona patients DS
Related Publication 1) Response to Covid-19: Data Analytics and Transparency, Koderea Talks, 18 March 2020, https://www.researchgate.net/publication/340003505_Response_to_Covid-19_Data_Analytics_and_Transparency 2) Covid-19 Data Science, ID Institute Obrolin Data Coronavirus, 24 March 2020, https://www.researchgate.net/publication/340116231_IDInstitute_Covid-19_Data_Science
Thanks sincerely to all the members of the DSCI Team, KawalCovid19.id, Pemda DKI Jakarta, Pemprov Jawa Barat, Pemprov Jawa Tengah, Pemprov Sumatera Barat, and Pemprov DIY.
We welcome anyone to join us as collaborators! Join WAG Chat: https://s.id/fgPoP For more information please contact ardi@ejnu.net or WA +8210-4297-0504
Working with
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F2849532%2Fd56eaf0a5d770d756a54cec0d09c87ff%2Fkoderea.png?generation=1584539195622597&alt=media" alt="">
--- Original source retains full ownership of the source dataset ---
Facebook
TwitterData for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Vaccination Status. Click 'More' for important dataset description and footnotes
Dataset and data visualization details: These data were posted on October 21, 2022, archived on November 18, 2022, and revised on February 22, 2023. These data reflect cases among persons with a positive specimen collection date through September 24, 2022, and deaths among persons with a positive specimen collection date through September 3, 2022.
Vaccination status: A person vaccinated with a primary series had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after verifiably completing the primary series of an FDA-authorized or approved COVID-19 vaccine. An unvaccinated person had SARS-CoV-2 RNA or antigen detected on a respiratory specimen and has not been verified to have received COVID-19 vaccine. Excluded were partially vaccinated people who received at least one FDA-authorized vaccine dose but did not complete a primary series ≥14 days before collection of a specimen where SARS-CoV-2 RNA or antigen was detected. Additional or booster dose: A person vaccinated with a primary series and an additional or booster dose had SARS-CoV-2 RNA or antigen detected on a respiratory specimen collected ≥14 days after receipt of an additional or booster dose of any COVID-19 vaccine on or after August 13, 2021. For people ages 18 years and older, data are graphed starting the week including September 24, 2021, when a COVID-19 booster dose was first recommended by CDC for adults 65+ years old and people in certain populations and high risk occupational and institutional settings. For people ages 12-17 years, data are graphed starting the week of December 26, 2021, 2 weeks after the first recommendation for a booster dose for adolescents ages 16-17 years. For people ages 5-11 years, data are included starting the week of June 5, 2022, 2 weeks after the first recommendation for a booster dose for children aged 5-11 years. For people ages 50 years and older, data on second booster doses are graphed starting the week including March 29, 2022, when the recommendation was made for second boosters. Vertical lines represent dates when changes occurred in U.S. policy for COVID-19 vaccination (details provided above). Reporting is by primary series vaccine type rather than additional or booster dose vaccine type. The booster dose vaccine type may be different than the primary series vaccine type. ** Because data on the immune status of cases and associated deaths are unavailable, an additional dose in an immunocompromised person cannot be distinguished from a booster dose. This is a relevant consideration because vaccines can be less effective in this group. Deaths: A COVID-19–associated death occurred in a person with a documented COVID-19 diagnosis who died; health department staff reviewed to make a determination using vital records, public health investigation, or other data sources. Rates of COVID-19 deaths by vaccination status are reported based on when the patient was tested for COVID-19, not the date they died. Deaths usually occur up to 30 days after COVID-19 diagnosis. Participating jurisdictions: Currently, these 31 health departments that regularly link their case surveillance to immunization information system data are included in these incidence rate estimates: Alabama, Arizona, Arkansas, California, Colorado, Connecticut, District of Columbia, Florida, Georgia, Idaho, Indiana, Kansas, Kentucky, Louisiana, Massachusetts, Michigan, Minnesota, Nebraska, New Jersey, New Mexico, New York, New York City (New York), North Carolina, Philadelphia (Pennsylvania), Rhode Island, South Dakota, Tennessee, Texas, Utah, Washington, and West Virginia; 30 jurisdictions also report deaths among vaccinated and unvaccinated people. These jurisdictions represent 72% of the total U.S. population and all ten of the Health and Human Services Regions. Data on cases
Facebook
TwitterPublic Domain Mark 1.0https://creativecommons.org/publicdomain/mark/1.0/
License information was derived automatically
This dataset contains the reported case numbers grouped by age group
The following fields are included:
— Age group — Reported cases female — Male cases reported — Hospitalisations — Deceased
Field separator is comma, string separator is a double quotation mark ("), numerals have a thousand point
—
The data published here are based on the figures reported by the districts and urban districts using the official reporting channel of the Land registration office. Due to the time needed for data collection and transmission, deviations from locally communicated cases may arise. On a case-by-case basis, there may also be a reduction in the number of reported cases, for example where a notification has not been confirmed or the person’s place of residence is outside the circle.
Facebook
TwitterThe COVID-19 pandemic has brought about massive declines in well-being around the world. This paper seeks to quantify and compare two important components of those losses—increased mortality and higher poverty—using years of human life as a common metric. The paper estimates that almost 20 million life-years were lost to COVID-19 by December 2020. Over the same period and by the most conservative definition, more than 120 million additional years were spent in poverty because of the pandemic. The mortality burden, whether estimated in lives or years of life lost, increases sharply with gross domestic product per capita. By contrast, the poverty burden declines with per capita national income when a constant absolute poverty line is used, or is uncorrelated with national income when a more relative approach is taken to poverty lines. In both cases, the poverty burden of the pandemic, relative to the mortality burden, is much higher for poor countries. The distribution of aggregate welfare losses—combining mortality and poverty and expressed in terms of life-years —depends on the choice of poverty line(s) and the relative weights placed on mortality and poverty. With a constant absolute poverty line and a relatively low welfare weight on mortality, poorer countries are found to bear a greater welfare loss from the pandemic. When poverty lines are set differently for poor, middle-income, and high-income countries and/or a greater welfare weight is placed on mortality, upper-middle-income and rich countries suffer the most.
Facebook
TwitterData visualizations of the COVID-19 pandemic in the United States often have presented case and death rates by state in separate visualizations making it difficult to discern the temporal relationship between these two epidemiological metrics. By combining the COVID-19 case and death rates into a single visualization we have provided an intuitive format for depicting the relationship between cases and deaths. Moreover, by using animation we have made the temporal lag between cases and subsequent deaths more obvious and apparent. This work helps to inform expectations for the trajectory of death rates in the United States given the recent surge in case rates.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset includes San Francisco COVID-19 tests by race/ethnicity and by date. This dataset represents the daily count of tests collected, and the breakdown of test results (positive, negative, or indeterminate). Tests in this dataset include all those collected from persons who listed San Francisco as their home address at the time of testing. It also includes tests that were collected by San Francisco providers for persons who were missing a locating address. This dataset does not include tests for residents listing a locating address outside of San Francisco, even if they were tested in San Francisco.
The data were de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected). If a person tested multiple times on the same date, only one test is included from that date. When there are multiple tests on the same date, a positive result, if one exists, will always be selected as the record for the person. If a PCR and antigen test are taken on the same day, the PCR test will supersede. If a person tests multiple times on the same day and the results are all the same (e.g. all negative or all positive) then the first test done is selected as the record for the person.
The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco.
When a person gets tested for COVID-19, they may be asked to report information about themselves. One piece of information that might be requested is a person's race and ethnicity. These data are often incomplete in the laboratory and provider reports of the test results sent to the health department. The data can be missing or incomplete for several possible reasons:
• The person was not asked about their race and ethnicity.
• The person was asked, but refused to answer.
• The person answered, but the testing provider did not include the person's answers in the reports.
• The testing provider reported the person's answers in a format that could not be used by the health department.
For any of these reasons, a person's race/ethnicity will be recorded in the dataset as “Unknown.”
B. NOTE ON RACE/ETHNICITY The different values for Race/Ethnicity in this dataset are "Asian;" "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" "White;" "Multi-racial;" "Other;" and “Unknown."
The Race/Ethnicity categorization increases data clarity by emulating the methodology used by the U.S. Census in the American Community Survey. Specifically, persons who identify as "Asian," "Black or African American," "American Indian or Alaska Native," "Native Hawaiian or Other Pacific Islander," "White," "Multi-racial," or "Other" do NOT include any person who identified as Hispanic/Latino at any time in their testing reports that either (1) identified them as SF residents or (2) as someone who tested without a locating address by an SF provider. All persons across all races who identify as Hispanic/Latino are recorded as “"Hispanic or Latino/a, all races." This categorization increases data accuracy by correcting the way “Other” persons were counted. Previously, when a person reported “Other” for Race/Ethnicity, they would be recorded “Unknown.” Under the new categorization, they are counted as “Other” and are distinct from “Unknown.”
If a person records their race/ethnicity as “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other” for their first COVID-19 test, then this data will not change—even if a different race/ethnicity is reported for this person for any future COVID-19 test. There are two exceptions to this rule. The first exception is if a person’s race/ethnicity value is reported as “Unknown” on their first test and then on a subsequent test they report “Asian;” "Black or African American;" "Hispanic or Latino/a, all races;" "American Indian or Alaska Native;" "Native Hawaiian or Other Pacific Islander;" or "White”, then this subsequent reported race/ethnicity will overwrite the previous recording of “Unknown”. If a person has only ever selected “Unknown” as their race/ethnicity, then it will be recorded as “Unknown.” This change provides more specific and actionable data on who is tested in San Francisco.
The second exception is if a person ever marks “Hispanic or Latino/a, all races” for race/ethnicity then this choice will always overwrite any previous or future response. This is because it is an overarching category that can include any and all other races and is mutually exclusive with the other responses.
A person's race/ethnicity will be recorded as “Multi-racial” if they select two or more values among the following choices: “Asian,” “Black or African American,” “American Indian or Alaska Native,” “Native Hawaiian or Other Pacific Islander,” “White,” or “Other.” If a person selects a combination of two or more race/ethnicity answers that includes “Hispanic or Latino/a, all races” then they will still be recorded as “Hispanic or Latino/a, all races”—not as “Multi-racial.”
C. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information.
D. UPDATE PROCESS Updates automatically at 5:00AM Pacific Time each day. Redundant runs are scheduled at 7:00AM and 9:00AM in case of pipeline failure.
E. HOW TO USE THIS DATASET San Francisco population estimates for race/ethnicity can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24, 2020 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.
In order to track trends over time, a user can analyze this data by sorting or filtering by the "specimen_collection_date" field.
Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of percent positive. When there are fewer than 20 positives tests for a given race/ethnicity and time period, the positivity rate is not calculated for the public tracker because rates of small test counts are less reliable.
Calculating Testing Rates: To calculate the testing rate per 10,000 residents, divide the total number of tests collected (positive, negative, and indeterminate results) for the specified race/ethnicity by the total number of residents who identify as that race/ethnicity (according to the 2016-2020 American Community Survey (ACS) population estimate), then multiply by 10,000. When there are fewer than 20 total tests for a given race/ethnicity and time period, the testing rate is not calculated for the public tracker because rates of small test counts are less reliable.
Read more about how this data is updated and validated daily: https://sf.gov/information/covid-19-data-questions
F. CHANGE LOG
Facebook
TwitterThe New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.