A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
To access the dataset that continues to refresh daily, navigate to this page: COVID-19 Deaths by Population Characteristics Over Time. The dataset contains data on the following population characteristics that are no longer being reported publicly:
B. HOW THE DATASET IS CREATED COVID-19 deaths are suspected to be associated with COVID-19. This means COVID-19 is listed as a cause of death or significant condition on the death certificate. Data on the population characteristics of COVID-19 deaths are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes. Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 deaths reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to Virtual Assistant information gathering starting December 2021. The California Department of Public Health, Virtual Assistant is only sent to adults who are 18+ years old. Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset will only update when any population characteristics are archived. Data for existing characteristic types will not change but new characteristic types may be added. D. HOW TO USE THIS DATASET This dataset may include different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.
New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.
E. CHANGE LOG
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 Deaths by Population Characteristics Over Time’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/60f5842f-a359-4b03-ad21-1bcfc3bf7fe6 on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Note: On January 22, 2022, system updates to improve the timeliness and accuracy of San Francisco COVID-19 cases and deaths data were implemented. You might see some fluctuations in historic data as a result of this change.
A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. Deaths are included on the date the individual died.
Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.
Data is lagged by five days, meaning the most date included is 5 days prior to today. All data update daily as more information becomes available.
B. HOW THE DATASET IS CREATED COVID-19 deaths are suspected to be associated with COVID-19. This means COVID-19 is listed as a cause of death or significant condition on the death certificate.
Data on the population characteristics of COVID-19 deaths are from: * Case interviews * Laboratories * Medical providers
These multiple streams of data are merged, deduplicated, and undergo data verification processes. It takes time to process this data. Because of this, data is lagged by 5 days and death totals for previous days may increase or decrease. More recent data is less reliable.
Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.
Data notes on each population characteristic type is listed below.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.
Sexual orientation * Sexual orientation data is collected from individuals who are 18 years old or older. These individuals can choose whether to provide this information during case interviews. Learn more about our data collection guidelines. * The City began asking for this information on April 28, 2020. Gender * The City collects information on gender identity using these guidelines.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Transmission type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
Homelessness
Persons are identified as homeless based on several data sources:
* self-reported living situation
* the location at the time of testing
* Department of Public Health homelessness and health databases
* Residents in Single-Room Occupancy hotels are not included in these figures.
These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Skilled Nursing Facility (SNF) occupancy
* A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives.
* Facilities are mandated to report COVID-19 cases or deaths among their residents. The City follows up with these facilities to confirm.
* There may be differences between the City’s SNF data and the California Department of Public Health (CDPH) dashboard. The difference may be because the City and the State use dif
--- Original source retains full ownership of the source dataset ---
As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward.
A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable.
Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.
B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.
Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates
Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.
To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.
Data notes on each population characteristic type is listed below.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.
Gender * The City collects information on gender identity using these guidelines.
C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.
Dataset will not update on the business day following any federal holiday.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.
New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.
This data may not be immediately available for more recent deaths. Data updates as more information becomes available.
To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.
E. CHANGE LOG
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
***As of May 2022, these datasets moved from daily updates to weekly updates. *** For greatest accuracy, please use the latest dataset for all analysis and reporting as opposed to any data you downloaded prior to September 29, 2020. All datasets now reflect counts from test collection dates instead of the previously displayed result dates. These changes will adjust, for example, the count of cases for each day. PDPH has also added 376 confirmed COVID-19 cases (positive tests) that were previously missing from the data. Deidentified, aggregate datasets showing COVID deaths by date, zip, race, or age. You can find COVID cases datasets here. To protect the confidentiality of residents, PDPH suppresses the exact data for any categories that have less than 6 counts (i.e. of cases or fatalities).
A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals may increase or decrease.
Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups.
B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national https://preparedness.cste.org/wp-content/uploads/2022/12/CSTE-Revised-Classification-of-COVID-19-associated-Deaths.Final_.11.22.22.pdf">Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health.
Data on the population characteristics of COVID-19 deaths are from: *Case reports *Medical records *Electronic lab reports *Death certificates
Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths.
To protect resident privacy, we summarize COVID-19 data by only one population characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more.
Data notes on select population characteristic types are listed below.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases.
Gender * The City collects information on gender identity using these guidelines.
C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week.
Dataset will not update on the business day following any federal holiday.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a dataset based on the San Francisco Population and Demographic Census dataset.These population estimates are from the 2018-2022 5-year American Community Survey (ACS).
This dataset includes several characteristic types. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of cumulative deaths.
Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.
To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset.
E. CHANGE LOG
***As of May 2022, these datasets moved from daily updates to weekly updates. *** For greatest accuracy, please use the latest dataset for all analysis and reporting as opposed to any data you downloaded prior to September 29, 2020. All datasets now reflect counts from test collection dates instead of the previously displayed result dates. These changes will adjust, for example, the count of cases for each day. PDPH has also added 376 confirmed COVID-19 cases (positive tests) that were previously missing from the data. Deidentified, aggregate datasets showing COVID deaths by date, zip, race, or age. You can find COVID cases datasets here. To protect the confidentiality of residents, PDPH suppresses the exact data for any categories that have less than 6 counts (i.e. of cases or fatalities).
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This mapping tool enables you to see how COVID-19 deaths in your area may relate to factors in the local population, which research has shown are associated with COVID-19 mortality. It maps COVID-19 deaths rates for small areas of London (known as MSOAs) and enables you to compare these to a number of other factors including the Index of Multiple Deprivation, the age and ethnicity of the local population, extent of pre-existing health conditions in the local population, and occupational data. Research has shown that the mortality risk from COVID-19 is higher for people of older age groups, for men, for people with pre-existing health conditions, and for people from BAME backgrounds. London boroughs had some of the highest mortality rates from COVID-19 based on data to April 17th 2020, based on data from the Office for National Statistics (ONS). Analysis from the ONS has also shown how mortality is also related to socio-economic issues such as occupations classified ‘at risk’ and area deprivation. There is much about COVID-19-related mortality that is still not fully understood, including the intersection between the different factors e.g. relationship between BAME groups and occupation. On their own, none of these individual factors correlate strongly with deaths for these small areas. This is most likely because the most relevant factors will vary from area to area. In some cases it may relate to the age of the population, in others it may relate to the prevalence of underlying health conditions, area deprivation or the proportion of the population working in ‘at risk occupations’, and in some cases a combination of these or none of them. Further descriptive analysis of the factors in this tool can be found here: https://data.london.gov.uk/dataset/covid-19--socio-economic-risk-factors-briefing
There's a story behind every dataset and here's your opportunity to share yours.
The datasets contain a selection of various data on COVID-19 from January 1st 2020 to June 28th 2021
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 Cases by Population Characteristics Over Time’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/a3291d85-0076-43c5-a59c-df49480cdc6d on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Note: On January 22, 2022, system updates to improve the timeliness and accuracy of San Francisco COVID-19 cases and deaths data were implemented. You might see some fluctuations in historic data as a result of this change. Due to the changes, starting on January 22, 2022, the number of new cases reported daily will be higher than under the old system as cases that would have taken longer to process will be reported earlier.
A. SUMMARY This dataset shows San Francisco COVID-19 cases by population characteristics and by specimen collection date. Cases are included on the date the positive test was collected.
Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how cases have been distributed among different subgroups. This information can reveal trends and disparities among groups.
Data is lagged by five days, meaning the most recent specimen collection date included is 5 days prior to today. Tests take time to process and report, so more recent data is less reliable.
B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases and deaths are from: * Case interviews * Laboratories * Medical providers
These multiple streams of data are merged, deduplicated, and undergo data verification processes. This data may not be immediately available for recently reported cases because of the time needed to process tests and validate cases. Daily case totals on previous days may increase or decrease. Learn more.
Data are continually updated to maximize completeness of information and reporting on San Francisco residents with COVID-19.
Data notes on each population characteristic type is listed below.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.
Sexual orientation * Sexual orientation data is collected from individuals who are 18 years old or older. These individuals can choose whether to provide this information during case interviews. Learn more about our data collection guidelines. * The City began asking for this information on April 28, 2020.
Gender * The City collects information on gender identity using these guidelines.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Transmission type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
Homelessness
Persons are identified as homeless based on several data sources:
* self-reported living situation
* the location at the time of testing
* Department of Public Health homelessness and health databases
* Residents in Single-Room Occupancy hotels are not included in these figures.
These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 mortality by vaccination status’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/mathurinache/covid19-mortality-by-vaccination-status on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Why we need to compare the rates of death between vaccinated and unvaccinated During a pandemic, you might see headlines like “Half of those who died from the virus were vaccinated”.
It would be wrong to draw any conclusions about whether the vaccines are protecting people from the virus based on this headline. The headline is not providing enough information to draw any conclusions.
Data comes from https://ourworldindata.org/covid-deaths-by-vaccination Thanks to them to compile thiese kind of interesting dataset. If you want to know more please visit https://ourworldindata.org/covid-deaths-by-vaccination
https://www.pya.org/Content/Image/NewsBlog/Covid19%20vaccine.jpg" alt="Covid19 vaccination">
Exploration Data, Forecasting, Impact of vaccination in USA. Compare Moderna vs Johnson&Johnson vs Moderna
--- Original source retains full ownership of the source dataset ---
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset has been collected from multiple sources provided by MVCR on their websites and contains daily summarized statistics as well as details statistics up to age & sex level.
What's inside is more than just rows and columns. Make it easy for others to get started by describing how you acquired the data and what time period it represents, too.
Date - Calendar date when data were collected Daily tested - Sum of tests performed Daily infected - Sum of confirmed cases those were positive Daily cured - Sum of cured people that does not have Covid-19 anymore Daily deaths - Sum of people those died on Covid-19 Daily cum tested - Cumulative sum of tests performed Daily infected - Cumulative sum of confirmed cases those were positive Daily cured - Cumulative sum of cured people that does not have Covid-19 anymore Daily deaths - Cumulative sum of people those died on Covid-19 Region - Region of Czech republic Sub-Region - Sub-Region of Czech republic Region accessories qty - Quantity of health care accessories delivered to region for all the time Age - Age of person Sex - Sex of person Infected - Sum of infected people for specific date, region, sub-region, age and sex Cured - Sum of cured people for specific date, region, sub-region, age and sex Death - Sum of people those dies on Covid-19 for specific date, region, sub-region, age and sex
Dataset contains data on different level of granularities. Make sure you do not mix different granularities. Let's suppose you have loaded data into pandas dataframe called df.
df_daily = df.groupby(['date']).max()[['daily_tested','daily_infected','daily_cured','daily_deaths','daily_cum_tested','daily_cum_infected','daily_cum_cured','daily_cum_deaths']].reset_index()
df_region = df[df['region'] != ''].groupby(['region']).agg(
region_accessories_qty=pd.NamedAgg(column='region_accessories_qty', aggfunc='max'),
infected=pd.NamedAgg(column='infected', aggfunc='sum'),
cured=pd.NamedAgg(column='cured', aggfunc='sum'),
death=pd.NamedAgg(column='death', aggfunc='sum')
).reset_index()
df_detail = df[['date','region','sub_region','age','sex','infected','cured','death']].reset_index(drop=True)
Thanks to websites of MVCR for sharing such great information.
Can you see relation between health care accessories delivered to region and number of cured/infected in that region? Why Czech Republic belongs to pretty safe countries when talking about Covid-19 Pandemic? Can you find out what is difference of pandemic evolution in Czech Republic comparing to other surrounding coutries, like Germany or Slovakia?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 Cases and Deaths Summarized by Geography’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/d2e381bb-f395-4b40-979e-920a79a3db88 on 11 February 2022.
--- Dataset description provided by original source is as follows ---
Note: On January 22, 2022, system updates to improve the timeliness and accuracy of San Francisco COVID-19 cases and deaths data were implemented. You might see some fluctuations in historic data as a result of this change. Due to the changes, starting on January 22, 2022, the number of new cases reported daily will be higher than under the old system as cases that would have taken longer to process will be reported earlier.
Note: As of April 16, 2021, this dataset will update daily with a five-day data lag.
A. SUMMARY Medical provider confirmed COVID-19 cases and confirmed COVID-19 related deaths in San Francisco, CA aggregated by several different geographic areas and normalized by 2019 American Community Survey (ACS) 5-year estimates for population data to calculate rate per 10,000 residents.
Cases and deaths are both mapped to the residence of the individual, not to where they were infected or died. For example, if one was infected in San Francisco at work but lives in the East Bay, those are not counted as SF Cases or if one dies in Zuckerberg San Francisco General but is from another county, that is also not counted in this dataset.
Dataset is cumulative and covers cases going back to March 2nd, 2020 when testing began.
Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas
B. HOW THE DATASET IS CREATED Addresses from medical data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area. The 2019 ACS estimates for population provided by the Census are used to create a rate which is equal to ([count] / [acs_population]) * 10000) representing the number of cases per 10,000 residents.
C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 7:30 Pacific Time.
D. HOW TO USE THIS DATASET Privacy rules in effect To protect privacy, certain rules are in effect: 1. Case counts greater than 0 and less than 10 are dropped - these will be null (blank) values 2. Death counts greater than 0 and less than 10 are dropped - these will be null (blank) values 3. Cases and deaths dropped altogether for areas where acs_population < 1000
Rate suppression in effect where counts lower than 20 Rates are not calculated unless the case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.
A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.
Row included for Citywide case counts, incidence rate, and deaths A single row is included that has the Citywide case counts and incidence rate. This can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling bases.
--- Original source retains full ownership of the source dataset ---
A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
B. HOW THE DATASET IS CREATED Data on the population characteristics of COVID-19 cases are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes.
Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. * The population estimates for the "Other" or “Multi-racial” groups should be considered with caution. The Census definition is likely not exactly aligned with how the City collects this data. For that reason, we do not recommend calculating population rates for these groups.
Gender * The City collects information on gender identity using these guidelines.
Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 cases reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to the California Department of Public Health, Virtual Assistant information gathering beginning December 2021. The Virtual Assistant is only sent to adults who are 18+ years old. https://www.sfdph.org/dph/files/PoliciesProcedures/COM9_SexualOrientationGuidelines.pdf">Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission Type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset has been archived and will no longer update as of 9/11/2023.
D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco po
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
[ U.S. State-Level Data (Raw CSV) | U.S. County-Level Data (Raw CSV) ]
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real-time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists, and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Data on cumulative coronavirus cases and deaths can be found in two files for states and counties.
Each row of data reports cumulative counts based on our best reporting up to the moment we publish an update. We do our best to revise earlier entries in the data when we receive new information.
Both files contain FIPS codes, a standard geographic identifier, to make it easier for an analyst to combine this data with other data sets like a map file or population data.
Download all the data or clone this repository by clicking the green "Clone or download" button above.
State-level data can be found in the states.csv file. (Raw CSV file here.)
date,state,fips,cases,deaths
2020-01-21,Washington,53,1,0
...
County-level data can be found in the counties.csv file. (Raw CSV file here.)
date,county,state,fips,cases,deaths
2020-01-21,Snohomish,Washington,53061,1,0
...
In some cases, the geographies where cases are reported do not map to standard county boundaries. See the list of geographic exceptions for more detail on these.
The data is the product of dozens of journalists working across several time zones to monitor news conferences, analyze data releases and seek clarification from public officials on how they categorize cases.
It is also a response to a fragmented American public health system in which overwhelmed public servants at the state, county and territorial levels have sometimes struggled to report information accurately, consistently and speedily. On several occasions, officials have corrected information hours or days after first reporting it. At times, cases have disappeared from a local government database, or officials have moved a patient first identified in one state or county to another, often with no explanation. In those instances, which have become more common as the number of cases has grown, our team has made every effort to update the data to reflect the most current, accurate information while ensuring that every known case is counted.
When the information is available, we count patients where they are being treated, not necessarily where they live.
In most instances, the process of recording cases has been straightforward. But because of the patchwork of reporting methods for this data across more than 50 state and territorial governments and hundreds of local health departments, our journalists sometimes had to make difficult interpretations about how to count and record cases.
For those reasons, our data will in some cases not exactly match the information reported by states and counties. Those differences include these cases: When the federal government arranged flights to the United States for Americans exposed to the coronavirus in China and Japan, our team recorded those cases in the states where the patients subsequently were treated, even though local health departments generally did not. When a resident of Florida died in Los Angeles, we recorded her death as having occurred in California rather than Florida, though officials in Florida counted her case in their records. And when officials in some states reported new cases without immediately identifying where the patients were being treated, we attempted to add information about their locations later, once it became available.
Confirmed cases are patients who test positive for the coronavirus. We consider a case confirmed when it is reported by a federal, state, territorial or local government agency.
For each date, we show the cumulative number of confirmed cases and deaths as reported that day in that county or state. All cases and deaths are counted on the date they are first announced.
In some instances, we report data from multiple counties or other non-county geographies as a single county. For instance, we report a single value for New York City, comprising the cases for New York, Kings, Queens, Bronx and Richmond Counties. In these instances, the FIPS code field will be empty. (We may assign FIPS codes to these geographies in the future.) See the list of geographic exceptions.
Cities like St. Louis and Baltimore that are administered separately from an adjacent county of the same name are counted separately.
Many state health departments choose to report cases separately when the patient’s county of residence is unknown or pending determination. In these instances, we record the county name as “Unknown.” As more information about these cases becomes available, the cumulative number of cases in “Unknown” counties may fluctuate.
Sometimes, cases are first reported in one county and then moved to another county. As a result, the cumulative number of cases may change for a given county.
All cases for the five boroughs of New York City (New York, Kings, Queens, Bronx and Richmond counties) are assigned to a single area called New York City.
Four counties (Cass, Clay, Jackson, and Platte) overlap the municipality of Kansas City, Mo. The cases and deaths that we show for these four counties are only for the portions exclusive of Kansas City. Cases and deaths for Kansas City are reported as their line.
Counts for Alameda County include cases and deaths from Berkeley and the Grand Princess cruise ship.
All cases and deaths for Chicago are reported as part of Cook County.
In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.
If you use this data, you must attribute it to “The New York Times” in any publication. If you would like a more expanded description of the data, you could say “Data from The New York Times, based on reports from state and local health agencies.”
If you use it in an online presentation, we would appreciate it if you would link to our U.S. tracking page at https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.
If you use this data, please let us know at covid-data@nytimes.com and indicate if you would be willing to talk to a reporter about your research.
See our LICENSE for the full terms of use for this data.
This license is co-extensive with the Creative Commons Attribution-NonCommercial 4.0 International license, and licensees should refer to that license (CC BY-NC) if they have questions about the scope of the license.
If you have questions about the data or licensing conditions, please contact us at:
covid-data@nytimes.com
Mitch Smith, Karen Yourish, Sarah Almukhtar, Keith Collins, Danielle Ivory, and Amy Harmon have been leading our U.S. data collection efforts.
Data has also been compiled by Jordan Allen, Jeff Arnold, Aliza Aufrichtig, Mike Baker, Robin Berjon, Matthew Bloch, Nicholas Bogel-Burroughs, Maddie Burakoff, Christopher Calabrese, Andrew Chavez, Robert Chiarito, Carmen Cincotti, Alastair Coote, Matt Craig, John Eligon, Tiff Fehr, Andrew Fischer, Matt Furber, Rich Harris, Lauryn Higgins, Jake Holland, Will Houp, Jon Huang, Danya Issawi, Jacob LaGesse, Hugh Mandeville, Patricia Mazzei, Allison McCann, Jesse McKinley, Miles McKinley, Sarah Mervosh, Andrea Michelson, Blacki Migliozzi, Steven Moity, Richard A. Oppel Jr., Jugal K. Patel, Nina Pavlich, Azi Paybarah, Sean Plambeck, Carrie Price, Scott Reinhard, Thomas Rivas, Michael Robles, Alison Saldanha, Alex Schwartz, Libby Seline, Shelly Seroussi, Rachel Shorey, Anjali Singhvi, Charlie Smart, Ben Smithgall, Steven Speicher, Michael Strickland, Albert Sun, Thu Trinh, Tracey Tully, Maura Turcotte, Miles Watkins, Jeremy White, Josh Williams, and Jin Wu.
There's a story behind every dataset and here's your opportunity to share yours.# Coronavirus (Covid-19) Data in the United States
[ U.S. State-Level Data ([Raw
Since the beginning of the covid19 pandemic, I have collected daily the coivd19 data from the daily report issued by the Egyptian ministry of health in order to know more about the spread of the virus and to assess the pandemic situation in my homeland Egypt.
The Data consists of several columns, and the content of each column is as follow: New cases: The New cases infected by covid 19 cases on that day. New Death: Number of people that died due to covid19 on that day. New Recovered: Number of people that recovered from covid19 on that day. New Active cases: Difference between the sum of New Recovered and New Death and New cases New Active cases= New cases - (New Recovered + New Death) The Data is from 13/5/2020 to 9/7/2021
We wouldn't be here without the help of others. If you owe any attributions or thanks, include them here along with any citations of past research.
Your data will be in front of the world's largest data science community. What questions do you want to see answered?
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Summary: This is a collection of publicly reported data relevant to the COVID-19 pandemic scraped from state and federal prisons in the United States. Data are collected each night from every state and federal correctional agency’s site that has data available. Data from Massachusetts come directly from the ACLU Massachusetts COVID-19 website (https://data.aclum.org/sjc-12926-tracker/), not the Massachusetts DOC website. Data from a small number of states come from Recidiviz (https://www.recidiviz.org/) whose team manually collects data from these states. Not all dates are available for some states due to websites being down or changes to the website that cause some data to be missed by the scraper.The data primarily cover the number of people incarcerated in these facilities who have tested positive, negative, recovered, and have died from COVID-19. Many - but not all - states also provide this information for staff members. This dataset includes every variable that any state makes available. While there are dozens of variables in the data, most apply to only a small number of states or a single state.The data is primarily at the facility-date unit, meaning that each row represents a single prison facility on a single date. The date is the date we scraped the data (we do so each night between 9pm-3am EST) and not necessarily the date the data was updated. While many states update daily, some do so less frequently. As such, you may see some dates for certain states contain the same values. A small number of states do not provide facility-level data, or do so for only a subset of all the variables they make available. In these cases we have also collected state-level data and made that available separately. Please note: When facility data is available, the state-level file combines the aggregated facility-level data with any state-level data that is available. You should therefore use this file when doing a state-level analysis instead of aggregating the facility-level data, as some states report values only at the state level (these states may still have some data at the facility-level), and some states report cumulative numbers at the state level but do not report them at the facility level. As a result, when we identify this, we typically add the cumulative information to the state level file. The state level file is still undergoing quality checks and will be released soon.These data were scraped from nearly all state and federal prison websites that make their data available each night for several months, and we continue to collect data. Over time some states have changed what variables are available, both adding and removing some variables, as well as the definition of variables. For all states and time periods you are using this data for, please carefully examine the data to detect these kinds of issues. We have spent extensive time doing a careful check of the data to remove any issues we find, primarily ones that could be caused by a scraper not working properly. However, please check all data for issues before using it. Contact us at covidprisondata@gmail.com to let us know if you find any issues, have questions, or if you would like to collaborate on research.
The World Health Organization (WHO) characterized the COVID-19, caused by the SARS-CoV-2, as a pandemic on March 11, while the exponential increase in the number of cases was risking to overwhelm health systems around the world with a demand for ICU beds far above the existing capacity, with regions of Italy being prominent examples.
Brazil recorded the first case of SARS-CoV-2 on February 26, and the virus transmission evolved from imported cases only, to local and finally community transmission very rapidly, with the federal government declaring nationwide community transmission on March 20.
Until March 27, the state of São Paulo had recorded 1,223 confirmed cases of COVID-19, with 68 related deaths, while the county of São Paulo, with a population of approximately 12 million people and where Hospital Israelita Albert Einstein is located, had 477 confirmed cases and 30 associated death, as of March 23. Both the state and the county of São Paulo decided to establish quarantine and social distancing measures, that will be enforced at least until early April, in an effort to slow the virus spread.
One of the motivations for this challenge is the fact that in the context of an overwhelmed health system with the possible limitation to perform tests for the detection of SARS-CoV-2, testing every case would be impractical and tests results could be delayed even if only a target subpopulation would be tested.
This dataset contains anonymized data from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 RT-PCR and additional laboratory tests during a visit to the hospital.
All data were anonymized following the best international practices and recommendations. All clinical data were standardized to have a mean of zero and a unit standard deviation.
TASK 1 • Predict confirmed COVID-19 cases among suspected cases. Based on the results of laboratory tests commonly collected for a suspected COVID-19 case during a visit to the emergency room, would it be possible to predict the test result for SARS-Cov-2 (positive/negative)?
TASK 2 • Predict admission to general ward, semi-intensive unit or intensive care unit among confirmed COVID-19 cases. Based on the results of laboratory tests commonly collected among confirmed COVID-19 cases during a visit to the emergency room, would it be possible to predict which patients will need to be admitted to a general ward, semi-intensive unit or intensive care unit?
Submit a notebook that implements the full lifecycle of data preparation, model creation and evaluation. Feel free to use this dataset plus any other data you have available. Since this is not a formal competition, you're not submitting a single submission file, but rather your whole approach to building a model.
This is not a formal competition, so we won't measure the results strictly against a given validation set using a strict metric. Rather, what we'd like to see is a well-defined process to build a model that can deliver decent results (evaluated by yourself).
Our team will be looking at: 1. Model Performance - How well does the model perform on the real data? Can it be generalized over time? Can it be applied to other scenarios? Was it overfit? 2. Data Preparation - How well was the data analysed prior to feeding it into the model? Are there any useful visualisations? Does the reader learn any new techniques through this submission? A great entry will be informative, thought provoking, and fresh all at the same time. 3. Documentation - Are your code, and notebook, and additional data sources well documented so a reader can understand what you did? Are your sources clearly cited? A high quality analysis should be concise and clear at each step so the rationale is easy to follow and the process is reproducible.
Additional questions and clarifications can be obtained at data4u@einstein.br
Decision making by health care professionals is a complex process, when physicians see a patient for the first time with an acute complaint (e.g., recent onset of fever and respiratory symptoms) they will take a medical history, perform a physical examination, and will base their decisions on this information. To order or not laboratory tests, and which ones to order, is among these decisions, and there is no standard set of tests that are ordered to every individual or to a specific condition. This will depend on the complaints, the findings on the physical examination, personal medical history (e.g., current and prior diagnosed diseases, medications under use, prior surgeries, vaccination), lifestyle habits (e.g., smoking, alcohol use, exercising), family medical history, and prior exposures (e.g., traveling, occupation). The dataset reflects the complexity of decision making during routine clinical care, as opposed to what happens on a more controlled research setting, and data sparsity is, therefore, expected.
We understand that clinical and exposure data, in addition to the laboratory results, are invaluable information to be added to the models, but at this moment they are not available.
A main objective of this challenge is to develop a generalizable model that could be useful during routine clinical care, and although which laboratory exams are ordered can vary for different individuals, even with the same condition, we aimed at including laboratory tests more commonly order during a visit to the emergency room. So, if you found some additional laboratory test that was not included, it is because it was not considered as commonly order in this situation.
Hospital Israelita Albert Einstein would like to thank you for all the effort and time dedicated to this challenge, the community interest and the number of contributions have surpassed our expectations, and we are extremely satisfied with the results.
These have been challenging times, and we believe that promoting information sharing and collaboration will be crucial to gain insights, as fast as possible, that could help to implement measures to diminish the burden of COVID-19.
The multitude of solutions presented focusing on different aspects of the problem could represent a valuable resource in the evaluation of different strategies to implement predictive models for COVID-19. Besides the data visualization methods employed could make it easier for multidisciplinary teams to collaborate around COVID-19 real-world data.
Although this was not a competition, we would like to highlight some solutions, based on the community and our review of results.
Lucas Moda (https://www.kaggle.com/lukmoda/covid-19-optimizing-recall-with-smote) utilized interesting data visualization methods for the interpretability of models. Fellipe Gomes (https://www.kaggle.com/gomes555/task2-covid-19-admission-ac-94-sens-0-92-auc-0-96) used concise descriptions of the data and model results. We saw interesting ideas for visualizing and understanding the data, like the dendrogram used by CaesarLupum (https://www.kaggle.com/caesarlupum/brazil-against-the-advance-of-covid-19). Ossamu (https://www.kaggle.com/ossamum/eda-and-feat-import-recall-0-95-roc-auc-0-61) also sought to evaluate several data resampling techniques, to verify how it can improve the performance of predictive models, which was also done by Kaike Reis (https://www.kaggle.com/kaikewreis/a-second-end-to-end-solution-for-covid-19) . Jairo Freitas & Christian Espinoza (https://www.kaggle.com/jairofreitas/covid-19-influence-of-exams-in-recall-precision) sought to understand the distribution of exams regarding the outcomes of task 2, to support the decisions to be made in the construction of predictive models.
We thank you all for the feedback on available data, helping to show its potential, and taking the challenge of dealing with real data feed. Your efforts let the feeling that it is possible to build good predictive models in real life healthcare settings.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
A. SUMMARY This archived dataset includes data for population characteristics that are no longer being reported publicly. The date on which each population characteristic type was archived can be found in the field “data_loaded_at”.
To access the dataset that continues to refresh daily, navigate to this page: COVID-19 Deaths by Population Characteristics Over Time. The dataset contains data on the following population characteristics that are no longer being reported publicly:
B. HOW THE DATASET IS CREATED COVID-19 deaths are suspected to be associated with COVID-19. This means COVID-19 is listed as a cause of death or significant condition on the death certificate. Data on the population characteristics of COVID-19 deaths are from: * Case interviews * Laboratories * Medical providers These multiple streams of data are merged, deduplicated, and undergo data verification processes. Skilled Nursing Facility (SNF) occupancy * A Skilled Nursing Facility (SNF) is a type of long-term care facility that provides care to individuals, generally in their 60s and older, who need functional assistance in their daily lives. * This dataset includes data for COVID-19 deaths reported in Skilled Nursing Facilities (SNFs) through 12/31/2022, archived on 1/5/2023. These data were identified where “Characteristic_Type” = ‘Skilled Nursing Facility Occupancy’.
Sexual orientation * The City began asking adults 18 years old or older for their sexual orientation identification during case interviews as of April 28, 2020. Sexual orientation data prior to this date is unavailable. * The City doesn’t collect or report information about sexual orientation for persons under 12 years of age. * Case investigation interviews transitioned to Virtual Assistant information gathering starting December 2021. The California Department of Public Health, Virtual Assistant is only sent to adults who are 18+ years old. Learn more about our data collection guidelines pertaining to sexual orientation.
Comorbidities * Underlying conditions are reported when a person has one or more underlying health conditions at the time of diagnosis or death.
Homelessness Persons are identified as homeless based on several data sources: * self-reported living situation * the location at the time of testing * Department of Public Health homelessness and health databases * Residents in Single-Room Occupancy hotels are not included in these figures. These methods serve as an estimate of persons experiencing homelessness. They may not meet other homelessness definitions.
Single Room Occupancy (SRO) tenancy * SRO buildings are defined by the San Francisco Housing Code as having six or more "residential guest rooms" which may be attached to shared bathrooms, kitchens, and living spaces. * The details of a person's living arrangements are verified during case interviews.
Transmission type * Information on transmission of COVID-19 is based on case interviews with individuals who have a confirmed positive test. Individuals are asked if they have been in close contact with a known COVID-19 case. If they answer yes, transmission category is recorded as contact with a known case. If they report no contact with a known case, transmission category is recorded as community transmission. If the case is not interviewed or was not asked the question, they are counted as unknown.
C. UPDATE PROCESS This dataset will only update when any population characteristics are archived. Data for existing characteristic types will not change but new characteristic types may be added. D. HOW TO USE THIS DATASET This dataset may include different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date.
New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed.
E. CHANGE LOG