Facebook
TwitterNew York has presented the most cases compared to all states across the U.S..There have also been critiques regarding how much more unnoticed impact the flu has caused. My dataset allows us to compare whether or not this is true according to the most recent data.
This COVID-19 data is from Kaggle whereas the New York influenza data comes from the U.S. government health data website. I merged the two datasets by county and FIPS code and listed the most recent reports of 2020 COVID-19 cases and deaths alongside the 2019 known influenza cases for comparison.
I am thankful to Kaggle and the U.S. government for making the data that made this possible openly available.
This data can be extended to answer the common misconceptions of the scale of the COVID-19 and common flu. My inspiration stems from supporting conclusions with data rather than simply intuition.
I would like my data to help answer how we can make U.S. citizens realize what diseases are most impactful.
Facebook
TwitterThis dataset is a per-state amalgamation of demographic, public health and other relevant predictors for COVID-19.
Used positive, death and totalTestResults from the API for, respectively, Infected, Deaths and Tested in this dataset.
Please read the documentation of the API for more context on those columns
Density is people per meter squared https://worldpopulationreview.com/states/
https://worldpopulationreview.com/states/gdp-by-state/
https://worldpopulationreview.com/states/per-capita-income-by-state/
https://en.wikipedia.org/wiki/List_of_U.S._states_by_Gini_coefficient
Rates from Feb 2020 and are percentage of labor force
https://www.bls.gov/web/laus/laumstrk.htm
Ratio is Male / Female
https://www.kff.org/other/state-indicator/distribution-by-gender/
https://worldpopulationreview.com/states/smoking-rates-by-state/
Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/flu_pneumonia_mortality/flu_pneumonia.htm
Death rate per 100,000 people
https://www.cdc.gov/nchs/pressroom/sosmap/lung_disease_mortality/lung_disease.htm
https://www.kff.org/other/state-indicator/total-active-physicians/
https://www.kff.org/other/state-indicator/total-hospitals
Includes spending for all health care services and products by state of residence. Hospital spending is included and reflects the total net revenue. Costs such as insurance, administration, research, and construction expenses are not included.
https://www.kff.org/other/state-indicator/avg-annual-growth-per-capita/
Pollution: Average exposure of the general public to particulate matter of 2.5 microns or less (PM2.5) measured in micrograms per cubic meter (3-year estimate)
https://www.americashealthrankings.org/explore/annual/measure/air/state/ALL
For each state, number of medium and large airports https://en.wikipedia.org/wiki/List_of_the_busiest_airports_in_the_United_States
Note that FL was incorrect in the table, but is corrected in the Hottest States paragraph
https://worldpopulationreview.com/states/average-temperatures-by-state/
District of Columbia temperature computed as the average of Maryland and Virginia
Urbanization as a percentage of the population https://www.icip.iastate.edu/tables/population/urban-pct-states
https://www.kff.org/other/state-indicator/distribution-by-age/
Schools that haven't closed are marked NaN https://www.edweek.org/ew/section/multimedia/map-coronavirus-and-school-closures.html
Note that some datasets above did not contain data for District of Columbia, this missing data was found via Google searches manually entered.
Facebook
TwitterNNDSS - TABLE 1R. Hepatitis C, perinatal infection to Influenza-associated pediatric mortality - 2020. In this Table, provisional cases* of notifiable diseases are displayed for United States, U.S. territories, and Non-U.S. residents. Notice: Data from California published in week 29 for years 2019 and 2020 were incomplete when originally published on July 24, 2020. On August 4, 2020, incomplete case counts were replaced with a "U" indicating case counts are not available for specified time period. Note: This table contains provisional cases of national notifiable diseases from the National Notifiable Diseases Surveillance System (NNDSS). NNDSS data from the 50 states, New York City, the District of Columbia and the U.S. territories are collated and published weekly on the NNDSS Data and Statistics web page (https://wwwn.cdc.gov/nndss/data-and-statistics.html). Cases reported by state health departments to CDC for weekly publication are provisional because of the time needed to complete case follow-up. Therefore, numbers presented in later weeks may reflect changes made to these counts as additional information becomes available. The national surveillance case definitions used to define a case are available on the NNDSS web site at https://wwwn.cdc.gov/nndss/. Information about the weekly provisional data and guides to interpreting data are available at: https://wwwn.cdc.gov/nndss/infectious-tables.html. Footnotes: U: Unavailable — The reporting jurisdiction was unable to send the data to CDC or CDC was unable to process the data. -: No reported cases — The reporting jurisdiction did not submit any cases to CDC. N: Not reportable — The disease or condition was not reportable by law, statute, or regulation in the reporting jurisdiction. NN: Not nationally notifiable — This condition was not designated as being nationally notifiable. NP: Nationally notifiable but not published. NC: Not calculated — There is insufficient data available to support the calculation of this statistic. Cum: Cumulative year-to-date counts. Max: Maximum — Maximum case count during the previous 52 weeks. * Case counts for reporting years 2019 and 2020 are provisional and subject to change. Cases are assigned to the reporting jurisdiction submitting the case to NNDSS, if the case's country of usual residence is the U.S., a U.S. territory, unknown, or null (i.e. country not reported); otherwise, the case is assigned to the 'Non-U.S. Residents' category. Country of usual residence is currently not reported by all jurisdictions or for all conditions. For further information on interpretation of these data, see https://wwwn.cdc.gov/nndss/document/Users_guide_WONDER_tables_cleared_final.pdf. †Previous 52 week maximum and cumulative YTD are determined from periods of time when the condition was reportable in the jurisdiction (i.e., may be less than 52 weeks of data or incomplete YTD data). § Please refer to the CDC WONDER publication for weekly updates to the footnote for this condition.
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
Why did I create this dataset? This is my first time creating a notebook in Kaggle and I am interested in learning more about COVID-19 and how different countries are affected by it and why. It might be useful to compare different metrics between different countries. And I also wanted to participate in a challenge, and I've decided to join the COVID-19 datasets challenge. While looking through the projects, I noticed https://www.kaggle.com/koryto/countryinfo and it inspired me to start this project.
My approach is to scour the Internet and Kaggle looking for country data that can potentially have an impact on how the COVID-19 pandemic spreads. In the end, I ended up with the following for each country:
See covid19_data - data_sources.csv for data source details.
Notebook: https://www.kaggle.com/bitsnpieces/covid19-data
Since I did not personally collect each datapoint, and because each datasource is different with different objectives, collected at different times, measured in different ways, any inferences from this dataset will need further investigation.
I want to acknowledge the authors of the datasets that made their data publicly available which has made this project possible. Banner image is by Brian.
I hope that the community finds this dataset useful. Feel free to recommend other datasets that you think will be useful / relevant! Thanks for looking.
Facebook
TwitterRank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
All-cause, COVID-19, and non-COVID-19 ASDR for ages 25+ by state and time period.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Characteristics of patients hospitalised for COVID-19 and controls.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Post hoc analysis of specific hospitalisation/mortality outcomes within the mental health and cognitive category.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterNew York has presented the most cases compared to all states across the U.S..There have also been critiques regarding how much more unnoticed impact the flu has caused. My dataset allows us to compare whether or not this is true according to the most recent data.
This COVID-19 data is from Kaggle whereas the New York influenza data comes from the U.S. government health data website. I merged the two datasets by county and FIPS code and listed the most recent reports of 2020 COVID-19 cases and deaths alongside the 2019 known influenza cases for comparison.
I am thankful to Kaggle and the U.S. government for making the data that made this possible openly available.
This data can be extended to answer the common misconceptions of the scale of the COVID-19 and common flu. My inspiration stems from supporting conclusions with data rather than simply intuition.
I would like my data to help answer how we can make U.S. citizens realize what diseases are most impactful.