1 dataset found
  1. COVID-19 US County JHU Data & Demographics

    • kaggle.com
    Updated Mar 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Heads or Tails (2023). COVID-19 US County JHU Data & Demographics [Dataset]. https://www.kaggle.com/headsortails/covid19-us-county-jhu-data-demographics/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 1, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Heads or Tails
    Area covered
    United States
    Description

    Context

    The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.

    Content

    The dataset consists of two main csv files: covid_us_county.csv and us_county.csv. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.

    • covid_us_county.csv: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date column. The earliest recorded cases are from 2020-01-22.

    • us_counties.csv: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.

    Column Description

    COVID-19 dataset covid_us_county.csv:

    • fips: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.

    • county: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.

    • state: Name of US state or territory.

    • state_code: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.

    • lat and long: coordinates of the county or territory.

    • date: Reporting date.

    • cases & deaths: Cumulative numbers for cases & deaths.

    Demographic dataset us_counties.csv:

    • fips, county, state, state_code: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips.

    • male & female: Population numbers for male and female.

    • population: Total population for the county. Provided as convenience feature; is always the sum of male + female.

    • female_percentage: Another convenience feature: female / population in percent.

    • median_age: Overall median age for the county.

    Acknowledgements

    Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).

    Licence

    The github repo states that:

    This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
    

    Version history

    • In version 1, a small number of cases had values of `county == "Unassigned". Those have been superseded.
    • Version 5: added US county shape files
  2. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Heads or Tails (2023). COVID-19 US County JHU Data & Demographics [Dataset]. https://www.kaggle.com/headsortails/covid19-us-county-jhu-data-demographics/code
Organization logo

COVID-19 US County JHU Data & Demographics

Johns Hopkins reported cases & deaths together with survey demographic features

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 1, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Heads or Tails
Area covered
United States
Description

Context

The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.

Content

The dataset consists of two main csv files: covid_us_county.csv and us_county.csv. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.

  • covid_us_county.csv: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date column. The earliest recorded cases are from 2020-01-22.

  • us_counties.csv: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.

Column Description

COVID-19 dataset covid_us_county.csv:

  • fips: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.

  • county: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.

  • state: Name of US state or territory.

  • state_code: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.

  • lat and long: coordinates of the county or territory.

  • date: Reporting date.

  • cases & deaths: Cumulative numbers for cases & deaths.

Demographic dataset us_counties.csv:

  • fips, county, state, state_code: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips.

  • male & female: Population numbers for male and female.

  • population: Total population for the county. Provided as convenience feature; is always the sum of male + female.

  • female_percentage: Another convenience feature: female / population in percent.

  • median_age: Overall median age for the county.

Acknowledgements

Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).

Licence

The github repo states that:

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

Version history

  • In version 1, a small number of cases had values of `county == "Unassigned". Those have been superseded.
  • Version 5: added US county shape files
Search
Clear search
Close search
Google apps
Main menu