The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.
The dataset consists of two main csv files: covid_us_county.csv
and us_county.csv
. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.
covid_us_county.csv
: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date
column. The earliest recorded cases are from 2020-01-22.
us_counties.csv
: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.
COVID-19 dataset covid_us_county.csv
:
fips
: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.
county
: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.
state
: Name of US state or territory.
state_code
: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.
lat
and long
: coordinates of the county or territory.
date
: Reporting date.
cases
& deaths
: Cumulative numbers for cases & deaths.
Demographic dataset us_counties.csv
:
fips
, county
, state
, state_code
: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips
.
male
& female
: Population numbers for male and female.
population
: Total population for the county. Provided as convenience feature; is always the sum of male + female
.
female_percentage
: Another convenience feature: female / population
in percent.
median_age
: Overall median age for the county.
Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
The github repo states that:
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
The United States have recently become the country with the most reported cases of 2019 Novel Coronavirus (COVID-19). This dataset contains daily updated number of reported cases & deaths in the US on the state and county level, as provided by the Johns Hopkins University. In addition, I provide matching demographic information for US counties.
The dataset consists of two main csv files: covid_us_county.csv
and us_county.csv
. See the column descriptions below for more detailed information. In addition, I've added US county shape files for geospatial plots: us_county.shp/dbf/prj/shx.
covid_us_county.csv
: COVID-19 cases and deaths which will be updated daily. The data is provided by the Johns Hopkins University through their excellent github repo. I combined the separate "confirmed cases" and "deaths" files into a single table, removed a few (I think to be) redundant geo identifier columns, and reshaped the data into long format with a single date
column. The earliest recorded cases are from 2020-01-22.
us_counties.csv
: Demographic information on the US county level based on the (most recent) 2014-18 release of the Amercian Community Survey. Derived via the great tidycensus package.
COVID-19 dataset covid_us_county.csv
:
fips
: County code in numeric format (i.e. no leading zeros). A small number of cases have NA values here, but can still be used for state-wise aggregation. Currently, this only affect the states of Massachusetts and Missouri.
county
: Name of the US county. This is NA for the (aggregated counts of the) territories of American Samoa, Guam, Northern Mariana Islands, Puerto Rico, and Virgin Islands.
state
: Name of US state or territory.
state_code
: Two letter abbreviation of US state (e.g. "CA" for "California"). This feature has NA values for the territories listed above.
lat
and long
: coordinates of the county or territory.
date
: Reporting date.
cases
& deaths
: Cumulative numbers for cases & deaths.
Demographic dataset us_counties.csv
:
fips
, county
, state
, state_code
: same as above. The county names are slightly different, but mostly the difference is that this dataset has the word "County" added. I recommend to join on fips
.
male
& female
: Population numbers for male and female.
population
: Total population for the county. Provided as convenience feature; is always the sum of male + female
.
female_percentage
: Another convenience feature: female / population
in percent.
median_age
: Overall median age for the county.
Data provided for educational and academic research purposes by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE).
The github repo states that:
This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.