There has been little research on United States homicide rates from a long-term perspective, primarily because there has been no consistent data series on a particular place preceding the Uniform Crime Reports (UCR), which began its first full year in 1931. To fill this research gap, this project created a data series on homicides per capita for New York City that spans two centuries. The goal was to create a site-specific, individual-based data series that could be used to examine major social shifts related to homicide, such as mass immigration, urban growth, war, demographic changes, and changes in laws. Data were also gathered on various other sites, particularly in England, to allow for comparisons on important issues, such as the post-World War II wave of violence. The basic approach to the data collection was to obtain the best possible estimate of annual counts and the most complete information on individual homicides. The annual count data (Parts 1 and 3) were derived from multiple sources, including the Federal Bureau of Investigation's Uniform Crime Reports and Supplementary Homicide Reports, as well as other official counts from the New York City Police Department and the City Inspector in the early 19th century. The data include a combined count of murder and manslaughter because charge bargaining often blurs this legal distinction. The individual-level data (Part 2) were drawn from coroners' indictments held by the New York City Municipal Archives, and from daily newspapers. Duplication was avoided by keeping a record for each victim. The estimation technique known as "capture-recapture" was used to estimate homicides not listed in either source. Part 1 variables include counts of New York City homicides, arrests, and convictions, as well as the homicide rate, race or ethnicity and gender of victims, type of weapon used, and source of data. Part 2 includes the date of the murder, the age, sex, and race of the offender and victim, and whether the case led to an arrest, trial, conviction, execution, or pardon. Part 3 contains annual homicide counts and rates for various comparison sites including Liverpool, London, Kent, Canada, Baltimore, Los Angeles, Seattle, and San Francisco.
This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) for all complete quarters so far this year (2017). For additional details, please see the attached data dictionary in the ‘About’ section.
List of every shooting incident that occurred in NYC during the current calendar year. This is a breakdown of every shooting incident that occurred in NYC during the current calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents a shooting incident in NYC and includes information about the event, the location and time of occurrence. In addition, information related to suspect and victim demographics is also included. This data can be used by the public to explore the nature of police enforcement activity. Please refer to the attached data footnotes for additional information about this dataset.
The leading causes of death by sex and ethnicity in New York City in since 2007. Cause of death is derived from the NYC death certificate which is issued for every death that occurs in New York City.
Report last ran: 09/24/2019https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Daily inmates in custody with attributes (custody level, mental health designation, race, gender, age, leagal status, sealed status, security risk group membership, top charge, and infraction flag). This data set excludes Sealed Cases. Resulting summaries may differ slightly from other published statistics.
This is a dataset hosted by the City of New York. The city has an open data platform found here and they update their information according the amount of data that is brought in. Explore New York City using Kaggle and all of the data sources available through the City of New York organization page!
This dataset is maintained using Socrata's API and Kaggle's API. Socrata has assisted countless organizations with hosting their open data and has been an integral part of the process of bringing more data to the public.
Cover photo by Fredrik Öhlander on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
Daily count of NYC residents who tested positive for SARS-CoV-2, who were hospitalized with COVID-19, and deaths among COVID-19 patients.
Note that this dataset currently pulls from https://raw.githubusercontent.com/nychealth/coronavirus-data/master/trends/data-by-day.csv on a daily basis.
Daily count of NYC residents who tested positive for SARS-CoV-2, who were hospitalized with COVID-19, and deaths among COVID-19 patients.
Note that this dataset currently pulls from https://raw.githubusercontent.com/nychealth/coronavirus-data/master/case-hosp-death.csv on a daily basis.
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JULY 12
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
In 2023, the violent crime rate in the United States was 363.8 cases per 100,000 of the population. Even though the violent crime rate has been decreasing since 1990, the United States tops the ranking of countries with the most prisoners. In addition, due to the FBI's transition to a new crime reporting system in which law enforcement agencies voluntarily submit crime reports, data may not accurately reflect the total number of crimes committed in recent years. Reported violent crime rate in the United States The United States Federal Bureau of Investigation tracks the rate of reported violent crimes per 100,000 U.S. inhabitants. In the timeline above, rates are shown starting in 1990. The rate of reported violent crime has fallen since a high of 758.20 reported crimes in 1991 to a low of 363.6 reported violent crimes in 2014. In 2023, there were around 1.22 million violent crimes reported to the FBI in the United States. This number can be compared to the total number of property crimes, roughly 6.41 million that year. Of violent crimes in 2023, aggravated assaults were the most common offenses in the United States, while homicide offenses were the least common. Law enforcement officers and crime clearance Though the violent crime rate was down in 2013, the number of law enforcement officers also fell. Between 2005 and 2009, the number of law enforcement officers in the United States rose from around 673,100 to 708,800. However, since 2009, the number of officers fell to a low of 626,900 officers in 2013. The number of law enforcement officers has since grown, reaching 720,652 in 2023. In 2023, the crime clearance rate in the U.S. was highest for murder and non-negligent manslaughter charges, with around 57.8 percent of murders being solved by investigators and a suspect being charged with the crime. Additionally, roughly 46.1 percent of aggravated assaults were cleared in that year. A statistics report on violent crime in the U.S. can be found here.
This dataset includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) for all complete quarters so far this year (2019). For additional details, please see the attached data dictionary in the ‘About’ section.
https://github.com/nytimes/covid-19-data/blob/master/LICENSEhttps://github.com/nytimes/covid-19-data/blob/master/LICENSE
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since the first reported coronavirus case in Washington State on Jan. 21, 2020, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 12 elements for all COVID-19 cases shared with CDC and includes demographics, any exposure history, disease severity indicators and outcomes, presence of any underlying medical conditions and risk behaviors, and no geographic data.
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.
For more information:
NNDSS Supports the COVID-19 Response | CDC.
The deidentified data in the “COVID-19 Case Surveillance Public Use Data” include demographic characteristics, any exposure history, disease severity indicators and outcomes, clinical data, laboratory diagnostic test results, and presence of any underlying medical conditions and risk behaviors. All data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
COVID-19 case reports have been routinely submitted using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19 included. Current versions of these case definitions are available here: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/.
All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for laboratory-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. Case reporting using this new form is ongoing among U.S. states and territories.
To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.
CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<5) records and indirect identifiers (e.g., date of first positive specimen). Suppression includes rare combinations of demographic characteristics (sex, age group, race/ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.
For questions, please contact Ask SRRG (eocevent394@cdc.gov).
COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These
This file contains COVID-19 death counts, death rates, and percent of total deaths by jurisdiction of residence. The data is grouped by different time periods including 3-month period, weekly, and total (cumulative since January 1, 2020). United States death counts and rates include the 50 states, plus the District of Columbia and New York City. New York state estimates exclude New York City. Puerto Rico is included in HHS Region 2 estimates.
Deaths with confirmed or presumed COVID-19, coded to ICD–10 code U07.1. Number of deaths reported in this file are the total number of COVID-19 deaths received and coded as of the date of analysis and may not represent all deaths that occurred in that period. Counts of deaths occurring before or after the reporting period are not included in the file.
Data during recent periods are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction and cause of death.
Death counts should not be compared across states. Data timeliness varies by state. Some states report deaths on a daily basis, while other states report deaths weekly or monthly.
The ten (10) United States Department of Health and Human Services (HHS) regions include the following jurisdictions. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; Region 2: New Jersey, New York, New York City, Puerto Rico; Region 3: Delaware, District of Columbia, Maryland, Pennsylvania, Virginia, West Virginia; Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, Tennessee; Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, Wisconsin; Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, Texas; Region 7: Iowa, Kansas, Missouri, Nebraska; Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, Wyoming; Region 9: Arizona, California, Hawaii, Nevada; Region 10: Alaska, Idaho, Oregon, Washington.
Rates were calculated using the population estimates for 2021, which are estimated as of July 1, 2021 based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Blended Base consists of the blend of Vintage 2020 postcensal population estimates, 2020 Demographic Analysis Estimates, and 2020 Census PL 94-171 Redistricting File (see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2021/methods-statement-v2021.pdf).
Rates are based on deaths occurring in the specified week/month and are age-adjusted to the 2000 standard population using the direct method (see https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-08-508.pdf). These rates differ from annual age-adjusted rates, typically presented in NCHS publications based on a full year of data and annualized weekly/monthly age-adjusted rates which have been adjusted to allow comparison with annual rates. Annualization rates presents deaths per year per 100,000 population that would be expected in a year if the observed period specific (weekly/monthly) rate prevailed for a full year.
Sub-national death counts between 1-9 are suppressed in accordance with NCHS data confidentiality standards. Rates based on death counts less than 20 are suppressed in accordance with NCHS standards of reliability as specified in NCHS Data Presentation Standards for Proportions (available from: https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf.).
This is a breakdown of every arrest effected in NYC by the NYPD during the current year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement. In addition, information related to suspect demographics is also included. This data can be used by the public to explore the nature of police enforcement activity. Please refer to the attached data footnotes for additional information about this dataset.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset contains the number of cases, number of in hospital/30 day deaths, observed, expected and risk- adjusted mortality rates for cardiac surgery and percutaneous coronary interventions (PCI) by hospital. Regions represent where the hospitals are located. The initial Health Data NY dataset includes patients discharged between January 1, 2008, and December 31, 2010. Analyses of risk-adjusted mortality rates and associated risk factors are provided for 2010 and for the three-year period from 2008 through 2010. For PCI, analyses of all cases, non-emergency cases (which represent the majority of procedures) and emergency cases are included. Subsequent year reports data will be appended to this dataset. For more information check out: http://www.health.ny.gov/health_care/consumer_information/cardiac_surgery/ or go to the “About” tab.
This is the US Coronavirus data repository from The New York Times . This data includes COVID-19 cases and deaths reported by state and county. The New York Times compiled this data based on reports from state and local health agencies. More information on the data repository is available here . For additional reporting and data visualizations, see The New York Times’ U.S. coronavirus interactive site
Which US counties have the most confirmed cases per capita? This query determines which counties have the most cases per 100,000 residents. Note that this may differ from similar queries of other datasets because of differences in reporting lag, methodologies, or other dataset differences.
SELECT
covid19.county,
covid19.state_name,
total_pop AS county_population,
confirmed_cases,
ROUND(confirmed_cases/total_pop *100000,2) AS confirmed_cases_per_100000,
deaths,
ROUND(deaths/total_pop *100000,2) AS deaths_per_100000
FROM
bigquery-public-data.covid19_nyt.us_counties
covid19
JOIN
bigquery-public-data.census_bureau_acs.county_2017_5yr
acs ON covid19.county_fips_code = acs.geo_id
WHERE
date = DATE_SUB(CURRENT_DATE(),INTERVAL 1 day)
AND covid19.county_fips_code != "00000"
ORDER BY
confirmed_cases_per_100000 desc
How do I calculate the number of new COVID-19 cases per day?
This query determines the total number of new cases in each state for each day available in the dataset
SELECT
b.state_name,
b.date,
MAX(b.confirmed_cases - a.confirmed_cases) AS daily_confirmed_cases
FROM
(SELECT
state_name AS state,
state_fips_code ,
confirmed_cases,
DATE_ADD(date, INTERVAL 1 day) AS date_shift
FROM
bigquery-public-data.covid19_nyt.us_states
WHERE
confirmed_cases + deaths > 0) a
JOIN
bigquery-public-data.covid19_nyt.us_states
b ON
a.state_fips_code = b.state_fips_code
AND a.date_shift = b.date
GROUP BY
b.state_name, date
ORDER BY
date desc
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.
Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.
This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors.
Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 33 data element restricted access dataset.
The following apply to the public use datasets and the restricted access dataset:
Overview
The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.
For more information:
NNDSS Supports the COVID-19 Response | CDC.
COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by public health jurisdictions using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19. Current versions of these case definitions are available at: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/. All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for lab-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. States and territories continue to use this form.
Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.
To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.
CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<11 COVID-19 case records with a given values). Suppression includes low frequency combinations of case month, geographic characteristics (county and state of residence), and demographic characteristics (sex, age group, race, and ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.
COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These and other COVID-19 data are available from multiple public locations: COVID Data Tracker; United States COVID-19 Cases and Deaths by State; COVID-19 Vaccination Reporting Data Systems; and COVID-19 Death Data and Resources.
Notes:
March 1, 2022: The "COVID-19 Case Surveillance Public Use Data with Geography" will be updated on a monthly basis.
April 7, 2022: An adjustment was made to CDC’s cleaning algorithm for COVID-19 line level case notification data. An assumption in CDC's algorithm led to misclassifying deaths that were not COVID-19 related. The algorithm has since been revised, and this dataset update reflects corrected individual level information about death status for all cases collected to date.
June 25, 2024: An adjustment
Note: The cumulative case count for some counties (with small population) is higher than expected due to the inclusion of non-permanent residents in COVID-19 case counts.
Reporting of Aggregate Case and Death Count data was discontinued on May 11, 2023, with the expiration of the COVID-19 public health emergency declaration. Although these data will continue to be publicly available, this dataset will no longer be updated.
Aggregate Data Collection Process Since the beginning of the COVID-19 pandemic, data were reported through a robust process with the following steps:
This process was collaborative, with CDC and jurisdictions working together to ensure the accuracy of COVID-19 case and death numbers. County counts provided the most up-to-date numbers on cases and deaths by report date. Throughout data collection, CDC retrospectively updated counts to correct known data quality issues. CDC also worked with jurisdictions after the end of the public health emergency declaration to finalize county data.
Important note: The counts reflected during a given time period in this dataset may not match the counts reflected for the same time period in the daily archived dataset noted above. Discrepancies may exist due to differences between county and state COVID-19 case surveillance and reconciliation efforts.
The surveillance case definition for COVID-19, a nationally notifiable disease, was first described in a position statement from the Council for State and Territorial Epidemiologists, which was later revised. However, there is some variation in how jurisdictions implement these case classifications. More information on how CDC collects COVID-19 case surveillance data can be found at FAQ: COVID-19 Data and Surveillance.
Confirmed and Probable Counts In this dataset, counts by jurisdiction are not displayed by confirmed or probable status. Instead, counts of confirmed and probable cases and deaths are included in the Total Cases and Total Deaths columns, when available. Not all jurisdictions reported probable cases and deaths to CDC. Confirmed and probable case definition criteria are described here: "https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-covid-19/">Coronavirus Disease 2019 (COVID-19) 2023 Case Definition | CDC Council of State and Territorial Epidemiologists (ymaws.com).
Deaths COVID-19 deaths were reported to CDC from several sources since the beginning of the pandemic including aggregate death data and NCHS Provisional Death Counts. Historic information presented on the COVID Data Tracker pages were based on the same source (Aggregate Data) as the present dataset until the expiration of the public health emergency declaration on May 11, 2023; however, the NCHS Death Counts are based on death certificate data that use information reported by physicians, medical examiners, or coroners in the cause-of-death section of each certificate. Counts from previous weeks were continually revised as more records were received and processed.
Number of Jurisdictions Reporting There were 60 public health jurisdictions that reported cases and deaths of COVID-19. This included the 50 states, the District of Columbia, New York City, the U.S. territories of American Samoa, Guam, the Commonwealth of the Northern Mariana Islands, Puerto Rico, and the U.S Virgin Islands as well as three independent countries in compacts of free association with the United States, Federated States of Micronesia, Republic of the Marshall Islands, and Republic of Palau. In total there were 3,222 counties for which counts were tracked within the 60 public health jurisdictions.
Additional COVID-19 public use datasets, include line-level (patient-level) data, are available at: https://data.cdc.gov/browse?tags=covid-19.
Note: In early 2020, Alaska enacted changes to their counties/boroughs due to low populations in certain areas:
Case and death counts for Yakutat City and Borough, Alaska, are shown as 0 by default. Case and death counts for Hoonah-Angoon Census Area, Alaska, represent total cases and deaths in residents of Hoonah-Angoon Census Area, Alaska, and Yakutat City and Borough, Alaska. Case and death counts for Bristol Bay Borough, Alaska, are shown as 0 by default. Case and death counts for Lake and Peninsula Borough, Alaska, represent total cases and deaths in residents of Lake and Peninsula Borough, Alaska, and Bristol Bay Borough, Alaska.
Historical cases and deaths are not tracked separately in the county level datasets, and differences in weekly new cases and deaths could exist when county-level data are aggregated to the state-level (i.e., when compared to this dataset: https://data.cdc.gov/Case-Surveillance/United-States-COVID-19-Cases-and-Deaths-by-State-o/9mfq-cb36).
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. They are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
As described on the NYTimes Github page.
For each date, we show the cumulative number of confirmed cases and deaths as reported that day in that county or state. All cases and deaths are counted on the date they are first announced.
In some instances, we report data from multiple counties or other non-county geographies as a single county. For instance, we report a single value for New York City, comprising the cases for New York, Kings, Queens, Bronx and Richmond Counties. In these instances the FIPS code field will be empty. (We may assign FIPS codes to these geographies in the future.) See the list of geographic exceptions.
Cities like St. Louis and Baltimore that are administered separately from an adjacent county of the same name are counted separately.
“Unknown” Counties Many state health departments choose to report cases separately when the patient’s county of residence is unknown or pending determination. In these instances, we record the county name as “Unknown.” As more information about these cases becomes available, the cumulative number of cases in “Unknown” counties may fluctuate.
Sometimes, cases are first reported in one county and then moved to another county. As a result, the cumulative number of cases may change for a given county.
Geographic Exceptions New York City All cases for the five boroughs of New York City (New York, Kings, Queens, Bronx and Richmond counties) are assigned to a single area called New York City.
Kansas City, Mo. Four counties (Cass, Clay, Jackson and Platte) overlap the municipality of Kansas City, Mo. The cases and deaths that we show for these four counties are only for the portions exclusive of Kansas City. Cases and deaths for Kansas City are reported as their own line.
Joplin, Mo. Joplin is reported separately from Jasper and Newton Counties.
Chicago All cases and deaths for Chicago are reported as part of Cook County.
Thanks to the New York Times for providing this data. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
The Gitbub repository can be found here: https://github.com/nytimes/covid-19-data
There has been little research on United States homicide rates from a long-term perspective, primarily because there has been no consistent data series on a particular place preceding the Uniform Crime Reports (UCR), which began its first full year in 1931. To fill this research gap, this project created a data series on homicides per capita for New York City that spans two centuries. The goal was to create a site-specific, individual-based data series that could be used to examine major social shifts related to homicide, such as mass immigration, urban growth, war, demographic changes, and changes in laws. Data were also gathered on various other sites, particularly in England, to allow for comparisons on important issues, such as the post-World War II wave of violence. The basic approach to the data collection was to obtain the best possible estimate of annual counts and the most complete information on individual homicides. The annual count data (Parts 1 and 3) were derived from multiple sources, including the Federal Bureau of Investigation's Uniform Crime Reports and Supplementary Homicide Reports, as well as other official counts from the New York City Police Department and the City Inspector in the early 19th century. The data include a combined count of murder and manslaughter because charge bargaining often blurs this legal distinction. The individual-level data (Part 2) were drawn from coroners' indictments held by the New York City Municipal Archives, and from daily newspapers. Duplication was avoided by keeping a record for each victim. The estimation technique known as "capture-recapture" was used to estimate homicides not listed in either source. Part 1 variables include counts of New York City homicides, arrests, and convictions, as well as the homicide rate, race or ethnicity and gender of victims, type of weapon used, and source of data. Part 2 includes the date of the murder, the age, sex, and race of the offender and victim, and whether the case led to an arrest, trial, conviction, execution, or pardon. Part 3 contains annual homicide counts and rates for various comparison sites including Liverpool, London, Kent, Canada, Baltimore, Los Angeles, Seattle, and San Francisco.