AP VoteCast is a survey of the American electorate conducted by NORC at the University of Chicago for Fox News, NPR, PBS NewsHour, Univision News, USA Today Network, The Wall Street Journal and The Associated Press.
AP VoteCast combines interviews with a random sample of registered voters drawn from state voter files with self-identified registered voters selected using nonprobability approaches. In general elections, it also includes interviews with self-identified registered voters conducted using NORC’s probability-based AmeriSpeak® panel, which is designed to be representative of the U.S. population.
Interviews are conducted in English and Spanish. Respondents may receive a small monetary incentive for completing the survey. Participants selected as part of the random sample can be contacted by phone and mail and can take the survey by phone or online. Participants selected as part of the nonprobability sample complete the survey online.
In the 2020 general election, the survey of 133,103 interviews with registered voters was conducted between Oct. 26 and Nov. 3, concluding as polls closed on Election Day. AP VoteCast delivered data about the presidential election in all 50 states as well as all Senate and governors’ races in 2020.
This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!
Instead, use statistical software such as R or SPSS to weight the data.
National Survey
The national AP VoteCast survey of voters and nonvoters in 2020 is based on the results of the 50 state-based surveys and a nationally representative survey of 4,141 registered voters conducted between Nov. 1 and Nov. 3 on the probability-based AmeriSpeak panel. It included 41,776 probability interviews completed online and via telephone, and 87,186 nonprobability interviews completed online. The margin of sampling error is plus or minus 0.4 percentage points for voters and 0.9 percentage points for nonvoters.
State Surveys
In 20 states in 2020, AP VoteCast is based on roughly 1,000 probability-based interviews conducted online and by phone, and roughly 3,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 2.3 percentage points for voters and 5.5 percentage points for nonvoters.
In an additional 20 states, AP VoteCast is based on roughly 500 probability-based interviews conducted online and by phone, and roughly 2,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 2.9 percentage points for voters and 6.9 percentage points for nonvoters.
In the remaining 10 states, AP VoteCast is based on about 1,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 4.5 percentage points for voters and 11.0 percentage points for nonvoters.
Although there is no statistically agreed upon approach for calculating margins of error for nonprobability samples, these margins of error were estimated using a measure of uncertainty that incorporates the variability associated with the poll estimates, as well as the variability associated with the survey weights as a result of calibration. After calibration, the nonprobability sample yields approximately unbiased estimates.
As with all surveys, AP VoteCast is subject to multiple sources of error, including from sampling, question wording and order, and nonresponse.
Sampling Details
Probability-based Registered Voter Sample
In each of the 40 states in which AP VoteCast included a probability-based sample, NORC obtained a sample of registered voters from Catalist LLC’s registered voter database. This database includes demographic information, as well as addresses and phone numbers for registered voters, allowing potential respondents to be contacted via mail and telephone. The sample is stratified by state, partisanship, and a modeled likelihood to respond to the postcard based on factors such as age, race, gender, voting history, and census block group education. In addition, NORC attempted to match sampled records to a registered voter database maintained by L2, which provided additional phone numbers and demographic information.
Prior to dialing, all probability sample records were mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Postcards were addressed by name to the sampled registered voter if that individual was under age 35; postcards were addressed to “registered voter” in all other cases. Telephone interviews were conducted with the adult that answered the phone following confirmation of registered voter status in the state.
Nonprobability Sample
Nonprobability participants include panelists from Dynata or Lucid, including members of its third-party panels. In addition, some registered voters were selected from the voter file, matched to email addresses by V12, and recruited via an email invitation to the survey. Digital fingerprint software and panel-level ID validation is used to prevent respondents from completing the AP VoteCast survey multiple times.
AmeriSpeak Sample
During the initial recruitment phase of the AmeriSpeak panel, randomly selected U.S. households were sampled with a known, non-zero probability of selection from the NORC National Sample Frame and then contacted by mail, email, telephone and field interviewers (face-to-face). The panel provides sample coverage of approximately 97% of the U.S. household population. Those excluded from the sample include people with P.O. Box-only addresses, some addresses not listed in the U.S. Postal Service Delivery Sequence File and some newly constructed dwellings. Registered voter status was confirmed in field for all sampled panelists.
Weighting Details
AP VoteCast employs a four-step weighting approach that combines the probability sample with the nonprobability sample and refines estimates at a subregional level within each state. In a general election, the 50 state surveys and the AmeriSpeak survey are weighted separately and then combined into a survey representative of voters in all 50 states.
State Surveys
First, weights are constructed separately for the probability sample (when available) and the nonprobability sample for each state survey. These weights are adjusted to population totals to correct for demographic imbalances in age, gender, education and race/ethnicity of the responding sample compared to the population of registered voters in each state. In 2020, the adjustment targets are derived from a combination of data from the U.S. Census Bureau’s November 2018 Current Population Survey Voting and Registration Supplement, Catalist’s voter file and the Census Bureau’s 2018 American Community Survey. Prior to adjusting to population totals, the probability-based registered voter list sample weights are adjusted for differential non-response related to factors such as availability of phone numbers, age, race and partisanship.
Second, all respondents receive a calibration weight. The calibration weight is designed to ensure the nonprobability sample is similar to the probability sample in regard to variables that are predictive of vote choice, such as partisanship or direction of the country, which cannot be fully captured through the prior demographic adjustments. The calibration benchmarks are based on regional level estimates from regression models that incorporate all probability and nonprobability cases nationwide.
Third, all respondents in each state are weighted to improve estimates for substate geographic regions. This weight combines the weighted probability (if available) and nonprobability samples, and then uses a small area model to improve the estimate within subregions of a state.
Fourth, the survey results are weighted to the actual vote count following the completion of the election. This weighting is done in 10–30 subregions within each state.
National Survey
In a general election, the national survey is weighted to combine the 50 state surveys with the nationwide AmeriSpeak survey. Each of the state surveys is weighted as described. The AmeriSpeak survey receives a nonresponse-adjusted weight that is then adjusted to national totals for registered voters that in 2020 were derived from the U.S. Census Bureau’s November 2018 Current Population Survey Voting and Registration Supplement, the Catalist voter file and the Census Bureau’s 2018 American Community Survey. The state surveys are further adjusted to represent their appropriate proportion of the registered voter population for the country and combined with the AmeriSpeak survey. After all votes are counted, the national data file is adjusted to match the national popular vote for president.
https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/D-31425https://dataverse-staging.rdmc.unc.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=hdl:1902.29/D-31425
This survey focuses on the congressional election. Issues addressed include approval of President Clinton, Monica Lewinsky, likelihood of voting in the election, and the most important issues in the election. Demographic variables include sex, age, education, race, income, and party affiliation.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
The Associated Press NFL Most Valuable Player Award is an annual award which has been presented since 1957 to the NFL player deemed to have been the best during the regular football season. Since the award was first presented, a total of 11 players have won the trophy more than once. Legendary quarterback Peyton Manning won the award a record five times during his career, including four times while playing for the Indianapolis Colts and most recently in 2013 with the Denver Broncos.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
In this paper, we revisit the effect of ballot access laws on voter confidence in the outcome of elections. We argue that voter confidence is conditioned by partisanship. Democrats and Republicans view election laws through a partisan lens, which is especially triggered when coalitions lose. We used The Integrity of Voting data set, along with other data sets, to test our hypotheses. The sample frame for the Integrity of Voting Survey was eligible persons who voted in the 2020 Presidential elections with accessible internet email addresses. Our sample consisted of two samples from two different vendors. Surveys were conducted with 17,526 voters drawing on two independent samples of registered voters who reported voting in the 2020 Presidential election. Email addresses for registered voters in each state were purchased from L2, a commercial vendor specializing in obtaining email addresses for registered voters. Interviews were solicited from one million voters in all 50 states, with 10,770 completed interviews for a response rate of .011%. A second sample of internet interviews were solicited and completed with 6,756 2020 voters using Dynata’s proprietary select-in survey of voters in selected states with smaller populations of registered voters. A minimum of roughly 100 2020 election voters were interviewed in each state. Our state samples were weighted using a raking technique on age, race, gender, education, and vote mode demographics from the U.S. Census Bureau’s 2020 Voting and Registration in the Election of November 2020 supplement to the Current Population survey (2021), as well as party identification totals from post-election exit polls conducted by the Associated Press (2020). Surveys were conducted between the first week in December, 2020 and the first week in February 2021.
THIS DATASET WAS LAST UPDATED AT 8:10 PM EASTERN ON MARCH 24
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Update September 20, 2021: Data and overview updated to reflect data used in the September 15 story Over Half of States Have Rolled Back Public Health Powers in Pandemic. It includes 303 state or local public health leaders who resigned, retired or were fired between April 1, 2020 and Sept. 12, 2021. Previous versions of this dataset reflected data used in the Dec. 2020 and April 2021 stories.
Across the U.S., state and local public health officials have found themselves at the center of a political storm as they combat the worst pandemic in a century. Amid a fractured federal response, the usually invisible army of workers charged with preventing the spread of infectious disease has become a public punching bag.
In the midst of the coronavirus pandemic, at least 303 state or local public health leaders in 41 states have resigned, retired or been fired since April 1, 2020, according to an ongoing investigation by The Associated Press and KHN.
According to experts, that is the largest exodus of public health leaders in American history.
Many left due to political blowback or pandemic pressure, as they became the target of groups that have coalesced around a common goal — fighting and even threatening officials over mask orders and well-established public health activities like quarantines and contact tracing. Some left to take higher profile positions, or due to health concerns. Others were fired for poor performance. Dozens retired. An untold number of lower level staffers have also left.
The result is a further erosion of the nation’s already fragile public health infrastructure, which KHN and the AP documented beginning in 2020 in the Underfunded and Under Threat project.
The AP and KHN found that:
To get total numbers of exits by state, broken down by state and local departments, use this query
KHN and AP counted how many state and local public health leaders have left their jobs between April 1, 2020 and Sept. 12, 2021.
The government tasks public health workers with improving the health of the general population, through their work to encourage healthy living and prevent infectious disease. To that end, public health officials do everything from inspecting water and food safety to testing the nation’s babies for metabolic diseases and contact tracing cases of syphilis.
Many parts of the country have a health officer and a health director/administrator by statute. The analysis counted both of those positions if they existed. For state-level departments, the count tracks people in the top and second-highest-ranking job.
The analysis includes exits of top department officials regardless of reason, because no matter the reason, each left a vacancy at the top of a health agency during the pandemic. Reasons for departures include political pressure, health concerns and poor performance. Others left to take higher profile positions or to retire. Some departments had multiple top officials exit over the course of the pandemic; each is included in the analysis.
Reporters compiled the exit list by reaching out to public health associations and experts in every state and interviewing hundreds of public health employees. They also received information from the National Association of City and County Health Officials, and combed news reports and records.
Public health departments can be found at multiple levels of government. Each state has a department that handles these tasks, but most states also have local departments that either operate under local or state control. The population served by each local health department is calculated using the U.S. Census Bureau 2019 Population Estimates based on each department’s jurisdiction.
KHN and the AP have worked since the spring on a series of stories documenting the funding, staffing and problems around public health. A previous data distribution detailed a decade's worth of cuts to state and local spending and staffing on public health. That data can be found here.
Findings and the data should be cited as: "According to a KHN and Associated Press report."
If you know of a public health official in your state or area who has left that position between April 1, 2020 and Sept. 12, 2021 and isn't currently in our dataset, please contact authors Anna Maria Barry-Jester annab@kff.org, Hannah Recht hrecht@kff.org, Michelle Smith mrsmith@ap.org and Lauren Weber laurenw@kff.org.
During the period under review, the press with the highest number of publications on the war in Ukraine was the Rzeczpospolita newspaper. Followed by Gazeta Polska Codziennie and Gazeta Wyborcza.
April 29, 2020
October 13, 2020
The COVID Tracking Project is releasing more precise total testing counts, and has changed the way it is distributing the data that ends up on this site. Previously, total testing had been represented by positive tests plus negative tests. As states are beginning to report more specific testing counts, The COVID Tracking Project is moving toward reporting those numbers directly.
This may make it more difficult to compare your state against others in terms of positivity rate, but the net effect is we now have more precise counts:
Total Test Encounters: Total tests increase by one for every individual that is tested that day. Additional tests for that individual on that day (i.e., multiple swabs taken at the same time) are not included
Total PCR Specimens: Total tests increase by one for every testing sample retrieved from an individual. Multiple samples from an individual on a single day can be included in the count
Unique People Tested: Total tests increase by one the first time an individual is tested. The count will not increase in later days if that individual is tested again – even months later
These three totals are not all available for every state. The COVID Tracking Project prioritizes the different count types for each state in this order:
Total Test Encounters
Total PCR Specimens
Unique People Tested
If the state does not provide any of those totals directly, The COVID Tracking Project falls back to the initial calculation of total tests that it has provided up to this point: positive + negative tests.
One of the above total counts will be the number present in the cumulative_total_test_results
and total_test_results_increase
columns.
The positivity rates provided on this site will divide confirmed cases by one of these total_test_results
columns.
The AP is using data collected by the COVID Tracking Project to measure COVID-19 testing across the United States.
The COVID Tracking Project data is available at the state level in the United States. The AP has paired this data with population figures and has calculated testing rates and death rates per 1,000 people.
This data is from The COVID Tracking Project API that is updated regularly throughout the day. Like all organizations dealing with data, The COVID Tracking Project is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find The COVID Tracking Project daily data reports, and a clean version of their feed.
A Note on timing:
- The COVID Tracking Project updates regularly throughout the day, but state numbers will come in at different times. The entire Tracking Project dataset will be updated between 4-5pm EDT daily. Keep this time in mind when reporting on stories comparing states. At certain times of day, one state may be more up to date than another. We have included the date_modified
timestamp for state-level data, which represents the last time the state updated its data. The date_checked
value in the state-level data reflects the last time The COVID Tracking Project checked the state source. We have also included the last_modified
timestamp for the national-level data, which marks the last time the national data was updated.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
total_people_tested
counts do not include pending tests. They are the total number of tests that have returned positive
or negative
.This data should be credited to The COVID Tracking Project
Nicky Forster — nforster@ap.org
A U.S. survey found that Gen Z adults had, for the most part, lower awareness of established news brands than U.S. adults in general, with 16 percent of Gen Z saying they had never heard of NBC and around a third admitting they were not aware of The New Yorker. MSNBC, The Associated Press, Bloomberg, and Breitbart also fared poorly in this respect, though Gen Z's news consumption habits (predominantly online and via social media) mean that this lack of awareness of major brands is less surprising than it may seem.
The Marshall Project, the nonprofit investigative newsroom dedicated to the U.S. criminal justice system, has partnered with The Associated Press to compile data on the prevalence of COVID-19 infection in prisons across the country. The Associated Press is sharing this data as the most comprehensive current national source of COVID-19 outbreaks in state and federal prisons.
Lawyers, criminal justice reform advocates and families of the incarcerated have worried about what was happening in prisons across the nation as coronavirus began to take hold in the communities outside. Data collected by The Marshall Project and AP shows that hundreds of thousands of prisoners, workers, correctional officers and staff have caught the illness as prisons became the center of some of the country’s largest outbreaks. And thousands of people — most of them incarcerated — have died.
In December, as COVID-19 cases spiked across the U.S., the news organizations also shared cumulative rates of infection among prison populations, to better gauge the total effects of the pandemic on prison populations. The analysis found that by mid-December, one in five state and federal prisoners in the United States had tested positive for the coronavirus -- a rate more than four times higher than the general population.
This data, which is updated weekly, is an effort to track how those people have been affected and where the crisis has hit the hardest.
The data tracks the number of COVID-19 tests administered to people incarcerated in all state and federal prisons, as well as the staff in those facilities. It is collected on a weekly basis by Marshall Project and AP reporters who contact each prison agency directly and verify published figures with officials.
Each week, the reporters ask every prison agency for the total number of coronavirus tests administered to its staff members and prisoners, the cumulative number who tested positive among staff and prisoners, and the numbers of deaths for each group.
The time series data is aggregated to the system level; there is one record for each prison agency on each date of collection. Not all departments could provide data for the exact date requested, and the data indicates the date for the figures.
To estimate the rate of infection among prisoners, we collected population data for each prison system before the pandemic, roughly in mid-March, in April, June, July, August, September and October. Beginning the week of July 28, we updated all prisoner population numbers, reflecting the number of incarcerated adults in state or federal prisons. Prior to that, population figures may have included additional populations, such as prisoners housed in other facilities, which were not captured in our COVID-19 data. In states with unified prison and jail systems, we include both detainees awaiting trial and sentenced prisoners.
To estimate the rate of infection among prison employees, we collected staffing numbers for each system. Where current data was not publicly available, we acquired other numbers through our reporting, including calling agencies or from state budget documents. In six states, we were unable to find recent staffing figures: Alaska, Hawaii, Kentucky, Maryland, Montana, Utah.
To calculate the cumulative COVID-19 impact on prisoner and prison worker populations, we aggregated prisoner and staff COVID case and death data up through Dec. 15. Because population snapshots do not account for movement in and out of prisons since March, and because many systems have significantly slowed the number of new people being sent to prison, it’s difficult to estimate the total number of people who have been held in a state system since March. To be conservative, we calculated our rates of infection using the largest prisoner population snapshots we had during this time period.
As with all COVID-19 data, our understanding of the spread and impact of the virus is limited by the availability of testing. Epidemiology and public health experts say that aside from a few states that have recently begun aggressively testing in prisons, it is likely that there are more cases of COVID-19 circulating undetected in facilities. Sixteen prison systems, including the Federal Bureau of Prisons, would not release information about how many prisoners they are testing.
Corrections departments in Indiana, Kansas, Montana, North Dakota and Wisconsin report coronavirus testing and case data for juvenile facilities; West Virginia reports figures for juvenile facilities and jails. For consistency of comparison with other state prison systems, we removed those facilities from our data that had been included prior to July 28. For these states we have also removed staff data. Similarly, Pennsylvania’s coronavirus data includes testing and cases for those who have been released on parole. We removed these tests and cases for prisoners from the data prior to July 28. The staff cases remain.
There are four tables in this data:
covid_prison_cases.csv
contains weekly time series data on tests, infections and deaths in prisons. The first dates in the table are on March 26. Any questions that a prison agency could not or would not answer are left blank.
prison_populations.csv
contains snapshots of the population of people incarcerated in each of these prison systems for whom data on COVID testing and cases are available. This varies by state and may not always be the entire number of people incarcerated in each system. In some states, it may include other populations, such as those on parole or held in state-run jails. This data is primarily for use in calculating rates of testing and infection, and we would not recommend using these numbers to compare the change in how many people are being held in each prison system.
staff_populations.csv
contains a one-time, recent snapshot of the headcount of workers for each prison agency, collected as close to April 15 as possible.
covid_prison_rates.csv
contains the rates of cases and deaths for prisoners. There is one row for every state and federal prison system and an additional row with the National
totals.
The Associated Press and The Marshall Project have created several queries to help you use this data:
Get your state's prison COVID data: Provides each week's data from just your state and calculates a cases-per-100000-prisoners rate, a deaths-per-100000-prisoners rate, a cases-per-100000-workers rate and a deaths-per-100000-workers rate here
Rank all systems' most recent data by cases per 100,000 prisoners here
Find what percentage of your state's total cases and deaths -- as reported by Johns Hopkins University -- occurred within the prison system here
In stories, attribute this data to: “According to an analysis of state prison cases by The Marshall Project, a nonprofit investigative newsroom dedicated to the U.S. criminal justice system, and The Associated Press.”
Many reporters and editors at The Marshall Project and The Associated Press contributed to this data, including: Katie Park, Tom Meagher, Weihua Li, Gabe Isman, Cary Aspinwall, Keri Blakinger, Jake Bleiberg, Andrew R. Calderón, Maurice Chammah, Andrew DeMillo, Eli Hager, Jamiles Lartey, Claudia Lauer, Nicole Lewis, Humera Lodhi, Colleen Long, Joseph Neff, Michelle Pitcher, Alysia Santo, Beth Schwartzapfel, Damini Sharma, Colleen Slevin, Christie Thompson, Abbie VanSickle, Adria Watson, Andrew Welsh-Huggins.
If you have questions about the data, please email The Marshall Project at info+covidtracker@themarshallproject.org or file a Github issue.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
At the end of the 2018 fiscal year, the U.S. had resettled 22,491 refugees -- a small fraction of the number of people who had entered in prior years. This is the smallest annual number of refugees since Congress passed a law in 1980 creating the modern resettlement system.
It's also well below the cap of 45,000 set by the administration for 2018, and less than thirty percent of the number granted entry in the final year of Barack Obama’s presidency. It's also significantly below the cap for 2019 announced by President Trump's administration, which is 30,000.
The Associated Press is updating its data on refugees through fiscal year 2018, which ended Sept. 30, to help reporters continue coverage of this story. Previous Associated Press data on refugees can be found here.
Data obtained from the State Department's Bureau of Population, Refugees and Migration show the mix of refugees also has changed substantially:
The past fiscal year marks a dramatic change in the refugee program, with only a fraction as many people entering. That affects refugees currently in the U.S., who may be waiting on relatives to arrive. It affects refugees in other countries, hoping to get to the United States for safety or other reasons. And it affects the organizations that work to house and resettle these refugees, who only a few years ago were dealing with record numbers of people. Several agencies have already closed their doors; others have laid off workers and cut back their programs.
Because there is wide geographic variations on resettlement depending on refugees' country of origin, some U.S. cities have been more affected by this than others. For instance, in past years, Iraqis have resettled most often in San Diego, Calif., or Houston. Now, with only a handful of Iraqis being admitted in 2018, those cities have seen some of the biggest drop-offs in resettlement numbers.
Datasheets include:
The data tracks the refugees' stated destination in the United States. In many cases, this is where the refugees first lived, although many may have since moved.
Be aware that some cities with particularly high totals may be the locations of refugee resettlement programs -- for instance, Glendale, Calif., is home to both Catholic Charities of Los Angeles and the International Rescue Committee of Los Angeles, which work at resettling refugees.
The data for refugees from other countries - or for any particular timeframe since 2002 - can be accessed through the State Department's Refugee Processing Center's site by clicking on "Arrivals by Destination and Nationality."
The Refugee Processing Center used to publish a state-by-state list of affiliate refugee organizations -- the groups that help refugees settle in the U.S. That list was last updated in January 2017, so it may now be out of date. It can be found here.
For general information about the U.S. refugee resettlement program, see this State Department description. For more detailed information about the program and proposed 2018 caps and changes, see the FY 2018 Report to Congress.
The Associated Press has set up a number of pre-written queries to help you filter this data and find local stories. Queries can be accessed by clicking on their names in the upper right hand bar.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
AP VoteCast is a survey of the American electorate conducted by NORC at the University of Chicago for Fox News, NPR, PBS NewsHour, Univision News, USA Today Network, The Wall Street Journal and The Associated Press.
AP VoteCast combines interviews with a random sample of registered voters drawn from state voter files with self-identified registered voters selected using nonprobability approaches. In general elections, it also includes interviews with self-identified registered voters conducted using NORC’s probability-based AmeriSpeak® panel, which is designed to be representative of the U.S. population.
Interviews are conducted in English and Spanish. Respondents may receive a small monetary incentive for completing the survey. Participants selected as part of the random sample can be contacted by phone and mail and can take the survey by phone or online. Participants selected as part of the nonprobability sample complete the survey online.
In the 2020 general election, the survey of 133,103 interviews with registered voters was conducted between Oct. 26 and Nov. 3, concluding as polls closed on Election Day. AP VoteCast delivered data about the presidential election in all 50 states as well as all Senate and governors’ races in 2020.
This is survey data and must be properly weighted during analysis: DO NOT REPORT THIS DATA AS RAW OR AGGREGATE NUMBERS!!
Instead, use statistical software such as R or SPSS to weight the data.
National Survey
The national AP VoteCast survey of voters and nonvoters in 2020 is based on the results of the 50 state-based surveys and a nationally representative survey of 4,141 registered voters conducted between Nov. 1 and Nov. 3 on the probability-based AmeriSpeak panel. It included 41,776 probability interviews completed online and via telephone, and 87,186 nonprobability interviews completed online. The margin of sampling error is plus or minus 0.4 percentage points for voters and 0.9 percentage points for nonvoters.
State Surveys
In 20 states in 2020, AP VoteCast is based on roughly 1,000 probability-based interviews conducted online and by phone, and roughly 3,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 2.3 percentage points for voters and 5.5 percentage points for nonvoters.
In an additional 20 states, AP VoteCast is based on roughly 500 probability-based interviews conducted online and by phone, and roughly 2,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 2.9 percentage points for voters and 6.9 percentage points for nonvoters.
In the remaining 10 states, AP VoteCast is based on about 1,000 nonprobability interviews conducted online. In these states, the margin of sampling error is about plus or minus 4.5 percentage points for voters and 11.0 percentage points for nonvoters.
Although there is no statistically agreed upon approach for calculating margins of error for nonprobability samples, these margins of error were estimated using a measure of uncertainty that incorporates the variability associated with the poll estimates, as well as the variability associated with the survey weights as a result of calibration. After calibration, the nonprobability sample yields approximately unbiased estimates.
As with all surveys, AP VoteCast is subject to multiple sources of error, including from sampling, question wording and order, and nonresponse.
Sampling Details
Probability-based Registered Voter Sample
In each of the 40 states in which AP VoteCast included a probability-based sample, NORC obtained a sample of registered voters from Catalist LLC’s registered voter database. This database includes demographic information, as well as addresses and phone numbers for registered voters, allowing potential respondents to be contacted via mail and telephone. The sample is stratified by state, partisanship, and a modeled likelihood to respond to the postcard based on factors such as age, race, gender, voting history, and census block group education. In addition, NORC attempted to match sampled records to a registered voter database maintained by L2, which provided additional phone numbers and demographic information.
Prior to dialing, all probability sample records were mailed a postcard inviting them to complete the survey either online using a unique PIN or via telephone by calling a toll-free number. Postcards were addressed by name to the sampled registered voter if that individual was under age 35; postcards were addressed to “registered voter” in all other cases. Telephone interviews were conducted with the adult that answered the phone following confirmation of registered voter status in the state.
Nonprobability Sample
Nonprobability participants include panelists from Dynata or Lucid, including members of its third-party panels. In addition, some registered voters were selected from the voter file, matched to email addresses by V12, and recruited via an email invitation to the survey. Digital fingerprint software and panel-level ID validation is used to prevent respondents from completing the AP VoteCast survey multiple times.
AmeriSpeak Sample
During the initial recruitment phase of the AmeriSpeak panel, randomly selected U.S. households were sampled with a known, non-zero probability of selection from the NORC National Sample Frame and then contacted by mail, email, telephone and field interviewers (face-to-face). The panel provides sample coverage of approximately 97% of the U.S. household population. Those excluded from the sample include people with P.O. Box-only addresses, some addresses not listed in the U.S. Postal Service Delivery Sequence File and some newly constructed dwellings. Registered voter status was confirmed in field for all sampled panelists.
Weighting Details
AP VoteCast employs a four-step weighting approach that combines the probability sample with the nonprobability sample and refines estimates at a subregional level within each state. In a general election, the 50 state surveys and the AmeriSpeak survey are weighted separately and then combined into a survey representative of voters in all 50 states.
State Surveys
First, weights are constructed separately for the probability sample (when available) and the nonprobability sample for each state survey. These weights are adjusted to population totals to correct for demographic imbalances in age, gender, education and race/ethnicity of the responding sample compared to the population of registered voters in each state. In 2020, the adjustment targets are derived from a combination of data from the U.S. Census Bureau’s November 2018 Current Population Survey Voting and Registration Supplement, Catalist’s voter file and the Census Bureau’s 2018 American Community Survey. Prior to adjusting to population totals, the probability-based registered voter list sample weights are adjusted for differential non-response related to factors such as availability of phone numbers, age, race and partisanship.
Second, all respondents receive a calibration weight. The calibration weight is designed to ensure the nonprobability sample is similar to the probability sample in regard to variables that are predictive of vote choice, such as partisanship or direction of the country, which cannot be fully captured through the prior demographic adjustments. The calibration benchmarks are based on regional level estimates from regression models that incorporate all probability and nonprobability cases nationwide.
Third, all respondents in each state are weighted to improve estimates for substate geographic regions. This weight combines the weighted probability (if available) and nonprobability samples, and then uses a small area model to improve the estimate within subregions of a state.
Fourth, the survey results are weighted to the actual vote count following the completion of the election. This weighting is done in 10–30 subregions within each state.
National Survey
In a general election, the national survey is weighted to combine the 50 state surveys with the nationwide AmeriSpeak survey. Each of the state surveys is weighted as described. The AmeriSpeak survey receives a nonresponse-adjusted weight that is then adjusted to national totals for registered voters that in 2020 were derived from the U.S. Census Bureau’s November 2018 Current Population Survey Voting and Registration Supplement, the Catalist voter file and the Census Bureau’s 2018 American Community Survey. The state surveys are further adjusted to represent their appropriate proportion of the registered voter population for the country and combined with the AmeriSpeak survey. After all votes are counted, the national data file is adjusted to match the national popular vote for president.