Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project
Number and percentage of deaths, by month and place of residence, 1991 to most recent year.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Provisional counts of the number of deaths registered in England and Wales, by age, sex, region and Index of Multiple Deprivation (IMD), in the latest weeks for which data are available.
THIS DATASET WAS LAST UPDATED AT 8:10 PM EASTERN ON MARCH 24
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
The dataset supports measure S.D.4.a of SD23. The Austin Municipal Court offers services via in person, phone, mail, email, online, in the community, in multiple locations, and during non-traditional hours to make it easier and more convenient for individuals to handle court business. This measure tracks the percentage of customers that utilize court services outside of normal business hours, defined as 8am-5pm Monday-Friday, and how many payments were made by methods other than in person. This measure helps determine how Court services are being used and enables the Court to allocate its resources to best meet the needs of the public. Historically, almost 30% of the operational hours are outside of traditional hours and the average percentage of payments made by mail and online has been over 59%. View more details and insights related to this measure on the story page: https://data.austintexas.gov/stories/s/c7z3-geii Data source: electronic case management system and manual tracking of payments received via mail. Calculation: Business hours are manually calculated annually. - A query is run from the court’s case management system to calculate how many monetary transactions were posted. S.D.4.a: Numerator: Number of payments received by mail is entered manually by the Customer Service unit that processes all incoming mail. S.D.4.a Denominator: Total number of web payments is calculated using a query to calculate a total number of payments with a payment type ‘web’ in the case management system. Measure time period: Annual (Fiscal Year) Automated: No Date of last description update: 4/10/2020
This dataset contains hourly pedestrian counts since 2009 from pedestrian sensor devices located across the city. The data is updated on a monthly basis and can be used to determine variations in pedestrian activity throughout the day.The sensor_id column can be used to merge the data with the Pedestrian Counting System - Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting pedestrian counts over time.Importants notes about this dataset:• Where no pedestrians have passed underneath a sensor during an hour, a count of zero will be shown for the sensor for that hour.• Directional readings are not included, though we hope to make this available later in the year. Directional readings are provided in the Pedestrian Counting System – Past Hour (counts per minute) dataset.The Pedestrian Counting System helps to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation.Related datasets:Pedestrian Counting System – Past Hour (counts per minute)Pedestrian Counting System - Sensor Locations
Current issue 23/09/2020
Please note: Sensors 67, 68 and 69 are showing duplicate records. We are currently working on a fix to resolve this.
This dataset contains minute by minute directional pedestrian counts for the last hour from pedestrian sensor devices located across the city. The data is updated every 15 minutes and can be used to determine variations in pedestrian activity throughout the day.
The sensor_id column can be used to merge the data with the Sensor Locations dataset which details the location, status and directional readings of sensors. Any changes to sensor locations are important to consider when analysing and interpreting historical pedestrian counting data.
Note this dataset may not contain a reading for every sensor for every minute as sensor devices only create a record when one or more pedestrians have passed underneath the sensor.
The Pedestrian Counting System helps us to understand how people use different city locations at different times of day to better inform decision-making and plan for the future. A representation of pedestrian volume which compares each location on any given day and time can be found in our Online Visualisation.
Related datasets:
Pedestrian Counting System – 2009 to Present (counts per hour).
Pedestrian Counting System - Sensor Locations
The Downtown Austin Community Court (DACC) was established to address quality of life and public order offenses occurring in the downtown Austin area utilizing a restorative justice court model. DACC’s priority population consists of individuals experiencing homelessness and the program’s main goal is to permanently stabilize individuals experiencing homelessness. To effectively serve these individuals, DACC created an Intensive Case Management (ICM) Program, which uses a client-centered and housing-focused approach. The ICM Program focuses on rehabilitating and stabilizing individuals using an evidenced-based model of wraparound interventions to help them achieve long-term stability. Because individuals participating in case management are literally homeless, case managers must actively seek their clients in the community through outreach activities and often times work on behalf of the client via collateral engagement with other social service and housing providers. This measure highlights case management activities accomplished via outreach and collateral engagement.
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Welcome to the English Language General Conversation Speech Dataset, a comprehensive and diverse collection of voice data specifically curated to advance the development of English language speech recognition models, with a particular focus on Canadian accents and dialects.
With high-quality audio recordings, detailed metadata, and accurate transcriptions, it empowers researchers and developers to enhance natural language processing, conversational AI, and Generative Voice AI algorithms. Moreover, it facilitates the creation of sophisticated voice assistants and voice bots tailored to the unique linguistic nuances found in the English language spoken in Canada.
Speech Data:This training dataset comprises 30 hours of audio recordings covering a wide range of topics and scenarios, ensuring robustness and accuracy in speech technology applications. To achieve this, we collaborated with a diverse network of 40 native English speakers from different states/provinces of Canada. This collaborative effort guarantees a balanced representation of Canadian accents, dialects, and demographics, reducing biases and promoting inclusivity.
Each audio recording captures the essence of spontaneous, unscripted conversations between two individuals, with an average duration ranging from 15 to 60 minutes. The speech data is available in WAV format, with stereo channel files having a bit depth of 16 bits and a sample rate of 8 kHz. The recording environment is generally quiet, without background noise and echo.
Metadata:In addition to the audio recordings, our dataset provides comprehensive metadata for each participant. This metadata includes the participant's age, gender, country, state, and dialect. Furthermore, additional metadata such as recording device detail, topic of recording, bit depth, and sample rate will be provided.
The metadata serves as a valuable tool for understanding and characterizing the data, facilitating informed decision-making in the development of English language speech recognition models.
Transcription:This dataset provides a manual verbatim transcription of each audio file to enhance your workflow efficiency. The transcriptions are available in JSON format. The transcriptions capture speaker-wise transcription with time-coded segmentation along with non-speech labels and tags.
Our goal is to expedite the deployment of English language conversational AI and NLP models by offering ready-to-use transcriptions, ultimately saving valuable time and resources in the development process.
Updates and Customization:We understand the importance of collecting data in various environments to build robust ASR models. Therefore, our voice dataset is regularly updated with new audio data captured in diverse real-world conditions.
If you require a custom training dataset with specific environmental conditions such as in-car, busy street, restaurant, or any other scenario, we can accommodate your request. We can provide voice data with customized sample rates ranging from 8kHz to 48kHz, allowing you to fine-tune your models for different audio recording setups. Additionally, we can also customize the transcription following your specific guidelines and requirements, to further support your ASR development process.
License:This audio dataset, created by FutureBeeAI, is now available for commercial use.
Conclusion:Whether you are training or fine-tuning speech recognition models, advancing NLP algorithms, exploring generative voice AI, or building cutting-edge voice assistants and bots, our dataset serves as a reliable and valuable resource.
The 24-Hour Log data can only be retained if the data is relevant to the Homeland Security mission and can be legally retained under Intelligence Oversight regulations. rnrnThe information entered into the log is dependent upon the content of the source report used to generate the log entry. The information for each incident varies depending upon the incident and circumstances surrounding the collection of information about the incident. rnrnInformation may be collected about the person who reported the incident and people involved in a reported incident, which may turn up varying levels of personal information, most often name and citizenship. Additional personal information may be collected and may include, but is not limited to, Social Security Number, passport or driver's license numbers or other identifying information; _location of residency, names of associates, political or religious aff1hat1ons or membership m some group or organization, and other information deemed important by the reporting official.
I created this dataset in an effort for data scientists to learn more about Lyme Disease. The field lacks a ton of funding, and I couldn't find any datasets online of EM rashes one of the most common symptoms of Lyme Disease. Lyme Disease also known as the "Silent Epidemic" affects more than 300,000 people each year.
The data contains images of the EM ( Erythema Migrans) also known as the "Bull's Eye Rash" It is one of the most prominent symptoms of Lyme disease. Also in the data contains several other types of rashes which may be often confused with EM rash by doctors and most of the medical field.
I've created this dataset by web scraping images from the internet and manually filtering the data, and making the dataset the best that it can be.
This is NOT a raw population dataset. We use our proprietary stack to combine detailed 'WorldPop' UN-adjusted, sex and age structured population data with a spatiotemporal OD matrix.
The result is a dataset where each record indicates how many people can be reached in a fixed timeframe (3 hours in this case) from that record's location.
The dataset is broken down into sex and age bands at 5 year intervals, e.g - male 25-29 (m_25) and also contains a set of features detailing the representative percentage of the total that the count represents.
The dataset provides 48420 records, one for each sampled location. These are labelled with a h3 index at resolution 7 - this allows easy plotting and filtering in Kepler.gl / Deck.gl / Mapbox, or easy conversion to a centroid (lat/lng) or the representative geometry of the hexagonal cell for integration with your geospatial applications and analyses.
A h3 resolution of 7, is a hexagonal cell area equivalent to: - ~1.9928 sq miles - ~5.1613 sq km
Higher resolutions or alternate geographies are available on request.
More information on the h3 system is available here: https://eng.uber.com/h3/
WorldPop data provides for a population count using a grid of 1 arc second intervals and is available for every geography.
More information on the WorldPop data is available here: https://www.worldpop.org/
One of the main use cases historically has been in prospecting for site selection, comparative analysis and network validation by asset investors and logistics companies. The data structure makes it very simple to filter out areas which do not meet requirements such as: - being able to access 70% of the UK population within 4 hours by Truck and show only the areas which do exhibit this characteristic.
Clients often combine different datasets either for different timeframes of interest, or to understand different populations, such as that of the unemployed, or those with particular qualifications within areas reachable as a commute.
This dataset contains many indicators in health such as Infant mortality rate, Proportion of population with advanced HIV infection with access to antiretroviral drugs, Death rate associated with malaria per 100,000 population, Tuberculosis prevalence rate per 100,000 population, etc. The whole list and their description can be find in this link https://bit.ly/2NZBRH3
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Welcome to the Vietnamese Call Center Speech Dataset for the Telecom domain designed to enhance the development of call center speech recognition models specifically for the Telecom industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.
This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Telecom domain, designed to build robust and accurate customer service speech technology.
This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.
This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.
To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:
These ready-to-use transcriptions accelerate the development of the Telecom domain call center conversational AI and ASR models for the Vietnamese language.
The dataset provides comprehensive metadata for each conversation and participant:
https://www.ecmwf.int/sites/default/files/ECMWF_Standard_Licence.pdfhttps://www.ecmwf.int/sites/default/files/ECMWF_Standard_Licence.pdf
Single prediction that uses
observations
prior information about the Earth-system
ECMWF's highest-resolution model
HRES Direct model output Products offers "High Frequency products"
4 forecast runs per day (00/06/12/18) (see dissemination schedule for details)
Hourly steps to step 144 for all four runs
Not all post-processed Products are available at 06/18 runs or in hourly steps.
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Welcome to the Malay Call Center Speech Dataset for the Telecom domain designed to enhance the development of call center speech recognition models specifically for the Telecom industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.
This training dataset comprises 30 Hours of call center audio recordings covering various topics and scenarios related to the Telecom domain, designed to build robust and accurate customer service speech technology.
This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.
This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.
To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:
These ready-to-use transcriptions accelerate the development of the Telecom domain call center conversational AI and ASR models for the Malay language.
The dataset provides comprehensive metadata for each conversation and participant:
https://www.futurebeeai.com/data-license-agreementhttps://www.futurebeeai.com/data-license-agreement
Welcome to the Australian English Call Center Speech Dataset for the Healthcare domain designed to enhance the development of call center speech recognition models specifically for the Healthcare industry. This dataset is meticulously curated to support advanced speech recognition, natural language processing, conversational AI, and generative voice AI algorithms.
This training dataset comprises 40 Hours of call center audio recordings covering various topics and scenarios related to the Healthcare domain, designed to build robust and accurate customer service speech technology.
This dataset offers a diverse range of conversation topics, call types, and outcomes, including both inbound and outbound calls with positive, neutral, and negative outcomes.
This extensive coverage ensures the dataset includes realistic call center scenarios, which is essential for developing effective customer support speech recognition models.
To facilitate your workflow, the dataset includes manual verbatim transcriptions of each call center audio file in JSON format. These transcriptions feature:
These ready-to-use transcriptions accelerate the development of the Healthcare domain call center conversational AI and ASR models for the Australian English language.
The dataset provides comprehensive metadata for each conversation and participant:
This metadata is a powerful tool for understanding and characterizing the data, enabling informed decision-making in the development of Australian English call center speech recognition models.
This dataset can be used for various applications in the fields of speech recognition, natural language processing, and conversational AI, specifically tailored to the Healthcare domain. Potential use cases include:
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
This dataset supports measure S.D.4.b, S.D.6 of SD23. The Downtown Austin Community Court (DACC) was established to address quality of life and public order offenses occurring in the downtown Austin area utilizing a restorative justice court model. DACC offers alternatives to fines and fees for defendants to handle their cases such as community service restitution and participation in rehabilitation services. Defendants who reside outside of a 40-mile radius from DACC are offered an opportunity to handle their case through correspondence action, meaning the entire judicial process can be handled through email or postal mail. Correspondence action eliminates an undue burden requiring a defendant to travel back to Austin to appear for their case and it allows for quicker access to court services of Austin residents by reducing the number of individuals required to appear for their case. This measure tracks how many cases involving non-homeless individuals have been handled through correspondence action recorded in the court's case management system. The data source for number and percentage of instances where people access court services other than in person for DACC has a annual range based on fiscal year 2015- first quarter fiscal year 2020. View more details and insights related to this measure on the story page: https://data.austintexas.gov/stories/s/vxci-zmm3
Data source: Data for this measure is collected by DACC staff inputting information from citations issued in DACC’s jurisdiction and from court processes. All data is entered in DACC’s electronic court case management platform.
Calculation S.D.4.b Numerator= number of cases with the correspondence action/Denominator= total number of cases involving non-homeless individuals.
Measure Time Period: Annually (Fiscal Year)
Automated: no
Date of last description update: 4/1/2020
Attribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Dataset Card for People's Speech
Dataset Summary
The People's Speech Dataset is among the world's largest English speech recognition corpus today that is licensed for academic and commercial usage under CC-BY-SA and CC-BY 4.0. It includes 30,000+ hours of transcribed speech in English languages with a diverse set of speakers. This open dataset is large enough to train speech-to-text systems and crucially is available with a permissive license.… See the full description on the dataset page: https://huggingface.co/datasets/MLCommons/peoples_speech.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
subject to appropriate attribution.
Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.
April 9, 2020
April 20, 2020
April 29, 2020
September 1st, 2020
February 12, 2021
new_deaths
column.February 16, 2021
The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.
The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.
This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.
The AP is updating this dataset hourly at 45 minutes past the hour.
To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.
Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic
Filter cases by state here
Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac
Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true
Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.
Pull the 100 counties with the highest per-capita confirmed cases here
Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.
The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.
@(https://datawrapper.dwcdn.net/nRyaf/15/)
<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>
Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here
This data should be credited to Johns Hopkins University COVID-19 tracking project