60 datasets found

Provisional COVID-19 death counts, rates, and percent of total deaths, by...
catalog.data.gov
data.virginia.gov
+2more
Updated Sep 26, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2025). Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residence [Dataset]. https://catalog.data.gov/dataset/provisional-covid-19-death-counts-rates-and-percent-of-total-deaths-by-jurisdiction-of-res
Explore at:
Dataset updated
Sep 26, 2025
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Description
This file contains COVID-19 death counts, death rates, and percent of total deaths by jurisdiction of residence. The data is grouped by different time periods including 3-month period, weekly, and total (cumulative since January 1, 2020). United States death counts and rates include the 50 states, plus the District of Columbia and New York City. New York state estimates exclude New York City. Puerto Rico is included in HHS Region 2 estimates. Deaths with confirmed or presumed COVID-19, coded to ICD–10 code U07.1. Number of deaths reported in this file are the total number of COVID-19 deaths received and coded as of the date of analysis and may not represent all deaths that occurred in that period. Counts of deaths occurring before or after the reporting period are not included in the file. Data during recent periods are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction and cause of death. Death counts should not be compared across states. Data timeliness varies by state. Some states report deaths on a daily basis, while other states report deaths weekly or monthly. The ten (10) United States Department of Health and Human Services (HHS) regions include the following jurisdictions. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; Region 2: New Jersey, New York, New York City, Puerto Rico; Region 3: Delaware, District of Columbia, Maryland, Pennsylvania, Virginia, West Virginia; Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, Tennessee; Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, Wisconsin; Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, Texas; Region 7: Iowa, Kansas, Missouri, Nebraska; Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, Wyoming; Region 9: Arizona, California, Hawaii, Nevada; Region 10: Alaska, Idaho, Oregon, Washington. Rates were calculated using the population estimates for 2021, which are estimated as of July 1, 2021 based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Blended Base consists of the blend of Vintage 2020 postcensal population estimates, 2020 Demographic Analysis Estimates, and 2020 Census PL 94-171 Redistricting File (see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2021/methods-statement-v2021.pdf). Rates are based on deaths occurring in the specified week/month and are age-adjusted to the 2000 standard population using the direct method (see https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-08-508.pdf). These rates differ from annual age-adjusted rates, typically presented in NCHS publications based on a full year of data and annualized weekly/monthly age-adjusted rates which have been adjusted to allow comparison with annual rates. Annualization rates presents deaths per year per 100,000 population that would be expected in a year if the observed period specific (weekly/monthly) rate prevailed for a full year. Sub-national death counts between 1-9 are suppressed in accordance with NCHS data confidentiality standards. Rates based on death counts less than 20 are suppressed in accordance with NCHS standards of reliability as specified in NCHS Data Presentation Standards for Proportions (available from: https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf.).
n
Coronavirus (Covid-19) Data in the United States
nytimes.com
openicpsr.org
+4more
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
New York Times, Coronavirus (Covid-19) Data in the United States [Dataset]. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Explore at:
Dataset provided by
New York Times
Description
The New York Times is releasing a series of data files with cumulative counts of coronavirus cases in the United States, at the state and county level, over time. We are compiling this time series data from state and local governments and health departments in an attempt to provide a complete record of the ongoing outbreak.
Since late January, The Times has tracked cases of coronavirus in real time as they were identified after testing. Because of the widespread shortage of testing, however, the data is necessarily limited in the picture it presents of the outbreak.
We have used this data to power our maps and reporting tracking the outbreak, and it is now being made available to the public in response to requests from researchers, scientists and government officials who would like access to the data to better understand the outbreak.
The data begins with the first reported coronavirus case in Washington State on Jan. 21, 2020. We will publish regular updates to the data in this repository.
Leading causes of death, total population, by age group
www150.statcan.gc.ca
ouvert.canada.ca
+1more
Updated Feb 19, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2025). Leading causes of death, total population, by age group [Dataset]. http://doi.org/10.25318/1310039401-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310039401-eng
Dataset updated
Feb 19, 2025
Dataset provided by
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Rank, number of deaths, percentage of deaths, and age-specific mortality rates for the leading causes of death, by age group and sex, 2000 to most recent year.
American Time Use Survey: Daily Activities
kaggle.com
zip
Updated Dec 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). American Time Use Survey: Daily Activities [Dataset]. https://www.kaggle.com/datasets/thedevastator/american-time-use-survey-daily-activities
Explore at:
zip(17763 bytes)Available download formats
Dataset updated
Dec 12, 2023
Authors
The Devastator
Description
American Time Use Survey: Daily Activities

Americans' Daily Activities: Education, Employment, Gender, and Leisure Time

By Throwback Thursday [source]

About this dataset

The American Time Use Survey dataset provides comprehensive information on how individuals in America allocate their time throughout the day. It includes various aspects of daily activities such as education level, age, employment status, gender, number of children, weekly earnings and hours worked. The dataset also includes data on specific activities individuals engage in like sleeping, grooming, housework, food and drink preparation, caring for children, playing with children, job searching, shopping and eating and drinking. Additionally it captures time spent on leisure activities like socializing and relaxing as well as engaging in specific hobbies such as watching television or golfing. The dataset also records the amount of time spent volunteering or running for exercise purposes.

Each entry is organized based on categorical variables such as education level (ranging from lower levels to higher degrees), age (capturing different age brackets), employment status (including employed full-time or part-time), gender (male or female) and the number of children an individual has. Furthermore it provides information regarding an individual's weekly earnings and hours worked.

This extensive dataset aims to provide insights into how Americans prioritize their time across various aspects of their lives. Whether it be focusing on work-related tasks or indulging in recreational activities,it offers a comprehensive look at the allocation of time among different demographic groups within American society.

This dataset can be used for understanding trends in daily activity patterns across demographics groups over multiple years without directly referencing specific dates

How to use the dataset

How to use this dataset: American Time Use Survey - Daily Activities

Welcome to the American Time Use Survey dataset! This dataset provides valuable information on how Americans spend their time on a daily basis. Here's a guide on how to effectively utilize this dataset for your analysis:

Familiarize yourself with the columns:

Education Level: The level of education attained by the individual.

Age: The age of the individual.

Age Range: The age range the individual falls into.

Employment Status: The employment status of the individual.

Gender: The gender of the individual.

Children: The number of children that an individual has.

Weekly Earnings: The amount of money earned by an individual on a weekly basis.

Year: The year in which the data was collected.

Weekly Hours Worked: The number of hours worked by an individual on a weekly basis.

Identify variables related to daily activities: This dataset provides information about various daily activities undertaken by individuals. Some important variables related to daily activities include:

Sleeping

Grooming

Housework

Food & Drink Prep

Caring for Children

Playing with Children

Job Searching …and many more!

Analyze time spent on different activities: This dataset includes numerical values representing time spent in minutes for specific activities such as sleeping, grooming, housework, food and drink preparation, etc. You can use this data to analyze and compare how different groups of individuals allocate their time throughout the day.

Explore demographic factors: In addition to daily activities, this dataset also includes columns such as education level, age range, employment status, gender, and number of children. You can cross-reference these demographic factors with activity data to gain insights into how different population subgroups spend their time differently.

Identify trends and patterns: You can use this dataset to identify trends and patterns in how Americans allocate their time over the years. By analyzing data from different years, you may discover changes in certain activities and how they relate to demographic factors or societal shifts.

Visualize the data: Creating visualizations such as bar graphs, line plots, or pie charts can provide a clear representation of how time is allocated for different activities among various groups of individuals. Visualizations help in understanding the distribution of time spent on different activities and identifying any significant differences or similarities across demographics.

Remember that each column represents a specific variable, whi...
US Age-Standardized Stroke Mortality Rates
kaggle.com
zip
Updated Jan 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). US Age-Standardized Stroke Mortality Rates [Dataset]. https://www.kaggle.com/datasets/thedevastator/us-age-standardized-stroke-mortality-rates-2013
Explore at:
zip(894260 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
The Devastator
Area covered
United States
Description
US Age-Standardized Stroke Mortality Rates (2013-15) by State/County/Gender/Race

Investigating Variations in Rates

By US Open Data Portal, data.gov [source]

About this dataset

This dataset contains the age-standardized stroke mortality rate in the United States from 2013 to 2015, by state/territory, county, gender and race/ethnicity. The data source is the highly respected National Vital Statistics System. The rates are reported as a 3-year average and have been age-standardized. Moreover, county rates are spatially smoothed for further accuracy. The interactive map of heart disease and stroke produced by this dataset provides invaluable information about the geographic disparities in stroke mortality across America at different scales - county, state/territory and national. By using the adjustable filter settings provided in this interactive map, you can quickly explore demographic details such as gender (Male/Female) or race/ethnicity (e.g Non-Hispanic White). Conquer your fear of unknown with evidence! Investigate these locations now to inform meaningful action plans for greater public health resilience in America and find out if strokes remain a threat to our millions of citizens every day! Updated regularly since 2020-02-26, so check it out now!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The US Age-Standardized Stroke Mortality Rates (2013-2015) by State/County/Gender/Race dataset provides valuable insights into stroke mortality rates among adults ages 35 and over in the USA between 2013 and 2015. This dataset contains age-standardized data from the National Vital Statistics System at the state, county, gender, and race level. Use this guide to learn how best use this dataset for your purposes!

Understand the Data

This dataset provides information about stroke mortality rates among adult Americans aged 35+. The data is collected from 2013 to 2015 in three year averages. Even though it is possible to view county level data, spatial smoothing techniques have been applied here. The following columns of data are provided: - Year – The year of the data collection - LocationAbbr – The abbreviation of location where the data was collected
- LocationDesc – A description of this location
- GeographicLevel – Geographic level of granularity where these numbers are recorded * DataSource - source of these statistics * Class - class or group into which these stats fall * Topic - overall topic on which we have stats * Data_Value - age standardized value associated with each row * Data_Value_Unit - units associated with each value * Stratification1– First stratification defined for a given row * Stratification2– Second stratification defined for a given row

Additionally, several other footnotes fields such as ‘Data_value_Type’; ‘Data_Value_Footnote _Symbol’; ‘StratificationCategory1’ & ‘StratificatoinCategory2’ etc may be present accordingly .

## Exploring Correlations

Now that you understand what individual columns mean it should take no time to analyze correlations within different categories using standard statistical methods like linear regressions or boxplots etc. If you want to compare different regions , then you can use LocationAbbr column with locations reduced geographical levels such as State or Region. Alternatively if one wants comparisons across genders then they can refer column labelled Stratifacation1 alongwith their desired values within this

Research Ideas

Creating a visualization to show the relationship between stroke mortality and specific variations in race/ethnicity, gender, and geography.

Comparing two or more states based on their average stroke mortality rate over time.

Building a predictive model that disregards temporal biases to anticipate further changes in stroke mortality for certain communities or entire states across the US

Acknowledgements

If you use this dataset in your research, please credit the original authors. Data Source

License

Unknown License - Please check the dataset description for more information.

Columns

File: csv-1.csv | Column name | Description | |:--...
Mortality rates, by age group
www150.statcan.gc.ca
open.canada.ca
Updated Dec 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Canada, Statistics Canada (2024). Mortality rates, by age group [Dataset]. http://doi.org/10.25318/1310071001-eng
Explore at:
Unique identifier
https://doi.org/10.25318/1310071001-eng
Dataset updated
Dec 4, 2024
Dataset provided by
Government of Canadahttp://www.gg.ca/
Statistics Canadahttps://statcan.gc.ca/en
Area covered
Canada
Description
Number of deaths and mortality rates, by age group, sex, and place of residence, 1991 to most recent year.
d
Johns Hopkins COVID-19 Case Tracker
data.world
kaggle.com
csv, zip
Updated Dec 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Johns Hopkins COVID-19 Case Tracker [Dataset]. https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 3, 2025
Authors
The Associated Press
Time period covered
Jan 22, 2020 - Mar 9, 2023
Area covered
Description
Updates

Notice of data discontinuation: Since the start of the pandemic, AP has reported case and death counts from data provided by Johns Hopkins University. Johns Hopkins University has announced that they will stop their daily data collection efforts after March 10. As Johns Hopkins stops providing data, the AP will also stop collecting daily numbers for COVID cases and deaths. The HHS and CDC now collect and visualize key metrics for the pandemic. AP advises using those resources when reporting on the pandemic going forward.

CDC Weekly case and death counts (national and state level)

CDC County level cases and deaths

HHS New hospital admissions

CDC NowCast COVID variant proportions (national and regional level)

April 9, 2020

The population estimate data for New York County, NY has been updated to include all five New York City counties (Kings County, Queens County, Bronx County, Richmond County and New York County). This has been done to match the Johns Hopkins COVID-19 data, which aggregates counts for the five New York City counties to New York County.

April 20, 2020

Johns Hopkins death totals in the US now include confirmed and probable deaths in accordance with CDC guidelines as of April 14. One significant result of this change was an increase of more than 3,700 deaths in the New York City count. This change will likely result in increases for death counts elsewhere as well. The AP does not alter the Johns Hopkins source data, so probable deaths are included in this dataset as well.

April 29, 2020

The AP is now providing timeseries data for counts of COVID-19 cases and deaths. The raw counts are provided here unaltered, along with a population column with Census ACS-5 estimates and calculated daily case and death rates per 100,000 people. Please read the updated caveats section for more information.

September 1st, 2020

Johns Hopkins is now providing counts for the five New York City counties individually.

February 12, 2021

The Ohio Department of Health recently announced that as many as 4,000 COVID-19 deaths may have been underreported through the state’s reporting system, and that the "daily reported death counts will be high for a two to three-day period."

Because deaths data will be anomalous for consecutive days, we have chosen to freeze Ohio's rolling average for daily deaths at the last valid measure until Johns Hopkins is able to back-distribute the data. The raw daily death counts, as reported by Johns Hopkins and including the backlogged death data, will still be present in the new_deaths column.

February 16, 2021

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

The AP is using data collected by the Johns Hopkins University Center for Systems Science and Engineering as our source for outbreak caseloads and death counts for the United States and globally.

The Hopkins data is available at the county level in the United States. The AP has paired this data with population figures and county rural/urban designations, and has calculated caseload and death rates per 100,000 people. Be aware that caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

This data is from the Hopkins dashboard that is updated regularly throughout the day. Like all organizations dealing with data, Hopkins is constantly refining and cleaning up their feed, so there may be brief moments where data does not appear correctly. At this link, you’ll find the Hopkins daily data reports, and a clean version of their feed.

The AP is updating this dataset hourly at 45 minutes past the hour.

To learn more about AP's data journalism capabilities for publishers, corporations and financial institutions, go here or email kromano@ap.org.

Queries

Use AP's queries to filter the data or to join to other datasets we've made available to help cover the coronavirus pandemic

Filter cases by state here

Rank states by their status as current hotspots. Calculates the 7-day rolling average of new cases per capita in each state: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=481e82a4-1b2f-41c2-9ea1-d91aa4b3b1ac

Find recent hotspots within your state by running a query to calculate the 7-day rolling average of new cases by capita in each county: https://data.world/associatedpress/johns-hopkins-coronavirus-case-tracker/workspace/query?queryid=b566f1db-3231-40fe-8099-311909b7b687&showTemplatePreview=true

Join county-level case data to an earlier dataset released by AP on local hospital capacity here. To find out more about the hospital capacity dataset, see the full details.

Pull the 100 counties with the highest per-capita confirmed cases here

Rank all the counties by the highest per-capita rate of new cases in the past 7 days here. Be aware that because this ranks per-capita caseloads, very small counties may rise to the very top, so take into account raw caseload figures as well.

Interactive

The AP has designed an interactive map to track COVID-19 cases reported by Johns Hopkins.

@(https://datawrapper.dwcdn.net/nRyaf/15/)

Interactive Embed Code

<iframe title="USA counties (2018) choropleth map Mapping COVID-19 cases by county" aria-describedby="" id="datawrapper-chart-nRyaf" src="https://datawrapper.dwcdn.net/nRyaf/10/" scrolling="no" frameborder="0" style="width: 0; min-width: 100% !important;" height="400"></iframe><script type="text/javascript">(function() {'use strict';window.addEventListener('message', function(event) {if (typeof event.data['datawrapper-height'] !== 'undefined') {for (var chartId in event.data['datawrapper-height']) {var iframe = document.getElementById('datawrapper-chart-' + chartId) || document.querySelector("iframe[src*='" + chartId + "']");if (!iframe) {continue;}iframe.style.height = event.data['datawrapper-height'][chartId] + 'px';}}});})();</script>

Caveats

This data represents the number of cases and deaths reported by each state and has been collected by Johns Hopkins from a number of sources cited on their website.

In some cases, deaths or cases of people who've crossed state lines -- either to receive treatment or because they became sick and couldn't return home while traveling -- are reported in a state they aren't currently in, because of state reporting rules.

In some states, there are a number of cases not assigned to a specific county -- for those cases, the county name is "unassigned to a single county"

This data should be credited to Johns Hopkins University's COVID-19 tracking project. The AP is simply making it available here for ease of use for reporters and members.

Caseloads may reflect the availability of tests -- and the ability to turn around test results quickly -- rather than actual disease spread or true infection rates.

Population estimates at the county level are drawn from 2014-18 5-year estimates from the American Community Survey.

The Urban/Rural classification scheme is from the Center for Disease Control and Preventions's National Center for Health Statistics. It puts each county into one of six categories -- from Large Central Metro to Non-Core -- according to population and other characteristics. More details about the classifications can be found here.

Johns Hopkins timeseries data - Johns Hopkins pulls data regularly to update their dashboard. Once a day, around 8pm EDT, Johns Hopkins adds the counts for all areas they cover to the timeseries file. These counts are snapshots of the latest cumulative counts provided by the source on that day. This can lead to inconsistencies if a source updates their historical data for accuracy, either increasing or decreasing the latest cumulative count. - Johns Hopkins periodically edits their historical timeseries data for accuracy. They provide a file documenting all errors in their timeseries files that they have identified and fixed here

Attribution

This data should be credited to Johns Hopkins University COVID-19 tracking project
d
Mass Killings in America, 2006 - present
data.world
csv, zip
Updated Dec 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
Explore at:
zip, csvAvailable download formats
Dataset updated
Dec 1, 2025
Authors
The Associated Press
Time period covered
Jan 1, 2006 - Nov 29, 2025
Area covered

Description
THIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1

OVERVIEW

2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

A total of 229 people died in mass killings in 2019.

The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

One-third of the offenders died at the scene of the killing or soon after, half from suicides.

About this Dataset

The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

This data will be updated periodically and can be used as an ongoing resource to help cover these events.

Using this Dataset

To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

Mass killings by year

Mass shootings by year

To get these counts just for your state:

Filter killings by state

Definition of "mass murder"

Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

Methodology

Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

This project started at USA TODAY in 2012.

Contacts

Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
d
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
catalog.data.gov
data.ct.gov
+2more
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
D
ARCHIVED: COVID-19 Cases and Deaths Summarized by Geography
data.sfgov.org
Updated Sep 11, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health - Population Health Division (2023). ARCHIVED: COVID-19 Cases and Deaths Summarized by Geography [Dataset]. https://data.sfgov.org/COVID-19/ARCHIVED-COVID-19-Cases-and-Deaths-Summarized-by-G/tpyr-dvnc
Explore at:
xml, csv, kml, kmz, application/geo+json, xlsxAvailable download formats
Dataset updated
Sep 11, 2023
Dataset authored and provided by
Department of Public Health - Population Health Division
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A. SUMMARY Medical provider confirmed COVID-19 cases and confirmed COVID-19 related deaths in San Francisco, CA aggregated by several different geographic areas and normalized by 2016-2020 American Community Survey (ACS) 5-year estimates for population data to calculate rate per 10,000 residents.

On September 12, 2021, a new case definition of COVID-19 was introduced that includes criteria for enumerating new infections after previous probable or confirmed infections (also known as reinfections). A reinfection is defined as a confirmed positive PCR lab test more than 90 days after a positive PCR or antigen test. The first reinfection case was identified on December 7, 2021.

Cases and deaths are both mapped to the residence of the individual, not to where they were infected or died. For example, if one was infected in San Francisco at work but lives in the East Bay, those are not counted as SF Cases or if one dies in Zuckerberg San Francisco General but is from another county, that is also not counted in this dataset.

Dataset is cumulative and covers cases going back to 3/2/2020 when testing began.

Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas

B. HOW THE DATASET IS CREATED Addresses from medical data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area. The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a rate which is equal to ([count] / [acs_population]) * 10000) representing the number of cases per 10,000 residents.

C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 7:30 Pacific Time.

D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

Privacy rules in effect To protect privacy, certain rules are in effect: 1. Case counts greater than 0 and less than 10 are dropped - these will be null (blank) values 2. Death counts greater than 0 and less than 10 are dropped - these will be null (blank) values 3. Cases and deaths dropped altogether for areas where acs_population < 1000

Rate suppression in effect where counts lower than 20 Rates are not calculated unless the case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.

A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.

Row included for Citywide case counts, incidence rate, and deaths A single row is included that has the Citywide case counts and incidence rate. This can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling basis.

E. CHANGE LOG
9/11/2023 - data on COVID-19 cases and deaths summarized by geography are no longer being updated. This data is currently through 9/6/2023 and will not include any new data after this date.
4/6/2023 - the State implemented system updates to improve the integrity of historical data.
2/21/2023 - system updates to improve reliability and accuracy of cases data were implemented.
1/31/2023 - updated “acs_population” column to reflect the 2020 Census Bureau American Community Survey (ACS) San Francisco Population estimates.
1/31/2023 - implemented system updates to streamline and improve our geo-coded data, resulting in small shifts in our case and death data by geography.
1/31/2023 - renamed column “last_updated_at” to “data_as_of”.
2/23/2022 - the New Cases Map dashboard began pulling from this dataset. To access Cases by Geography Over Time, please refer to this dataset.
1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented.
7/15/2022 - reinfections added to cases dataset. See section SUMMARY for more information on how reinfections are identified.
4/16/2021 - dataset updated to refresh with a five-day data lag.
T
United States Coronavirus COVID-19 Deaths
tradingeconomics.com
csv, excel, json, xml
Updated Dec 15, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TRADING ECONOMICS (2024). United States Coronavirus COVID-19 Deaths [Dataset]. https://tradingeconomics.com/united-states/coronavirus-deaths
Explore at:
json, xml, csv, excelAvailable download formats
Dataset updated
Dec 15, 2024
Dataset authored and provided by
TRADING ECONOMICS
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 22, 2020 - May 17, 2023
Area covered
United States
Description
United States recorded 1127152 Coronavirus Deaths since the epidemic began, according to the World Health Organization (WHO). In addition, United States reported 103436829 Coronavirus Cases. This dataset includes a chart with historical data for the United States Coronavirus Deaths.
Deaths Involving COVID-19 by Vaccination Status
open.canada.ca
gimi9.com
+1more
csv, docx, html, xlsx
Updated Nov 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Ontario (2025). Deaths Involving COVID-19 by Vaccination Status [Dataset]. https://open.canada.ca/data/dataset/1375bb00-6454-4d3e-a723-4ae9e849d655
Explore at:
docx, csv, html, xlsxAvailable download formats
Dataset updated
Nov 12, 2025
Dataset provided by
Government of Ontariohttps://www.ontario.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Mar 1, 2021 - Nov 12, 2024
Description
This dataset reports the daily reported number of the 7-day moving average rates of Deaths involving COVID-19 by vaccination status and by age group. Learn how the Government of Ontario is helping to keep Ontarians safe during the 2019 Novel Coronavirus outbreak. Effective November 14, 2024 this page will no longer be updated. Information about COVID-19 and other respiratory viruses is available on Public Health Ontario’s interactive respiratory virus tool: https://www.publichealthontario.ca/en/Data-and-Analysis/Infectious-Disease/Respiratory-Virus-Tool Data includes: * Date on which the death occurred * Age group * 7-day moving average of the last seven days of the death rate per 100,000 for those not fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those fully vaccinated * 7-day moving average of the last seven days of the death rate per 100,000 for those vaccinated with at least one booster ##Additional notes As of June 16, all COVID-19 datasets will be updated weekly on Thursdays by 2pm. As of January 12, 2024, data from the date of January 1, 2024 onwards reflect updated population estimates. This update specifically impacts data for the 'not fully vaccinated' category. On November 30, 2023 the count of COVID-19 deaths was updated to include missing historical deaths from January 15, 2020 to March 31, 2023. CCM is a dynamic disease reporting system which allows ongoing update to data previously entered. As a result, data extracted from CCM represents a snapshot at the time of extraction and may differ from previous or subsequent results. Public Health Units continually clean up COVID-19 data, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes and current totals being different from previously reported cases and deaths. Observed trends over time should be interpreted with caution for the most recent period due to reporting and/or data entry lags. The data does not include vaccination data for people who did not provide consent for vaccination records to be entered into the provincial COVaxON system. This includes individual records as well as records from some Indigenous communities where those communities have not consented to including vaccination information in COVaxON. “Not fully vaccinated” category includes people with no vaccine and one dose of double-dose vaccine. “People with one dose of double-dose vaccine” category has a small and constantly changing number. The combination will stabilize the results. Spikes, negative numbers and other data anomalies: Due to ongoing data entry and data quality assurance activities in Case and Contact Management system (CCM) file, Public Health Units continually clean up COVID-19, correcting for missing or overcounted cases and deaths. These corrections can result in data spikes, negative numbers and current totals being different from previously reported case and death counts. Public Health Units report cause of death in the CCM based on information available to them at the time of reporting and in accordance with definitions provided by Public Health Ontario. The medical certificate of death is the official record and the cause of death could be different. Deaths are defined per the outcome field in CCM marked as “Fatal”. Deaths in COVID-19 cases identified as unrelated to COVID-19 are not included in the Deaths involving COVID-19 reported. Rates for the most recent days are subject to reporting lags All data reflects totals from 8 p.m. the previous day. This dataset is subject to change.
Data from: Daymet: Daily Surface Weather Data on a 1-km Grid for North...
data.nasa.gov
cmr.earthdata.nasa.gov
+3more
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 4 R1 [Dataset]. https://data.nasa.gov/dataset/daymet-daily-surface-weather-data-on-a-1-km-grid-for-north-america-version-4-r1-b1d6b
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
North America
Description
This dataset provides Daymet Version 4 R1 data as gridded estimates of daily weather parameters for North America, Hawaii, and Puerto Rico. Daymet variables include the following parameters: minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length. The dataset covers the period from January 1, 1980, to December 31 (or December 30 in leap years) of the most recent full calendar year for the Continental North America and Hawaii spatial regions. Data for Puerto Rico is available starting in 1950. Each subsequent year is processed individually at the close of a calendar year. Daymet variables are provided as individual files, by variable and year, at a 1 km x 1 km spatial resolution and a daily temporal resolution. Areas of Hawaii and Puerto Rico are available as files separate from the continental North America. Data are in a North America Lambert Conformal Conic projection and are distributed in a standardized Climate and Forecast (CF)-compliant netCDF file format. In Version 4 R1, all 2020 and 2021 files were updated to improve predictions especially in high-latitude areas. It was found that input files used for deriving 2020 and 2021 data had, for a significant portion of Canadian weather stations, missing daily variable readings for the month of January. NCEI has corrected issues with the Environment Canada ingest feed which led to the missing readings. The revised 2020 and 2021 Daymet V4 R1 files were derived with new GHCNd inputs. Files outside of 2020 and 2021 have not changed from the previous V4 release.
People shot to death by U.S. police 2017-2024, by race
statista.com
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista, People shot to death by U.S. police 2017-2024, by race [Dataset]. https://www.statista.com/statistics/585152/people-shot-to-death-by-us-police-by-race/
Explore at:
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
Sadly, the trend of fatal police shootings in the United States seems to only be increasing, with a total 1,173 civilians having been shot, 248 of whom were Black, as of December 2024. In 2023, there were 1,164 fatal police shootings. Additionally, the rate of fatal police shootings among Black Americans was much higher than that for any other ethnicity, standing at 6.1 fatal shootings per million of the population per year between 2015 and 2024. Police brutality in the U.S. In recent years, particularly since the fatal shooting of Michael Brown in Ferguson, Missouri in 2014, police brutality has become a hot button issue in the United States. The number of homicides committed by police in the United States is often compared to those in countries such as England, where the number is significantly lower. Black Lives Matter The Black Lives Matter Movement, formed in 2013, has been a vocal part of the movement against police brutality in the U.S. by organizing “die-ins”, marches, and demonstrations in response to the killings of black men and women by police. While Black Lives Matter has become a controversial movement within the U.S., it has brought more attention to the number and frequency of police shootings of civilians.
D
ARCHIVED: COVID-19 Cases by Geography Over Time
data.sfgov.org
csv, xlsx, xml
Updated Oct 24, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health - Population Health Division (2023). ARCHIVED: COVID-19 Cases by Geography Over Time [Dataset]. https://data.sfgov.org/w/d2ef-idww/ikek-yizv?cur=6pe39zMjfCR&from=f5tFBDuJcU8
Explore at:
csv, xml, xlsxAvailable download formats
Dataset updated
Oct 24, 2023
Dataset authored and provided by
Department of Public Health - Population Health Division
License
ODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
Description
A. SUMMARY This dataset contains COVID-19 positive confirmed cases aggregated by several different geographic areas and by day. COVID-19 cases are mapped to the residence of the individual and shown on the date the positive test was collected. In addition, 2016-2020 American Community Survey (ACS) population estimates are included to calculate the cumulative rate per 10,000 residents.

Dataset covers cases going back to 3/2/2020 when testing began. This data may not be immediately available for recently reported cases and data will change to reflect as information becomes available. Data updated daily.

Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas

B. HOW THE DATASET IS CREATED Addresses from the COVID-19 case data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area for a given date.

The 2016-2020 American Community Survey (ACS) population estimates provided by the Census are used to create a cumulative rate which is equal to ([cumulative count up to that date] / [acs_population]) * 10000) representing the number of total cases per 10,000 residents (as of the specified date).

COVID-19 case data undergo quality assurance and other data verification processes and are continually updated to maximize completeness and accuracy of information. This means data may change for previous days as information is updated.

C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 05:00 Pacific Time.

D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

This dataset can be used to track the spread of COVID-19 throughout the city, in a variety of geographic areas. Note that the new cases column in the data represents the number of new cases confirmed in a certain area on the specified day, while the cumulative cases column is the cumulative total of cases in a certain area as of the specified date.

Privacy rules in effect To protect privacy, certain rules are in effect: 1. Any area with a cumulative case count less than 10 are dropped for all days the cumulative count was less than 10. These will be null values. 2. Once an area has a cumulative case count of 10 or greater, that area will have a new row of case data every day following. 3. Cases are dropped altogether for areas where acs_population < 1000 4. Deaths data are not included in this dataset for privacy reasons. The low COVID-19 death rate in San Francisco, along with other publicly available information on deaths, means that deaths data by geography and day is too granular and potentially risky. Read more in our privacy guidelines

Rate suppression in effect where counts lower than 20 Rates are not calculated unless the cumulative case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.

A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.

Rows included for Citywide case counts Rows are included for the Citywide case counts and incidence rate every day. These Citywide rows can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling bases.

Related dataset See the dataset of the most recent cumulative counts for all geographic areas here: https://data.sfgov.org/COVID-19/COVID-19-Cases-and-Deaths-Summarized-by-Geography/tpyr-dvnc

E. CHANGE LOG
9/11/2023 - data on COVID-19 cases by geography over time are no longer being updated. This data is currently through 9/6/2023 and will not include any new data after this date.
4/6/2023 - the State implemented system updates to improve the integrity of historical data.
2/21/2023 - system updates to improve reliability and accuracy of cases data were implemented.
1/31/2023 - updated “acs_population” column to reflect the 2020 Census Bureau American Community Survey (ACS) San Francisco Population estimates.
1/31/2023 - implemented system updates to streamline and improve our geo-coded data, resulting in small shifts in our case data by geography.
1/31/2023 - renamed column “last_updated_at” to “data_as_of”.
1/31/2023 - removed the “multipolygon” column. To access the multipolygon geometry column for each geography unit, refer to COVID-19 Cases and Deaths Summarized by Geography.
1/22/2022 - system updates to improve timeliness and accuracy of cases and deaths data were implemented.
4/16/2021 - dataset updated to refresh with a five-day data lag.
C
Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version...
data.cnra.ca.gov
daac.ornl.gov
+1more
html, pdf, png
Updated Mar 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Aeronautics and Space Administration (2025). Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 3 [Dataset]. https://data.cnra.ca.gov/dataset/daymet-daily-surface-weather-data-on-a-1-km-grid-for-north-america
Explore at:
pdf, png, htmlAvailable download formats
Dataset updated
Mar 24, 2025
Dataset authored and provided by
National Aeronautics and Space Administration
Area covered
North America
Description
This dataset provides Daymet Version 3 model output data as gridded estimates of daily weather parameters for North America and Hawaii: including Canada, Mexico, the United States of America, and Puerto Rico. The island areas of Hawaii and Puerto Rico are available as files separate from the continental land mass. Daymet output variables include the following parameters: minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length. The dataset covers the period from January 1, 1980 to December 31 of the most recent full calendar year. Each subsequent year is processed individually at the close of a calendar year. Daymet variables are continuous surfaces provided as individual files, by variable and year, at a 1-km x 1-km spatial resolution and a daily temporal resolution. Data are in a Lambert Conformal Conic projection for North America and are distributed in a netCDF file format compliant with Climate and Forecast (CF) metadata conventions (version 1.6).
F
American English General Conversation Speech Dataset for ASR
futurebeeai.com
wav
Updated Aug 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). American English General Conversation Speech Dataset for ASR [Dataset]. https://www.futurebeeai.com/dataset/speech-dataset/general-conversation-english-usa
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Area covered
United States
Dataset funded by
FutureBeeAI
Description
Introduction
Welcome to the US English General Conversation Speech Dataset — a rich, linguistically diverse corpus purpose-built to accelerate the development of English speech technologies. This dataset is designed to train and fine-tune ASR systems, spoken language understanding models, and generative voice AI tailored to real-world US English communication.
Curated by FutureBeeAI, this 30 hours dataset offers unscripted, spontaneous two-speaker conversations across a wide array of real-life topics. It enables researchers, AI developers, and voice-first product teams to build robust, production-grade English speech models that understand and respond to authentic American accents and dialects.
Speech Data
The dataset comprises 30 hours of high-quality audio, featuring natural, free-flowing dialogue between native speakers of US English. These sessions range from informal daily talks to deeper, topic-specific discussions, ensuring variability and context richness for diverse use cases.
•Participant Diversity:
•
Speakers: 60 verified native US English speakers from FutureBeeAI’s contributor community.

•
Regions: Representing various provinces of United States of America to ensure dialectal diversity and demographic balance.

•
Demographics: A balanced gender ratio (60% male, 40% female) with participant ages ranging from 18 to 70 years.

•Recording Details:
•
Conversation Style: Unscripted, spontaneous peer-to-peer dialogues.

•
Duration: Each conversation ranges from 15 to 60 minutes.

•
Audio Format: Stereo WAV files, 16-bit depth, recorded at 16kHz sample rate.

•
Environment: Quiet, echo-free settings with no background noise.

Topic Diversity
The dataset spans a wide variety of everyday and domain-relevant themes. This topic diversity ensures the resulting models are adaptable to broad speech contexts.
•Sample Topics Include:
•Family & Relationships
•Food & Recipes
•Education & Career
•Healthcare Discussions
•Social Issues
•Technology & Gadgets
•Travel & Local Culture
•Shopping & Marketplace Experiences, and many more.
Transcription
Each audio file is paired with a human-verified, verbatim transcription available in JSON format.
•Transcription Highlights:
•Speaker-segmented dialogues
•Time-coded utterances
•Non-speech elements (pauses, laughter, etc.)
•High transcription accuracy, achieved through double QA pass, average WER < 5%
These transcriptions are production-ready, enabling seamless integration into ASR model pipelines or conversational AI workflows.
Metadata
The dataset comes with granular metadata for both speakers and recordings:
•
Speaker Metadata: Age, gender, accent, dialect, state/province, and participant ID.

•
Recording Metadata: Topic, duration, audio format, device type, and sample rate.

Such metadata helps developers fine-tune model training and supports use-case-specific filtering or demographic analysis.
Usage and Applications
This dataset is a versatile resource for multiple English speech and language AI applications:
•
ASR Development: Train accurate speech-to-text systems for US English.

•
Voice Assistants: Build smart assistants capable of understanding natural American conversations.

<div style="margin-top:10px; margin-bottom: 10px; padding-left: 30px; display: flex; gap: 16px; align-items:
z
Counts of Measles reported in UNITED STATES OF AMERICA: 1888-2002
zenodo.org
data.niaid.nih.gov
+1more
json, xml, zip
Updated Jun 3, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Measles reported in UNITED STATES OF AMERICA: 1888-2002 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.14189004
Explore at:
xml, json, zipAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/us.14189004
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 15, 1888 - Dec 28, 2002
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
n
GHRSST L3C NOAA/ACSPO GOES-18/ABI West America Region Sea Surface...
podaac.jpl.nasa.gov
cmr.earthdata.nasa.gov
+1more
html
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
PO.DAAC, GHRSST L3C NOAA/ACSPO GOES-18/ABI West America Region Sea Surface Temperature v2.90 dataset [Dataset]. http://doi.org/10.5067/GHG18-3C290
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.5067/GHG18-3C290
Dataset provided by
PO.DAAC
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jun 7, 2022 - Present
Variables measured
SEA SURFACE TEMPERATURE, SEA SURFACE TEMPERATURE
Description
The G18-ABI-L3C-ACSPO-v2.90 dataset produced by the NOAA ACSPO system is used to derive Subskin and Depth Sea Surface Temperature (SST) from the ABI sensor onboard the G18 satellite. NOAA’s G18 (aka GOES-T before launch) was launched on March 1, 2022, replacing G17 as GOES West in Jan'2023. It is the third satellite in the US GOES–R Series, the Western Hemisphere’s most sophisticated weather-observing and environmental-monitoring system. The ABI is the primary instrument on the GOES-R Series for imaging Earth’s weather, oceans, and environment.

The G18-ABI-L3C-ACSPO-v2.90 dataset is a gridded version of the G18-ABI-L2P-ACSPO-v2.90 dataset (https://podaac.jpl.nasa.gov/dataset/G18-ABI-L2P-ACSPO-v2.90). The L3C (Level 3 Collated) outputs 24 hourly granules per day, with a daily volume of 0.7 GB/day. Valid SSTs are found over oceans, sea, lakes or rivers, with fill values reported elsewhere. All valid SSTs in L3C are recommended for users, although data over internal waters may not have enough in situ data to be adequately validated. Per GDS2 specifications, two additional Sensor-Specific Error Statistics layers (bias and standard deviation) are reported in each pixel with valid SST.

The ACSPO G18/ABI L3C product is validated against iQuam in situ data (Xu and Ignatov, 2014) and continuously monitored in the NOAA SQUAM system (Dash et al, 2010). The NRT files are replaced with Delayed Mode (DM) files, with a latency of ~2-months. File names remain unchanged, and DM vs NRT can be identified by different time stamps and global attributes inside the files (MERRA instead of GFS for atmospheric profiles, and same day CMC L4 analyses in DM instead of one-day delayed in NRT processing).
Collections (from American Folklife Center)
zenodo.org
csv
Updated Nov 13, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Patrick Egan; Patrick Egan (2024). Collections (from American Folklife Center) [Dataset]. http://doi.org/10.5281/zenodo.14140570
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14140570
Dataset updated
Nov 13, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Patrick Egan; Patrick Egan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
2019
Description
Dataset originally created 03/01/2019 UPDATE: Packaged on 04/18/2019 UPDATE: Edited README on 04/18/2019

I. About this Data Set This data set is a snapshot of work that is ongoing as a collaboration between Kluge Fellow in Digital Studies, Patrick Egan and an intern at the Library of Congress in the American Folklife Center. It contains a combination of metadata from various collections that contain audio recordings of Irish traditional music. The development of this dataset is iterative, and it integrates visualizations that follow the key principles of trust and approachability. The project, entitled, “Connections In Sound” invites you to use and re-use this data.

The text available in the Items dataset is generated from multiple collections of audio material that were discovered at the American Folklife Center. Each instance of a performance was listed and “sets” or medleys of tunes or songs were split into distinct instances in order to allow machines to read each title separately (whilst still noting that they were part of a group of tunes). The work of the intern was then reviewed before publication, and cross-referenced with the tune index at www.irishtune.info. The Items dataset consists of just over 1000 rows, with new data being added daily in a separate file.

The collections dataset contains at least 37 rows of collections that were located by a reference librarian at the American Folklife Center. This search was complemented by searches of the collections by the scholar both on the internet at https://catalog.loc.gov and by using card catalogs.

Updates to these datasets will be announced and published as the project progresses.

II. What’s included? This data set includes:

The Items Dataset – a .CSV containing Media Note, OriginalFormat, On Website, Collection Ref, Missing In Duplication, Collection, Outside Link, Performer, Solo/multiple, Sub-item, type of tune, Tune, Position, Location, State, Date, Notes/Composer, Potential Linked Data, Instrument, Additional Notes, Tune Cleanup. This .CSV is the direct export of the Items Google Spreadsheet

III. How Was It Created? These data were created by a Kluge Fellow in Digital Studies and an intern on this program over the course of three months. By listening, transcribing, reviewing, and tagging audio recordings, these scholars improve access and connect sounds in the American Folklife Collections by focusing on Irish traditional music. Once transcribed and tagged, information in these datasets is reviewed before publication.

IV. Data Set Field Descriptions

IV

a) Collections dataset field descriptions

ItemId – this is the identifier for the collection that was found at the AFC
Viewed – if the collection has been viewed, or accessed in any way by the researchers.
On LOC – whether or not there are audio recordings of this collection available on the Library of Congress website.
On Other Website – if any of the recordings in this collection are available elsewhere on the internet
Original Format – the format that was used during the creation of the recordings that were found within each collection
Search – this indicates the type of search that was performed in order that resulted in locating recordings and collections within the AFC
Collection – the official title for the collection as noted on the Library of Congress website
State – The primary state where recordings from the collection were located
Other States – The secondary states where recordings from the collection were located
Era / Date – The decade or year associated with each collection
Call Number – This is the official reference number that is used to locate the collections, both in the urls used on the Library website, and in the reference search for catalog cards (catalog cards can be searched at this address: https://memory.loc.gov/diglib/ihas/html/afccards/afccards-home.html)
Finding Aid Online? – Whether or not a finding aid is available for this collection on the internet

b) Items dataset field descriptions

id – the specific identification of the instance of a tune, song or dance within the dataset
Media Note – Any information that is included with the original format, such as identification, name of physical item, additional metadata written on the physical item
Original Format – The physical format that was used when recording each specific performance. Note: this field is used in order to calculate the number of physical items that were created in each collection such as 32 wax cylinders.
On Webste? – Whether or not each instance of a performance is available on the Library of Congress website
Collection Ref – The official reference number of the collection
Missing In Duplication – This column marks if parts of some recordings had been made available on other websites, but not all of the recordings were included in duplication (see recordings from Philadelphia Céilí Group on Villanova University website)
Collection – The official title of the collection given by the American Folklife Center
Outside Link – If recordings are available on other websites externally
Performer – The name of the contributor(s)
Solo/multiple – This field is used to calculate the amount of solo performers vs group performers in each collection
Sub-item – In some cases, physical recordings contained extra details, the sub-item column was used to denote these details
Type of item – This column describes each individual item type, as noted by performers and collectors
Item – The item title, as noted by performers and collectors. If an item was not described, it was entered as “unidentified”
Position – The position on the recording (in some cases during playback, audio cassette player counter markers were used)
Location – Local address of the recording
State – The state where the recording was made
Date – The date that the recording was made
Notes/Composer – The stated composer or source of the item recorded
Potential Linked Data – If items may be linked to other recordings or data, this column was used to provide examples of potential relationships between them
Instrument – The instrument(s) that was used during the performance
Additional Notes – Notes about the process of capturing, transcribing and tagging recordings (for researcher and intern collaboration purposes)
Tune Cleanup – This column was used to tidy each item so that it could be read by machines, but also so that spelling mistakes from the Item column could be corrected, and as an aid to preserving iterations of the editing process

V. Rights statement The text in this data set was created by the researcher and intern and can be used in many different ways under creative commons with attribution. All contributions to Connections In Sound are released into the public domain as they are created. Anyone is free to use and re-use this data set in any way they want, provided reference is given to the creators of these datasets.

VI. Creator and Contributor Information

Creator: Connections In Sound

Contributors: Library of Congress Labs

VII. Contact Information Please direct all questions and comments to Patrick Egan via www.twitter.com/drpatrickegan or via his website at www.patrickegan.org. You can also get in touch with the Library of Congress Labs team via LC-Labs@loc.gov.

Facebook

Twitter

Click to copy link

Link copied

Cite

Centers for Disease Control and Prevention (2025). Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residence [Dataset]. https://catalog.data.gov/dataset/provisional-covid-19-death-counts-rates-and-percent-of-total-deaths-by-jurisdiction-of-res

Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residence

Explore at:

5 scholarly articles cite this dataset (View in Google Scholar)

Dataset updated

Sep 26, 2025

Dataset provided by

Centers for Disease Control and Preventionhttp://www.cdc.gov/

Description

This file contains COVID-19 death counts, death rates, and percent of total deaths by jurisdiction of residence. The data is grouped by different time periods including 3-month period, weekly, and total (cumulative since January 1, 2020). United States death counts and rates include the 50 states, plus the District of Columbia and New York City. New York state estimates exclude New York City. Puerto Rico is included in HHS Region 2 estimates. Deaths with confirmed or presumed COVID-19, coded to ICD–10 code U07.1. Number of deaths reported in this file are the total number of COVID-19 deaths received and coded as of the date of analysis and may not represent all deaths that occurred in that period. Counts of deaths occurring before or after the reporting period are not included in the file. Data during recent periods are incomplete because of the lag in time between when the death occurred and when the death certificate is completed, submitted to NCHS and processed for reporting purposes. This delay can range from 1 week to 8 weeks or more, depending on the jurisdiction and cause of death. Death counts should not be compared across states. Data timeliness varies by state. Some states report deaths on a daily basis, while other states report deaths weekly or monthly. The ten (10) United States Department of Health and Human Services (HHS) regions include the following jurisdictions. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont; Region 2: New Jersey, New York, New York City, Puerto Rico; Region 3: Delaware, District of Columbia, Maryland, Pennsylvania, Virginia, West Virginia; Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, Tennessee; Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, Wisconsin; Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, Texas; Region 7: Iowa, Kansas, Missouri, Nebraska; Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, Wyoming; Region 9: Arizona, California, Hawaii, Nevada; Region 10: Alaska, Idaho, Oregon, Washington. Rates were calculated using the population estimates for 2021, which are estimated as of July 1, 2021 based on the Blended Base produced by the US Census Bureau in lieu of the April 1, 2020 decennial population count. The Blended Base consists of the blend of Vintage 2020 postcensal population estimates, 2020 Demographic Analysis Estimates, and 2020 Census PL 94-171 Redistricting File (see https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2020-2021/methods-statement-v2021.pdf). Rates are based on deaths occurring in the specified week/month and are age-adjusted to the 2000 standard population using the direct method (see https://www.cdc.gov/nchs/data/nvsr/nvsr70/nvsr70-08-508.pdf). These rates differ from annual age-adjusted rates, typically presented in NCHS publications based on a full year of data and annualized weekly/monthly age-adjusted rates which have been adjusted to allow comparison with annual rates. Annualization rates presents deaths per year per 100,000 population that would be expected in a year if the observed period specific (weekly/monthly) rate prevailed for a full year. Sub-national death counts between 1-9 are suppressed in accordance with NCHS data confidentiality standards. Rates based on death counts less than 20 are suppressed in accordance with NCHS standards of reliability as specified in NCHS Data Presentation Standards for Proportions (available from: https://www.cdc.gov/nchs/data/series/sr_02/sr02_175.pdf.).

Clear search

Close search

Google apps

Main menu

Provisional COVID-19 death counts, rates, and percent of total deaths, by...

Coronavirus (Covid-19) Data in the United States

Leading causes of death, total population, by age group

American Time Use Survey: Daily Activities

American Time Use Survey: Daily Activities

Americans' Daily Activities: Education, Employment, Gender, and Leisure Time

About this dataset

How to use the dataset

US Age-Standardized Stroke Mortality Rates

US Age-Standardized Stroke Mortality Rates (2013-15) by State/County/Gender/Race

Investigating Variations in Rates

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Understand the Data

Research Ideas

Acknowledgements

License

Columns

Mortality rates, by age group

Johns Hopkins COVID-19 Case Tracker

Updates

- Johns Hopkins has reconciled Ohio's historical deaths data with the state.

Overview

Queries

Interactive

Interactive Embed Code

Caveats

Attribution

Mass Killings in America, 2006 - present

OVERVIEW

About this Dataset

Using this Dataset

Definition of "mass murder"

Methodology

Contacts

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

ARCHIVED: COVID-19 Cases and Deaths Summarized by Geography

United States Coronavirus COVID-19 Deaths

Deaths Involving COVID-19 by Vaccination Status

Data from: Daymet: Daily Surface Weather Data on a 1-km Grid for North...

People shot to death by U.S. police 2017-2024, by race

ARCHIVED: COVID-19 Cases by Geography Over Time

Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version...

American English General Conversation Speech Dataset for ASR

Introduction

Speech Data

Topic Diversity

Transcription

Metadata

Usage and Applications

Counts of Measles reported in UNITED STATES OF AMERICA: 1888-2002

GHRSST L3C NOAA/ACSPO GOES-18/ABI West America Region Sea Surface...

Collections (from American Folklife Center)

Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residenceSee More Versions

Provisional COVID-19 death counts, rates, and percent of total deaths, by jurisdiction of residence