41 datasets found

Death in the United States
kaggle.com
zip
Updated Aug 3, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Centers for Disease Control and Prevention (2017). Death in the United States [Dataset]. https://www.kaggle.com/datasets/cdc/mortality
Explore at:
zip(766333584 bytes)Available download formats
Dataset updated
Aug 3, 2017
Dataset authored and provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Area covered
United States
Description
Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.

It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.

Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.

Overview

This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.

A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.

All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.

Project ideas

The CDC's mortality data was the basis of a widely publicized paper, by Anne Case and Nobel prize winner Angus Deaton, arguing that middle-aged whites are dying at elevated rates. One of the criticisms against the paper is that it failed to properly account for the exact ages within the broad bins available through the CDC's WONDER tool. What do these results look like with exact/not-binned age data?

Similarly, how sensitive are the mortality trends being discussed in the news to the choice of bin-widths?

As noted above, the data preparation process could have introduced errors. Can you find any discrepancies compared to the aggregate metrics on WONDER? If so, please let me know in the forums!

WONDER is cited in numerous economics, sociology, and public health research papers. Can you find any papers whose conclusions would be altered if they used the exact data available here rather than binned data from Wonder?

Differences from the first version of the dataset

This version of the dataset was prepared in a completely different many. This has allowed us to provide a much larger volume of data and ensure that codes are available for every field.

We've replaced the batch of sql files with a single JSON per year. Kaggle's platform currently offer's better support for JSON files, and this keeps the number of files manageable.

A tutorial kernel providing a quick introduction to the new format is available here.

Lastly, I apologize if the transition has interrupted anyone's work! If need be, you can still download v1.
d
[Archived] COVID-19 Deaths by Population Characteristics Over Time
catalog.data.gov
data.sfgov.org
+1more
Updated Mar 29, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). [Archived] COVID-19 Deaths by Population Characteristics Over Time [Dataset]. https://catalog.data.gov/dataset/covid-19-deaths-by-population-characteristics-over-time
Explore at:
Dataset updated
Mar 29, 2025
Dataset provided by
data.sfgov.org
Description
As of July 2nd, 2024 the COVID-19 Deaths by Population Characteristics Over Time dataset has been retired. This dataset is archived and will no longer update. We will be publishing a cumulative deaths by population characteristics dataset that will update moving forward. A. SUMMARY This dataset shows San Francisco COVID-19 deaths by population characteristics and by date. This data may not be immediately available for recently reported deaths. Data updates as more information becomes available. Because of this, death totals for previous days may increase or decrease. More recent data is less reliable. Population characteristics are subgroups, or demographic cross-sections, like age, race, or gender. The City tracks how deaths have been distributed among different subgroups. This information can reveal trends and disparities among groups. B. HOW THE DATASET IS CREATED As of January 1, 2023, COVID-19 deaths are defined as persons who had COVID-19 listed as a cause of death or a significant condition contributing to their death on their death certificate. This definition is in alignment with the California Department of Public Health and the national Council of State and Territorial Epidemiologists. Death certificates are maintained by the California Department of Public Health. Data on the population characteristics of COVID-19 deaths are from: Case reports Medical records Electronic lab reports Death certificates Data are continually updated to maximize completeness of information and reporting on San Francisco COVID-19 deaths. To protect resident privacy, we summarize COVID-19 data by only one characteristic at a time. Data are not shown until cumulative citywide deaths reach five or more. Data notes on each population characteristic type is listed below. Race/ethnicity * We include all race/ethnicity categories that are collected for COVID-19 cases. Gender * The City collects information on gender identity using these guidelines. C. UPDATE PROCESS Updates automatically at 06:30 and 07:30 AM Pacific Time on Wednesday each week. Dataset will not update on the business day following any federal holiday. D. HOW TO USE THIS DATASET Population estimates are only available for age groups and race/ethnicity categories. San Francisco population estimates for race/ethnicity and age groups can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS). This dataset includes many different types of characteristics. Filter the “Characteristic Type” column to explore a topic area. Then, the “Characteristic Group” column shows each group or category within that topic area and the number of deaths on each date. New deaths are the count of deaths within that characteristic group on that specific date. Cumulative deaths are the running total of all San Francisco COVID-19 deaths in that characteristic group up to the date listed. This data may not be immediately available for more recent deaths. Data updates as more information becomes available. To explore data on the total number of deaths, use the COVID-19 Deaths Over Time dataset. E. CHANGE LOG 9/11/2023 - on this date, we began using an updated definition of a COVID-19 death to align with the California Department o
d
Mass Killings in America, 2006 - present
data.world
csv, zip
Updated Sep 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
Explore at:
zip, csvAvailable download formats
Dataset updated
Sep 22, 2025
Authors
The Associated Press
Time period covered
Jan 1, 2006 - Aug 1, 2025
Area covered

Description
THIS DATASET WAS LAST UPDATED AT 8:11 AM EASTERN ON SEPT. 22

OVERVIEW

2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

A total of 229 people died in mass killings in 2019.

The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

One-third of the offenders died at the scene of the killing or soon after, half from suicides.

About this Dataset

The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

This data will be updated periodically and can be used as an ongoing resource to help cover these events.

Using this Dataset

To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

Mass killings by year

Mass shootings by year

To get these counts just for your state:

Filter killings by state

Definition of "mass murder"

Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

Methodology

Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

This project started at USA TODAY in 2012.

Contacts

Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
g
Centers for Disease Control and Prevention (CDC), Weekly Flu Report, USA,...
geocommons.com
Updated May 29, 2008
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data (2008). Centers for Disease Control and Prevention (CDC), Weekly Flu Report, USA, Week ending 5.17.2008 [Dataset]. http://geocommons.com/search.html
Explore at:
Dataset updated
May 29, 2008
Dataset provided by
Centers for Disease Control and Prevention (CDC)
data
Description
This datasets displays the summary of the Geographic Spread of Influenza throughout the United States for the week ending on 5.17.2008. Each state is given a flu Status that is dependent on the spread of influenza throughout the state for the particular week. There are five levels of status that a state can obtain. They are: * No Activity: No laboratory-confirmed cases of influenza and no reported increase in the number of cases of ILI. * Sporadic: Small numbers of laboratory-confirmed influenza cases or a single laboratory-confirmed influenza outbreak has been reported, but there is no increase in cases of ILI. * Local: Outbreaks of influenza or increases in ILI cases and recent laboratory-confirmed influenza in a single region of the state. * Regional: Outbreaks of influenza or increases in ILI and recent laboratory confirmed influenza in at least 2 but less than half the regions of the state. * Widespread: Outbreaks of influenza or increases in ILI cases and recent laboratory-confirmed influenza in at least half the regions of the state. The information comes from the Centers for Disease Control and Prevention (CDC) To see more information about how this status is determined see http://www.cdc.gov/flu/weekly/fluactivity.htm for a full explanation of their system.
d
Data from: What We Eat In America (WWEIA) Database
catalog.data.gov
agdatacommons.nal.usda.gov
Updated Apr 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). What We Eat In America (WWEIA) Database [Dataset]. https://catalog.data.gov/dataset/what-we-eat-in-america-wweia-database-f7f35
Explore at:
Dataset updated
Apr 21, 2025
Dataset provided by
Agricultural Research Service
Area covered
United States
Description
What We Eat in America (WWEIA) is the dietary intake interview component of the National Health and Nutrition Examination Survey (NHANES). WWEIA is conducted as a partnership between the U.S. Department of Agriculture (USDA) and the U.S. Department of Health and Human Services (DHHS). Two days of 24-hour dietary recall data are collected through an initial in-person interview, and a second interview conducted over the telephone within three to 10 days. Participants are given three-dimensional models (measuring cups and spoons, a ruler, and two household spoons) and/or USDA's Food Model Booklet (containing drawings of various sizes of glasses, mugs, bowls, mounds, circles, and other measures) to estimate food amounts. WWEIA data are collected using USDA's dietary data collection instrument, the Automated Multiple-Pass Method (AMPM). The AMPM is a fully computerized method for collecting 24-hour dietary recalls either in-person or by telephone. For each 2-year data release cycle, the following dietary intake data files are available: Individual Foods File - Contains one record per food for each survey participant. Foods are identified by USDA food codes. Each record contains information about when and where the food was consumed, whether the food was eaten in combination with other foods, amount eaten, and amounts of nutrients provided by the food. Total Nutrient Intakes File - Contains one record per day for each survey participant. Each record contains daily totals of food energy and nutrient intakes, daily intake of water, intake day of week, total number foods reported, and whether intake was usual, much more than usual or much less than usual. The Day 1 file also includes salt use in cooking and at the table; whether on a diet to lose weight or for other health-related reason and type of diet; and frequency of fish and shellfish consumption (examinees one year or older, Day 1 file only). DHHS is responsible for the sample design and data collection, and USDA is responsible for the survey’s dietary data collection methodology, maintenance of the databases used to code and process the data, and data review and processing. USDA also funds the collection and processing of Day 2 dietary intake data, which are used to develop variance estimates and calculate usual nutrient intakes. Resources in this dataset:Resource Title: What We Eat In America (WWEIA) main web page. File Name: Web Page, url: https://www.ars.usda.gov/northeast-area/beltsville-md-bhnrc/beltsville-human-nutrition-research-center/food-surveys-research-group/docs/wweianhanes-overview/ Contains data tables, research articles, documentation data sets and more information about the WWEIA program. (Link updated 05/13/2020)
z
Counts of Malaria reported in UNITED STATES OF AMERICA: 1951-2017
zenodo.org
json, xml, zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Malaria reported in UNITED STATES OF AMERICA: 1951-2017 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.61462000
Explore at:
zip, xml, jsonAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/us.61462000
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 30, 1951 - Dec 30, 2017
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
p
Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952
tycho.pitt.edu
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952 [Dataset]. https://www.tycho.pitt.edu/dataset/US.67924001
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1888 - 1952
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Z
Counts of Infantile paralysis reported in UNITED STATES OF AMERICA:...
data.niaid.nih.gov
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burke, Donald (2024). Counts of Infantile paralysis reported in UNITED STATES OF AMERICA: 1923-1932 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452428
Explore at:
Dataset updated
Jun 3, 2024
Dataset provided by
Burke, Donald
Cross, Anne
Van Panhuis, Willem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
z
Counts of Inflammatory disease of liver reported in UNITED STATES OF...
zenodo.org
json, xml, zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Inflammatory disease of liver reported in UNITED STATES OF AMERICA: 1983-1983 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.128241005
Explore at:
zip, xml, jsonAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/us.128241005
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 17, 1983 - Jul 23, 1983
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
b
North American Rail Network Lines
geodata.bts.gov
geodata.colorado.gov
+7more
Updated Jul 1, 1995
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Department of Transportation: ArcGIS Online (1995). North American Rail Network Lines [Dataset]. https://geodata.bts.gov/datasets/usdot::north-american-rail-network-lines/about
Explore at:
Dataset updated
Jul 1, 1995
Dataset authored and provided by
U.S. Department of Transportation: ArcGIS Online
Area covered

Description
The North American Rail Network (NARN) Rail Lines dataset was created in 2016 and was updated on July 18, 2025 from the Federal Railroad Administration (FRA) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). The NARN Rail Lines dataset is a database that provides ownership, trackage rights, type, passenger, STRACNET, and geographic reference for North America's railway system at 1:24,000 or better within the United States. The data set covers all 50 States, the District of Columbia, Mexico, and Canada. A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1528950
N
American Fork, UT annual median income by work experience and sex dataset:...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). American Fork, UT annual median income by work experience and sex dataset: Aged 15+, 2010-2023 (in 2023 inflation-adjusted dollars) // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/a4fd39a1-f4ce-11ef-8577-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Utah, American Fork
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time
Measurement technique
The data presented in this dataset is derived from the U.S. Census Bureau American Community Survey (ACS) 5-Year Estimates. The dataset covers the years 2010 to 2023, representing 14 years of data. To analyze income differences between genders (male and female), we conducted an initial data analysis and categorization. Subsequently, we adjusted these figures for inflation using the Consumer Price Index retroactive series (R-CPI-U-RS) based on current methodologies. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents median income data over a decade or more for males and females categorized by Total, Full-Time Year-Round (FT), and Part-Time (PT) employment in American Fork. It showcases annual income, providing insights into gender-specific income distributions and the disparities between full-time and part-time work. The dataset can be utilized to gain insights into gender-based pay disparity trends and explore the variations in income for male and female individuals.

Key observations: Insights from 2023

Based on our analysis ACS 2019-2023 5-Year Estimates, we present the following observations: - All workers, aged 15 years and older: In American Fork, the median income for all workers aged 15 years and older, regardless of work hours, was $54,439 for males and $27,701 for females.
These income figures highlight a substantial gender-based income gap in American Fork. Women, regardless of work hours, earn 51 cents for each dollar earned by men. This significant gender pay gap, approximately 49%, underscores concerning gender-based income inequality in the city of American Fork.
- Full-time workers, aged 15 years and older: In American Fork, among full-time, year-round workers aged 15 years and older, males earned a median income of $71,091, while females earned $50,384, leading to a 29% gender pay gap among full-time workers. This illustrates that women earn 71 cents for each dollar earned by men in full-time roles. This analysis indicates a widening gender pay gap, showing a substantial income disparity where women, despite working full-time, face a more significant wage discrepancy compared to men in the same roles.
Surprisingly, the gender pay gap percentage was higher across all roles, including non-full-time employment, for women compared to men. This suggests that full-time employment offers a more equitable income scenario for women compared to other employment patterns in American Fork.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. All incomes have been adjusting for inflation and are presented in 2023-inflation-adjusted dollars.

Gender classifications include:

Male

Female

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Variables / Data Columns

Year: This column presents the data year. Expected values are 2010 to 2023

Male Total Income: Annual median income, for males regardless of work hours

Male FT Income: Annual median income, for males working full time, year-round

Male PT Income: Annual median income, for males working part time

Female Total Income: Annual median income, for females regardless of work hours

Female FT Income: Annual median income, for females working full time, year-round

Female PT Income: Annual median income, for females working part time

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for American Fork median household income by race. You can refer the same here
d
Travel Danger
data.world
csv, zip
Updated Apr 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
State Department Travel Warnings (2025). Travel Danger [Dataset]. https://data.world/travelwarnings/travel-danger
Explore at:
zip, csvAvailable download formats
Dataset updated
Apr 19, 2025
Authors
State Department Travel Warnings
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
2008 - 2016
Description
This dataset contains data and analysis from the article Do State Department Travel Warnings Reflect Real Danger?

Key findings

On the whole, there is a significant relationship between the number of American deaths abroad per capita and the number of travel warnings a country receives

Mexico, Mali, and Israel have been targeted by the most travel warnings in recent years, but Americans are more likely to be killed in Thailand, Pakistan, and the Philippines

Several countries with relatively high rates of American death have not been issued a single travel warning in ~7 years, including Belize, Guyana, and Guatemala

Several countries with relatively low rates of American death have been issued a relatively high number of travel warnings in ~7 years, including Israel, Turkey, and Saudi Arabia

Overall, countries subject to travel warnings do not see notable declines in American visitors in the 6 months after a warning is issued

Data sources

BTSOriginUS_10_09_to_06_16.csv Air Carrier Statistics Database export, Bureau of Transportation Statistics

SDamerican_deaths_abroad_10_09_to_06_16.csv U.S. State Department

SDwarnings_10_09to06_16.csv U.S. State Department via Internet Archive

Charts / data visualizations

https://cdn-images-1.medium.com/max/800/1*moPQYbzXW0Jx6AFhY8VKWQ.png" alt="alt text">

https://cdn-images-1.medium.com/max/800/1*s1OX6ke8wlHhK4VubpVWcg.png" alt="alt text">

https://cdn-images-1.medium.com/max/800/1*JwvpqE4YIuYfx2UEqCp9nA.png" alt="alt text">

https://cdn-images-1.medium.com/max/800/1*LHLsJ0IzLsSlNl0UN8XrAw.png" alt="alt text">

https://cdn-images-1.medium.com/max/800/1*l0sqn7voWyMCbwoQ2OKGfg.png" alt="alt text">
o
Counties - United States of America
public.opendatasoft.com
data.smartidf.services
+1more
csv, excel, geojson +1
Updated Jun 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Counties - United States of America [Dataset]. https://public.opendatasoft.com/explore/dataset/georef-united-states-of-america-county/
Explore at:
excel, json, geojson, csvAvailable download formats
Dataset updated
Jun 6, 2024
License
https://en.wikipedia.org/wiki/Public_domainhttps://en.wikipedia.org/wiki/Public_domain
Area covered
United States
Description
This dataset is part of the Geographical repository maintained by Opendatasoft. This dataset contains data for counties and equivalent entities in United States of America. The primary legal divisions of most states are termed counties. In Louisiana, these divisions are known as parishes. In Alaska, which has no counties, the equivalent entities are the organized boroughs, city and boroughs, municipalities, and for the unorganized area, census areas. The latter are delineated cooperatively for statistical purposes by the State of Alaska and the Census Bureau. In four states (Maryland, Missouri, Nevada, and Virginia), there are one or more incorporated places that are independent of any county organization and thus constitute primary divisions of their states. These incorporated places are known as independent cities and are treated as equivalent entities for purposes of data presentation. The District of Columbia and Guam have no primary divisions, and each area is considered an equivalent entity for purposes of data presentation. The Census Bureau treats the following entities as equivalents of counties for purposes of data presentation: Municipios in Puerto Rico, Districts and Islands in American Samoa, Municipalities in the Commonwealth of the Northern Mariana Islands, and Islands in the U.S. Virgin Islands. The entire area of the United States, Puerto Rico, and the Island Areas is covered by counties or equivalent entities.Processors and tools are using this data. Enhancements Add ISO 3166-3 codes. Simplify geometries to provide better performance across the services. Add administrative hierarchy.
N
Dead Lake Township, Minnesota Age Group Population Dataset: A complete...
neilsberg.com
csv, json
Updated Sep 16, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2023). Dead Lake Township, Minnesota Age Group Population Dataset: A complete breakdown of Dead Lake township age demographics from 0 to 85 years, distributed across 18 age groups [Dataset]. https://www.neilsberg.com/research/datasets/70219958-3d85-11ee-9abe-0aa64bf2eeb2/
Explore at:
json, csvAvailable download formats
Dataset updated
Sep 16, 2023
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Minnesota, Dead Lake Township
Variables measured
Population Under 5 Years, Population over 85 years, Population Between 5 and 9 years, Population Between 10 and 14 years, Population Between 15 and 19 years, Population Between 20 and 24 years, Population Between 25 and 29 years, Population Between 30 and 34 years, Population Between 35 and 39 years, Population Between 40 and 44 years, and 9 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the age groups. For age groups we divided it into roughly a 5 year bucket for ages between 0 and 85. For over 85, we aggregated data into a single group for all ages. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the Dead Lake township population distribution across 18 age groups. It lists the population in each age group along with the percentage population relative of the total population for Dead Lake township. The dataset can be utilized to understand the population distribution of Dead Lake township by age. For example, using this dataset, we can identify the largest age group in Dead Lake township.

Key observations

The largest age group in Dead Lake Township, Minnesota was for the group of age 65-69 years with a population of 96 (15.02%), according to the 2021 American Community Survey. At the same time, the smallest age group in Dead Lake Township, Minnesota was the 25-29 years with a population of 7 (1.10%). Source: U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2017-2021 5-Year Estimates.

Age groups:

Under 5 years

5 to 9 years

10 to 14 years

15 to 19 years

20 to 24 years

25 to 29 years

30 to 34 years

35 to 39 years

40 to 44 years

45 to 49 years

50 to 54 years

55 to 59 years

60 to 64 years

65 to 69 years

70 to 74 years

75 to 79 years

80 to 84 years

85 years and over

Variables / Data Columns

Age Group: This column displays the age group in consideration

Population: The population for the specific age group in the Dead Lake township is shown in this column.

% of Total Population: This column displays the population of each age group as a proportion of Dead Lake township total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Dead Lake township Population by Age. You can refer the same here
p
Counts of Dysentery reported in UNITED STATES OF AMERICA: 1942-1948
tycho.pitt.edu
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Dysentery reported in UNITED STATES OF AMERICA: 1942-1948 [Dataset]. https://www.tycho.pitt.edu/dataset/US.111939009
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1942 - 1948
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Counties
catalog.data.gov
datasets.ai
+4more
Updated Jul 17, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
United States Census Bureau (USCB) (Point of Contact) (2025). Counties [Dataset]. https://catalog.data.gov/dataset/counties2
Explore at:
Dataset updated
Jul 17, 2025
Dataset provided by
United States Census Bureauhttp://census.gov/
Description
The Counties dataset was updated on October 31, 2023 from the United States Census Bureau (USCB) and is part of the U.S. Department of Transportation (USDOT)/Bureau of Transportation Statistics (BTS) National Transportation Atlas Database (NTAD). This resource is a member of a series. The TIGER/Line shapefiles and related database files (.dbf) are an extract of selected geographic and cartographic information from the U.S. Census Bureau's Master Address File / Topologically Integrated Geographic Encoding and Referencing (MAF/TIGER) Database (MTDB). The MTDB represents a seamless national file with no overlaps or gaps between parts, however, each TIGER/Line shapefile is designed to stand alone as an independent data set, or they can be combined to cover the entire nation. The primary legal divisions of most states are termed counties. In Louisiana, these divisions are known as parishes. In Alaska, which has no counties, the equivalent entities are the organized boroughs, city and boroughs, municipalities, and for the unorganized area, census areas. The latter are delineated cooperatively for statistical purposes by the State of Alaska and the Census Bureau. In four states (Maryland, Missouri, Nevada, and Virginia), there are one or more incorporated places that are independent of any county organization and thus constitute primary divisions of their states. These incorporated places are known as independent cities and are treated as equivalent entities for purposes of data presentation. The District of Columbia and Guam have no primary divisions, and each area is considered an equivalent entity for purposes of data presentation. The Census Bureau treats the following entities as equivalents of counties for purposes of data presentation: Municipios in Puerto Rico, Districts and Islands in American Samoa, Municipalities in the Commonwealth of the Northern Mariana Islands, and Islands in the U.S. Virgin Islands. The entire area of the United States, Puerto Rico, and the Island Areas is covered by counties or equivalent entities. The boundaries for counties and equivalent entities are mostly as of January 1, 2023, as reported through the Census Bureau's Boundary and Annexation Survey (BAS). A data dictionary, or other source of attribute information, is accessible at https://doi.org/10.21949/1529015
N
American Fork, UT annual income distribution by work experience and gender...
neilsberg.com
csv, json
Updated Feb 27, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). American Fork, UT annual income distribution by work experience and gender dataset: Number of individuals ages 15+ with income, 2023 // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/ba9308fa-f4ce-11ef-8577-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 27, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Utah, American Fork
Variables measured
Income for Male Population, Income for Female Population, Income for Male Population working full time, Income for Male Population working part time, Income for Female Population working full time, Income for Female Population working part time, Number of males working full time for a given income bracket, Number of males working part time for a given income bracket, Number of females working full time for a given income bracket, Number of females working part time for a given income bracket
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To portray the number of individuals for both the genders (Male and Female), within each income bracket we conducted an initial analysis and categorization of the American Community Survey data. Households are categorized, and median incomes are reported based on the self-identified gender of the head of the household. For additional information about these estimations, please contact us via email at research@neilsberg.com
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset presents the detailed breakdown of the count of individuals within distinct income brackets, categorizing them by gender (men and women) and employment type - full-time (FT) and part-time (PT), offering valuable insights into the diverse income landscapes within American Fork. The dataset can be utilized to gain insights into gender-based income distribution within the American Fork population, aiding in data analysis and decision-making..

Key observations

Employment patterns: Within American Fork, among individuals aged 15 years and older with income, there were 12,169 men and 10,456 women in the workforce. Among them, 7,760 men were engaged in full-time, year-round employment, while 3,803 women were in full-time, year-round roles.

Annual income under $24,999: Of the male population working full-time, 6.07% fell within the income range of under $24,999, while 7.18% of the female population working full-time was represented in the same income bracket.

Annual income above $100,000: 34.82% of men in full-time roles earned incomes exceeding $100,000, while 11.89% of women in full-time positions earned within this income bracket.

Refer to the research insights for more key observations on more income brackets ( Annual income under $24,999, Annual income between $25,000 and $49,999, Annual income between $50,000 and $74,999, Annual income between $75,000 and $99,999 and Annual income above $100,000) and employment types (full-time year-round and part-time)

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Income brackets:

$1 to $2,499 or loss

$2,500 to $4,999

$5,000 to $7,499

$7,500 to $9,999

$10,000 to $12,499

$12,500 to $14,999

$15,000 to $17,499

$17,500 to $19,999

$20,000 to $22,499

$22,500 to $24,999

$25,000 to $29,999

$30,000 to $34,999

$35,000 to $39,999

$40,000 to $44,999

$45,000 to $49,999

$50,000 to $54,999

$55,000 to $64,999

$65,000 to $74,999

$75,000 to $99,999

$100,000 or more

Variables / Data Columns

Income Bracket: This column showcases 20 income brackets ranging from $1 to $100,000+..

Full-Time Males: The count of males employed full-time year-round and earning within a specified income bracket

Part-Time Males: The count of males employed part-time and earning within a specified income bracket

Full-Time Females: The count of females employed full-time year-round and earning within a specified income bracket

Part-Time Females: The count of females employed part-time and earning within a specified income bracket

Employment type classifications include:

Full-time, year-round: A full-time, year-round worker is a person who worked full time (35 or more hours per week) and 50 or more weeks during the previous calendar year.

Part-time: A part-time worker is a person who worked less than 35 hours per week during the previous calendar year.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for American Fork median household income by race. You can refer the same here
N
Day, New York Population Breakdown By Race (Excluding Ethnicity) Dataset:...
neilsberg.com
csv, json
Updated Feb 21, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Neilsberg Research (2025). Day, New York Population Breakdown By Race (Excluding Ethnicity) Dataset: Population Counts and Percentages for 7 Racial Categories as Identified by the US Census Bureau // 2025 Edition [Dataset]. https://www.neilsberg.com/research/datasets/756b02b6-ef82-11ef-9e71-3860777c1fe6/
Explore at:
csv, jsonAvailable download formats
Dataset updated
Feb 21, 2025
Dataset authored and provided by
Neilsberg Research
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
New York, Day, United States
Variables measured
Asian Population, Black Population, White Population, Some other race Population, Two or more races Population, American Indian and Alaska Native Population, Asian Population as Percent of Total Population, Black Population as Percent of Total Population, White Population as Percent of Total Population, Native Hawaiian and Other Pacific Islander Population, and 4 more
Measurement technique
The data presented in this dataset is derived from the latest U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates. To measure the two variables, namely (a) population and (b) population as a percentage of the total population, we initially analyzed and categorized the data for each of the racial categories idetified by the US Census Bureau. It is ensured that the population estimates used in this dataset pertain exclusively to the identified racial categories, and do not rely on any ethnicity classification. For further information regarding these estimates, please feel free to reach out to us via email at research@neilsberg.com.
Dataset funded by
Neilsberg Research
Description
About this dataset

Context

The dataset tabulates the population of Day town by race. It includes the population of Day town across racial categories (excluding ethnicity) as identified by the Census Bureau. The dataset can be utilized to understand the population distribution of Day town across relevant racial categories.

Key observations

The percent distribution of Day town population by race (across all racial categories recognized by the U.S. Census Bureau): 92.57% are white, 0.83% are Black or African American, 0.24% are American Indian and Alaska Native, 0.35% are Native Hawaiian and other Pacific Islander and 6.01% are multiracial.

Content

When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.

Racial categories include:

White

Black or African American

American Indian and Alaska Native

Asian

Native Hawaiian and Other Pacific Islander

Some other race

Two or more races (multiracial)

Variables / Data Columns

Race: This column displays the racial categories (excluding ethnicity) for the Day town

Population: The population of the racial category (excluding ethnicity) in the Day town is shown in this column.

% of Total Population: This column displays the percentage distribution of each race as a proportion of Day town total population. Please note that the sum of all percentages may not equal one due to rounding of values.

Good to know

Margin of Error

Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.

Custom data

If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.

Inspiration

Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.

Recommended for further research

This dataset is a part of the main dataset for Day town Population by Race & Ethnicity. You can refer the same here
z
Counts of Gonorrhea reported in UNITED STATES OF AMERICA: 1972-2017
zenodo.org
json, xml, zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Gonorrhea reported in UNITED STATES OF AMERICA: 1972-2017 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.15628003
Explore at:
json, xml, zipAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/us.15628003
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jan 2, 1972 - Dec 30, 2017
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
w
College Football All-Americans Rankings Dataset
winsipedia.com
html
Updated Sep 15, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Winsipedia (2025). College Football All-Americans Rankings Dataset [Dataset]. https://winsipedia.com/ranking/all-americans
Explore at:
htmlAvailable download formats
Dataset updated
Sep 15, 2025
Dataset authored and provided by
Winsipedia
License
https://winsipedia.com/termshttps://winsipedia.com/terms
Variables measured
All-Americans
Measurement technique
Statistical analysis of college football performance data
Description
Comprehensive dataset of college football teams ranked by all-americans. Includes historical data, statistics, and performance metrics for NCAA Division I FBS teams.

Facebook

Twitter

Click to copy link

Link copied

Cite

Centers for Disease Control and Prevention (2017). Death in the United States [Dataset]. https://www.kaggle.com/datasets/cdc/mortality

Death in the United States

Learn more about the leading causes of death from 2005-2015

Explore at:

zip(766333584 bytes)Available download formats

Dataset updated

Aug 3, 2017

Dataset authored and provided by

Centers for Disease Control and Preventionhttp://www.cdc.gov/

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Area covered

United States

Description

Every year the CDC releases the country’s most detailed report on death in the United States under the National Vital Statistics Systems. This mortality dataset is a record of every death in the country for 2005 through 2015, including detailed information about causes of death and the demographic background of the deceased.

It's been said that "statistics are human beings with the tears wiped off." This is especially true with this dataset. Each death record represents somebody's loved one, often connected with a lifetime of memories and sometimes tragically too short.

Putting the sensitive nature of the topic aside, analyzing mortality data is essential to understanding the complex circumstances of death across the country. The US Government uses this data to determine life expectancy and understand how death in the U.S. differs from the rest of the world. Whether you’re looking for macro trends or analyzing unique circumstances, we challenge you to use this dataset to find your own answers to one of life’s great mysteries.

Overview

This dataset is a collection of CSV files each containing one year's worth of data and paired JSON files containing the code mappings, plus an ICD 10 code set. The CSVs were reformatted from their original fixed-width file formats using information extracted from the CDC's PDF manuals using this script. Please note that this process may have introduced errors as the text extracted from the pdf is not a perfect match. If you have any questions or find errors in the preparation process, please leave a note in the forums. We hope to publish additional years of data using this method soon.

A more detailed overview of the data can be found here. You'll find that the fields are consistent within this time window, but some of data codes change every few years. For example, the 113_cause_recode entry 069 only covers ICD codes (I10,I12) in 2005, but by 2015 it covers (I10,I12,I15). When I post data from years prior to 2005, expect some of the fields themselves to change as well.

All data comes from the CDC’s National Vital Statistics Systems, with the exception of the Icd10Code, which are sourced from the World Health Organization.

Project ideas

The CDC's mortality data was the basis of a widely publicized paper, by Anne Case and Nobel prize winner Angus Deaton, arguing that middle-aged whites are dying at elevated rates. One of the criticisms against the paper is that it failed to properly account for the exact ages within the broad bins available through the CDC's WONDER tool. What do these results look like with exact/not-binned age data?
Similarly, how sensitive are the mortality trends being discussed in the news to the choice of bin-widths?
As noted above, the data preparation process could have introduced errors. Can you find any discrepancies compared to the aggregate metrics on WONDER? If so, please let me know in the forums!
WONDER is cited in numerous economics, sociology, and public health research papers. Can you find any papers whose conclusions would be altered if they used the exact data available here rather than binned data from Wonder?

Differences from the first version of the dataset

This version of the dataset was prepared in a completely different many. This has allowed us to provide a much larger volume of data and ensure that codes are available for every field.
We've replaced the batch of sql files with a single JSON per year. Kaggle's platform currently offer's better support for JSON files, and this keeps the number of files manageable.
A tutorial kernel providing a quick introduction to the new format is available here.
Lastly, I apologize if the transition has interrupted anyone's work! If need be, you can still download v1.

Clear search

Close search

Google apps

Main menu

Death in the United States

Overview

Project ideas

Differences from the first version of the dataset

[Archived] COVID-19 Deaths by Population Characteristics Over Time

Mass Killings in America, 2006 - present

OVERVIEW

About this Dataset

Using this Dataset

Definition of "mass murder"

Methodology

Contacts

Centers for Disease Control and Prevention (CDC), Weekly Flu Report, USA,...

Data from: What We Eat In America (WWEIA) Database

Counts of Malaria reported in UNITED STATES OF AMERICA: 1951-2017

Counts of Smallpox reported in UNITED STATES OF AMERICA: 1888-1952

Counts of Infantile paralysis reported in UNITED STATES OF AMERICA:...

Counts of Inflammatory disease of liver reported in UNITED STATES OF...

North American Rail Network Lines

American Fork, UT annual median income by work experience and sex dataset:...

About this dataset

Content

Inspiration

Recommended for further research

Travel Danger

Key findings

Data sources

Charts / data visualizations

Counties - United States of America

Dead Lake Township, Minnesota Age Group Population Dataset: A complete...

About this dataset

Content

Inspiration

Recommended for further research

Counts of Dysentery reported in UNITED STATES OF AMERICA: 1942-1948

Counties

American Fork, UT annual income distribution by work experience and gender...

About this dataset

Content

Inspiration

Recommended for further research

Day, New York Population Breakdown By Race (Excluding Ethnicity) Dataset:...

About this dataset

Content

Inspiration

Recommended for further research

Counts of Gonorrhea reported in UNITED STATES OF AMERICA: 1972-2017

College Football All-Americans Rankings Dataset

Death in the United StatesSee More Versions

Learn more about the leading causes of death from 2005-2015

Overview

Project ideas

Differences from the first version of the dataset

Death in the United States