54 datasets found

Number of missing persons files U.S. 2024, by race
statista.com
Updated Aug 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Number of missing persons files U.S. 2024, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/
Explore at:
Dataset updated
Aug 14, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2024
Area covered
United States
Description
In 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
Number of missing person files U.S. 1990-2023
statista.com
Updated Nov 25, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of missing person files U.S. 1990-2023 [Dataset]. https://www.statista.com/statistics/240401/number-of-missing-person-files-in-the-us-since-1990/
Explore at:
Dataset updated
Nov 25, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Area covered
United States
Description
In 2023, the number of missing person files in the United States equaled 563,389 cases, an increase from 2021 which had the lowest number of missing person files in the U.S. since 1990.
Data from: COVID-19 Case Surveillance Public Use Data with Geography
data.cdc.gov
healthdata.gov
+5more
csv, xlsx, xml
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data with Geography [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data-with-Ge/n8mc-b4w4
Explore at:
csv, xlsx, xmlAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors.

Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 33 data element restricted access dataset.

The following apply to the public use datasets and the restricted access dataset:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data are suppressed to protect individual privacy.
Datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensure that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by public health jurisdictions using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19. Current versions of these case definitions are available at: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/. All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for lab-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. States and territories continue to use this form.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.

Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question "Was the individual hospitalized?" where the possible answer choices include "Yes," "No," or "Unknown," the blank value is recoded to "Missing" because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race, ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<11 COVID-19 case records with a given values). Suppression includes low frequency combinations of case month, geographic characteristics (county and state of residence), and demographic characteristics (sex, age group, race, and ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These and other COVID-19 data are available from multiple public locations: COVID Data Tracker; United States COVID-19 Cases and Deaths by State; COVID-19 Vaccination Reporting Data Systems; and COVID-19 Death Data and Resources.

Notes:

March 1, 2022: The "COVID-19 Case Surveillance Public Use Data with Geography" will be updated on a monthly basis.

April 7, 2022: An adjustment was made to CDC’s cleaning algorithm for COVID-19 line level case notification data. An assumption in CDC's algorithm led to misclassifying deaths that were not COVID-19 related. The algorithm has since been revised, and this dataset update reflects corrected individual level information about death status for all cases collected to date.

June 25, 2024: An adjustment
A
‘Missing Migrants Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Missing Migrants Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-missing-migrants-dataset-c736/2e62d69f/?iid=008-234&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Missing Migrants Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jmataya/missingmigrants on 14 February 2022.

--- Dataset description provided by original source is as follows ---

About the Missing Migrants Data

This data is sourced from the International Organization for Migration. The data is part of a specific project called the Missing Migrants Project which tracks deaths of migrants, including refugees , who have gone missing along mixed migration routes worldwide. The research behind this project began with the October 2013 tragedies, when at least 368 individuals died in two shipwrecks near the Italian island of Lampedusa. Since then, Missing Migrants Project has developed into an important hub and advocacy source of information that media, researchers, and the general public access for the latest information.

Where is the data from?

Missing Migrants Project data are compiled from a variety of sources. Sources vary depending on the region and broadly include data from national authorities, such as Coast Guards and Medical Examiners; media reports; NGOs; and interviews with survivors of shipwrecks. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. In other cases, media reports are used. IOM and UNHCR also regularly coordinate on such data to ensure consistency. Data on the U.S./Mexico border are compiled based on data from U.S. county medical examiners and sheriff’s offices, as well as media reports for deaths occurring on the Mexico side of the border. Estimates within Mexico and Central America are based primarily on media and year-end government reports. Data on the Bay of Bengal are drawn from reports by UNHCR and NGOs. In the Horn of Africa, data are obtained from media and NGOs. Data for other regions is drawn from a combination of sources, including media and grassroots organizations. In all regions, Missing Migrants Projectdata represents minimum estimates and are potentially lower than in actuality.

Updated data and visuals can be found here: https://missingmigrants.iom.int/

Who is included in Missing Migrants Project data?

IOM defines a migrant as any person who is moving or has moved across an international border or within a State away from his/her habitual place of residence, regardless of

(1) the person’s legal status; (2) whether the movement is voluntary or involuntary; (3) what the causes for the movement are; or (4) what the length of the stay is.[1]

Missing Migrants Project counts migrants who have died or gone missing at the external borders of states, or in the process of migration towards an international destination. The count excludes deaths that occur in immigration detention facilities, during deportation, or after forced return to a migrant’s homeland, as well as deaths more loosely connected with migrants’ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. This approach is chosen because deaths that occur at physical borders and while en route represent a more clearly definable category, and inform what migration routes are most dangerous. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, rather tracked as a distinct category.

How complete is the data on dead and missing migrants?

Data on fatalities during the migration process are challenging to collect for a number of reasons, most stemming from the irregular nature of migratory journeys on which deaths tend to occur. For one, deaths often occur in remote areas on routes chosen with the explicit aim of evading detection. Countless bodies are never found, and rarely do these deaths come to the attention of authorities or the media. Furthermore, when deaths occur at sea, frequently not all bodies are recovered - sometimes with hundreds missing from one shipwreck - and the precise number of missing is often unknown. In 2015, over 50 per cent of deaths recorded by the Missing Migrants Project refer to migrants who are presumed dead and whose bodies have not been found, mainly at sea.

Data are also challenging to collect as reporting on deaths is poor, and the data that does exist are highly scattered. Few official sources are collecting data systematically. Many counts of death rely on media as a source. Coverage can be spotty and incomplete. In addition, the involvement of criminal actors in incidents means there may be fear among survivors to report deaths and some deaths may be actively covered-up. The irregular immigration status of many migrants, and at times their families as well, also impedes reporting of missing persons or deaths.

The varying quality and comprehensiveness of data by region in attempting to estimate deaths globally may exaggerate the share of deaths that occur in some regions, while under-representing the share occurring in others.

What can be understood through this data?

The available data can give an indication of changing conditions and trends related to migration routes and the people travelling on them, which can be relevant for policy making and protection plans. Data can be useful to determine the relative risks of irregular migration routes. For example, Missing Migrants Project data show that despite the increase in migrant flows through the eastern Mediterranean in 2015, the central Mediterranean remained the more deadly route. In 2015, nearly two people died out of every 100 travellers (1.85%) crossing the Central route, as opposed to one out of every 1,000 that crossed from Turkey to Greece (0.095%). From the data, we can also get a sense of whether groups like women and children face additional vulnerabilities on migration routes.

However, it is important to note that because of the challenges in data collection for the missing and dead, basic demographic information on the deceased is rarely known. Often migrants in mixed migration flows do not carry appropriate identification. When bodies are found it may not be possible to identify them or to determine basic demographic information. In the data compiled by Missing Migrants Project, sex of the deceased is unknown in over 80% of cases. Region of origin has been determined for the majority of the deceased. Even this information is at times extrapolated based on available information – for instance if all survivors of a shipwreck are of one origin it was assumed those missing also came from the same region.

The Missing Migrants Project dataset includes coordinates for where incidents of death took place, which indicates where the risks to migrants may be highest. However, it should be noted that all coordinates are estimates.

Why collect data on missing and dead migrants?

By counting lives lost during migration, even if the result is only an informed estimate, we at least acknowledge the fact of these deaths. What before was vague and ill-defined is now a quantified tragedy that must be addressed. Politically, the availability of official data is important. The lack of political commitment at national and international levels to record and account for migrant deaths reflects and contributes to a lack of concern more broadly for the safety and well-being of migrants, including asylum-seekers. Further, it drives public apathy, ignorance, and the dehumanization of these groups.

Data are crucial to better understand the profiles of those who are most at risk and to tailor policies to better assist migrants and prevent loss of life. Ultimately, improved data should contribute to efforts to better understand the causes, both direct and indirect, of fatalities and their potential links to broader migration control policies and practices.

Counting and recording the dead can also be an initial step to encourage improved systems of identification of those who die. Identifying the dead is a moral imperative that respects and acknowledges those who have died. This process can also provide a some sense of closure for families who may otherwise be left without ever knowing the fate of missing loved ones.

Identification and tracing of the dead and missing

As mentioned above, the challenge remains to count the numbers of dead and also identify those counted. Globally, the majority of those who die during migration remain unidentified. Even in cases in which a body is found identification rates are low. Families may search for years or a lifetime to find conclusive news of their loved one. In the meantime, they may face psychological, practical, financial, and legal problems.

Ultimately Missing Migrants Project would like to see that every unidentified body, for which it is possible to recover, is adequately “managed”, analysed and tracked to ensure proper documentation, traceability and dignity. Common forensic protocols and standards should be agreed upon, and used within and between States. Furthermore, data relating to the dead and missing should be held in searchable and open databases at local, national and international levels to facilitate identification.

For more in-depth analysis and discussion of the numbers of missing and dead migrants around the world, and the challenges involved in identification and tracing, read our two reports on the issue, Fatal Journeys: Tracking Lives Lost during Migration (2014) and Fatal Journeys Volume 2, Identification and Tracing of Dead and Missing Migrants

Content

The data set records
d
Mass Killings in America, 2006 - present
data.world
csv, zip
Updated Aug 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Associated Press (2025). Mass Killings in America, 2006 - present [Dataset]. https://data.world/associatedpress/mass-killings-public
Explore at:
zip, csvAvailable download formats
Dataset updated
Aug 27, 2025
Authors
The Associated Press
Time period covered
Jan 1, 2006 - Aug 1, 2025
Area covered

Description
THIS DATASET WAS LAST UPDATED AT 8:11 PM EASTERN ON AUG. 26

OVERVIEW

2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.

In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.

A total of 229 people died in mass killings in 2019.

The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.

One-third of the offenders died at the scene of the killing or soon after, half from suicides.

About this Dataset

The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.

The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.

This data will be updated periodically and can be used as an ongoing resource to help cover these events.

Using this Dataset

To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:

Mass killings by year

Mass shootings by year

To get these counts just for your state:

Filter killings by state

Definition of "mass murder"

Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.

This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”

Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.

Methodology

Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.

Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.

In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.

Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.

Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.

This project started at USA TODAY in 2012.

Contacts

Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
c
Data from: Global Land Cover Mapping and Estimation Yearly 30 m V001
s.cnmilf.com
data.staging.idas-ds1.appdat.jsc.nasa.gov
+1more
Updated Jun 28, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
LP DAAC;BU/EE/LCSC (2025). Global Land Cover Mapping and Estimation Yearly 30 m V001 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/global-land-cover-mapping-and-estimation-yearly-30-m-v001-80e06
Explore at:
Dataset updated
Jun 28, 2025
Dataset provided by
LP DAAC;BU/EE/LCSC
Description
NASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) Global Land Cover Mapping and Estimation (GLanCE) annual 30 meter (m) Version 1 data product provides global land cover and land cover change data derived from Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI). These maps provide the user community with land cover type, land cover change, metrics characterizing the magnitude and seasonality of greenness of each pixel, and the magnitude of change. GLanCE data products will be provided using a set of seven continental grids that use Lambert Azimuthal Equal Area projections parameterized to minimize distortion for each continent. Currently, North America, South America, Europe, and Oceania are available. This dataset is useful for a wide range of applications, including ecosystem, climate, and hydrologic modeling; monitoring the response of terrestrial ecosystems to climate change; carbon accounting; and land management. The GLanCE data product provides seven layers: the land cover class, the estimated day of year of change, integer identifier for class in previous year, median and amplitude of the Enhanced Vegetation Index (EVI2) in the year, rate of change in EVI2, and the change in EVI2 median from previous year to current year. A low-resolution browse image representing EVI2 amplitude is also available for each granule.Known Issues Version 1.0 of the data set does not include Quality Assurance, Leaf Type or Leaf Phenology. These layers are populated with fill values. These layers will be included in future releases of the data product. * Science Data Set (SDS) values may be missing, or of lower quality, at years when land cover change occurs. This issue is a by-product of the fact that Continuous Change Detection and Classification (CCDC) does not fit models or provide synthetic reflectance values during short periods of time between time segments. * The accuracy of mapping results varies by land cover class and geography. Specifically, distinguishing between shrubs and herbaceous cover is challenging at high latitudes and in arid and semi-arid regions. Hence, the accuracy of shrub cover, herbaceous cover, and to some degree bare cover, is lower than for other classes. * Due to the combined effects of large solar zenith angles, short growing seasons, lower availability of high-resolution imagery to support training data, the representation of land cover at land high latitudes in the GLanCE product is lower than in mid latitudes. * Shadows and large variation in local zenith angles decrease the accuracy of the GLanCE product in regions with complex topography, especially at high latitudes. * Mapping results may include artifacts from variation in data density in overlap zones between Landsat scenes relative to mapping results in non-overlap zones. * Regions with low observation density due to cloud cover, especially in the tropics, and/or poor data density (e.g. Alaska, Siberia, West Africa) have lower map quality. * Artifacts from the Landsat 7 Scan Line Corrector failure are occasionally evident in the GLanCE map product. High proportions of missing data in regions with snow and ice at high elevations result in missing data in the GLanCE SDSs.* The GlanCE data product tends to modestly overpredict developed land cover in arid regions.
F
Native American Children Facial Image Dataset for Facial Recognition
futurebeeai.com
wav
Updated Aug 1, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
FutureBee AI (2022). Native American Children Facial Image Dataset for Facial Recognition [Dataset]. https://www.futurebeeai.com/dataset/image-dataset/facial-images-minor-native-american
Explore at:
wavAvailable download formats
Dataset updated
Aug 1, 2022
Dataset provided by
FutureBeeAI
Authors
FutureBee AI
License
https://www.futurebeeai.com/policies/ai-data-license-agreementhttps://www.futurebeeai.com/policies/ai-data-license-agreement
Dataset funded by
FutureBeeAI
Description
Introduction
The Native American Children Facial Image Dataset is a thoughtfully curated collection designed to support the development of advanced facial recognition systems, biometric identity verification, age estimation tools, and child-specific AI models. This dataset enables researchers and developers to build highly accurate, inclusive, and ethically sourced AI solutions for real-world applications.
Facial Image Data
The dataset includes over 1000 high-resolution image sets of children under the age of 18. Each participant contributes approximately 15 unique facial images, captured to reflect natural variations in appearance and context.
Diversity and Representation
•
Geographic Coverage: Children from USA, Canada, Mexico and more

•
Age Group: All participants are minors, with a wide age spread across childhood and adolescence.

•
Gender Balance: Includes both boys and girls, representing a balanced gender distribution.

•
File Formats: Images are available in JPEG and HEIC formats.

Quality and Image Conditions
To ensure robust model training and generalizability, images are captured under varied natural conditions:
•
Lighting: A mix of lighting setups, including indoor, outdoor, bright, and low-light scenarios.

•
Backgrounds: Diverse backgrounds—plain, natural, and everyday environments—are included to promote realism.

•
Capture Devices: All photos are taken using modern mobile devices, ensuring high resolution and sharp detail.

Metadata
Each child’s image set is paired with detailed, structured metadata, enabling granular control and filtering during model training:
•Unique Participant ID
•File Name
•Age
•Gender
•Country
•Demographic Attributes
•File Format
This metadata is essential for applications that require demographic awareness, such as region-specific facial recognition or bias mitigation in AI models.
Applications
This dataset is ideal for a wide range of computer vision use cases, including:
•
Facial Recognition: Improving identification accuracy across diverse child demographics.

•
KYC and Identity Verification: Enabling more inclusive onboarding processes for child-specific platforms.

•
Biometric Systems: Supporting child-focused identity verification in education, healthcare, or travel.

•
Age Estimation: Training AI models to estimate age ranges of children from facial features.

•
Child Safety Models: Assisting in missing child identification or online content moderation.

•
Generative AI Training: Creating more representative synthetic data using real-world diverse inputs.

Ethical Collection and Data Security
We maintain the highest ethical and security standards throughout the data lifecycle:
•
Guardian Consent: Every participant’s guardian provided informed, written consent, clearly outlining the dataset’s use cases.

•
Privacy-First Approach: Personally identifiable information is not shared. Only anonymized metadata is included.

•
Secure Storage: All data is
f
ReferenceUSA Historical Business Datasets
arizona.figshare.com
Updated Aug 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
University of Arizona Libraries (2024). ReferenceUSA Historical Business Datasets [Dataset]. http://doi.org/10.25422/azu.data.14515541.v2
Explore at:
Unique identifier
https://doi.org/10.25422/azu.data.14515541.v2
Dataset updated
Aug 6, 2024
Dataset provided by
University of Arizona Research Data Repository
Authors
University of Arizona Libraries
License
http://rightsstatements.org/vocab/InC/1.0/http://rightsstatements.org/vocab/InC/1.0/
Description
Dataset available only to University of Arizona affiliates. To obtain access, you must log in to ReDATA with your NetID. Data is for research use by each individual downloader only. Sharing and/or redistribution of any portion of this dataset is prohibited.The ReferenceUSA datasets from Data Axle (formerly Infogroup) contain establishment-level data about US businesses in annual snapshots from 1997-2021 and can help users create marketing plans and conduct competitive analyses. Each year has 53-94 indicators, with documentation for layouts found in Documentation.zip. Codebooks for 2015, 2017, and 2019 are missing, but all file layouts for 2014-2021 are identical.The University of Arizona University Libraries also subscribe to Data Axle Reference Solutions which provides this data in a searchable, online database with historical data available going back to 2003.NOTE: The uncompressed datasets are very large.Detailed file descriptions and MD5 hash values for each file can be found in the README.txt file.For inquiries regarding the contents of this dataset, please contact the Corresponding Author listed in the README.txt file. Administrative inquiries (e.g., removal requests, trouble downloading, etc.) can be directed to data-management@arizona.edu
z
Counts of Cholera reported in UNITED STATES OF AMERICA: 1895-1905
zenodo.org
json, xml, zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Cholera reported in UNITED STATES OF AMERICA: 1895-1905 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.63650001
Explore at:
json, zip, xmlAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/us.63650001
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 29, 1895 - Nov 25, 1905
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Z
Counts of Pneumonia reported in UNITED STATES OF AMERICA: 1912-1951
data.niaid.nih.gov
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burke, Donald (2024). Counts of Pneumonia reported in UNITED STATES OF AMERICA: 1912-1951 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452320
Explore at:
Dataset updated
Jun 3, 2024
Dataset provided by
Cross, Anne
Van Panhuis, Willem
Burke, Donald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
O
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
data.ct.gov
s.cnmilf.com
+1more
application/rdfxml +5
Updated Jun 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Public Health (2022). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-and-Deaths-by-Race-Ethnicity-ARCHIV/7rne-efic
Explore at:
xml, tsv, csv, application/rdfxml, json, application/rssxmlAvailable download formats
Dataset updated
Jun 24, 2022
Dataset authored and provided by
Department of Public Health
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.

The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.

The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .

The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .

The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.

COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update.

The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates.

The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.

Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf

Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic.

Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics

Data are subject to future revision as reporting changes.

Starting in July 2020, this dataset will be updated every weekday.

Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.

A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.

Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Z
Counts of Murine typhus reported in UNITED STATES OF AMERICA: 1945-1961
data.niaid.nih.gov
zenodo.org
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burke, Donald (2024). Counts of Murine typhus reported in UNITED STATES OF AMERICA: 1945-1961 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452349
Explore at:
Dataset updated
Jun 3, 2024
Dataset provided by
Cross, Anne
Van Panhuis, Willem
Burke, Donald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
p
Counts of Meningococcal meningitis reported in UNITED STATES OF AMERICA:...
tycho.pitt.edu
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Meningococcal meningitis reported in UNITED STATES OF AMERICA: 1926-1964 [Dataset]. https://www.tycho.pitt.edu/dataset/US.192644005
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1926 - 1964
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
p
Counts of Smallpox without rash reported in UNITED STATES OF AMERICA:...
tycho.pitt.edu
Updated Apr 1, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem G Van Panhuis; Anne L Cross; Donald S Burke (2018). Counts of Smallpox without rash reported in UNITED STATES OF AMERICA: 1889-1906 [Dataset]. https://www.tycho.pitt.edu/dataset/US.74724001
Explore at:
Dataset updated
Apr 1, 2018
Dataset provided by
Project Tycho, University of Pittsburgh
Authors
Willem G Van Panhuis; Anne L Cross; Donald S Burke
Time period covered
1889 - 1906
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretability. We also formatted the data into a standard data format.

Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datasets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of acquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.

Depending on the intended use of a dataset, we recommend a few data processing steps before analysis: - Analyze missing data: Project Tycho datasets do not include time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. - Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Z
Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945
data.niaid.nih.gov
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burke, Donald (2024). Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452452
Explore at:
Dataset updated
Jun 3, 2024
Dataset provided by
Cross, Anne
Van Panhuis, Willem
Burke, Donald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
z
Counts of Pertussis reported in UNITED STATES OF AMERICA: 1888-2017
zenodo.org
data.niaid.nih.gov
json, xml, zip
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke (2024). Counts of Pertussis reported in UNITED STATES OF AMERICA: 1888-2017 [Dataset]. http://doi.org/10.25337/t7/ptycho.v2.0/us.27836007
Explore at:
json, zip, xmlAvailable download formats
Unique identifier
https://doi.org/10.25337/t7/ptycho.v2.0/us.27836007
Dataset updated
Jun 3, 2024
Dataset provided by
Project Tycho
Authors
Willem Van Panhuis; Willem Van Panhuis; Anne Cross; Anne Cross; Donald Burke; Donald Burke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Jul 8, 1888 - Dec 30, 2017
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.
Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.
Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:
Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.
Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
Z
Counts of Typhoid and paratyphoid fevers reported in UNITED STATES OF...
data.niaid.nih.gov
zenodo.org
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cross, Anne (2024). Counts of Typhoid and paratyphoid fevers reported in UNITED STATES OF AMERICA: 1937-1951 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452277
Explore at:
Dataset updated
Jun 3, 2024
Dataset provided by
Cross, Anne
Van Panhuis, Willem
Burke, Donald
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
d
Crustal Architecture Beneath the Southern Midcontinent (USA) -- Data Grids...
catalog.data.gov
data.usgs.gov
+1more
Updated Jul 6, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2024). Crustal Architecture Beneath the Southern Midcontinent (USA) -- Data Grids and 3D Geophysical Models: Gravity and Magnetic Grids [Dataset]. https://catalog.data.gov/dataset/crustal-architecture-beneath-the-southern-midcontinent-usa-data-grids-and-3d-geophysical-m-2ed0b
Explore at:
Dataset updated
Jul 6, 2024
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
United States
Description
Regional geophysical data grid files provided in Grid eXchange File (GXF) format were used to study crustal architecture beneath the Southern Midcontinent (USA) by McCafferty and others (2019). The study covered a rectangular, multi-state area of 924 by 924 kilometers centered on Missouri, and a corresponding volume extending from the topographic surface to a depth of 50 kilometers below sea level. The observed geophysical data grids (Bouguer gravity anomaly and reduced-to-pole (RTP) aeromagnetic anomaly) were used in 3D inversions, and the calculated geophysical data grids were produced by the inversions. All grids are sampled at a 2,000 by 2,000 meter grid interval. Missing data nodes are assigned a value of 1.000000E+38. Two versions of each GXF file are provided: (1) a version with embeded map projection information for use with Geosoft Oasis montaj and the free Geosoft viewer software and (2) a version without map projection information (filenames ending in NoProj.gxf) for use with ArcMap and QGIS. An ArcMap projection file, which contains the missing projection information, is also provided.
VEMAP 1: U.S. Soil - Dataset - NASA Open Data Portal
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). VEMAP 1: U.S. Soil - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/vemap-1-u-s-soil-2877b
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
United States
Description
The Vegetation/Ecosystem Modeling and Analysis Project (VEMAP) is an ongoing multiinstitutional, international effort addressing the response of biogeography and biogeochemistry to environmental variability in climate and other drivers in both space and time domains. The objectives of VEMAP are the intercomparison of biogeochemistry models and vegetationtype distribution models (biogeography models) and determination of their sensitivity to changing climate, elevated atmospheric carbon dioxide concentrations, and other sources of altered forcing. Soil properties were based on a 10-km gridded EPA soil database developed by Kern (1994, 1995). Two soil coverages are provided in the Kern data set: one from the USDA Soil Conservation Service (SCS) national soil database (NATSGO) and the other from the United Nations Food and Agriculture Organization soil database (FAO 1974- 78). Only the SCS NATSGO soils are included in the VEMAP set. Physical consistency in soils data was incorporated by representing a grid cell's soil by a set of dominant (modal) soil profiles, rather than by a simple average of soil properties. Because soil processes, such as soil organic matter turnover and water balance, are non-linearly related to soil texture and other soil parameters, simulations based on dominant soil profiles and their frequency distribution can account for soil dynamics that would be lost if averaged soil properties were used. To spatially aggregate Kern data to the 0.5 degree grid, we used cluster analysis to group the subgrid 10-km elements into up to 4 modal soil catagories (Kittel et al. 1995). In this statistical approach, cell soil properties are represented by the set of modal soils, rather than by an "average soil." We also provide cell- averaged soil data. Please see the associated Data Set Revision page for an explanation of recent changes made to this data set. A complete users guide to the VEMAP Phase I database which includes more information about this data set can be found at ftp://daac.ornl.gov/data/vemap-1/comp/Phase_1_User_Guide.pdf. ORNL DAAC maintains additional information associated with the VEMAP Project. Data Citation: This data set should be cited as follows: Kittel, T. G. F., N. A. Rosenbloom, T. H. Painter, D. S. Schimel, H. H. Fisher, A. Grimsdell, VEMAP Participants, C. Daly, and E. R. Hunt, Jr. 2002. VEMAP Phase I Database, revised. Available on-line from Oak Ridge National Laboratory Distributed Active Archive Center, Oak Ridge, Tennessee, U.S.A.
Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version...
data.nasa.gov
Updated Apr 1, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
nasa.gov (2025). Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 4 R1 - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/daymet-daily-surface-weather-data-on-a-1-km-grid-for-north-america-version-4-r1-b1d6b
Explore at:
Dataset updated
Apr 1, 2025
Dataset provided by
NASAhttp://nasa.gov/
Area covered
North America
Description
This dataset provides Daymet Version 4 R1 data as gridded estimates of daily weather parameters for North America, Hawaii, and Puerto Rico. Daymet variables include the following parameters: minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length. The dataset covers the period from January 1, 1980, to December 31 (or December 30 in leap years) of the most recent full calendar year for the Continental North America and Hawaii spatial regions. Data for Puerto Rico is available starting in 1950. Each subsequent year is processed individually at the close of a calendar year. Daymet variables are provided as individual files, by variable and year, at a 1 km x 1 km spatial resolution and a daily temporal resolution. Areas of Hawaii and Puerto Rico are available as files separate from the continental North America. Data are in a North America Lambert Conformal Conic projection and are distributed in a standardized Climate and Forecast (CF)-compliant netCDF file format. In Version 4 R1, all 2020 and 2021 files were updated to improve predictions especially in high-latitude areas. It was found that input files used for deriving 2020 and 2021 data had, for a significant portion of Canadian weather stations, missing daily variable readings for the month of January. NCEI has corrected issues with the Environment Canada ingest feed which led to the missing readings. The revised 2020 and 2021 Daymet V4 R1 files were derived with new GHCNd inputs. Files outside of 2020 and 2021 have not changed from the previous V4 release.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2025). Number of missing persons files U.S. 2024, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/

Number of missing persons files U.S. 2024, by race

Explore at:

Dataset updated

Aug 14, 2025

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2024

Area covered

United States

Description

In 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.

Clear search

Close search

Google apps

Main menu

Number of missing persons files U.S. 2024, by race

Number of missing person files U.S. 1990-2023

Data from: COVID-19 Case Surveillance Public Use Data with Geography

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

‘Missing Migrants Dataset’ analyzed by Analyst-2

About the Missing Migrants Data

Where is the data from?

Who is included in Missing Migrants Project data?

How complete is the data on dead and missing migrants?

What can be understood through this data?

Why collect data on missing and dead migrants?

Identification and tracing of the dead and missing

Content

Mass Killings in America, 2006 - present

OVERVIEW

About this Dataset

Using this Dataset

Definition of "mass murder"

Methodology

Contacts

Data from: Global Land Cover Mapping and Estimation Yearly 30 m V001

Native American Children Facial Image Dataset for Facial Recognition

Introduction

Facial Image Data

Diversity and Representation

Quality and Image Conditions

Metadata

Applications

Ethical Collection and Data Security

ReferenceUSA Historical Business Datasets

Counts of Cholera reported in UNITED STATES OF AMERICA: 1895-1905

Counts of Pneumonia reported in UNITED STATES OF AMERICA: 1912-1951

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

Counts of Murine typhus reported in UNITED STATES OF AMERICA: 1945-1961

Counts of Meningococcal meningitis reported in UNITED STATES OF AMERICA:...

Counts of Smallpox without rash reported in UNITED STATES OF AMERICA:...

Counts of Anthrax reported in UNITED STATES OF AMERICA: 1942-1945

Counts of Pertussis reported in UNITED STATES OF AMERICA: 1888-2017

Counts of Typhoid and paratyphoid fevers reported in UNITED STATES OF...

Crustal Architecture Beneath the Southern Midcontinent (USA) -- Data Grids...

VEMAP 1: U.S. Soil - Dataset - NASA Open Data Portal

Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version...

Number of missing persons files U.S. 2024, by race