100+ datasets found

Number of missing persons files in the U.S. 2022, by race
statista.com
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of missing persons files in the U.S. 2022, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/
Explore at:
Dataset updated
Jul 5, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
United States
Description
In 2022, there were 313,017 cases filed by the NCIC where the race of the reported missing was White. In the same year, 18,928 people were missing whose race was unknown.

What is the NCIC?

The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide.

Missing people in the United States

A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
d
NCRB: State and Gender-wise Number of Persons Reported Missing and Traced
dataful.in
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). NCRB: State and Gender-wise Number of Persons Reported Missing and Traced [Dataset]. https://dataful.in/datasets/18466
Explore at:
csv, application/x-parquet, xlsxAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
States of India
Variables measured
Number of persons missing, share of persons traced
Description
The dataset contains the state-wise number of persons reported missing in a particular year, the total number of persons missing including those from previous years, the number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.

Note: Figures for projected_mid_year_population are sourced from the Report of the Technical Group on Population Projections for India and States 2011-2036
d
NCRB: State and Gender-wise number of children reported missing and traced
dataful.in
Updated Aug 1, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). NCRB: State and Gender-wise number of children reported missing and traced [Dataset]. https://dataful.in/datasets/18468
Explore at:
csv, application/x-parquet, xlsxAvailable download formats
Dataset updated
Aug 1, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
States of India
Variables measured
Number of children missing, share of children traced
Description
Ministry of Home Affairs, Government of India has defined missing child as 'a person below eighteen years of age, whose whereabouts are not known to the parents, legal guardians and any other persons who may be legally entrusted with the custody of the child, whatever may be the circumstances/causes of disappearance”. The dataset contains the state wise and gender-wise number of children reported missing in a particular year, total number of persons missing including those from previous years, number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
OPP Missing Persons Annual Report Data
open.canada.ca
ouvert.canada.ca
csv, html, txt, xlsx
Updated Jun 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Ontario (2025). OPP Missing Persons Annual Report Data [Dataset]. https://open.canada.ca/data/en/dataset/1bf5a9a3-14bc-482d-9fe6-c182034f3a66
Explore at:
csv, xlsx, txt, htmlAvailable download formats
Dataset updated
Jun 25, 2025
Dataset provided by
Government of Ontariohttps://www.ontario.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jul 1, 2019 - Dec 31, 2023
Description
Under Section 8 of the Missing Persons Act, 2018, police services are required to report annually on their use of urgent demands for records under the Act and the Ministry of the Solicitor General is required to make the OPP’s annual report data publicly available. The data includes: * year in which the urgent demands were reported * category of records * description of records accessed under each category * total number of times each category of records was demanded * total number of missing persons investigations which had urgent demands for records * total number of urgent demands for records made by OPP in a year.
National Missing and Unidentified Persons System (NamUs)
catalog.data.gov
datasets.ai
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Justice Programs (2025). National Missing and Unidentified Persons System (NamUs) [Dataset]. https://catalog.data.gov/dataset/national-missing-and-unidentified-persons-system-namus
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Office of Justice Programshttps://ojp.gov/
Description
NamUs is the only national repository for missing, unidentified, and unclaimed persons cases. The program provides a singular resource hub for law enforcement, medical examiners, coroners, and investigating professionals. It is the only national database for missing, unidentified, and unclaimed persons that allows limited access to the public, empowering family members to take a more proactive role in the search for their missing loved ones.
f
Missing and Unaccounted-for People in Mexico (1960s–2025)
figshare.com
txt
Updated Jul 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Montserrat Mora (2025). Missing and Unaccounted-for People in Mexico (1960s–2025) [Dataset]. http://doi.org/10.6084/m9.figshare.28283000.v4
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28283000.v4
Dataset updated
Jul 2, 2025
Dataset provided by
figshare
Authors
Montserrat Mora
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Mexico
Description
This project provides a comprehensive dataset of over 125,000 missing and unaccounted-for people in Mexico from the 1960s to 2025. The dataset is sourced from the publicly available records on the RNPDO website and represents individuals who were actively missing as of the date of collection (July 1, 2025). To protect individual identities, personal identifiers, such as names, have been removed.Dataset Features:The data has been cleaned and translated to facilitate analysis by a global audience.Fields include:SexDate of birthDate of incidenceState and municipality of the incidentData spans over six decades, offering insights into trends and regional disparities.Additional Materials:Python Script: A Python script to generate customizable visualizations based on the dataset. Users can specify the state to generate tailored charts.Sample Chart: An example chart showcasing the evolution of missing persons per 100,000 inhabitants in Mexico between 2006 and 2025.Requirements File: A requirements.txt file listing the necessary Python libraries to run the script seamlessly.This dataset and accompanying tools aim to support researchers, policymakers, and journalists in analyzing and addressing the issue of missing persons in Mexico.
e
Geographies of missing people: processes, experiences and responses -...
b2find.eudat.eu
Updated Oct 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Geographies of missing people: processes, experiences and responses - Dataset - B2FIND [Dataset]. https://b2find.eudat.eu/dataset/76fd493f-bf8f-5337-a8d8-8db0fc56dbd5
Explore at:
Dataset updated
Oct 23, 2023
Description
This data collection represents the empirical materials collected from the ESRC project 'Geographies of Missing People'. It comprises 45 interviews with people previously reported as missing, 9 charity workers, 23 police officers of various ranks and 25 families of missing people. We request that other researchers who wish to reuse our data get in touch to dialogue with the research team about how and why they want to reuse this data. The data is accessible with direct permission from the PI of the original ESRC award: Hester.parr@glasgow.ac.ukThis project seeks to understand the realities involved in 'going missing', and does so from multiple perspectives; using the voices and opinions of the police, families and returned missing people themselves. Qualitative data has been collected to shed light on this significant social (and spatial) problem and help us understand more about the nature of missing experiences for different groups. The purpose of the research project has been to understand more about how people go missing and how the police and families respond to such events (the geographies of searching). Such a focus holds value for both the police and families (the 'left behind') in that it updates and checks current knowledge about the likely spatial experiences of missing people. The project has recruited 45 people formally reported as missing to the project; 9 charity workers in the field of missing persons; 23 police officers of various ranks and 25 family members and these are held by the data archive service. Permission to access from Hester.parr@glasgow.ac.uk Interviews and focus groups. Sampling methods are profiled in the main reports lodged on www.geographiesofmissingpeople.org.uk
Data from: COVID-19 Case Surveillance Public Use Data with Geography
data.cdc.gov
data.virginia.gov
+5more
application/rdfxml +5
Updated Jul 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CDC Data, Analytics and Visualization Task Force (2024). COVID-19 Case Surveillance Public Use Data with Geography [Dataset]. https://data.cdc.gov/Case-Surveillance/COVID-19-Case-Surveillance-Public-Use-Data-with-Ge/n8mc-b4w4
Explore at:
application/rssxml, csv, tsv, application/rdfxml, xml, jsonAvailable download formats
Dataset updated
Jul 9, 2024
Dataset provided by
Centers for Disease Control and Preventionhttp://www.cdc.gov/
Authors
CDC Data, Analytics and Visualization Task Force
License
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
Description
Note: Reporting of new COVID-19 Case Surveillance data will be discontinued July 1, 2024, to align with the process of removing SARS-CoV-2 infections (COVID-19 cases) from the list of nationally notifiable diseases. Although these data will continue to be publicly available, the dataset will no longer be updated.

Authorizations to collect certain public health data expired at the end of the U.S. public health emergency declaration on May 11, 2023. The following jurisdictions discontinued COVID-19 case notifications to CDC: Iowa (11/8/21), Kansas (5/12/23), Kentucky (1/1/24), Louisiana (10/31/23), New Hampshire (5/23/23), and Oklahoma (5/2/23). Please note that these jurisdictions will not routinely send new case data after the dates indicated. As of 7/13/23, case notifications from Oregon will only include pediatric cases resulting in death.

This case surveillance public use dataset has 19 elements for all COVID-19 cases shared with CDC and includes demographics, geography (county and state of residence), any exposure history, disease severity indicators and outcomes, and presence of any underlying medical conditions and risk behaviors.

Currently, CDC provides the public with three versions of COVID-19 case surveillance line-listed data: this 19 data element dataset with geography, a 12 data element public use dataset, and a 33 data element restricted access dataset.

The following apply to the public use datasets and the restricted access dataset:
Data elements can be found on the COVID-19 case report form located at www.cdc.gov/coronavirus/2019-ncov/downloads/pui-form.pdf.
Data are considered provisional by CDC and are subject to change until the data are reconciled and verified with the state and territorial data providers.
Some data are suppressed to protect individual privacy.
Datasets will include all cases with the earliest date available in each record (date received by CDC or date related to illness/specimen collection) at least 14 days prior to the creation of the current datasets. This 14-day lag allows case reporting to be stabilized and ensure that time-dependent outcome data are accurately captured.
Datasets are updated monthly.
Datasets are created using CDC’s Policy on Public Health Research and Nonresearch Data Management and Access and include protections designed to protect individual privacy.
For more information about data collection and reporting, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/about-us-cases-deaths.html.
For more information about the COVID-19 case surveillance data, please see https://www.cdc.gov/coronavirus/2019-ncov/covid-data/faq-surveillance.html

Overview

The COVID-19 case surveillance database includes individual-level data reported to U.S. states and autonomous reporting entities, including New York City and the District of Columbia (D.C.), as well as U.S. territories and affiliates. On April 5, 2020, COVID-19 was added to the Nationally Notifiable Condition List and classified as “immediately notifiable, urgent (within 24 hours)” by a Council of State and Territorial Epidemiologists (CSTE) Interim Position Statement (Interim-20-ID-01). CSTE updated the position statement on August 5, 2020, to clarify the interpretation of antigen detection tests and serologic test results within the case classification (Interim-20-ID-02). The statement also recommended that all states and territories enact laws to make COVID-19 reportable in their jurisdiction, and that jurisdictions conducting surveillance should submit case notifications to CDC. COVID-19 case surveillance data are collected by jurisdictions and reported voluntarily to CDC.

For more information: NNDSS Supports the COVID-19 Response | CDC.

COVID-19 Case Reports COVID-19 case reports are routinely submitted to CDC by public health jurisdictions using nationally standardized case reporting forms. On April 5, 2020, CSTE released an Interim Position Statement with national surveillance case definitions for COVID-19. Current versions of these case definitions are available at: https://ndc.services.cdc.gov/case-definitions/coronavirus-disease-2019-2021/. All cases reported on or after were requested to be shared by public health departments to CDC using the standardized case definitions for lab-confirmed or probable cases. On May 5, 2020, the standardized case reporting form was revised. States and territories continue to use this form.

Data are Considered Provisional

The COVID-19 case surveillance data are dynamic; case reports can be modified at any time by the jurisdictions sharing COVID-19 data with CDC. CDC may update prior cases shared with CDC based on any updated information from jurisdictions. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window. Data may also be shared late with CDC due to the volume of COVID-19 cases.
Annual finalized data: To create the final NNDSS data used in the annual tables, CDC works carefully with the reporting jurisdictions to reconcile the data received during the year until each state or territorial epidemiologist confirms that the data from their area are correct.

Access Addressing Gaps in Public Health Reporting of Race and Ethnicity for COVID-19, a report from the Council of State and Territorial Epidemiologists, to better understand the challenges in completing race and ethnicity data for COVID-19 and recommendations for improvement.

Data Limitations

To learn more about the limitations in using case surveillance data, visit FAQ: COVID-19 Data and Surveillance.

Data Quality Assurance Procedures

CDC’s Case Surveillance Section routinely performs data quality assurance procedures (i.e., ongoing corrections and logic checks to address data errors). To date, the following data cleaning steps have been implemented:
Questions that have been left unanswered (blank) on the case report form are reclassified to a Missing value, if applicable to the question. For example, in the question "Was the individual hospitalized?" where the possible answer choices include "Yes," "No," or "Unknown," the blank value is recoded to "Missing" because the case report form did not include a response to the question.
Logic checks are performed for date data. If an illogical date has been provided, CDC reviews the data with the reporting jurisdiction. For example, if a symptom onset date in the future is reported to CDC, this value is set to null until the reporting jurisdiction updates the date appropriately.
Additional data quality processing to recode free text data is ongoing. Data on symptoms, race, ethnicity, and healthcare worker status have been prioritized.

Data Suppression

To prevent release of data that could be used to identify people, data cells are suppressed for low frequency (<11 COVID-19 case records with a given values). Suppression includes low frequency combinations of case month, geographic characteristics (county and state of residence), and demographic characteristics (sex, age group, race, and ethnicity). Suppressed values are re-coded to the NA answer option; records with data suppression are never removed.

Additional COVID-19 Data

COVID-19 data are available to the public as summary or aggregate count files, including total counts of cases and deaths by state and by county. These and other COVID-19 data are available from multiple public locations: COVID Data Tracker; United States COVID-19 Cases and Deaths by State; COVID-19 Vaccination Reporting Data Systems; and COVID-19 Death Data and Resources.

Notes:

March 1, 2022: The "COVID-19 Case Surveillance Public Use Data with Geography" will be updated on a monthly basis.

April 7, 2022: An adjustment was made to CDC’s cleaning algorithm for COVID-19 line level case notification data. An assumption in CDC's algorithm led to misclassifying deaths that were not COVID-19 related. The algorithm has since been revised, and this dataset update reflects corrected individual level information about death status for all cases collected to date.

June 25, 2024: An adjustment
A
‘Missing Migrants Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Apr 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘Missing Migrants Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-missing-migrants-dataset-c736/2e62d69f/?v=grid
Explore at:
Dataset updated
Apr 23, 2019
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Missing Migrants Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jmataya/missingmigrants on 14 February 2022.

--- Dataset description provided by original source is as follows ---

About the Missing Migrants Data

This data is sourced from the International Organization for Migration. The data is part of a specific project called the Missing Migrants Project which tracks deaths of migrants, including refugees , who have gone missing along mixed migration routes worldwide. The research behind this project began with the October 2013 tragedies, when at least 368 individuals died in two shipwrecks near the Italian island of Lampedusa. Since then, Missing Migrants Project has developed into an important hub and advocacy source of information that media, researchers, and the general public access for the latest information.

Where is the data from?

Missing Migrants Project data are compiled from a variety of sources. Sources vary depending on the region and broadly include data from national authorities, such as Coast Guards and Medical Examiners; media reports; NGOs; and interviews with survivors of shipwrecks. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. In other cases, media reports are used. IOM and UNHCR also regularly coordinate on such data to ensure consistency. Data on the U.S./Mexico border are compiled based on data from U.S. county medical examiners and sheriff’s offices, as well as media reports for deaths occurring on the Mexico side of the border. Estimates within Mexico and Central America are based primarily on media and year-end government reports. Data on the Bay of Bengal are drawn from reports by UNHCR and NGOs. In the Horn of Africa, data are obtained from media and NGOs. Data for other regions is drawn from a combination of sources, including media and grassroots organizations. In all regions, Missing Migrants Projectdata represents minimum estimates and are potentially lower than in actuality.

Updated data and visuals can be found here: https://missingmigrants.iom.int/

Who is included in Missing Migrants Project data?

IOM defines a migrant as any person who is moving or has moved across an international border or within a State away from his/her habitual place of residence, regardless of

(1) the person’s legal status; (2) whether the movement is voluntary or involuntary; (3) what the causes for the movement are; or (4) what the length of the stay is.[1]

Missing Migrants Project counts migrants who have died or gone missing at the external borders of states, or in the process of migration towards an international destination. The count excludes deaths that occur in immigration detention facilities, during deportation, or after forced return to a migrant’s homeland, as well as deaths more loosely connected with migrants’ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. This approach is chosen because deaths that occur at physical borders and while en route represent a more clearly definable category, and inform what migration routes are most dangerous. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, rather tracked as a distinct category.

How complete is the data on dead and missing migrants?

Data on fatalities during the migration process are challenging to collect for a number of reasons, most stemming from the irregular nature of migratory journeys on which deaths tend to occur. For one, deaths often occur in remote areas on routes chosen with the explicit aim of evading detection. Countless bodies are never found, and rarely do these deaths come to the attention of authorities or the media. Furthermore, when deaths occur at sea, frequently not all bodies are recovered - sometimes with hundreds missing from one shipwreck - and the precise number of missing is often unknown. In 2015, over 50 per cent of deaths recorded by the Missing Migrants Project refer to migrants who are presumed dead and whose bodies have not been found, mainly at sea.

Data are also challenging to collect as reporting on deaths is poor, and the data that does exist are highly scattered. Few official sources are collecting data systematically. Many counts of death rely on media as a source. Coverage can be spotty and incomplete. In addition, the involvement of criminal actors in incidents means there may be fear among survivors to report deaths and some deaths may be actively covered-up. The irregular immigration status of many migrants, and at times their families as well, also impedes reporting of missing persons or deaths.

The varying quality and comprehensiveness of data by region in attempting to estimate deaths globally may exaggerate the share of deaths that occur in some regions, while under-representing the share occurring in others.

What can be understood through this data?

The available data can give an indication of changing conditions and trends related to migration routes and the people travelling on them, which can be relevant for policy making and protection plans. Data can be useful to determine the relative risks of irregular migration routes. For example, Missing Migrants Project data show that despite the increase in migrant flows through the eastern Mediterranean in 2015, the central Mediterranean remained the more deadly route. In 2015, nearly two people died out of every 100 travellers (1.85%) crossing the Central route, as opposed to one out of every 1,000 that crossed from Turkey to Greece (0.095%). From the data, we can also get a sense of whether groups like women and children face additional vulnerabilities on migration routes.

However, it is important to note that because of the challenges in data collection for the missing and dead, basic demographic information on the deceased is rarely known. Often migrants in mixed migration flows do not carry appropriate identification. When bodies are found it may not be possible to identify them or to determine basic demographic information. In the data compiled by Missing Migrants Project, sex of the deceased is unknown in over 80% of cases. Region of origin has been determined for the majority of the deceased. Even this information is at times extrapolated based on available information – for instance if all survivors of a shipwreck are of one origin it was assumed those missing also came from the same region.

The Missing Migrants Project dataset includes coordinates for where incidents of death took place, which indicates where the risks to migrants may be highest. However, it should be noted that all coordinates are estimates.

Why collect data on missing and dead migrants?

By counting lives lost during migration, even if the result is only an informed estimate, we at least acknowledge the fact of these deaths. What before was vague and ill-defined is now a quantified tragedy that must be addressed. Politically, the availability of official data is important. The lack of political commitment at national and international levels to record and account for migrant deaths reflects and contributes to a lack of concern more broadly for the safety and well-being of migrants, including asylum-seekers. Further, it drives public apathy, ignorance, and the dehumanization of these groups.

Data are crucial to better understand the profiles of those who are most at risk and to tailor policies to better assist migrants and prevent loss of life. Ultimately, improved data should contribute to efforts to better understand the causes, both direct and indirect, of fatalities and their potential links to broader migration control policies and practices.

Counting and recording the dead can also be an initial step to encourage improved systems of identification of those who die. Identifying the dead is a moral imperative that respects and acknowledges those who have died. This process can also provide a some sense of closure for families who may otherwise be left without ever knowing the fate of missing loved ones.

Identification and tracing of the dead and missing

As mentioned above, the challenge remains to count the numbers of dead and also identify those counted. Globally, the majority of those who die during migration remain unidentified. Even in cases in which a body is found identification rates are low. Families may search for years or a lifetime to find conclusive news of their loved one. In the meantime, they may face psychological, practical, financial, and legal problems.

Ultimately Missing Migrants Project would like to see that every unidentified body, for which it is possible to recover, is adequately “managed”, analysed and tracked to ensure proper documentation, traceability and dignity. Common forensic protocols and standards should be agreed upon, and used within and between States. Furthermore, data relating to the dead and missing should be held in searchable and open databases at local, national and international levels to facilitate identification.

For more in-depth analysis and discussion of the numbers of missing and dead migrants around the world, and the challenges involved in identification and tracing, read our two reports on the issue, Fatal Journeys: Tracking Lives Lost during Migration (2014) and Fatal Journeys Volume 2, Identification and Tracing of Dead and Missing Migrants

Content

The data set records
o
HomeDatasetCategoryStoriesSuggestContact Sign in
opendatanepal.com
Updated Jul 20, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). HomeDatasetCategoryStoriesSuggestContact Sign in [Dataset]. https://opendatanepal.com/dataset/climate-change-and-health-profile-2015
Explore at:
Dataset updated
Jul 20, 2025
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
The dataset shows effects of various natural disasters leading to people’s death, injury, people going missing, houses being destroyed etc. in timestamps of year 1971 to 2013. There were a total of 24,257 deaths in that time period. Where epidemic, landslide, and flood were the top three causes of human deaths. And flood, earthquake and fire were the top three causes of household damages.
Data from: National Incidence Studies of Missing, Abducted, Runaway, and...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Juvenile Justice and Delinquency Prevention (2025). National Incidence Studies of Missing, Abducted, Runaway, and Thrownaway Children (NISMART), 1999 [Dataset]. https://catalog.data.gov/dataset/national-incidence-studies-of-missing-abducted-runaway-and-thrownaway-children-nismart-199-2621e
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Office of Juvenile Justice and Delinquency Preventionhttp://ojjdp.gov/
Description
The National Incidence Studies of Missing, Abducted, Runaway, and Thrownaway Children (NISMART) were undertaken in response to the mandate of the 1984 Missing Children's Assistance Act (Pub.L. 98-473) that requires the Office of Juvenile Justice and Delinquency Prevention (OJJDP) to conduct periodic national incidence studies to determine the actual number of children reported missing and the number of missing children who are recovered for a given year. The first such study, NISMART-1 (NATIONAL INCIDENCE STUDIES OF MISSING, ABDUCTED, RUNAWAY, AND THROWNAWAY CHILDREN (NISMART), 1988 [ICPSR 9682]), was conducted from 1988 to 1989 and addressed this mandate by defining major types of missing child episodes and estimating the number of children who experienced missing child episodes of each type in 1988. At that time, the lack of a standardized definition of a "missing child" made it impossible to provide a single estimate of missing children. As a result, one of the primary goals of NISMART-2 was to develop a standardized definition and provide unified estimates of the number of missing children in the United States. Both NISMART-1 and NISMART-2 comprise several component datasets designed to provide a comprehensive picture of the population of children who experienced qualifying episodes, with each component focusing on a different aspect of the missing child population. The Household Survey -- Youth Data and the Household Survey -- Adult Data (Parts 1-2) are similar but separate surveys, one administered to the adult primary caretaker of the children in the sampled household and the other to a randomly selected household youth aged 10 through 18 at the time of interview. The Juvenile Facilities Data on Runaways (Part 3) sought to estimate the number of runaways from juvenile residential facilities in order to supplement the household survey estimate of the number of runaways from households. And the Law Enforcement Study Data, by case perpetrator, and victim, (Parts 4-6) intended to estimate the number of children who were victims of stereotypical kidnappings and to obtain a sample of these cases for in-depth study.
ARCHIVED: COVID-19 Testing by Geography Over Time
healthdata.gov
data.sfgov.org
+2more
application/rdfxml +5
Updated Apr 8, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.sfgov.org (2025). ARCHIVED: COVID-19 Testing by Geography Over Time [Dataset]. https://healthdata.gov/dataset/ARCHIVED-COVID-19-Testing-by-Geography-Over-Time/nw7x-qrh3
Explore at:
application/rssxml, xml, json, csv, tsv, application/rdfxmlAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
data.sfgov.org
Description
A. SUMMARY This dataset includes COVID-19 tests by resident neighborhood and specimen collection date (the day the test was collected). Specifically, this dataset includes tests of San Francisco residents who listed a San Francisco home address at the time of testing. These resident addresses were then geo-located and mapped to neighborhoods. The resident address associated with each test is hand-entered and susceptible to errors, therefore neighborhood data should be interpreted as an approximation, not a precise nor comprehensive total.

In recent months, about 5% of tests are missing addresses and therefore cannot be included in any neighborhood totals. In earlier months, more tests were missing address data. Because of this high percentage of tests missing resident address data, this neighborhood testing data for March, April, and May should be interpreted with caution (see below)

Percentage of tests missing address information, by month in 2020 Mar - 33.6% Apr - 25.9% May - 11.1% Jun - 7.2% Jul - 5.8% Aug - 5.4% Sep - 5.1% Oct (Oct 1-12) - 5.1%

To protect the privacy of residents, the City does not disclose the number of tests in neighborhoods with resident populations of fewer than 1,000 people. These neighborhoods are omitted from the data (they include Golden Gate Park, John McLaren Park, and Lands End).

Tests for residents that listed a Skilled Nursing Facility as their home address are not included in this neighborhood-level testing data. Skilled Nursing Facilities have required and repeated testing of residents, which would change neighborhood trends and not reflect the broader neighborhood's testing data.

This data was de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected).

The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco. During this investigation, some test results are found to be for persons living outside of San Francisco and some people in San Francisco may be tested multiple times (which is common). To see the number of new confirmed cases by neighborhood, reference this map: https://sf.gov/data/covid-19-case-maps#new-cases-maps

B. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information. All testing data is then geo-coded by resident address. Then data is aggregated by analysis neighborhood and specimen collection date.

Data are prepared by close of business Monday through Saturday for public display.

C. UPDATE PROCESS Updates automatically at 05:00 Pacific Time each day. Redundant runs are scheduled at 07:00 and 09:00 in case of pipeline failure.

D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).

Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.

In order to track trends over time, a data user can analyze this data by "specimen_collection_date".

Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of pe
d
COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE
catalog.data.gov
data.ct.gov
+1more
Updated Aug 12, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.ct.gov (2023). COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE [Dataset]. https://catalog.data.gov/dataset/covid-19-cases-and-deaths-by-race-ethnicity
Explore at:
Dataset updated
Aug 12, 2023
Dataset provided by
data.ct.gov
Description
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
Austin Animal Center Shelter Intakes and Outcomes
kaggle.com
Updated Apr 13, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AaronSchlegel (2018). Austin Animal Center Shelter Intakes and Outcomes [Dataset]. https://www.kaggle.com/datasets/aaronschlegel/austin-animal-center-shelter-intakes-and-outcomes/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Apr 13, 2018
Dataset provided by
Kagglehttp://kaggle.com/
Authors
AaronSchlegel
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Area covered
Austin
Description
Context

The Austin Animal Center is the largest no-kill animal shelter in the United States that provides care and shelter to over 18,000 animals each year. As part of the AAC's efforts to help and care for animals in need, the organization makes available its accumulated data and statistics as part of the city of Austin's Open Data Initiative.

Content

The data contains intakes and outcomes of animals entering the Austin Animal Center from the beginning of October 2013 to the present day. The datasets are also freely available on the Socrata Open Data Access API and are updated daily.

The following are links to the datasets hosted on Socrata's Open Data:

Austin Animal Center Intakes

Austin Animal Center Outcomes

The data contained in this dataset is the outcomes and intakes data as noted above, as well as a combined dataset. The merging of the outcomes and intakes data was done on a unique key that is a combination of the given Animal ID and the intake number. Several of the animals in the dataset have been taken into the shelter multiple times, which creates duplicate Animal IDs that causes problems when merging the two datasets.

Copied from the description of the Shelter Outcomes dataset, here are some definitions of the outcome types:

Adoption

the animal was adopted to a home

Barn Adoption

the animal was adopted to live in a barn

Offsite Missing

the animal went missing for unknown reasons at an offsite partner location

In-Foster Missing

the animal is missing after being placed in a foster home

In-Kennel Missing

the animal is missing after being transferred to a kennel facility

Possible Theft

Although not confirmed, the animal went missing as a result of theft from the facility

Barn Transfer

The animal was transferred to a facility for adoption into a barn environment

SNR

SNR refers to the city of Austin's Shelter-Neuter-Release program. I believe the outcome is representative of the animal being released.

Acknowledgements

The data presented here is only possible through the hard work and dedication of the Austin Animal Center in saving and caring for animal lives.

Inspiration

Following from the first dataset I posted to Kaggle, Austin Animal Shelter Outcomes, which was initially filtered for just cats as part of an analysis I was performing, I wanted to post the complete outcome and complementing intake datasets. My hope is the great users of Kaggle will find this data interesting and want to explore shelter animal statistics further and perhaps get more involved in the animal welfare community. The analysis of this data and other shelter animal provided datasets helps uncover useful insights that have the potential to save lives directly.
f
Dataset for: Avoiding pitfalls when combining multiple imputation and...
wiley.figshare.com
docx
Updated Jun 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Granger; Jamie Sergeant; Mark Lunt (2023). Dataset for: Avoiding pitfalls when combining multiple imputation and propensity scores [Dataset]. http://doi.org/10.6084/m9.figshare.9253178.v1
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.9253178.v1
Dataset updated
Jun 2, 2023
Dataset provided by
Wiley
Authors
Emily Granger; Jamie Sergeant; Mark Lunt
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Overcoming bias due to confounding and missing data is challenging when analysing observational data. Propensity scores are commonly used to account for the first problem and multiple imputation for the latter. Unfortunately, it is not known how best to proceed when both techniques are required. We investigate whether two different approaches to combining propensity scores and multiple imputation (Across and Within) lead to differences in the accuracy or precision of exposure effect estimates. Both approaches start by imputing missing values multiple times. Propensity scores are then estimated for each resulting dataset. Using the Across approach, the mean propensity score across imputations for each subject is used in a single subsequent analysis. Alternatively, the Within approach uses propensity scores individually to obtain exposure effect estimates in each imputation, which are combined to produce an overall estimate. These approaches were compared in a series of Monte Carlo simulations and applied to data from the British Society for Rheumatology Biologics Register. Results indicated that the Within approach produced unbiased estimates with appropriate confidence intervals, whereas the Across approach produced biased results and unrealistic confidence intervals. Researchers are encouraged to implement the Within approach when conducting propensity score analyses with incomplete data.
A
‘Covid-19 Tests by Race Ethnicity and Date’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 27, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Covid-19 Tests by Race Ethnicity and Date’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/data-gov-covid-19-tests-by-race-ethnicity-and-date-f47f/e38e3d0a/?iid=004-383&v=presentation
Explore at:
Dataset updated
Jan 27, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Covid-19 Tests by Race Ethnicity and Date’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/68410b4b-052f-4ce3-8d0c-873b5664f1a4 on 27 January 2022.

--- Dataset description provided by original source is as follows ---

Note: As of April 16, 2021, this dataset will update daily with a five-day data lag.

A. SUMMARY This dataset includes San Francisco COVID-19 tests by race/ ethnicity and date. For each day, this dataset represents the daily count of tests collected by race/ethnicity, and how many of those were positive, negative, and indeterminate. Tests in this dataset include all tests collected from San Francisco residents who listed a San Francisco home address at the time of testing, and tests that were collected in San Francisco but had a missing home address. Data are based on information collected at the time of testing.

For recent data, about 25-30% of tests are missing race/ ethnicity information. Tests where the race/ ethnicity of the patient is unknown are included in the dataset under the "Unknown" category.

This data was de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected).

The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco. Each positive test result is investigated. During this investigation, some test results are found to be for persons living outside of San Francisco and some people in San Francisco may be tested multiple times. In both cases, these results are not included in San Francisco’s total COVID-19 case count. To track the number of cases by race/ ethnicity, see this dashboard: https://data.sfgov.org/stories/s/w6za-6st8

B. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information.

C. UPDATE PROCESS Updates automatically at 05:00 Pacific Time each day. Redundant runs are scheduled at 07:00 and 09:00 in case of pipeline failure.

D. HOW TO USE THIS DATASET Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.

In order to track trends over time, a data user can analyze this data by "specimen_collection_date".

Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of percent positive. When there are fewer than 20 positives tests for a given race/ethnicity and time period, the positivity rate is not calculated for the public tracker because rates of small test counts are less reliable.

Calculating Testing Rates: To calculate the testing rate per 10,000 residents, divide the total number of tests collected (positive, negative, and indeterminate results) for the specified race/ ethnicity by the total number of residents who identify as that race/ ethnicity (according to the 2018 5-year estimates from the American Community Survey), then multiply by 10,000. When there are fewer than 20 total tests for a given race/ethnicity and time period, the testing rate is not calculated for the public tracker because rates of small test counts are less reliable.

Read more about how this data is updated and validated daily: https://data.sfgov.org/stories/s/nudz-9tg2

There are two other datasets related to tests: 1. COVID-19 Tests 2. <a href="https://data.sfgov.org/dataset/Covid-19-Testing-by

--- Original source retains full ownership of the source dataset ---
Z
Counts of Influenza reported in UNITED STATES OF AMERICA: 1919-1951
data.niaid.nih.gov
zenodo.org
Updated Jun 3, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Burke, Donald (2024). Counts of Influenza reported in UNITED STATES OF AMERICA: 1919-1951 [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_11452498
Explore at:
Dataset updated
Jun 3, 2024
Dataset provided by
Burke, Donald
Cross, Anne
Van Panhuis, Willem
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
United States
Description
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format. Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc. Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:

Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported. Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".
A
‘MISSING MIGRANTS (2014-2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘MISSING MIGRANTS (2014-2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-missing-migrants-2014-2021-19da/1a9479e3/?iid=039-565&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘MISSING MIGRANTS (2014-2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/methoomirza/missing-migrants-20142021 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Missing Migrants Project tracks deaths of migrants, including refugees and asylum-seekers, who have died or gone missing in the process of migration towards an international destination. Please note that these data represent minimum estimates, as many deaths during migration go unrecorded

What is included in Missing Migrants Project data?

Missing Migrants Project counts migrants who have died at the external borders of states, or in the process of migration towards an international destination, regardless of their legal status. The Project records only those migrants who die during their journey to a country different from their country of residence. Missing Migrants Project data include the deaths of migrants who die in transportation accidents, shipwrecks, violent attacks, or due to medical complications during their journeys. It also includes the number of corpses found at border crossings that are categorized as the bodies of migrants, on the basis of belongings and/or the characteristics of the death. For instance, a death of an unidentified person might be included if the decedent is found without any identifying documentation in an area known to be on a migration route. Deaths during migration may also be identified based on the cause of death, especially if is related to trafficking, smuggling, or means of travel such as on top of a train, in the back of a cargo truck, as a stowaway on a plane, in unseaworthy boats, or crossing a border fence. While the location and cause of death can provide strong evidence that an unidentified decedent should be included in Missing Migrants Project data, this should always be evaluated in conjunction with migration history and trends.

What is excluded?

The count excludes deaths that occur in immigration detention facilities or after deportation to a migrant’s homeland, as well as deaths more loosely connected with migrants´ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. The deaths of internally displaced persons who die within their country of origin are also excluded. There remains a significant gap in knowledge and data on such deaths. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, but rather tracked as a distinct category.

What sources of information are used in the Missing Migrants Project database?

The Missing Migrants Project currently gathers information from diverse sources such as official records – including from coast guards and medical examiners – and other sources such as media reports, NGOs, and surveys and interviews of migrants. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. IOM and UNHCR also regularly coordinate to validate data on missing migrants in the Mediterranean. Data on the United States/Mexico border are compiled based on data from U.S. county medical examiners, coroners, and sheriff’s offices, as well as media reports for deaths occurring on the Mexican side of the border. In Africa, data are obtained from media and NGOs, including the Regional Mixed Migration Secretariat and the International Red Cross/Red Crescent. The quality of the data source(s) for each incident is assessed through the ‘Source quality’ variable, which can be viewed in the data. Across the world, the Missing Migrants Project uses social and traditional media reports to find data, which are then verified by local IOM staff whenever possible. In all cases, new entries are checked against existing records to ensure that no deaths are double-counted. In all regions, Missing Migrants Project data represent a minimum estimate of the number of migrant deaths. To learn more about data sources, visit the thematic page on migrant deaths and disappearances in the Global Migration Data Portal.

Content

What are the variables used in the Missing Migrants Project database?

This section presents the list of variables that constitute the Missing Migrants Project database. While ideally, all incidents recorded would include entries for each of these variables, the challenges described above mean that this is not always possible. The minimum information necessary to register an incident is the date of the incident, the number of dead and/or the number of missing, and the location of death. If the information is unavailable, the cell is left blank or “unknown” is recorded, as indicated in below.

1. Web ID - An automatically generated number used to identify each unique entry in the dataset.

2. Region - Region in which an incident took place. For more about regional classifications used in the dataset, click here.

3. Incident Date - Estimated date of death. In cases where the exact date of death is not known, this variable indicates the date in which the body or bodies were found. In cases where data are drawn from surviving migrants, witnesses or other interviews, this variable is entered as the date of the death as reported by the interviewee. At a minimum, the month and the year of death is recorded. In some cases, official statistics are not disaggregated by the incident, meaning that data is reported as a total number of deaths occurring during a certain time period. In such cases the entry is marked as a “cumulative total,” and the latest date of the range is recorded, with the full dates recorded in the comments.

4. Year - The year in which the incident occurred.

5. Reported month - The month in which the incident occurred.

6. Number dead - The total number of people confirmed dead in one incident, i.e. the number of bodies recovered. If migrants are missing and presumed dead, such as in cases of shipwrecks, leave blank.

7. Number missing - The total number of those who are missing and are thus assumed to be dead. This variable is generally recorded in incidents involving shipwrecks. The number of missing is calculated by subtracting the number of bodies recovered from a shipwreck and the number of survivors from the total number of migrants reported to have been on the boat. This number may be reported by surviving migrants or witnesses. If no missing persons are reported, it is left blank.

8. Total dead & missing - The sum of the ‘number dead’ and ‘number missing’ variables.

9. Number of survivors - The number of migrants that survived the incident, if known. The age, gender, and country of origin of survivors are recorded in the ‘Comments’ variable if known. If unknown, it is left blank.

10. Number of females - Indicates the number of females found dead or missing. If unknown, it is left blank. This gender identification is based on a third-party interpretation of the victim's gender from information available in official documents, autopsy reports, witness testimonies, and/or media reports.

11. Number of males - Indicates the number of males found dead or missing. If unknown, it is left blank. This gender identification is based on a third-party interpretation of the victim's gender from information available in official documents, autopsy reports, witness testimonies, and/or media reports.

12. Number of children - Indicates the number of individuals under the age of 18 found dead or missing. If unknown, it is left blank.

13. Age - The age of the decedent(s). Occasionally, an estimated age range is recorded. If unknown, it is left blank.

14. Country of origin - Country of birth of the decedent. If unknown, the entry will be marked “unknown”.

15. Region of origin - Region of origin of the decedent(s). In some incidents, region of origin may be marked as “Presumed” or “(P)” if migrants travelling through that location are known to hail from a certain region. If unknown, the entry will be marked “unknown”.

16. Cause of death - The determination of conditions resulting in the migrant's death i.e. the circumstances of the event that produced the fatal injury. If unknown, the reason why is included where possible. For example, “Unknown – skeletal remains only”, is used in cases in which only the skeleton of the decedent was found.

17. Location description - Place where the death(s) occurred or where the body or bodies were found. Nearby towns or cities or borders are included where possible. When incidents are reported in an unspecified location, this will be noted.

18. Location coordinates - Place where the death(s) occurred or where the body or bodies were found. In many regions, most notably the Mediterranean, geographic coordinates are estimated as precise locations are not often known. The location description should always be checked against the location coordinates.

19. Migration route - Name of the migrant route on which incident occurred, if known. If unknown, it is left blank.

20. UNSD geographical grouping - Geographical region in which the incident took place, as designated by the United Nations Statistics Division (UNSD) geoscheme. For more about regional classifications used in the dataset, click here.

21. Information source - Name of source of information for each incident. Multiple sources may be listed.

22. Link - Links to original reports of migrant deaths /
f
Data_Sheet_2_A Random Shuffle Method to Expand a Narrow Dataset and Overcome...
frontiersin.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Lorenzo Fassina; Alessandro Faragli; Francesco Paolo Lo Muzio; Sebastian Kelle; Carlo Campana; Burkert Pieske; Frank Edelmann; Alessio Alogna (2023). Data_Sheet_2_A Random Shuffle Method to Expand a Narrow Dataset and Overcome the Associated Challenges in a Clinical Study: A Heart Failure Cohort Example.PDF [Dataset]. http://doi.org/10.3389/fcvm.2020.599923.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fcvm.2020.599923.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers
Authors
Lorenzo Fassina; Alessandro Faragli; Francesco Paolo Lo Muzio; Sebastian Kelle; Carlo Campana; Burkert Pieske; Frank Edelmann; Alessio Alogna
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Heart failure (HF) affects at least 26 million people worldwide, so predicting adverse events in HF patients represents a major target of clinical data science. However, achieving large sample sizes sometimes represents a challenge due to difficulties in patient recruiting and long follow-up times, increasing the problem of missing data. To overcome the issue of a narrow dataset cardinality (in a clinical dataset, the cardinality is the number of patients in that dataset), population-enhancing algorithms are therefore crucial. The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate, without the need of specific hypotheses and regression models. The cardinality enhancement was validated against an established random repeated-measures method with regard to the correctness in predicting clinical conditions and endpoints. In particular, machine learning and regression models were employed to highlight the benefits of the enhanced datasets. The proposed random shuffle method was able to enhance the HF dataset cardinality (711 patients before dataset preprocessing) circa 10 times and circa 21 times when followed by a random repeated-measures approach. We believe that the random shuffle method could be used in the cardiovascular field and in other data science problems when missing data and the narrow dataset cardinality represent an issue.
Water-quality data imputation with a high percentage of missing values: a...
zenodo.org
explore.openaire.eu
+1more
csv
Updated Jun 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati (2021). Water-quality data imputation with a high percentage of missing values: a machine learning approach [Dataset]. http://doi.org/10.5281/zenodo.4731169
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4731169
Dataset updated
Jun 8, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water resource management. However, water-quality studies are limited by the lack of complete and reliable data sets on surface-water-quality variables. These deficiencies are particularly noticeable in developing countries.

This work focuses on surface-water-quality data from Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. Data collected at six monitoring stations are publicly available at https://www.dinama.gub.uy/oan/datos-abiertos/calidad-agua/. The high temporal and spatial variability that characterizes water-quality variables and the high rate of missing values (between 50% and 70%) raises significant challenges.

To deal with missing values, we applied several statistical and machine-learning imputation methods. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Huber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)).

IDW outperformed the others, achieving a very good performance (NSE greater than 0.8) in most cases.

In this dataset, we include the original and imputed values for the following variables:

Water temperature (Tw)

Dissolved oxygen (DO)

Electrical conductivity (EC)

pH

Turbidity (Turb)

Nitrite (NO2-)

Nitrate (NO3-)

Total Nitrogen (TN)

Each variable is identified as [STATION] VARIABLE FULL NAME (VARIABLE SHORT NAME) [UNIT METRIC].

More details about the study area, the original datasets, and the methodology adopted can be found in our paper https://www.mdpi.com/2071-1050/13/11/6318.

If you use this dataset in your work, please cite our paper:
Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Number of missing persons files in the U.S. 2022, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/

Number of missing persons files in the U.S. 2022, by race

Explore at:

Dataset updated

Jul 5, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2022

Area covered

United States

Description

In 2022, there were 313,017 cases filed by the NCIC where the race of the reported missing was White. In the same year, 18,928 people were missing whose race was unknown.

What is the NCIC?

The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide.

Missing people in the United States

A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.

Clear search

Close search

Google apps

Main menu

Number of missing persons files in the U.S. 2022, by race

NCRB: State and Gender-wise Number of Persons Reported Missing and Traced

NCRB: State and Gender-wise number of children reported missing and traced

OPP Missing Persons Annual Report Data

National Missing and Unidentified Persons System (NamUs)

Missing and Unaccounted-for People in Mexico (1960s–2025)

Geographies of missing people: processes, experiences and responses -...

Data from: COVID-19 Case Surveillance Public Use Data with Geography

Data are Considered Provisional

Data Limitations

Data Quality Assurance Procedures

Data Suppression

Additional COVID-19 Data

‘Missing Migrants Dataset’ analyzed by Analyst-2

About the Missing Migrants Data

Where is the data from?

Who is included in Missing Migrants Project data?

How complete is the data on dead and missing migrants?

What can be understood through this data?

Why collect data on missing and dead migrants?

Identification and tracing of the dead and missing

Content

HomeDatasetCategoryStoriesSuggestContact Sign in

Data from: National Incidence Studies of Missing, Abducted, Runaway, and...

ARCHIVED: COVID-19 Testing by Geography Over Time

COVID-19 Cases and Deaths by Race/Ethnicity - ARCHIVE

Austin Animal Center Shelter Intakes and Outcomes

Context

Content

Acknowledgements

Inspiration

Dataset for: Avoiding pitfalls when combining multiple imputation and...

‘Covid-19 Tests by Race Ethnicity and Date’ analyzed by Analyst-2

Counts of Influenza reported in UNITED STATES OF AMERICA: 1919-1951

‘MISSING MIGRANTS (2014-2021)’ analyzed by Analyst-2

Context

What is included in Missing Migrants Project data?

What is excluded?

What sources of information are used in the Missing Migrants Project database?

Content

What are the variables used in the Missing Migrants Project database?

Data_Sheet_2_A Random Shuffle Method to Expand a Narrow Dataset and Overcome...

Water-quality data imputation with a high percentage of missing values: a...

Number of missing persons files in the U.S. 2022, by race