In 2022, there were 313,017 cases filed by the NCIC where the race of the reported missing was White. In the same year, 18,928 people were missing whose race was unknown.
What is the NCIC?
The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide.
Missing people in the United States
A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Ministry of Home Affairs, Government of India has defined missing child as 'a person below eighteen years of age, whose whereabouts are not known to the parents, legal guardians and any other persons who may be legally entrusted with the custody of the child, whatever may be the circumstances/causes of disappearance”. The dataset contains the state wise and gender-wise number of children reported missing in a particular year, total number of persons missing including those from previous years, number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset provides a comprehensive record of missing migrants and their tragic journeys towards international destinations , collected by the Missing Migrants Project, an initiative implemented by the International Organization for Migration (IOM) since 2014. The dataset documents deaths and disappearances, shedding light on the challenges migrants face during their journeys. Please note that due to the complexities of data collection, the figures presented are likely an undercount. The dataset serves as a tribute to the individuals who lost their lives, as well as the families and communities impacted by their absence.
- Incident Type: Type of migration incident
- Incident Year: Year when the incident occurred
- Reported Month: Month when the incident was reported
- Region of Origin: Geographical region where the migrants originated
- Region of Incident: Geographical region where the incident occurred
- Country of Origin: Country from which the migrants originated
- Number of Dead: Number of confirmed deceased migrants
- Minimum Estimated Number of Missing: Minimum estimated count of missing migrants
- Total Number of Dead and Missing: Total count of both deceased and missing migrants
- Number of Survivors: Number of migrants who survived the incident
- Number of Females: Number of female migrants involved
- Number of Males: Number of male migrants involved
- Number of Children: Number of children migrants involved
- Cause of Death: Cause of death for the migrants
- Migration Route: Route taken by migrants during their journey (if available)
- Location of Death: Approximate location where the incident occurred
- Information Source: Source of information about the incident
- Coordinates: Geographical coordinates of the incident location
- UNSD Geographical Grouping: Geographical grouping according to the United Nations Statistics Division
- Migration Patterns Analysis: Explore trends and patterns in migration incidents to understand the most affected regions and routes.
- Gender and Age Analysis: Investigate the demographics of migrants to identify gender and age-related vulnerabilities.
- Survival and Mortality Analysis: Analyze survival rates and causes of death to highlight risks and challenges migrants face.
- Temporal Analysis: Examine incidents over time to identify any temporal patterns or changes.
- Geospatial Analysis: Utilize geographical coordinates to map migration routes and incident locations.
If you find this dataset valuable, your support through votes is highly appreciated! ❤️ Thank you 🙂
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Each year people go missing inside national parks all across the United States. This dataset contains information of 264 active missing person cases that were reported inside national parks including the coordinates of the national park in order to facilitate geographical analysis.
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains the state-wise number of persons reported missing in a particular year, the total number of persons missing including those from previous years, the number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
Note: Figures for projected_mid_year_population are sourced from the Report of the Technical Group on Population Projections for India and States 2011-2036
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project provides a comprehensive dataset of over 125,000 missing and unaccounted-for people in Mexico from the 1960s to 2025. The dataset is sourced from the publicly available records on the RNPDO website and represents individuals who were actively missing as of the date of collection (May 1, 2025). To protect individual identities, personal identifiers, such as names, have been removed.Dataset Features:The data has been cleaned and translated to facilitate analysis by a global audience.Fields include:SexDate of birthDate of incidenceState and municipality of the incidentData spans over six decades, offering insights into trends and regional disparities.Additional Materials:Python Script: A Python script to generate customizable visualizations based on the dataset. Users can specify the state to generate tailored charts.Sample Chart: An example chart showcasing the evolution of missing persons per 100,000 inhabitants in Mexico between 2006 and 2025.Requirements File: A requirements.txt file listing the necessary Python libraries to run the script seamlessly.This dataset and accompanying tools aim to support researchers, policymakers, and journalists in analyzing and addressing the issue of missing persons in Mexico.
NamUs is the only national repository for missing, unidentified, and unclaimed persons cases. The program provides a singular resource hub for law enforcement, medical examiners, coroners, and investigating professionals. It is the only national database for missing, unidentified, and unclaimed persons that allows limited access to the public, empowering family members to take a more proactive role in the search for their missing loved ones.
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Under Section 8 of the Missing Persons Act, 2018, police services are required to report annually on their use of urgent demands for records under the Act and the Ministry of the Solicitor General is required to make the OPP’s annual report data publicly available. The data includes: * year in which the urgent demands were reported * category of records * description of records accessed under each category * total number of times each category of records was demanded * total number of missing persons investigations which had urgent demands for records * total number of urgent demands for records made by OPP in a year.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lost Nation by gender, including both male and female populations. This dataset can be utilized to understand the population distribution of Lost Nation across both sexes and to determine which sex constitutes the majority.
Key observations
There is a slight majority of female population, with 52.93% of total population being female. Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis. No further analysis is done on the data reported from the Census Bureau.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lost Nation Population by Race & Ethnicity. You can refer the same here
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Context
The dataset tabulates the population of Lost Springs by gender across 18 age groups. It lists the male and female population in each age group along with the gender ratio for Lost Springs. The dataset can be utilized to understand the population distribution of Lost Springs by gender and age. For example, using this dataset, we can identify the largest age group for both Men and Women in Lost Springs. Additionally, it can be used to see how the gender ratio changes from birth to senior most age group and male to female ratio across each age group for Lost Springs.
Key observations
Largest age group (population): Male # 50-54 years (7) | Female # 70-74 years (7). Source: U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
When available, the data consists of estimates from the U.S. Census Bureau American Community Survey (ACS) 2019-2023 5-Year Estimates.
Age groups:
Scope of gender :
Please note that American Community Survey asks a question about the respondents current sex, but not about gender, sexual orientation, or sex at birth. The question is intended to capture data for biological sex, not gender. Respondents are supposed to respond with the answer as either of Male or Female. Our research and this dataset mirrors the data reported as Male and Female for gender distribution analysis.
Variables / Data Columns
Good to know
Margin of Error
Data in the dataset are based on the estimates and are subject to sampling variability and thus a margin of error. Neilsberg Research recommends using caution when presening these estimates in your research.
Custom data
If you do need custom data for any of your research project, report or presentation, you can contact our research staff at research@neilsberg.com for a feasibility of a custom tabulation on a fee-for-service basis.
Neilsberg Research Team curates, analyze and publishes demographics and economic data from a variety of public and proprietary sources, each of which often includes multiple surveys and programs. The large majority of Neilsberg Research aggregated datasets and insights is made available for free download at https://www.neilsberg.com/research/.
This dataset is a part of the main dataset for Lost Springs Population by Gender. You can refer the same here
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Dataset Description This dataset consists of academic and demographic information about 300 students from a university, which can be used for predicting academic outcomes, such as probation status. The dataset was simulated to represent a variety of student attributes across multiple categories like personal data, academic history, and other related information. The primary goal of this dataset is to analyze factors contributing to academic performance and identify students at risk of probation.
Column Descriptions Student No.: (Numeric) A unique identifier for each student. In this dataset, each student has a different ID number, making it a 100% unique column. Cohort: (Numeric) The year a student enrolled in the university. No missing values and consistent across the dataset. College: (Nominal) The name of the college the student belongs to. Examples include "Engineering," "Science," etc. No missing values. College Code: (Nominal) A numerical or alphanumerical code representing the college. This is an alternative representation of the "College" column. Major: (Nominal) The major field of study of the student. Some missing values (23%) represent students who haven’t declared a major or are in an undeclared status. Major Code: (Nominal) A code representing the major subject. Similar to the "Major" column, this has 23% missing values due to undeclared majors. Minor: (Nominal) The minor subject, if any, chosen by the student. This column has a high percentage of missing data (91%) since most students do not have minors. Spec: (Nominal) Specialization within the major field of study. Like the "Minor" column, this has 93% missing data as most students do not declare a specialization. Degree: (Numeric) The type of degree the student is pursuing (e.g., Bachelor's). In this dataset, all students are pursuing the same degree, so there are no missing values. Status: (Nominal) The current academic standing of the student (e.g., "Active," "Inactive"). No missing values. Load Status: (Nominal) The academic load status (e.g., "Full-time," "Part-time"). This column has very few missing values (1%). Gender: (Nominal) The gender of the student (e.g., "Male," "Female"). No missing values. Country: (Nominal) The country of origin of the student. Only 2 missing values, making it nearly complete. Governorate: (Nominal) The administrative region (governorate) the student comes from. This column has a small percentage of missing values (1%). Wellayah: (Nominal) The district or locality within the governorate. Around 1% of the data is missing. CGPA: (Numeric) The cumulative grade point average (CGPA) of the student. This field has 145 missing values, representing students without available CGPA records. Estimated Graduation Year: (Numeric) The expected year in which the student will graduate. No missing values. From HEAC: (Nominal) Indicates whether the student was admitted through the Higher Education Admission Center (HEAC). This column has 4% missing values. Admission Category: (Nominal) The category of admission (e.g., scholarship, self-funded). This column has a significant amount of missing data (98%), indicating that admission category data is either unavailable or irrelevant for most students. Birth Date: (Nominal) The birth date of the student. The dataset includes very few missing values (0%) and has been replaced by the derived feature "Age." Actual Graduation Date: (Nominal) The actual date on which a student graduates. More than half of the values are missing (54%), representing students who haven’t graduated yet. Withdrawal: (Nominal) Indicates whether the student has withdrawn from the university. This column has 89% missing data since the majority of students haven’t withdrawn. Marital Status: (Nominal) The marital status of the student (e.g., "Single," "Married"). No missing values. SQU Hostel: (Nominal) Indicates whether the student lives in the university hostel. No missing values. Percentage (Secondary School Score): (Nominal) The student’s percentage score from secondary school. No missing values. Probation Student: (Nominal) Indicates whether the student is under academic probation. This is the target variable for classification, with no missing values.
Record Details Total Records: 300 Total Attributes: 26 Missing Values: Some columns have a significant proportion of missing data (e.g., Minor, Spec, Major Code), while others have very few or no missing values (e.g., Gender, Cohort, College). Missing values were handled using a placeholder for clarity in certain columns.
Overall attendance data include students in Districts 1-32 and 75 (Special Education). Students in District 79 (Alternative Schools & Programs), charter schools, home schooling, and home and hospital instruction are excluded. Pre-K data do not include NYC Early Education Centers or District Pre-K Centers; therefore, Pre-K data are limited to those who attend K-12 schools that offer Pre-K. Transfer schools are included in citywide, borough, and district counts but removed from school-level files. Attendance is attributed to the school the student attended at the time. If a student attends multiple schools in a school year, the student will contribute data towards multiple schools. Starting in 2020-21, the NYC DOE transitioned to NYSED's definition of chronic absenteeism. Students are considered chronically absent if they have an attendance of 90 percent or less (i.e. students who are absent 10 percent or more of the total days). In order to be included in chronic absenteeism calculations, students must be enrolled for at least 10 days (regardless of whether present or absent) and must have been present for at least 1 day. The NYSED chronic absenteeism definition is applied to all prior years in the report. School-level chronic absenteeism data reflect chronic absenteeism at a particular school. In order to eliminate double-counting students in chronic absenteeism counts, calculations at the district, borough, and citywide levels include all attendance data that contribute to the given geographic category. For example, if a student was chronically absent at one school but not at another, the student would only be counted once in the citywide calculation. For this reason, chronic absenteeism counts will not align across files. All demographic data are based on a student's most recent record in a given year. Students With Disabilities (SWD) data do not include Pre-K students since Pre-K students are screened for IEPs only at the parents' request. English language learner (ELL) data do not include Pre-K students since the New York State Education Department only begins administering assessments to be identified as an ELL in Kindergarten. Only grades PK-12 are shown, but calculations for "All Grades" also include students missing a grade level, so PK-12 may not add up to "All Grades". Data include students missing a gender, but are not shown due to small cell counts. Data for Asian students include Native Hawaiian or Other Pacific Islanders . Multi-racial and Native American students, as well as students missing ethnicity/race data are included in the "Other" ethnicity category. In order to comply with the Family Educational Rights and Privacy Act (FERPA) regulations on public reporting of education outcomes, rows with five or fewer students are suppressed, and have been replaced with an "s". Using total days of attendance as a proxy , rows with 900 or fewer total days are suppressed. In addition, other rows have been replaced with an "s" when they could reveal, through addition or subtraction, the underlying numbers that have been redacted. Chronic absenteeism values are suppressed, regardless of total days, if the number of students who contribute at least 20 days is five or fewer. Due to the COVID-19 pandemic and resulting shift to remote learning in March 2020, 2019-20 attendance data was only available for September 2019 through March 13, 2020. Interactions data from the spring of 2020 are reported on a separate tab. Interactions were reported by schools during remote learning, from April 6 2020 through June 26 2020 (a total of 57 instructional days, excluding special professional development days of June 4 and June 9). Schools were required to indicate any student from their roster that did not have an interaction on a given day. Schools were able to define interactions in a way that made sense for their students and families. Definitions of an interaction included: • Student submission of an assignment or completion of an
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JUNE 7
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Overall attendance data include students in Districts 1-32 and 75 (Special Education). Students in District 79 (Alternative Schools & Programs), charter schools, home schooling, and home and hospital instruction are excluded. Pre-K data do not include NYC Early Education Centers or District Pre-K Centers; therefore, Pre-K data are limited to those who attend K-12 schools that offer Pre-K. Transfer schools are included in citywide, borough, and district counts but removed from school-level files. Attendance is attributed to the school the student attended at the time. If a student attends multiple schools in a school year, the student will contribute data towards multiple schools. Starting in 2020-21, the NYC DOE transitioned to NYSED's definition of chronic absenteeism. Students are considered chronically absent if they have an attendance of 90 percent or less (i.e. students who are absent 10 percent or more of the total days). In order to be included in chronic absenteeism calculations, students must be enrolled for at least 10 days (regardless of whether present or absent) and must have been present for at least 1 day. The NYSED chronic absenteeism definition is applied to all prior years in the report. School-level chronic absenteeism data reflect chronic absenteeism at a particular school. In order to eliminate double-counting students in chronic absenteeism counts, calculations at the district, borough, and citywide levels include all attendance data that contribute to the given geographic category. For example, if a student was chronically absent at one school but not at another, the student would only be counted once in the citywide calculation. For this reason, chronic absenteeism counts will not align across files. All demographic data are based on a student's most recent record in a given year. Students With Disabilities (SWD) data do not include Pre-K students since Pre-K students are screened for IEPs only at the parents' request. English language learner (ELL) data do not include Pre-K students since the New York State Education Department only begins administering assessments to be identified as an ELL in Kindergarten. Only grades PK-12 are shown, but calculations for "All Grades" also include students missing a grade level, so PK-12 may not add up to "All Grades". Data include students missing a gender, but are not shown due to small cell counts. Data for Asian students include Native Hawaiian or Other Pacific Islanders . Multi-racial and Native American students, as well as students missing ethnicity/race data are included in the "Other" ethnicity category. In order to comply with the Family Educational Rights and Privacy Act (FERPA) regulations on public reporting of education outcomes, rows with five or fewer students are suppressed, and have been replaced with an "s". Using total days of attendance as a proxy , rows with 900 or fewer total days are suppressed. In addition, other rows have been replaced with an "s" when they could reveal, through addition or subtraction, the underlying numbers that have been redacted. Chronic absenteeism values are suppressed, regardless of total days, if the number of students who contribute at least 20 days is five or fewer. Due to the COVID-19 pandemic and resulting shift to remote learning in March 2020, 2019-20 attendance data was only available for September 2019 through March 13, 2020. Interactions data from the spring of 2020 are reported on a separate tab. Interactions were reported by schools during remote learning, from April 6 2020 through June 26 2020 (a total of 57 instructional days, excluding special professional development days of June 4 and June 9). Schools were required to indicate any student from their roster that did not have an interaction on a given day. Schools were able to define interactions in a way that made sense for their students and families. Definitions of an interaction included: • Student submission of an assignment or completion of an
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘MISSING MIGRANTS (2014-2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/methoomirza/missing-migrants-20142021 on 28 January 2022.
--- Dataset description provided by original source is as follows ---
Missing Migrants Project tracks deaths of migrants, including refugees and asylum-seekers, who have died or gone missing in the process of migration towards an international destination. Please note that these data represent minimum estimates, as many deaths during migration go unrecorded
Missing Migrants Project counts migrants who have died at the external borders of states, or in the process of migration towards an international destination, regardless of their legal status. The Project records only those migrants who die during their journey to a country different from their country of residence. Missing Migrants Project data include the deaths of migrants who die in transportation accidents, shipwrecks, violent attacks, or due to medical complications during their journeys. It also includes the number of corpses found at border crossings that are categorized as the bodies of migrants, on the basis of belongings and/or the characteristics of the death. For instance, a death of an unidentified person might be included if the decedent is found without any identifying documentation in an area known to be on a migration route. Deaths during migration may also be identified based on the cause of death, especially if is related to trafficking, smuggling, or means of travel such as on top of a train, in the back of a cargo truck, as a stowaway on a plane, in unseaworthy boats, or crossing a border fence. While the location and cause of death can provide strong evidence that an unidentified decedent should be included in Missing Migrants Project data, this should always be evaluated in conjunction with migration history and trends.
The count excludes deaths that occur in immigration detention facilities or after deportation to a migrant’s homeland, as well as deaths more loosely connected with migrants´ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. The deaths of internally displaced persons who die within their country of origin are also excluded. There remains a significant gap in knowledge and data on such deaths. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, but rather tracked as a distinct category.
The Missing Migrants Project currently gathers information from diverse sources such as official records – including from coast guards and medical examiners – and other sources such as media reports, NGOs, and surveys and interviews of migrants. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. IOM and UNHCR also regularly coordinate to validate data on missing migrants in the Mediterranean. Data on the United States/Mexico border are compiled based on data from U.S. county medical examiners, coroners, and sheriff’s offices, as well as media reports for deaths occurring on the Mexican side of the border. In Africa, data are obtained from media and NGOs, including the Regional Mixed Migration Secretariat and the International Red Cross/Red Crescent. The quality of the data source(s) for each incident is assessed through the ‘Source quality’ variable, which can be viewed in the data. Across the world, the Missing Migrants Project uses social and traditional media reports to find data, which are then verified by local IOM staff whenever possible. In all cases, new entries are checked against existing records to ensure that no deaths are double-counted. In all regions, Missing Migrants Project data represent a minimum estimate of the number of migrant deaths. To learn more about data sources, visit the thematic page on migrant deaths and disappearances in the Global Migration Data Portal.
This section presents the list of variables that constitute the Missing Migrants Project database. While ideally, all incidents recorded would include entries for each of these variables, the challenges described above mean that this is not always possible. The minimum information necessary to register an incident is the date of the incident, the number of dead and/or the number of missing, and the location of death. If the information is unavailable, the cell is left blank or “unknown” is recorded, as indicated in below.
1. Web ID - An automatically generated number used to identify each unique entry in the dataset.
2. Region - Region in which an incident took place. For more about regional classifications used in the dataset, click here.
3. Incident Date - Estimated date of death. In cases where the exact date of death is not known, this variable indicates the date in which the body or bodies were found. In cases where data are drawn from surviving migrants, witnesses or other interviews, this variable is entered as the date of the death as reported by the interviewee. At a minimum, the month and the year of death is recorded. In some cases, official statistics are not disaggregated by the incident, meaning that data is reported as a total number of deaths occurring during a certain time period. In such cases the entry is marked as a “cumulative total,” and the latest date of the range is recorded, with the full dates recorded in the comments.
4. Year - The year in which the incident occurred.
5. Reported month - The month in which the incident occurred.
6. Number dead - The total number of people confirmed dead in one incident, i.e. the number of bodies recovered. If migrants are missing and presumed dead, such as in cases of shipwrecks, leave blank.
7. Number missing - The total number of those who are missing and are thus assumed to be dead. This variable is generally recorded in incidents involving shipwrecks. The number of missing is calculated by subtracting the number of bodies recovered from a shipwreck and the number of survivors from the total number of migrants reported to have been on the boat. This number may be reported by surviving migrants or witnesses. If no missing persons are reported, it is left blank.
8. Total dead & missing - The sum of the ‘number dead’ and ‘number missing’ variables.
9. Number of survivors - The number of migrants that survived the incident, if known. The age, gender, and country of origin of survivors are recorded in the ‘Comments’ variable if known. If unknown, it is left blank.
10. Number of females - Indicates the number of females found dead or missing. If unknown, it is left blank. This gender identification is based on a third-party interpretation of the victim's gender from information available in official documents, autopsy reports, witness testimonies, and/or media reports.
11. Number of males - Indicates the number of males found dead or missing. If unknown, it is left blank. This gender identification is based on a third-party interpretation of the victim's gender from information available in official documents, autopsy reports, witness testimonies, and/or media reports.
12. Number of children - Indicates the number of individuals under the age of 18 found dead or missing. If unknown, it is left blank.
13. Age - The age of the decedent(s). Occasionally, an estimated age range is recorded. If unknown, it is left blank.
14. Country of origin - Country of birth of the decedent. If unknown, the entry will be marked “unknown”.
15. Region of origin - Region of origin of the decedent(s). In some incidents, region of origin may be marked as “Presumed” or “(P)” if migrants travelling through that location are known to hail from a certain region. If unknown, the entry will be marked “unknown”.
16. Cause of death - The determination of conditions resulting in the migrant's death i.e. the circumstances of the event that produced the fatal injury. If unknown, the reason why is included where possible. For example, “Unknown – skeletal remains only”, is used in cases in which only the skeleton of the decedent was found.
17. Location description - Place where the death(s) occurred or where the body or bodies were found. Nearby towns or cities or borders are included where possible. When incidents are reported in an unspecified location, this will be noted.
18. Location coordinates - Place where the death(s) occurred or where the body or bodies were found. In many regions, most notably the Mediterranean, geographic coordinates are estimated as precise locations are not often known. The location description should always be checked against the location coordinates.
19. Migration route - Name of the migrant route on which incident occurred, if known. If unknown, it is left blank.
20. UNSD geographical grouping - Geographical region in which the incident took place, as designated by the United Nations Statistics Division (UNSD) geoscheme. For more about regional classifications used in the dataset, click here.
21. Information source - Name of source of information for each incident. Multiple sources may be listed.
22. Link - Links to original reports of migrant deaths /
http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
I need help to analyze this data set with R code, if someone can help me I'd appreciate a lot and I'd send some money for his kindness. I really need how to do a regression and clustering manipulating this data. Sorry about the format, it's in text file. Thanks in advance :)
Context: Measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available.
Data Set Characteristics:
Multivariate, Time-Series
Associated Tasks: Regression, Clustering
Data Set Information:
This archive contains 2075259 measurements gathered between December 2006 and November 2010 (47 months). Notes: 1.(global_active_power*1000/60 - sub_metering_1 - sub_metering_2 - sub_metering_3) represents the active energy consumed every minute (in watt hour) in the household by electrical equipment not measured in sub-meterings 1, 2 and 3.
2.The dataset contains some missing values in the measurements (nearly 1,25% of the rows). All calendar timestamps are present in the dataset but for some timestamps, the measurement values are missing: a missing value is represented by the absence of value between two consecutive semi-colon attribute separators. For instance, the dataset shows missing values on April 28, 2007.
Attribute Information: 1.date: Date in format dd/mm/yyyy
2.time: time in format hh:mm:ss
3.global_active_power: household global minute-averaged active power (in kilowatt)
4.global_reactive_power: household global minute-averaged reactive power (in kilowatt)
5.voltage: minute-averaged voltage (in volt)
6.global_intensity: household global minute-averaged current intensity (in ampere)
7.sub_metering_1: energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave (hot plates are not electric but gas powered).
8.sub_metering_2: energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry room, containing a washing-machine, a tumble-drier, a refrigerator and a light.
9.sub_metering_3: energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.
https://www.usa.gov/government-workshttps://www.usa.gov/government-works
After May 3, 2024, this dataset and webpage will no longer be updated because hospitals are no longer required to report data on COVID-19 hospital admissions, and hospital capacity and occupancy data, to HHS through CDC’s National Healthcare Safety Network. Data voluntarily reported to NHSN after May 1, 2024, will be available starting May 10, 2024, at COVID Data Tracker Hospitalizations.
The following dataset provides state-aggregated data for hospital utilization. These are derived from reports with facility-level granularity across two main sources: (1) HHS TeleTracking, and (2) reporting provided directly to HHS Protect by state/territorial health departments on behalf of their healthcare facilities.
The file will be updated regularly and provides the latest values reported by each facility within the last four days for all time. This allows for a more comprehensive picture of the hospital utilization within a state by ensuring a hospital is represented, even if they miss a single day of reporting.
No statistical analysis is applied to account for non-response and/or to account for missing data.
The below table displays one value for each field (i.e., column). Sometimes, reports for a given facility will be provided to more than one reporting source: HHS TeleTracking, NHSN, and HHS Protect. When this occurs, to ensure that there are not duplicate reports, prioritization is applied to the numbers for each facility.
On June 26, 2023 the field "reporting_cutoff_start" was replaced by the field "date".
On April 27, 2022 the following pediatric fields were added:
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
SUMMARYThis analysis, designed and executed by Ribble Rivers Trust, identifies areas across England with the greatest levels of physical illnesses that are linked with obesity and inactivity. Please read the below information to gain a full understanding of what the data shows and how it should be interpreted.ANALYSIS METHODOLOGYThe analysis was carried out using Quality and Outcomes Framework (QOF) data, derived from NHS Digital, relating to:- Asthma (in persons of all ages)- Cancer (in persons of all ages)- Chronic kidney disease (in adults aged 18+)- Coronary heart disease (in persons of all ages)- Diabetes mellitus (in persons aged 17+)- Hypertension (in persons of all ages)- Stroke and transient ischaemic attack (in persons of all ages)This information was recorded at the GP practice level. However, GP catchment areas are not mutually exclusive: they overlap, with some areas covered by 30+ GP practices. Therefore, to increase the clarity and usability of the data, the GP-level statistics were converted into statistics based on Middle Layer Super Output Area (MSOA) census boundaries.For each of the above illnesses, the percentage of each MSOA’s population with that illness was estimated. This was achieved by calculating a weighted average based on:- The percentage of the MSOA area that was covered by each GP practice’s catchment area- Of the GPs that covered part of that MSOA: the percentage of patients registered with each GP that have that illnessThe estimated percentage of each MSOA’s population with each illness was then combined with Office for National Statistics Mid-Year Population Estimates (2019) data for MSOAs, to estimate the number of people in each MSOA with each illness, within the relevant age range.For each illness, each MSOA was assigned a relative score between 1 and 0 (1 = worst, 0 = best) based on:A) the PERCENTAGE of the population within that MSOA who are estimated to have that illnessB) the NUMBER of people within that MSOA who are estimated to have that illnessAn average of scores A & B was taken, and converted to a relative score between 1 and 0 (1= worst, 0 = best). The closer to 1 the score, the greater both the number and percentage of the population in the MSOA predicted to have that illness, compared to other MSOAs. In other words, those are areas where a large number of people are predicted to suffer from an illness, and where those people make up a large percentage of the population, indicating there is a real issue with that illness within the population and the investment of resources to address that issue could have the greatest benefits.The scores for each of the 7 illnesses were added together then converted to a relative score between 1 – 0 (1 = worst, 0 = best), to give an overall score for each MSOA: a score close to 1 would indicate that an area has high predicted levels of all obesity/inactivity-related illnesses, and these are areas where the local population could benefit the most from interventions to address those illnesses. A score close to 0 would indicate very low predicted levels of obesity/inactivity-related illnesses and therefore interventions might not be required.LIMITATIONS1. GPs do not have catchments that are mutually exclusive from each other: they overlap, with some geographic areas being covered by 30+ practices. This dataset should be viewed in combination with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset to identify where there are areas that are covered by multiple GP practices but at least one of those GP practices did not provide data. Results of the analysis in these areas should be interpreted with caution, particularly if the levels of obesity/inactivity-related illnesses appear to be significantly lower than the immediate surrounding areas.2. GP data for the financial year 1st April 2018 – 31st March 2019 was used in preference to data for the financial year 1st April 2019 – 31st March 2020, as the onset of the COVID19 pandemic during the latter year could have affected the reporting of medical statistics by GPs. However, for 53 GPs (out of 7670) that did not submit data in 2018/19, data from 2019/20 was used instead. Note also that some GPs (997 out of 7670) did not submit data in either year. This dataset should be viewed in conjunction with the ‘Health and wellbeing statistics (GP-level, England): Missing data and potential outliers’ dataset, to determine areas where data from 2019/20 was used, where one or more GPs did not submit data in either year, or where there were large discrepancies between the 2018/19 and 2019/20 data (differences in statistics that were > mean +/- 1 St.Dev.), which suggests erroneous data in one of those years (it was not feasible for this study to investigate this further), and thus where data should be interpreted with caution. Note also that there are some rural areas (with little or no population) that do not officially fall into any GP catchment area (although this will not affect the results of this analysis if there are no people living in those areas).3. Although all of the obesity/inactivity-related illnesses listed can be caused or exacerbated by inactivity and obesity, it was not possible to distinguish from the data the cause of the illnesses in patients: obesity and inactivity are highly unlikely to be the cause of all cases of each illness. By combining the data with data relating to levels of obesity and inactivity in adults and children (see the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset), we can identify where obesity/inactivity could be a contributing factor, and where interventions to reduce obesity and increase activity could be most beneficial for the health of the local population.4. It was not feasible to incorporate ultra-fine-scale geographic distribution of populations that are registered with each GP practice or who live within each MSOA. Populations might be concentrated in certain areas of a GP practice’s catchment area or MSOA and relatively sparse in other areas. Therefore, the dataset should be used to identify general areas where there are high levels of obesity/inactivity-related illnesses, rather than interpreting the boundaries between areas as ‘hard’ boundaries that mark definite divisions between areas with differing levels of these illnesses. TO BE VIEWED IN COMBINATION WITH:This dataset should be viewed alongside the following datasets, which highlight areas of missing data and potential outliers in the data:- Health and wellbeing statistics (GP-level, England): Missing data and potential outliersDOWNLOADING THIS DATATo access this data on your desktop GIS, download the ‘Levels of obesity, inactivity and associated illnesses: Summary (England)’ dataset.DATA SOURCESThis dataset was produced using:Quality and Outcomes Framework data: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.GP Catchment Outlines. Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital. Data was cleaned by Ribble Rivers Trust before use.COPYRIGHT NOTICEThe reproduction of this data must be accompanied by the following statement:© Ribble Rivers Trust 2021. Analysis carried out using data that is: Copyright © 2020, Health and Social Care Information Centre. The Health and Social Care Information Centre is a non-departmental body created by statute, also known as NHS Digital.CaBA HEALTH & WELLBEING EVIDENCE BASEThis dataset forms part of the wider CaBA Health and Wellbeing Evidence Base.
This dataset contains information on antibody testing for COVID-19: the number of people who received a test, the number of people with positive results, the percentage of people tested who tested positive, and the rate of testing per 100,000 people, stratified by week of testing. These data can also be accessed here: https://github.com/nychealth/coronavirus-data/blob/master/trends/antibody-by-week.csv Exposure to COVID-19 can be detected by measuring antibodies to the disease in a person’s blood, which can indicate that a person may have had an immune response to the virus. Antibodies are proteins produced by the body’s immune system that can be found in the blood. People can test positive for antibodies after they have been exposed, sometimes when they no longer test positive for the virus itself. It is important to note that the science around COVID-19 antibody tests is evolving rapidly and there is still much uncertainty about what individual antibody test results mean for a single person and what population-level antibody test results mean for understanding the epidemiology of COVID-19 at a population level. These data only provide information on people tested. People receiving an antibody test do not reflect all people in New York City; therefore, these data may not reflect antibody prevalence among all New Yorkers. Increasing instances of screening programs further impact the generalizability of these data, as screening programs influence who and how many people are tested over time. Examples of screening programs in NYC include: employers screening their workers (e.g., hospitals), and long-term care facilities screening their residents. In addition, there may be potential biases toward people receiving an antibody test who have a positive result because people who were previously ill are preferentially seeking testing, in addition to the testing of persons with higher exposure (e.g., health care workers, first responders.) Rates were calculated using interpolated intercensal population estimates updated in 2019. These rates differ from previously reported rates based on the 2000 Census or previous versions of population estimates. The Health Department produced these population estimates based on estimates from the U.S. Census Bureau and NYC Department of City Planning. Antibody tests are categorized based on the date of specimen collection and are aggregated by full weeks starting each Sunday and ending on Saturday. For example, a person whose blood was collected for antibody testing on Wednesday, May 6 would be categorized as tested during the week ending May 9. A person tested twice in one week would only be counted once in that week. This dataset includes testing data beginning April 5, 2020. Data are updated daily, and the dataset preserves historical records and source data changes, so each extract date reflects the current copy of the data as of that date. For example, an extract date of 11/04/2020 and extract date of 11/03/2020 will both contain all records as they were as of that extract date. Without filtering or grouping by extract date, an analysis will almost certainly be miscalculating or counting the same values multiple times. To analyze the most current data, only use the latest extract date. Antibody tests that are missing dates are not included in the dataset; as dates are identified, these events are added. Lags between occurrence and report of cases and tests can be assessed by comparing counts and rates across multiple data extract dates. For further details, visit: • https://www1.nyc.gov/site/doh/covid/covid-19-data.page • https://github.com/nychealth/coronavirus-data
In 2022, there were 313,017 cases filed by the NCIC where the race of the reported missing was White. In the same year, 18,928 people were missing whose race was unknown.
What is the NCIC?
The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide.
Missing people in the United States
A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.