22 datasets found

Number of missing persons files in the U.S. 2022, by race
statista.com
Updated Jul 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2024). Number of missing persons files in the U.S. 2022, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/
Explore at:
Dataset updated
Jul 5, 2024
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
United States
Description
In 2022, there were 313,017 cases filed by the NCIC where the race of the reported missing was White. In the same year, 18,928 people were missing whose race was unknown.

What is the NCIC?

The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide.

Missing people in the United States

A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
d
NCRB: State and Gender-wise number of children reported missing and traced
dataful.in
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). NCRB: State and Gender-wise number of children reported missing and traced [Dataset]. https://dataful.in/datasets/18468
Explore at:
csv, application/x-parquet, xlsxAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
States of India
Variables measured
Number of children missing, share of children traced
Description
Ministry of Home Affairs, Government of India has defined missing child as 'a person below eighteen years of age, whose whereabouts are not known to the parents, legal guardians and any other persons who may be legally entrusted with the custody of the child, whatever may be the circumstances/causes of disappearance”. The dataset contains the state wise and gender-wise number of children reported missing in a particular year, total number of persons missing including those from previous years, number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
Data from: National Incidence Studies of Missing, Abducted, Runaway, and...
catalog.data.gov
icpsr.umich.edu
Updated Mar 12, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Juvenile Justice and Delinquency Prevention (2025). National Incidence Studies of Missing, Abducted, Runaway, and Thrownaway Children (NISMART), 1999 [Dataset]. https://catalog.data.gov/dataset/national-incidence-studies-of-missing-abducted-runaway-and-thrownaway-children-nismart-199-2621e
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Office of Juvenile Justice and Delinquency Preventionhttp://ojjdp.gov/
Description
The National Incidence Studies of Missing, Abducted, Runaway, and Thrownaway Children (NISMART) were undertaken in response to the mandate of the 1984 Missing Children's Assistance Act (Pub.L. 98-473) that requires the Office of Juvenile Justice and Delinquency Prevention (OJJDP) to conduct periodic national incidence studies to determine the actual number of children reported missing and the number of missing children who are recovered for a given year. The first such study, NISMART-1 (NATIONAL INCIDENCE STUDIES OF MISSING, ABDUCTED, RUNAWAY, AND THROWNAWAY CHILDREN (NISMART), 1988 [ICPSR 9682]), was conducted from 1988 to 1989 and addressed this mandate by defining major types of missing child episodes and estimating the number of children who experienced missing child episodes of each type in 1988. At that time, the lack of a standardized definition of a "missing child" made it impossible to provide a single estimate of missing children. As a result, one of the primary goals of NISMART-2 was to develop a standardized definition and provide unified estimates of the number of missing children in the United States. Both NISMART-1 and NISMART-2 comprise several component datasets designed to provide a comprehensive picture of the population of children who experienced qualifying episodes, with each component focusing on a different aspect of the missing child population. The Household Survey -- Youth Data and the Household Survey -- Adult Data (Parts 1-2) are similar but separate surveys, one administered to the adult primary caretaker of the children in the sampled household and the other to a randomly selected household youth aged 10 through 18 at the time of interview. The Juvenile Facilities Data on Runaways (Part 3) sought to estimate the number of runaways from juvenile residential facilities in order to supplement the household survey estimate of the number of runaways from households. And the Law Enforcement Study Data, by case perpetrator, and victim, (Parts 4-6) intended to estimate the number of children who were victims of stereotypical kidnappings and to obtain a sample of these cases for in-depth study.
d
NCRB: State and Gender-wise Number of Persons Reported Missing and Traced
dataful.in
Updated Jul 3, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). NCRB: State and Gender-wise Number of Persons Reported Missing and Traced [Dataset]. https://dataful.in/datasets/18466
Explore at:
csv, application/x-parquet, xlsxAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
India
Variables measured
Number of persons missing, share of persons traced
Description
The dataset contains the state-wise number of persons reported missing in a particular year, the total number of persons missing including those from previous years, the number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.

Note: Figures for projected_mid_year_population are sourced from the Report of the Technical Group on Population Projections for India and States 2011-2036
National Missing and Unidentified Persons System (NamUs)
catalog.data.gov
datasets.ai
Updated Mar 12, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office of Justice Programs (2025). National Missing and Unidentified Persons System (NamUs) [Dataset]. https://catalog.data.gov/dataset/national-missing-and-unidentified-persons-system-namus
Explore at:
Dataset updated
Mar 12, 2025
Dataset provided by
Office of Justice Programshttps://ojp.gov/
Description
NamUs is the only national repository for missing, unidentified, and unclaimed persons cases. The program provides a singular resource hub for law enforcement, medical examiners, coroners, and investigating professionals. It is the only national database for missing, unidentified, and unclaimed persons that allows limited access to the public, empowering family members to take a more proactive role in the search for their missing loved ones.
f
Missing and Unaccounted-for People in Mexico (1960s–2025)
figshare.com
txt
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Montserrat Mora (2025). Missing and Unaccounted-for People in Mexico (1960s–2025) [Dataset]. http://doi.org/10.6084/m9.figshare.28283000.v3
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28283000.v3
Dataset updated
May 2, 2025
Dataset provided by
figshare
Authors
Montserrat Mora
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Mexico
Description
This project provides a comprehensive dataset of over 125,000 missing and unaccounted-for people in Mexico from the 1960s to 2025. The dataset is sourced from the publicly available records on the RNPDO website and represents individuals who were actively missing as of the date of collection (May 1, 2025). To protect individual identities, personal identifiers, such as names, have been removed.Dataset Features:The data has been cleaned and translated to facilitate analysis by a global audience.Fields include:SexDate of birthDate of incidenceState and municipality of the incidentData spans over six decades, offering insights into trends and regional disparities.Additional Materials:Python Script: A Python script to generate customizable visualizations based on the dataset. Users can specify the state to generate tailored charts.Sample Chart: An example chart showcasing the evolution of missing persons per 100,000 inhabitants in Mexico between 2006 and 2025.Requirements File: A requirements.txt file listing the necessary Python libraries to run the script seamlessly.This dataset and accompanying tools aim to support researchers, policymakers, and journalists in analyzing and addressing the issue of missing persons in Mexico.
A
‘Missing Migrants Dataset’ analyzed by Analyst-2
analyst-2.ai
Updated Apr 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘Missing Migrants Dataset’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-missing-migrants-dataset-c736/2e62d69f/?v=grid
Explore at:
Dataset updated
Apr 23, 2019
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Missing Migrants Dataset’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/jmataya/missingmigrants on 14 February 2022.

--- Dataset description provided by original source is as follows ---

About the Missing Migrants Data

This data is sourced from the International Organization for Migration. The data is part of a specific project called the Missing Migrants Project which tracks deaths of migrants, including refugees , who have gone missing along mixed migration routes worldwide. The research behind this project began with the October 2013 tragedies, when at least 368 individuals died in two shipwrecks near the Italian island of Lampedusa. Since then, Missing Migrants Project has developed into an important hub and advocacy source of information that media, researchers, and the general public access for the latest information.

Where is the data from?

Missing Migrants Project data are compiled from a variety of sources. Sources vary depending on the region and broadly include data from national authorities, such as Coast Guards and Medical Examiners; media reports; NGOs; and interviews with survivors of shipwrecks. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. In other cases, media reports are used. IOM and UNHCR also regularly coordinate on such data to ensure consistency. Data on the U.S./Mexico border are compiled based on data from U.S. county medical examiners and sheriff’s offices, as well as media reports for deaths occurring on the Mexico side of the border. Estimates within Mexico and Central America are based primarily on media and year-end government reports. Data on the Bay of Bengal are drawn from reports by UNHCR and NGOs. In the Horn of Africa, data are obtained from media and NGOs. Data for other regions is drawn from a combination of sources, including media and grassroots organizations. In all regions, Missing Migrants Projectdata represents minimum estimates and are potentially lower than in actuality.

Updated data and visuals can be found here: https://missingmigrants.iom.int/

Who is included in Missing Migrants Project data?

IOM defines a migrant as any person who is moving or has moved across an international border or within a State away from his/her habitual place of residence, regardless of

(1) the person’s legal status; (2) whether the movement is voluntary or involuntary; (3) what the causes for the movement are; or (4) what the length of the stay is.[1]

Missing Migrants Project counts migrants who have died or gone missing at the external borders of states, or in the process of migration towards an international destination. The count excludes deaths that occur in immigration detention facilities, during deportation, or after forced return to a migrant’s homeland, as well as deaths more loosely connected with migrants’ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. This approach is chosen because deaths that occur at physical borders and while en route represent a more clearly definable category, and inform what migration routes are most dangerous. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, rather tracked as a distinct category.

How complete is the data on dead and missing migrants?

Data on fatalities during the migration process are challenging to collect for a number of reasons, most stemming from the irregular nature of migratory journeys on which deaths tend to occur. For one, deaths often occur in remote areas on routes chosen with the explicit aim of evading detection. Countless bodies are never found, and rarely do these deaths come to the attention of authorities or the media. Furthermore, when deaths occur at sea, frequently not all bodies are recovered - sometimes with hundreds missing from one shipwreck - and the precise number of missing is often unknown. In 2015, over 50 per cent of deaths recorded by the Missing Migrants Project refer to migrants who are presumed dead and whose bodies have not been found, mainly at sea.

Data are also challenging to collect as reporting on deaths is poor, and the data that does exist are highly scattered. Few official sources are collecting data systematically. Many counts of death rely on media as a source. Coverage can be spotty and incomplete. In addition, the involvement of criminal actors in incidents means there may be fear among survivors to report deaths and some deaths may be actively covered-up. The irregular immigration status of many migrants, and at times their families as well, also impedes reporting of missing persons or deaths.

The varying quality and comprehensiveness of data by region in attempting to estimate deaths globally may exaggerate the share of deaths that occur in some regions, while under-representing the share occurring in others.

What can be understood through this data?

The available data can give an indication of changing conditions and trends related to migration routes and the people travelling on them, which can be relevant for policy making and protection plans. Data can be useful to determine the relative risks of irregular migration routes. For example, Missing Migrants Project data show that despite the increase in migrant flows through the eastern Mediterranean in 2015, the central Mediterranean remained the more deadly route. In 2015, nearly two people died out of every 100 travellers (1.85%) crossing the Central route, as opposed to one out of every 1,000 that crossed from Turkey to Greece (0.095%). From the data, we can also get a sense of whether groups like women and children face additional vulnerabilities on migration routes.

However, it is important to note that because of the challenges in data collection for the missing and dead, basic demographic information on the deceased is rarely known. Often migrants in mixed migration flows do not carry appropriate identification. When bodies are found it may not be possible to identify them or to determine basic demographic information. In the data compiled by Missing Migrants Project, sex of the deceased is unknown in over 80% of cases. Region of origin has been determined for the majority of the deceased. Even this information is at times extrapolated based on available information – for instance if all survivors of a shipwreck are of one origin it was assumed those missing also came from the same region.

The Missing Migrants Project dataset includes coordinates for where incidents of death took place, which indicates where the risks to migrants may be highest. However, it should be noted that all coordinates are estimates.

Why collect data on missing and dead migrants?

By counting lives lost during migration, even if the result is only an informed estimate, we at least acknowledge the fact of these deaths. What before was vague and ill-defined is now a quantified tragedy that must be addressed. Politically, the availability of official data is important. The lack of political commitment at national and international levels to record and account for migrant deaths reflects and contributes to a lack of concern more broadly for the safety and well-being of migrants, including asylum-seekers. Further, it drives public apathy, ignorance, and the dehumanization of these groups.

Data are crucial to better understand the profiles of those who are most at risk and to tailor policies to better assist migrants and prevent loss of life. Ultimately, improved data should contribute to efforts to better understand the causes, both direct and indirect, of fatalities and their potential links to broader migration control policies and practices.

Counting and recording the dead can also be an initial step to encourage improved systems of identification of those who die. Identifying the dead is a moral imperative that respects and acknowledges those who have died. This process can also provide a some sense of closure for families who may otherwise be left without ever knowing the fate of missing loved ones.

Identification and tracing of the dead and missing

As mentioned above, the challenge remains to count the numbers of dead and also identify those counted. Globally, the majority of those who die during migration remain unidentified. Even in cases in which a body is found identification rates are low. Families may search for years or a lifetime to find conclusive news of their loved one. In the meantime, they may face psychological, practical, financial, and legal problems.

Ultimately Missing Migrants Project would like to see that every unidentified body, for which it is possible to recover, is adequately “managed”, analysed and tracked to ensure proper documentation, traceability and dignity. Common forensic protocols and standards should be agreed upon, and used within and between States. Furthermore, data relating to the dead and missing should be held in searchable and open databases at local, national and international levels to facilitate identification.

For more in-depth analysis and discussion of the numbers of missing and dead migrants around the world, and the challenges involved in identification and tracing, read our two reports on the issue, Fatal Journeys: Tracking Lives Lost during Migration (2014) and Fatal Journeys Volume 2, Identification and Tracing of Dead and Missing Migrants

Content

The data set records
d
Crimes Against Children from NCRB: Year-and Type-of-crime-wise Number of...
dataful.in
Updated Jul 3, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dataful (Factly) (2025). Crimes Against Children from NCRB: Year-and Type-of-crime-wise Number of Crimes Committed against Children [Dataset]. https://dataful.in/datasets/19540
Explore at:
application/x-parquet, xlsx, csvAvailable download formats
Dataset updated
Jul 3, 2025
Dataset authored and provided by
Dataful (Factly)
License
https://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Area covered
States of India
Variables measured
Types of Crimes against Children
Description
The dataset contains year-, type-of-crime- and gender-wise compiled data on the number of different types of crimes which were committed against children and the number of victims who were affected by the same crimes. The different types of crimes covered in the dataset include kidnapping and abduction crimes such as kidanapping and abduction for the purpose of murder, begging, ransom, compelling for marriage, procuration of minor girls, importation of girls from foreign countries, missing deemed as kidnapped, etc., fatal crimes such as murder, attempt to commit murder, muder with rape, abetment of suicide of child, infanticide, foeticide, trafficking and sexual crimes such buying and selling of minors for prostitution, use of children for pornography, transmiting sexual content and material involving children in sexually explicit acts, sexual assualt, penetrative sexual assault, rape, and other crimes such as child labour, child marriage, exposure, abandaonment, simple hurt, grievous hurt, insult and assualt of damage modesty, crimes under juvenile justice act and transplantation of organs act, etc.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...
openicpsr.org
Updated Jun 5, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2017). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2022 [Dataset]. http://doi.org/10.3886/E100707V20
Explore at:
Unique identifier
https://doi.org/10.3886/E100707V20
Dataset updated
Jun 5, 2017
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1960 - 2021
Area covered
United States
Description
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 20 release notes:Adds 2022 dataVersion 19 release notes:Starting in year 2018 I used the "card_actual_type" to identify missing months rather than using the "card_actual_pt" column. As noted in previous release notes, a change by the FBI starting in 2018 led to the "card_actual_type" to always say that this month is reported. The "card_actual_type" appears to be unchanged so can be used to actually measure months missing. The tradeoff is that pre-2018 the "card_actual_type" and the "card_actual_pt" columns did not always agree so could have different values. Still, I consider the ability to measure months missing at all to be worth this tradeoff. Version 18 release notes:Adds data for 2021.Version 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which c
OPP Missing Persons Annual Report Data
open.canada.ca
ouvert.canada.ca
csv, html, txt, xlsx
Updated Jun 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Government of Ontario (2025). OPP Missing Persons Annual Report Data [Dataset]. https://open.canada.ca/data/en/dataset/1bf5a9a3-14bc-482d-9fe6-c182034f3a66
Explore at:
csv, xlsx, txt, htmlAvailable download formats
Dataset updated
Jun 25, 2025
Dataset provided by
Government of Ontariohttps://www.ontario.ca/
License
Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Time period covered
Jul 1, 2019 - Dec 31, 2023
Description
Under Section 8 of the Missing Persons Act, 2018, police services are required to report annually on their use of urgent demands for records under the Act and the Ministry of the Solicitor General is required to make the OPP’s annual report data publicly available. The data includes: * year in which the urgent demands were reported * category of records * description of records accessed under each category * total number of times each category of records was demanded * total number of missing persons investigations which had urgent demands for records * total number of urgent demands for records made by OPP in a year.
d
M.C.6.a_Percentage of Missing Sidewalk Network Completed
catalog.data.gov
Updated Jun 25, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.austintexas.gov (2025). M.C.6.a_Percentage of Missing Sidewalk Network Completed [Dataset]. https://catalog.data.gov/dataset/m-c-6a-percentage-of-missing-sidewalk-network-completed
Explore at:
Dataset updated
Jun 25, 2025
Dataset provided by
data.austintexas.gov
Description
The 2016 Sidewalk Master Plan envisions a network of nearly 5,000 miles of sidewalks. As of January 1, 2020, over 2,700 miles (54.6%) of the network has been built. This measure shows the percentage of the missing network built by Austin Public Works each calendar year, based on the size of the absent portion of the network at the start of the year. The dataset Strategic Measure_Aggregated Sidewalk Construction Data shows the progress in this area for each calendar year, beginning in 2016. The dataset Strategic Measure_Sidewalk Segment Data provides more detailed data for the nearly 300,000 segments that make up the network, both built and unbuilt.
Crime Data from 2010 to 2019
data.lacity.org
s.cnmilf.com
+1more
application/rdfxml +5
Updated Jun 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Los Angeles Police Department (2019). Crime Data from 2010 to 2019 [Dataset]. https://data.lacity.org/Public-Safety/Crime-Data-from-2010-to-2019/63jg-8b9z
Explore at:
application/rssxml, tsv, application/rdfxml, csv, json, xmlAvailable download formats
Dataset updated
Jun 25, 2019
Dataset authored and provided by
Los Angeles Police Departmenthttp://lapdonline.org/
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This dataset reflects incidents of crime in the City of Los Angeles from 2010 - 2019. This data is transcribed from original crime reports that are typed on paper and therefore there may be some inaccuracies within the data. Some location fields with missing data are noted as (0°, 0°). Address fields are only provided to the nearest hundred block in order to maintain privacy. This data is as accurate as the data in the database. Please note questions or concerns in the comments.
Quarterly Labour Force Survey Household Dataset, April - June, 2021
beta.ukdataservice.ac.uk
Updated 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office For National Statistics (2023). Quarterly Labour Force Survey Household Dataset, April - June, 2021 [Dataset]. http://doi.org/10.5255/ukda-sn-8852-3
Explore at:
Unique identifier
https://doi.org/10.5255/ukda-sn-8852-3
Dataset updated
2023
Dataset provided by
DataCitehttps://www.datacite.org/
UK Data Servicehttps://ukdataservice.ac.uk/
Authors
Office For National Statistics
Description
Background
The Labour Force Survey (LFS) is a unique source of information using international definitions of employment and unemployment and economic inactivity, together with a wide range of related topics such as occupation, training, hours of work and personal characteristics of household members aged 16 years and over. It is used to inform social, economic and employment policy. The LFS was first conducted biennially from 1973-1983. Between 1984 and 1991 the survey was carried out annually and consisted of a quarterly survey conducted throughout the year and a 'boost' survey in the spring quarter (data were then collected seasonally). From 1992 quarterly data were made available, with a quarterly sample size approximately equivalent to that of the previous annual data. The survey then became known as the Quarterly Labour Force Survey (QLFS). From December 1994, data gathering for Northern Ireland moved to a full quarterly cycle to match the rest of the country, so the QLFS then covered the whole of the UK (though some additional annual Northern Ireland LFS datasets are also held at the UK Data Archive). Further information on the background to the QLFS may be found in the documentation.

Household datasets
Up to 2015, the LFS household datasets were produced twice a year (April-June and October-December) from the corresponding quarter's individual-level data. From January 2015 onwards, they are now produced each quarter alongside the main QLFS. The household datasets include all the usual variables found in the individual-level datasets, with the exception of those relating to income, and are intended to facilitate the analysis of the economic activity patterns of whole households. It is recommended that the existing individual-level LFS datasets continue to be used for any analysis at individual level, and that the LFS household datasets be used for analysis involving household or family-level data. From January 2011, a pseudonymised household identifier variable (HSERIALP) is also included in the main quarterly LFS dataset instead.

Change to coding of missing values for household series
From 1996-2013, all missing values in the household datasets were set to one '-10' category instead of the separate '-8' and '-9' categories. For that period, the ONS introduced a new imputation process for the LFS household datasets and it was necessary to code the missing values into one new combined category ('-10'), to avoid over-complication. This was also in line with the Annual Population Survey household series of the time. The change was applied to the back series during 2010 to ensure continuity for analytical purposes. From 2013 onwards, the -8 and -9 categories have been reinstated.

LFS Documentation
The documentation available from the Archive to accompany LFS datasets largely consists of the latest version of each volume alongside the appropriate questionnaire for the year concerned. However, LFS volumes are updated periodically by ONS, so users are advised to check the ONS LFS User Guidance page before commencing analysis.

Additional data derived from the QLFS
The Archive also holds further QLFS series: End User Licence (EUL) quarterly datasets; Secure Access datasets (see below); two-quarter and five-quarter longitudinal datasets; quarterly, annual and ad hoc module datasets compiled for Eurostat; and some additional annual Northern Ireland datasets.

End User Licence and Secure Access QLFS Household datasets
Users should note that there are two discrete versions of the QLFS household datasets. One is available under the standard End User Licence (EUL) agreement, and the other is a Secure Access version. Secure Access household datasets for the QLFS are available from 2009 onwards, and include additional, detailed variables not included in the standard EUL versions. Extra variables that typically can be found in the Secure Access versions but not in the EUL versions relate to: geography; date of birth, including day; education and training; household and family characteristics; employment; unemployment and job hunting; accidents at work and work-related health problems; nationality, national identity and country of birth; occurrence of learning difficulty or disability; and benefits. For full details of variables included, see data dictionary documentation. The Secure Access version (see SN 7674) has more restrictive access conditions than those made available under the standard EUL. Prospective users will need to gain ONS Accredited Researcher status, complete an extra application form and demonstrate to the data owners exactly why they need access to the additional variables. Users are strongly advised to first obtain the standard EUL version of the data to see if they are sufficient for their research requirements.

Changes to variables in QLFS Household EUL datasets
In order to further protect respondent confidentiality, ONS have made some changes to variables available in the EUL datasets. From July-September 2015 onwards, 4-digit industry class is available for main job only, meaning that 3-digit industry group is the most detailed level available for second and last job.

Review of imputation methods for LFS Household data - changes to missing values
A review of the imputation methods used in LFS Household and Family analysis resulted in a change from the January-March 2015 quarter onwards. It was no longer considered appropriate to impute any personal characteristic variables (e.g. religion, ethnicity, country of birth, nationality, national identity, etc.) using the LFS donor imputation method. This method is primarily focused to ensure the 'economic status' of all individuals within a household is known, allowing analysis of the combined economic status of households. This means that from 2015 larger amounts of missing values ('-8'/-9') will be present in the data for these personal characteristic variables than before. Therefore if users need to carry out any time series analysis of households/families which also includes personal characteristic variables covering this time period, then it is advised to filter off 'ioutcome=3' cases from all periods to remove this inconsistent treatment of non-responders.

Occupation data for 2021 and 2022 data files
The ONS has identified an issue with the collection of some occupational data in 2021 and 2022 data files in a number of their surveys. While they estimate any impacts will be small overall, this will affect the accuracy of the breakdowns of some detailed (four-digit Standard Occupational Classification (SOC)) occupations, and data derived from them. Further information can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.

Latest edition information
For the third edition (September 2023), the variables NSECM20, NSECMJ20, SC2010M, SC20SMJ, SC20SMN, SOC20M and SOC20O have been replaced with new versions. Further information on the SOC revisions can be found in the ONS article published on 11 July 2023: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/revisionofmiscodedoccupationaldataintheonslabourforcesurveyuk/january2021toseptember2022" style="background-color: rgb(255, 255, 255);">Revision of miscoded occupational data in the ONS Labour Force Survey, UK: January 2021 to September 2022.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...
openicpsr.org
Updated Jun 5, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2017). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data: Offenses Known and Clearances by Arrest (Return A), 1960-2020 [Dataset]. http://doi.org/10.3886/E100707V17
Explore at:
Unique identifier
https://doi.org/10.3886/E100707V17
Dataset updated
Jun 5, 2017
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1960 - 2020
Area covered
United States
Description
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 17 release notes:Adds data for 2020.Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Offenses Known and Clearances by Arrest data they release. Changes .rda files to .rds. Please note that in 2020 the card_actual_pt variable always returns that the month was reported. This causes 2020 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 16 release notes:Changes release notes description, does not change data.Version 15 release notes:Adds data for 2019.Please note that in 2019 the card_actual_pt variable always returns that the month was reported. This causes 2019 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this). Version 14 release notes:Adds arson data from the UCR's Arson dataset. This adds just the arson variables about the number of arson incidents, not the complete set of variables in that dataset (which include damages from arson and whether structures were occupied or not during the arson.As arson is an index crime, both the total index and the index property columns now include arson offenses. The "all_crimes" variables also now include arson.Adds a arson_number_of_months_missing column indicating how many months were not reporting (i.e. missing from the annual data) in the arson data. In most cases, this is the same as the normal number_of_months_missing but not always so please check if you intend to use arson data.Please note that in 2018 the card_actual_pt variable always returns that the month was reported. This causes 2018 to report that all months are reported for all agencies because I use the card_actual_pt variable to measure how many months were reported. This variable is almost certainly incorrect since it is extremely unlikely that all agencies suddenly always report. However, I am keeping this incorrect value to maintain a consistent definition of how many months are missing (measuring missing months through card_actual_type, for example, gives different results for previous years so I don't want to change this).For some reason, a small number of agencies (primarily federal agencies) had the same ORI number in 2018 and I removed these duplicate agencies. Version 13 release notes: Adds 2018 dataNew Orleans (ORI = LANPD00) data had more unfounded crimes than actual crimes in 2018 so unfounded columns for 2018 are all NA. Version 12 release notes: Adds population 1-3 columns - if an agency is in multiple counties, these variables show the population in the county with the most people in that agency in it (population_1), second largest county (population_2), and third largest county (population_3). Also adds county 1-3 columns which identify which counties the agency is in. The population column is the sum of the three population columns. Thanks to Mike Maltz for the suggestion!Fixes bug in the crosswalk data that is merged to this file that had the incorrect FIPS code for Clinton, Tennessee (ORI = TN00101). Thanks for Brooke Watson for catching this bug!Adds a last_month_reported column which says which month was reported last. This is actually how the FBI defines number_of_months_reported so is a more accurate representation of that. Removes the number_of_months_reported variable as the name is misleading. You should use the last_month_reported or the number_of_months_missing (see below) variable instead.Adds a number_of_months_missin
t
PV Generation and Consumption Dataset of an Estonian Residential Dwelling
data.taltech.ee
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sayeed Hasan; Sayeed Hasan; Andrei Blinov; Andrei Blinov; Andrii Chub; Andrii Chub; Dmitri Vinnikov; Dmitri Vinnikov (2025). PV Generation and Consumption Dataset of an Estonian Residential Dwelling [Dataset]. http://doi.org/10.48726/6hayh-x0h25
Explore at:
Unique identifier
https://doi.org/10.48726/6hayh-x0h25
Dataset updated
Mar 22, 2025
Dataset provided by
TalTech Data Repository
Authors
Sayeed Hasan; Sayeed Hasan; Andrei Blinov; Andrei Blinov; Andrii Chub; Andrii Chub; Dmitri Vinnikov; Dmitri Vinnikov
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Estonia
Description
This is a Residential PV generation and consumption data set from an Estonian house. At the time of submission, one year (2023) of data was available. The data was logged at a 10-second resolution. The untouched dataset can be found in the raw data folder, which is separated month-wise. A few missing points in the dataset were filled with a simple KNN algorithm. However, improved data imputation methods based on machine learning are also possible. To carry out the imputing, run the scripts in the script folder one by one in the numerical serial order (SC1..py, SC2..py, etc.).

Data Descriptor (Scientific Data): https://doi.org/10.1038/s41597-025-04747-w">https://doi.org/10.1038/s41597-025-04747-w

General Information:

Duration: January 2023 – December 2023

Resolution: 10 seconds

Dataset Type: Aggregated consumption and PV generation data

Logging Device: Camile Bauer PQ1000 (×2)

Load/Appliance Information:

5 kW Rooftop PV array connected to AC Bus via 4.2kW 3-ϕ Inverter

Air conditioner: 0.44 kW (Cooling), 0.62 kW (Heating)

Air to Water (ATW) Heat Pump: 2.5kW (Cooling), 2.6 kW (Heating)

ATW Cylinder unit: 0.21 kW (Controller), 9 kW (Booster Heater)

Microwave oven: 0.9 kW

Coffee Maker: 1 kW

Cooktop Hot Plate: 4.6 kW

TV: 0.103 kW

Vacuum Cleaner: 1.5 kW

Ventilation: 0.1 kW

Washing Machine: 2.2 kW

Electric Sauna: 10 kW

Lighting: 0.25 kW

EV charger: 2.4 kW 1-ϕ

Measurement Points:

PV converter-side current transformer, potential transformer (Measurement of PV generation).

Utility meter-side current transformer, potential transformer (Measurement of power exchange with the grid).

Measured Parameters:

Per-phase mean power recorded within the sampling period

Per-phase Minimum power recorded within the sampling period

Per-phase maximum power recorded within the sampling period

Quadrant-wise mean power recorded within the sampling period (1st + 3rd), (2nd + 4th)

Quadrant-wise minimum power recorded within the sampling period (1st + 3rd), (2nd + 4th)

Quadrant-wise maximum power recorded within the sampling period (1st + 3rd), (2nd + 4th)

mean power Factor recorded within the sampling period

Minimum power Factor recorded within the sampling period

Maximum power Factor recorded within the sampling period

System Voltage

Minimum system Voltage

Maximum system Voltage

Mean Voltage between phase and neutral

Minimum voltage between phase and neutral

Maximum voltage between phase and neutral

Zero displacement voltage 4-wire systems (mean, min, max)

Script Description:

SC1_PV_auto_sort.py : This fixes timestamp continuity by resampling at the original sampling rate for PV generation data.

SC2_L2_auto_sort.py : This fixes timestamp continuity by resampling at the original sampling rate for meter-side measurement data.

SC3_PV_KNN_impute.py : Filling missing data points by simple KNN for PV generation data.

SC4_L2_KNN_impute.py : Filling missing data points by simple KNN for meter-side measurement data.

SC5_Final_data_gen.py : Merge PV and meter-side measurement data, and calculate load consumption.

The dataset provides all the outcomes (CSV files) from the scripts. All processed variables (PV generation, load, power import, and export) are expressed in kW units.

Update: 'SC1_PV_auto_sort.py' & 'SC2_L2_auto_sort.py' are adequate for cleaning up data and making the missing point visible. 'SC3_PV_KNN_impute.py' & 'SC4_L2_KNN_impute.py' work fine for short-range missing data points; however, these two scripts won't help much for missing data points for a longer period. They are provided as examples of one method of processing data. Future updates will include proper ML-based forecasting to predict missing data points.

Funding Agency and Grant Number:

European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement no. 955614.

Estonian Research Council under Grant PRG1086.

Estonian Centre of Excellence in Energy Efficiency, ENER, funded by the Estonian Ministry of Education and Research under Grant TK230.
A
‘MISSING MIGRANTS (2014-2021)’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘MISSING MIGRANTS (2014-2021)’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-missing-migrants-2014-2021-19da/1a9479e3/?iid=039-565&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘MISSING MIGRANTS (2014-2021)’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/methoomirza/missing-migrants-20142021 on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Missing Migrants Project tracks deaths of migrants, including refugees and asylum-seekers, who have died or gone missing in the process of migration towards an international destination. Please note that these data represent minimum estimates, as many deaths during migration go unrecorded

What is included in Missing Migrants Project data?

Missing Migrants Project counts migrants who have died at the external borders of states, or in the process of migration towards an international destination, regardless of their legal status. The Project records only those migrants who die during their journey to a country different from their country of residence. Missing Migrants Project data include the deaths of migrants who die in transportation accidents, shipwrecks, violent attacks, or due to medical complications during their journeys. It also includes the number of corpses found at border crossings that are categorized as the bodies of migrants, on the basis of belongings and/or the characteristics of the death. For instance, a death of an unidentified person might be included if the decedent is found without any identifying documentation in an area known to be on a migration route. Deaths during migration may also be identified based on the cause of death, especially if is related to trafficking, smuggling, or means of travel such as on top of a train, in the back of a cargo truck, as a stowaway on a plane, in unseaworthy boats, or crossing a border fence. While the location and cause of death can provide strong evidence that an unidentified decedent should be included in Missing Migrants Project data, this should always be evaluated in conjunction with migration history and trends.

What is excluded?

The count excludes deaths that occur in immigration detention facilities or after deportation to a migrant’s homeland, as well as deaths more loosely connected with migrants´ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. The deaths of internally displaced persons who die within their country of origin are also excluded. There remains a significant gap in knowledge and data on such deaths. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, but rather tracked as a distinct category.

What sources of information are used in the Missing Migrants Project database?

The Missing Migrants Project currently gathers information from diverse sources such as official records – including from coast guards and medical examiners – and other sources such as media reports, NGOs, and surveys and interviews of migrants. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. IOM and UNHCR also regularly coordinate to validate data on missing migrants in the Mediterranean. Data on the United States/Mexico border are compiled based on data from U.S. county medical examiners, coroners, and sheriff’s offices, as well as media reports for deaths occurring on the Mexican side of the border. In Africa, data are obtained from media and NGOs, including the Regional Mixed Migration Secretariat and the International Red Cross/Red Crescent. The quality of the data source(s) for each incident is assessed through the ‘Source quality’ variable, which can be viewed in the data. Across the world, the Missing Migrants Project uses social and traditional media reports to find data, which are then verified by local IOM staff whenever possible. In all cases, new entries are checked against existing records to ensure that no deaths are double-counted. In all regions, Missing Migrants Project data represent a minimum estimate of the number of migrant deaths. To learn more about data sources, visit the thematic page on migrant deaths and disappearances in the Global Migration Data Portal.

Content

What are the variables used in the Missing Migrants Project database?

This section presents the list of variables that constitute the Missing Migrants Project database. While ideally, all incidents recorded would include entries for each of these variables, the challenges described above mean that this is not always possible. The minimum information necessary to register an incident is the date of the incident, the number of dead and/or the number of missing, and the location of death. If the information is unavailable, the cell is left blank or “unknown” is recorded, as indicated in below.

1. Web ID - An automatically generated number used to identify each unique entry in the dataset.

2. Region - Region in which an incident took place. For more about regional classifications used in the dataset, click here.

3. Incident Date - Estimated date of death. In cases where the exact date of death is not known, this variable indicates the date in which the body or bodies were found. In cases where data are drawn from surviving migrants, witnesses or other interviews, this variable is entered as the date of the death as reported by the interviewee. At a minimum, the month and the year of death is recorded. In some cases, official statistics are not disaggregated by the incident, meaning that data is reported as a total number of deaths occurring during a certain time period. In such cases the entry is marked as a “cumulative total,” and the latest date of the range is recorded, with the full dates recorded in the comments.

4. Year - The year in which the incident occurred.

5. Reported month - The month in which the incident occurred.

6. Number dead - The total number of people confirmed dead in one incident, i.e. the number of bodies recovered. If migrants are missing and presumed dead, such as in cases of shipwrecks, leave blank.

7. Number missing - The total number of those who are missing and are thus assumed to be dead. This variable is generally recorded in incidents involving shipwrecks. The number of missing is calculated by subtracting the number of bodies recovered from a shipwreck and the number of survivors from the total number of migrants reported to have been on the boat. This number may be reported by surviving migrants or witnesses. If no missing persons are reported, it is left blank.

8. Total dead & missing - The sum of the ‘number dead’ and ‘number missing’ variables.

9. Number of survivors - The number of migrants that survived the incident, if known. The age, gender, and country of origin of survivors are recorded in the ‘Comments’ variable if known. If unknown, it is left blank.

10. Number of females - Indicates the number of females found dead or missing. If unknown, it is left blank. This gender identification is based on a third-party interpretation of the victim's gender from information available in official documents, autopsy reports, witness testimonies, and/or media reports.

11. Number of males - Indicates the number of males found dead or missing. If unknown, it is left blank. This gender identification is based on a third-party interpretation of the victim's gender from information available in official documents, autopsy reports, witness testimonies, and/or media reports.

12. Number of children - Indicates the number of individuals under the age of 18 found dead or missing. If unknown, it is left blank.

13. Age - The age of the decedent(s). Occasionally, an estimated age range is recorded. If unknown, it is left blank.

14. Country of origin - Country of birth of the decedent. If unknown, the entry will be marked “unknown”.

15. Region of origin - Region of origin of the decedent(s). In some incidents, region of origin may be marked as “Presumed” or “(P)” if migrants travelling through that location are known to hail from a certain region. If unknown, the entry will be marked “unknown”.

16. Cause of death - The determination of conditions resulting in the migrant's death i.e. the circumstances of the event that produced the fatal injury. If unknown, the reason why is included where possible. For example, “Unknown – skeletal remains only”, is used in cases in which only the skeleton of the decedent was found.

17. Location description - Place where the death(s) occurred or where the body or bodies were found. Nearby towns or cities or borders are included where possible. When incidents are reported in an unspecified location, this will be noted.

18. Location coordinates - Place where the death(s) occurred or where the body or bodies were found. In many regions, most notably the Mediterranean, geographic coordinates are estimated as precise locations are not often known. The location description should always be checked against the location coordinates.

19. Migration route - Name of the migrant route on which incident occurred, if known. If unknown, it is left blank.

20. UNSD geographical grouping - Geographical region in which the incident took place, as designated by the United Nations Statistics Division (UNSD) geoscheme. For more about regional classifications used in the dataset, click here.

21. Information source - Name of source of information for each incident. Multiple sources may be listed.

22. Link - Links to original reports of migrant deaths /
d
Missing SW Licensing Data in the Namoi PAE 20140711
data.gov.au
researchdata.edu.au
Updated Nov 20, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2019). Missing SW Licensing Data in the Namoi PAE 20140711 [Dataset]. https://data.gov.au/data/dataset/131b847c-7fe3-4b5f-a610-e969b2e54ca4
Explore at:
Dataset updated
Nov 20, 2019
Dataset authored and provided by
Bioregional Assessment Program
Area covered
Namoi River
Description
Abstract

This dataset was supplied to the Bioregional Assessment Programme by a third party and is presented here as originally supplied. Metadata was not provided and has been compiled by the Bioregional Assessment Programme based on known details at the time of acquisition.

This dataset includes the works details from with surface water licences from NSW in the NIC/NAM Additional PAE region. The short guide to NSW Office of Water's licensing data has been provided to accompany the dataset (both the spatial locations and the associated licence details).

A SHORT GUIDE TO NSW OFFICE OF WATER'S LICENSING DATA

Methodology

Using the supplied polygons a spatial select was taken for each polygon area for the Surface and Groundwater Approved Work locations. These Work Location points were exported to an ArcGIS 10.0 File Geodatabase for each polygon area. These work locations have a "Status" of either "Active" (under the Water Act) or "Current" (under the Water Management Act).

The Approved License number attached to each Work was then used to query the Office of Water's Water Licensing System (WLS) to extract details on each Approved license including any linked Water Access Licenses (WAL) if the Work was now under the Water Management Act (WMA). These files end in *_WLS-EXTRACT_n.xls.

If found the linked WAL number is used to re-query using WLS to extract details on each linked WAL. These files end in *_WLS-EXTRACT_n_WALs_volume.xls.

It should be noted that due to query size constraints in WLS the output files for each polygon area may be split into a number of subset files ("n" being the number of the subset).

The field headings are as per the WLS Extract report. They include some characters (e.g. "") that may cause problems if loaded into ArcGIS. Not knowing how the data is to be used I have not amended them.

Understanding Licensing data

A Licensed Work Approval may have more than work (and therefore work location, i.e. point) associated with it. If the Licensed Work Approval is under the old Water Act it may have associated with it an "Entitlement" volume (if on a Regulated River) or an "Allocation" volume in an unregulated area. Please note that these volumes are for the whole licensed approval distributed amongst the related works but not against any particular one.

A Licensed Work Approval, if under the newer Water Management Act may have more than one linked WAL. Each WAL may have a "Share Component" volume associated with it. This will nee to be summed against each linked Licensed Work Approval to get the total WAL volume. Please note again that these volumes are for the whole licensed approval distributed amongst the related works but not against any particular one.

It is important to note that under the WMA it is possible for WALs not to have a linked Licensed Work Approval (to support Water Trading). This means a spatial select with not find these WALs and the volumes associated with them. The WAL is still related to a particular Water Source and can be re-associated with a different Licensed Work Approval at a later date.

This dataset has been provided to the BA Programme for use within the programme only. Third parties may request a copy of the data from DPI Water (previously known as the NSW Office of Water) at http://www.water.nsw.gov.au/.

Dataset History

This dataset was extracted from the NSW Office of Water's licensing system. Work Location points were exported to an ArcGIS 10.0 File Geodatabase for each polygon area supplied by the Bioregional Assessment project teams for each area. Corresponding work locations found with each polygon were exported from the licensing system.

Dataset Citation

NSW Office of Water (2014) Missing SW Licensing Data in the Namoi PAE 20140711. Bioregional Assessment Source Dataset. Viewed 11 December 2018, http://data.bioregionalassessments.gov.au/dataset/131b847c-7fe3-4b5f-a610-e969b2e54ca4.
Water-quality data imputation with a high percentage of missing values: a...
zenodo.org
explore.openaire.eu
+1more
csv
Updated Jun 8, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati (2021). Water-quality data imputation with a high percentage of missing values: a machine learning approach [Dataset]. http://doi.org/10.5281/zenodo.4731169
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.4731169
Dataset updated
Jun 8, 2021
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Rafael Rodríguez; Rafael Rodríguez; Marcos Pastorini; Marcos Pastorini; Lorena Etcheverry; Lorena Etcheverry; Christian Chreties; Mónica Fossati; Alberto Castro; Alberto Castro; Angela Gorgoglione; Angela Gorgoglione; Christian Chreties; Mónica Fossati
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water resource management. However, water-quality studies are limited by the lack of complete and reliable data sets on surface-water-quality variables. These deficiencies are particularly noticeable in developing countries.

This work focuses on surface-water-quality data from Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. Data collected at six monitoring stations are publicly available at https://www.dinama.gub.uy/oan/datos-abiertos/calidad-agua/. The high temporal and spatial variability that characterizes water-quality variables and the high rate of missing values (between 50% and 70%) raises significant challenges.

To deal with missing values, we applied several statistical and machine-learning imputation methods. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Huber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)).

IDW outperformed the others, achieving a very good performance (NSE greater than 0.8) in most cases.

In this dataset, we include the original and imputed values for the following variables:

Water temperature (Tw)

Dissolved oxygen (DO)

Electrical conductivity (EC)

pH

Turbidity (Turb)

Nitrite (NO2-)

Nitrate (NO3-)

Total Nitrogen (TN)

Each variable is identified as [STATION] VARIABLE FULL NAME (VARIABLE SHORT NAME) [UNIT METRIC].

More details about the study area, the original datasets, and the methodology adopted can be found in our paper https://www.mdpi.com/2071-1050/13/11/6318.

If you use this dataset in your work, please cite our paper:
Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318
f
Data from: Variable Selection with Multiply-Imputed Datasets: Choosing...
tandf.figshare.com
pdf
Updated Jun 3, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jiacong Du; Jonathan Boss; Peisong Han; Lauren J. Beesley; Michael Kleinsasser; Stephen A. Goutman; Stuart Batterman; Eva L. Feldman; Bhramar Mukherjee (2023). Variable Selection with Multiply-Imputed Datasets: Choosing Between Stacked and Grouped Methods [Dataset]. http://doi.org/10.6084/m9.figshare.19111441.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.19111441.v2
Dataset updated
Jun 3, 2023
Dataset provided by
Taylor & Francis
Authors
Jiacong Du; Jonathan Boss; Peisong Han; Lauren J. Beesley; Michael Kleinsasser; Stephen A. Goutman; Stuart Batterman; Eva L. Feldman; Bhramar Mukherjee
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Penalized regression methods are used in many biomedical applications for variable selection and simultaneous coefficient estimation. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors. This article considers a general class of penalized objective functions which, by construction, force selection of the same variables across imputed datasets. By pooling objective functions across imputations, optimization is then performed jointly over all imputed datasets rather than separately for each dataset. We consider two objective function formulations that exist in the literature, which we will refer to as “stacked” and “grouped” objective functions. Building on existing work, we (i) derive and implement efficient cyclic coordinate descent and majorization-minimization optimization algorithms for continuous and binary outcome data, (ii) incorporate adaptive shrinkage penalties, (iii) compare these methods through simulation, and (iv) develop an R package miselect. Simulations demonstrate that the “stacked” approaches are more computationally efficient and have better estimation and selection properties. We apply these methods to data from the University of Michigan ALS Patients Biorepository aiming to identify the association between environmental pollutants and ALS risk. Supplementary materials for this article are available online.
Benchmark datasets to study fairness in synthetic data generation
zenodo.org
csv, json
Updated Aug 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Joao Fonseca; Joao Fonseca (2024). Benchmark datasets to study fairness in synthetic data generation [Dataset]. http://doi.org/10.5281/zenodo.13385610
Explore at:
csv, jsonAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.13385610
Dataset updated
Aug 28, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Joao Fonseca; Joao Fonseca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The traveltime dataset is based on the Folktables project covering US census data. The target is a binary variable encoding whether or not the individual needs to travel more than 20 minutes for work; here, having a shorter travel time is the desirable outcome. We use a subset of data from the states of California, Florida, Maine, New York, Utah, and Wyoming states in 2018. Although the folktables dataset does not have any missing values, there are some values recorded as NaN due to the Bureau's data collection methodology. We remove the "esp" column, which encodes the employment status of parents, and has 99.55% missing values. We encode the missing values in the povpip, income to poverty ratio (0.85%), to -1 in accordance to the methodology in Ding et al.. See https://arxiv.org/pdf/2108.04884 for metadata.

The cardio (a) dataset contains patient data recorded during medical examination, including 3 binary features supplied by the patient. The target class denotes the presence of cardiovascular disease. This dataset represents predictive tasks that allocate access to priority medical care for patients, and has been used for fairness evaluations in the domain.

The credit dataset contains historical financial data of borrowers, including past non-serious delinquencies. Here, a serious delinquency is considered to be 90 days past due, and this is the target variable.

The German Credit dataset (https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data) contains financial and personal information regarding loan-seeking applicants.

Facebook

Twitter

Click to copy link

Link copied

Cite

Statista (2024). Number of missing persons files in the U.S. 2022, by race [Dataset]. https://www.statista.com/statistics/240396/number-of-missing-persons-files-in-the-us-by-race/

Number of missing persons files in the U.S. 2022, by race

Explore at:

Dataset updated

Jul 5, 2024

Dataset authored and provided by

Statistahttp://statista.com/

Time period covered

2022

Area covered

United States

Description

In 2022, there were 313,017 cases filed by the NCIC where the race of the reported missing was White. In the same year, 18,928 people were missing whose race was unknown.

What is the NCIC?

The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide.

Missing people in the United States

A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.

Clear search

Close search

Google apps

Main menu

Number of missing persons files in the U.S. 2022, by race

NCRB: State and Gender-wise number of children reported missing and traced

Data from: National Incidence Studies of Missing, Abducted, Runaway, and...

NCRB: State and Gender-wise Number of Persons Reported Missing and Traced

National Missing and Unidentified Persons System (NamUs)

Missing and Unaccounted-for People in Mexico (1960s–2025)

‘Missing Migrants Dataset’ analyzed by Analyst-2

About the Missing Migrants Data

Where is the data from?

Who is included in Missing Migrants Project data?

How complete is the data on dead and missing migrants?

What can be understood through this data?

Why collect data on missing and dead migrants?

Identification and tracing of the dead and missing

Content

Crimes Against Children from NCRB: Year-and Type-of-crime-wise Number of...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...

OPP Missing Persons Annual Report Data

M.C.6.a_Percentage of Missing Sidewalk Network Completed

Crime Data from 2010 to 2019

Quarterly Labour Force Survey Household Dataset, April - June, 2021

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting Program Data:...

PV Generation and Consumption Dataset of an Estonian Residential Dwelling

‘MISSING MIGRANTS (2014-2021)’ analyzed by Analyst-2

Context

What is included in Missing Migrants Project data?

What is excluded?

What sources of information are used in the Missing Migrants Project database?

Content

What are the variables used in the Missing Migrants Project database?

Missing SW Licensing Data in the Namoi PAE 20140711

Abstract

Dataset History

Dataset Citation

Water-quality data imputation with a high percentage of missing values: a...

Data from: Variable Selection with Multiply-Imputed Datasets: Choosing...

Benchmark datasets to study fairness in synthetic data generation

Number of missing persons files in the U.S. 2022, by race