Facebook
TwitterIn 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains the state-wise number of persons reported missing in a particular year, the total number of persons missing including those from previous years, the number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
Note: Figures for projected_mid_year_population are sourced from the Report of the Technical Group on Population Projections for India and States 2011-2036
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
Ministry of Home Affairs, Government of India has defined missing child as 'a person below eighteen years of age, whose whereabouts are not known to the parents, legal guardians and any other persons who may be legally entrusted with the custody of the child, whatever may be the circumstances/causes of disappearance”. The dataset contains the state wise and gender-wise number of children reported missing in a particular year, total number of persons missing including those from previous years, number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This project provides a comprehensive dataset of over 130,000 missing and unaccounted-for people in Mexico from the 1960s to 2025. The dataset is sourced from the publicly available records on the RNPDO website and represents individuals who were actively missing as of the date of collection (October 1, 2025). To protect individual identities, personal identifiers, such as names, have been removed.Dataset Features:The data has been cleaned and translated to facilitate analysis by a global audience.Fields include:SexDate of birthDate of incidenceState and municipality of the incidentData spans over six decades, offering insights into trends and regional disparities.Additional Materials:Python Script: A Python script to generate customizable visualizations based on the dataset. Users can specify the state to generate tailored charts.Sample Chart: An example chart showcasing the evolution of missing persons per 100,000 inhabitants in Mexico between 2006 and 2025.Requirements File: A requirements.txt file listing the necessary Python libraries to run the script seamlessly.This dataset and accompanying tools aim to support researchers, policymakers, and journalists in analyzing and addressing the issue of missing persons in Mexico.
Facebook
TwitterOpen Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
License information was derived automatically
Under Section 8 of the Missing Persons Act, 2018, police services are required to report annually on their use of urgent demands for records under the Act and the Ministry of the Solicitor General is required to make the OPP’s annual report data publicly available. The data includes: * year in which the urgent demands were reported * category of records * description of records accessed under each category * total number of times each category of records was demanded * total number of missing persons investigations which had urgent demands for records * total number of urgent demands for records made by OPP in a year.
Facebook
Twitterhttps://dataful.in/terms-and-conditionshttps://dataful.in/terms-and-conditions
The dataset contains the age-group wise and gender-wise number of persons reported missing in a particular year, total number of persons missing including those from previous years, number of persons recovered/traced and those unrecovered/untraced. The dataset also contains the percentage recovery of missing persons which is calculated as the percentage share of total number of persons traced over the total number of persons missing. NCRB started providing detailed data on missing & traced persons including children from 2016 onwards following the Supreme Court’s direction in a Writ Petition. It should also be noted that the data published by NCRB is restricted to those cases where FIRs have been registered by the police in respective States/UTs.
Facebook
TwitterNamUs is the only national repository for missing, unidentified, and unclaimed persons cases. The program provides a singular resource hub for law enforcement, medical examiners, coroners, and investigating professionals. It is the only national database for missing, unidentified, and unclaimed persons that allows limited access to the public, empowering family members to take a more proactive role in the search for their missing loved ones.
Facebook
TwitterThis data is sourced from the International Organization for Migration. The data is part of a specific project called the Missing Migrants Project which tracks deaths of migrants, including refugees , who have gone missing along mixed migration routes worldwide. The research behind this project began with the October 2013 tragedies, when at least 368 individuals died in two shipwrecks near the Italian island of Lampedusa. Since then, Missing Migrants Project has developed into an important hub and advocacy source of information that media, researchers, and the general public access for the latest information.
Missing Migrants Project data are compiled from a variety of sources. Sources vary depending on the region and broadly include data from national authorities, such as Coast Guards and Medical Examiners; media reports; NGOs; and interviews with survivors of shipwrecks. In the Mediterranean region, data are relayed from relevant national authorities to IOM field missions, who then share it with the Missing Migrants Project team. Data are also obtained by IOM and other organizations that receive survivors at landing points in Italy and Greece. In other cases, media reports are used. IOM and UNHCR also regularly coordinate on such data to ensure consistency. Data on the U.S./Mexico border are compiled based on data from U.S. county medical examiners and sheriff’s offices, as well as media reports for deaths occurring on the Mexico side of the border. Estimates within Mexico and Central America are based primarily on media and year-end government reports. Data on the Bay of Bengal are drawn from reports by UNHCR and NGOs. In the Horn of Africa, data are obtained from media and NGOs. Data for other regions is drawn from a combination of sources, including media and grassroots organizations. In all regions, Missing Migrants Projectdata represents minimum estimates and are potentially lower than in actuality.
Updated data and visuals can be found here: https://missingmigrants.iom.int/
IOM defines a migrant as any person who is moving or has moved across an international border or within a State away from his/her habitual place of residence, regardless of
(1) the person’s legal status;
(2) whether the movement is voluntary or involuntary;
(3) what the causes for the movement are; or
(4) what the length of the stay is.[1]
Missing Migrants Project counts migrants who have died or gone missing at the external borders of states, or in the process of migration towards an international destination. The count excludes deaths that occur in immigration detention facilities, during deportation, or after forced return to a migrant’s homeland, as well as deaths more loosely connected with migrants’ irregular status, such as those resulting from labour exploitation. Migrants who die or go missing after they are established in a new home are also not included in the data, so deaths in refugee camps or housing are excluded. This approach is chosen because deaths that occur at physical borders and while en route represent a more clearly definable category, and inform what migration routes are most dangerous. Data and knowledge of the risks and vulnerabilities faced by migrants in destination countries, including death, should not be neglected, rather tracked as a distinct category.
Data on fatalities during the migration process are challenging to collect for a number of reasons, most stemming from the irregular nature of migratory journeys on which deaths tend to occur. For one, deaths often occur in remote areas on routes chosen with the explicit aim of evading detection. Countless bodies are never found, and rarely do these deaths come to the attention of authorities or the media. Furthermore, when deaths occur at sea, frequently not all bodies are recovered - sometimes with hundreds missing from one shipwreck - and the precise number of missing is often unknown. In 2015, over 50 per cent of deaths recorded by the Missing Migrants Project refer to migrants who are presumed dead and whose bodies have not been found, mainly at sea.
Data are also challenging to collect as reporting on deaths is poor, and the data that does exist are highly scattered. Few official sources are collecting data systematically. Many counts of death rely on media as a source. Coverage can be spotty and incomplete. In addition, the involvement of criminal actors in incidents means there may be fear among survivors to report deaths and some deaths may be actively covered-up. The irregular immigration status of many migrants, and at times their families as well, also impedes reporting of missing persons or deaths.
The vary...
Facebook
TwitterAttribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
This is official open data from The Ministry of Internal Affairs of the Russian Federation on missing and wanted people, identified and unindentified corpses. Original data available here source.
File meta.csv - contain information about data source and contact information of original owners in Russian.
File structure-20140727.csv - describe datastructure in Russian. Main things that you need to know about data columns are here:
"Name of the statistical factor" - this one speaks for itself. Available factors:
-- Identified persons from among those who were wanted, including those who disappeared from the bodies of inquiry, investigation, court.
-- Total cases on the identification of citizens on unidentified corpses that were on the register.
-- Total wanted persons, including those who disappeared from the bodies of inquiry, investigation, court.
-- Identified persons from among the wanted persons, including those missing.
-- Total wanted persons.
-- Number (balance) of unreturned missing persons in relation to 2011 (%)
-- Number (balance) of unresolved criminals against 2011 (%)
-- Total discontinued cases in connection with the identification of the person
-- Total wanted persons, including those missing
-- Identified persons from the number of wanted persons
"Importance of the statistical factor" - value of correspondent statistical factor.
Files data-%Y%m%d-structure-20140727.csv contain actual data. Names of the files contain release date. Data aggregated by quarters of each year, for example
data-20150127-structure-20140727.csv - data for whole 2014 year
data-20150627-structure-20140727.csv - data for Q1 and Q2 of 2015
File translate.csv is used to simplify translation from Russian to English. See usage in the kernel.
Thanks to newspaper Komsomolskaya Pravda for bringing up the issue of missing kids in Russia.
Thanks to Liza Alert - Volunteer Search and Rescue Squad for efforts in rescue of missing people in Russia.
Photo by Alessio Lin on Unsplash
Missing people, especially kids, is a serious problem. However there is not much detailed information about it. Russian officials provide overall information without detalisation of victim's age. As a result many speculations appear in media on this topic:
Some insights to official data can be found here interview, year 2012: "Annually in Russia about 20 thousand minors disappear, in 90% of cases the police find children".
Still there is no information about kids in recent years. If you have any reliable sources, please share.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This synthetic dataset contains 4,362 rows and five columns, including both numerical and categorical data. It is designed for data cleaning, imputation, and analysis tasks, featuring structured missing values at varying percentages (63%, 4%, 47%, 31%, and 9%).
The dataset includes:
- Category (Categorical): Product category (A, B, C, D)
- Price (Numerical): Randomized product prices
- Rating (Numerical): Ratings between 1 to 5
- Stock (Categorical): Availability status (In Stock, Out of Stock)
- Discount (Numerical): Discount percentage
This dataset is ideal for practicing missing data handling, exploratory data analysis (EDA), and machine learning preprocessing.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Mississippi Repository for Missing and Unidentified Persons (MS Repository) was developed in January 2022 to help identify, resolve, and archive Mississippi’s missing and unidentified persons cases. The MS Repository, housed at Mississippi State University, serves as a statewide missing and unidentified persons clearinghouse database. The MS Repository is under the purview of the Cobb Institute of Archaeology (including the Department of Anthropology and Middle Eastern Cultures) and the MSU Police Department (MSUPD). In collaboration with law enforcement agencies throughout the state, the goals of the MS Repository are to:1. Provide a centralized location for data on missing and unidentified persons from Mississippi2. Increase missing persons public access for all Mississippians3. Visualize socioeconomic and medicolegal disparities affecting missing persons through geospatial analysis4. Partner with neighboring states to facilitate data sharing of missing and unidentified persons information.The lack of comprehensive missing and unidentified persons repository data at the state and national levels continues to hinder identifying missing and unidentified people. The MS Repository is the only secure, formalized, searchable Mississippi data repository for unidentified and missing persons information. It includes missing and unidentified persons information from the National Missing and Unidentified Persons System (NamUS), law enforcement missing persons reports on social media, cases from non-profit missing persons advocacy groups, and reports from families with missing loved ones. Like NamUS, the MS Repository provides demographic information about the missing individual and case circumstances, including last seen date and location. Each profile has a built-in capacity for holding copies of medical records and DNA records results (including family reference samples). All profiles (current and resolved) are stored electronically and available in perpetuity, regardless of case status. In addition to the database, there is a searchable clearinghouse website accessible to the public (missinginms.msstate.edu).
Facebook
TwitterThe National Incidence Studies of Missing, Abducted, Runaway, and Thrownaway Children (NISMART) were undertaken in response to the mandate of the 1984 Missing Children's Assistance Act (Pub.L. 98-473) that requires the Office of Juvenile Justice and Delinquency Prevention (OJJDP) to conduct periodic national incidence studies to determine the actual number of children reported missing and the number of missing children who are recovered for a given year. The first such study, NISMART-1 (NATIONAL INCIDENCE STUDIES OF MISSING, ABDUCTED, RUNAWAY, AND THROWNAWAY CHILDREN (NISMART), 1988 [ICPSR 9682]), was conducted from 1988 to 1989 and addressed this mandate by defining major types of missing child episodes and estimating the number of children who experienced missing child episodes of each type in 1988. At that time, the lack of a standardized definition of a "missing child" made it impossible to provide a single estimate of missing children. As a result, one of the primary goals of NISMART-2 was to develop a standardized definition and provide unified estimates of the number of missing children in the United States. Both NISMART-1 and NISMART-2 comprise several component datasets designed to provide a comprehensive picture of the population of children who experienced qualifying episodes, with each component focusing on a different aspect of the missing child population. The Household Survey -- Youth Data and the Household Survey -- Adult Data (Parts 1-2) are similar but separate surveys, one administered to the adult primary caretaker of the children in the sampled household and the other to a randomly selected household youth aged 10 through 18 at the time of interview. The Juvenile Facilities Data on Runaways (Part 3) sought to estimate the number of runaways from juvenile residential facilities in order to supplement the household survey estimate of the number of runaways from households. And the Law Enforcement Study Data, by case perpetrator, and victim, (Parts 4-6) intended to estimate the number of children who were victims of stereotypical kidnappings and to obtain a sample of these cases for in-depth study.
Facebook
TwitterOpen Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Note: These statistics are published as Official Statistics. Users should be cautious making comparisons between local authorities, or across years due to changing reporting practices - see the methodology document for further information. Children looked after who were missing. Figures by duration of missing periods, placement from which the child went missing and age of child at start of missing incident. Data formerly in table G1.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Missing 411 is a series of books that describe people who go missing in unusual circumstances. The books contain a plethora of very precise, and very comprehensive data about each victim, and the circumstances surrounding each particular disappearance.
Everything that is stated directly or can be inferred about a victim. For example, if people go missing in various parts of Arizona, "tablelands" will often be true, since this is a prominent geological feature of Arizona. However, it will not be marked true unless there are tableland features apparent in the case itself, or from images of the area in which the case happened.
Thanks to David Paulides for serious empirical rigor. His data is fun to catalogue because there is very little speculation to sort through. He focuses on facts in the form of statements taken from newspapers, police reports, park service reports, and eye-witness accounts. David does not muddle his books with theories and speculation about unrelated phenomena.
I want to see machine learning and human inference used to identify patterns in this data that can help draw an increasingly clear profile of who the attacker is, what causes people to go missing, what it means for society more broadly. Possible questions to answer:
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve.
The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj.
The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 .
The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 .
The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed.
COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update.
The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates.
The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used.
Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf
Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic.
Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical examiner) using their best clinical judgment. Additionally, all COVID-19 deaths, including suspected or related, are required to be reported to OCME. On April 4, 2020, CT DPH and OCME released a joint memo to providers and facilities within Connecticut providing guidelines for certifying deaths due to COVID-19 that were consistent with the CDC’s guidelines and a reminder of the required reporting to OCME.25,26 As of July 1, 2021, OCME had reviewed every case reported and performed additional investigation on about one-third of reported deaths to better ascertain if COVID-19 did or did not cause or contribute to the death. Some of these investigations resulted in the OCME performing postmortem swabs for PCR testing on individuals whose deaths were suspected to be due to COVID-19, but antemortem diagnosis was unable to be made.31 The OCME issued or re-issued about 10% of COVID-19 death certificates and, when appropriate, removed COVID-19 from the death certificate. For standardization and tabulation of mortality statistics, written cause of death statements made by the certifiers on death certificates are sent to the National Center for Health Statistics (NCHS) at the CDC which assigns cause of death codes according to the International Causes of Disease 10th Revision (ICD-10) classification system.25,26 COVID-19 deaths in this report are defined as those for which the death certificate has an ICD-10 code of U07.1 as either a primary (underlying) or a contributing cause of death. More information on COVID-19 mortality can be found at the following link: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Mortality/Mortality-Statistics
Data are subject to future revision as reporting changes.
Starting in July 2020, this dataset will be updated every weekday.
Additional notes: A delay in the data pull schedule occurred on 06/23/2020. Data from 06/22/2020 was processed on 06/23/2020 at 3:30 PM. The normal data cycle resumed with the data for 06/23/2020.
A network outage on 05/19/2020 resulted in a change in the data pull schedule. Data from 5/19/2020 was processed on 05/20/2020 at 12:00 PM. Data from 5/20/2020 was processed on 5/20/2020 8:30 PM. The normal data cycle resumed on 05/20/2020 with the 8:30 PM data pull. As a result of the network outage, the timestamp on the datasets on the Open Data Portal differ from the timestamp in DPH's daily PDF reports.
Starting 5/10/2021, the date field will represent the date this data was updated on data.ct.gov. Previously the date the data was pulled by DPH was listed, which typically coincided with the date before the data was published on data.ct.gov. This change was made to standardize the COVID-19 data sets on data.ct.gov.
Facebook
TwitterNASA's Making Earth System Data Records for Use in Research Environments (MEaSUREs) Global Land Cover Mapping and Estimation (GLanCE) annual 30 meter (m) Version 1 data product provides global land cover and land cover change data derived from Landsat 5 Thematic Mapper (TM), Landsat 7 Enhanced Thematic Mapper Plus (ETM+), and Landsat 8 Operational Land Imager (OLI). These maps provide the user community with land cover type, land cover change, metrics characterizing the magnitude and seasonality of greenness of each pixel, and the magnitude of change. GLanCE data products will be provided using a set of seven continental grids that use Lambert Azimuthal Equal Area projections parameterized to minimize distortion for each continent. Currently, North America, South America, Europe, and Oceania are available. This dataset is useful for a wide range of applications, including ecosystem, climate, and hydrologic modeling; monitoring the response of terrestrial ecosystems to climate change; carbon accounting; and land management. The GLanCE data product provides seven layers: the land cover class, the estimated day of year of change, integer identifier for class in previous year, median and amplitude of the Enhanced Vegetation Index (EVI2) in the year, rate of change in EVI2, and the change in EVI2 median from previous year to current year. A low-resolution browse image representing EVI2 amplitude is also available for each granule.Known Issues Version 1.0 of the data set does not include Quality Assurance, Leaf Type or Leaf Phenology. These layers are populated with fill values. These layers will be included in future releases of the data product. * Science Data Set (SDS) values may be missing, or of lower quality, at years when land cover change occurs. This issue is a by-product of the fact that Continuous Change Detection and Classification (CCDC) does not fit models or provide synthetic reflectance values during short periods of time between time segments. * The accuracy of mapping results varies by land cover class and geography. Specifically, distinguishing between shrubs and herbaceous cover is challenging at high latitudes and in arid and semi-arid regions. Hence, the accuracy of shrub cover, herbaceous cover, and to some degree bare cover, is lower than for other classes. * Due to the combined effects of large solar zenith angles, short growing seasons, lower availability of high-resolution imagery to support training data, the representation of land cover at land high latitudes in the GLanCE product is lower than in mid latitudes. * Shadows and large variation in local zenith angles decrease the accuracy of the GLanCE product in regions with complex topography, especially at high latitudes. * Mapping results may include artifacts from variation in data density in overlap zones between Landsat scenes relative to mapping results in non-overlap zones. * Regions with low observation density due to cloud cover, especially in the tropics, and/or poor data density (e.g. Alaska, Siberia, West Africa) have lower map quality. * Artifacts from the Landsat 7 Scan Line Corrector failure are occasionally evident in the GLanCE map product. High proportions of missing data in regions with snow and ice at high elevations result in missing data in the GLanCE SDSs.* The GlanCE data product tends to modestly overpredict developed land cover in arid regions.
Facebook
TwitterODC Public Domain Dedication and Licence (PDDL) v1.0http://www.opendatacommons.org/licenses/pddl/1.0/
License information was derived automatically
A. SUMMARY This dataset includes COVID-19 tests by resident neighborhood and specimen collection date (the day the test was collected). Specifically, this dataset includes tests of San Francisco residents who listed a San Francisco home address at the time of testing. These resident addresses were then geo-located and mapped to neighborhoods. The resident address associated with each test is hand-entered and susceptible to errors, therefore neighborhood data should be interpreted as an approximation, not a precise nor comprehensive total.
In recent months, about 5% of tests are missing addresses and therefore cannot be included in any neighborhood totals. In earlier months, more tests were missing address data. Because of this high percentage of tests missing resident address data, this neighborhood testing data for March, April, and May should be interpreted with caution (see below)
Percentage of tests missing address information, by month in 2020 Mar - 33.6% Apr - 25.9% May - 11.1% Jun - 7.2% Jul - 5.8% Aug - 5.4% Sep - 5.1% Oct (Oct 1-12) - 5.1%
To protect the privacy of residents, the City does not disclose the number of tests in neighborhoods with resident populations of fewer than 1,000 people. These neighborhoods are omitted from the data (they include Golden Gate Park, John McLaren Park, and Lands End).
Tests for residents that listed a Skilled Nursing Facility as their home address are not included in this neighborhood-level testing data. Skilled Nursing Facilities have required and repeated testing of residents, which would change neighborhood trends and not reflect the broader neighborhood's testing data.
This data was de-duplicated by individual and date, so if a person gets tested multiple times on different dates, all tests will be included in this dataset (on the day each test was collected).
The total number of positive test results is not equal to the total number of COVID-19 cases in San Francisco. During this investigation, some test results are found to be for persons living outside of San Francisco and some people in San Francisco may be tested multiple times (which is common). To see the number of new confirmed cases by neighborhood, reference this map: https://sf.gov/data/covid-19-case-maps#new-cases-maps
B. HOW THE DATASET IS CREATED COVID-19 laboratory test data is based on electronic laboratory test reports. Deduplication, quality assurance measures and other data verification processes maximize accuracy of laboratory test information. All testing data is then geo-coded by resident address. Then data is aggregated by analysis neighborhood and specimen collection date.
Data are prepared by close of business Monday through Saturday for public display.
C. UPDATE PROCESS Updates automatically at 05:00 Pacific Time each day. Redundant runs are scheduled at 07:00 and 09:00 in case of pipeline failure.
D. HOW TO USE THIS DATASET San Francisco population estimates for geographic regions can be found in a view based on the San Francisco Population and Demographic Census dataset. These population estimates are from the 2016-2020 5-year American Community Survey (ACS).
Due to the high degree of variation in the time needed to complete tests by different labs there is a delay in this reporting. On March 24 the Health Officer ordered all labs in the City to report complete COVID-19 testing information to the local and state health departments.
In order to track trends over time, a data user can analyze this data by "specimen_collection_date".
Calculating Percent Positivity: The positivity rate is the percentage of tests that return a positive result for COVID-19 (positive tests divided by the sum of positive and negative tests). Indeterminate results, which could not conclusively determine whether COVID-19 virus was present, are not included in the calculation of percent positive. Percent positivity indicates how widespread COVID-19 is in San Francisco and it helps public health officials determine if we are testing enough given the number of people who are testing positive. When there are fewer than 20 positives tests for a given neighborhood and time period, the positivity rate is not calculated for the public tracker because rates of small test counts are less reliable.
Calculating Testing Rates: To calculate the testing rate per 10,000 residents, divide the total number of tests collected (positive, negative, and indeterminate results) for neighborhood by the total number of residents who live in that neighborhood (included in the dataset), then multiply by 10,000. When there are fewer than 20 total tests for a given neighborhood and time period, the testing rate is not calculated for the public tracker because rates of small test counts are less reliable.
Read more about how this data is updated and validated daily: https://sf.gov/information/covid-19-data-questions
E. CHANGE LOG
Facebook
TwitterA current-year-only universe of Cook County parcels with attached geographic, governmental, and spatial data. When working with Parcel Index Numbers (PINs) make sure to zero-pad them to 14 digits. Some datasets may lose leading zeros for PINs when downloaded. Additional notes:Non-taxing district data is attached via spatial join (st_contains) to each parcel's centroid. Tax district data (school district, park district, municipality, etc.) are attached by a parcel's assigned tax code. Centroids are based on Cook County parcel shapefiles. Older properties may be missing coordinates and thus also missing attached spatial data (usually they are missing a parcel boundary in the shapefile). Newer properties may be missing a mailing or property address, as they need to be assigned one by the postal service. This dataset contains data for the current tax year, which may not yet be complete or final. Assessed values for any given year are subject to change until review and certification of values by the Cook County Board of Review, though there are a few rare circumstances where values may change for the current or past years after that. Rowcount for a given year is final once the Assessor has certified the assessment roll all townships. Data will be updated monthly. Depending on the time of year, some third-party and internal data will be missing for the most recent year. Assessments mailed this year represent values from last year, so this isn't an issue. By the time the Data Department models values for this year, those data will have populated. Current property class codes, their levels of assessment, and descriptions can be found on the Assessor's website. Note that class codes details can change across time. Due to discrepancies between the systems used by the Assessor and Clerk's offices, tax_district_code is not currently up-to-date in this table. There are currently two different sources of parcel-level municipality available in this data set, and they will not always agree: tax and spatial records. Tax records from the Cook County Clerk indicate the municipality to which a parcel owner pays taxes, while spatial records, also from the Cook County Clerk, indicate the municipal boundaries within which a parcel lies. For more information on the sourcing of attached data and the preparation of this dataset, see the Assessor's Standard Operating Procedures for Open Data on GitHub. Read about the Assessor's 2025 Open Data Refresh.
Facebook
Twitterhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.7910/DVN/WIYLEHhttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.3/customlicense?persistentId=doi:10.7910/DVN/WIYLEH
Originally published by Harte-Hanks, the CiTDS dataset is now produced by Aberdeen Group, a subsidiary of Spiceworks Ziff Davis (SWZD). It is also referred to as CiTDB (Computer Intelligence Technology Database). CiTDS provides data on digital investments of businesses across the globe. It includes two types of technology datasets: (i) hardware expenditures and (ii) product installs. Hardware expenditure data is constructed through a combination of surveys and modeling. A survey is administered to a number of companies and the data from surveys is used to develop a prediction model of expenditures as a function of firm characteristics. CiTDS uses this model to predict the expenditures of non-surveyed firms and reports them in the dataset. In contrast, CiTDS does not do any imputation for product install data, which comes entirely from web scraping and surveys. A confidence score between 1-3 is assigned to indicate how much the source of information can be trusted. A 3 corresponds to 90-100 percent install likelihood, 2 corresponds to 75-90 percent install likelihood and 1 corresponds to 65-75 percent install likelihood. CiTDS reports technology adoption at the site level with a unique DUNS identifier. One of these sites is identified as an “enterprise,” corresponding to the firm that owns the sites. Therefore, it is possible to analyze technology adoption both at the site (establishment) and enterprise (firm) levels. CiTDS sources the site population from Dun and Bradstreet every year and drops sites that are not relevant to their clients. Due to this sample selection, there is quite a bit of variation in the number of sites from year to year, where on average, 10-15 percent of sites enter and exit every year in the US data. This number is higher in the EU data. We observe similar turnover year-to-year in the products included in the dataset. Some products have become absolute, and some new products are added every year. There are two versions of the data: (i) version 3, which covers 2016-2020, and (ii) version 4, which covers 2020-2021. The quality of version 4 is significantly better regarding the information included about the technology products. In version 3, product categories have missing values, and they are abbreviated in a way that are sometimes difficult to interpret. Version 4 does not have any major issues. Since both versions of the data are available in 2020, CiTDS provides a crosswalk between the versions. This makes it possible to use information about products in Version 4 for the products in Version 3, with the caveats that there will be no crosswalk for the products that exist in 2016-2019 but not in 2020. Finally, special attention should be paid to data from 2016, where the coverage is significantly different from 2017. From 2017 onwards, coverage is more consistent. Years of Coverage: APac: 2019 - 2021 Canada: 2015 - 2021 EMEA: 2019 - 2021 Europe: 2015 - 2018 Latin America: 2015, 2019- 2021 United States: 2015 - 2021
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The monitoring of surface-water quality followed by water-quality modeling and analysis is essential for generating effective strategies in water resource management. However, water-quality studies are limited by the lack of complete and reliable data sets on surface-water-quality variables. These deficiencies are particularly noticeable in developing countries.
This work focuses on surface-water-quality data from Santa Lucía Chico river (Uruguay), a mixed lotic and lentic river system. Data collected at six monitoring stations are publicly available at https://www.dinama.gub.uy/oan/datos-abiertos/calidad-agua/. The high temporal and spatial variability that characterizes water-quality variables and the high rate of missing values (between 50% and 70%) raises significant challenges.
To deal with missing values, we applied several statistical and machine-learning imputation methods. The competing algorithms implemented belonged to both univariate and multivariate imputation methods (inverse distance weighting (IDW), Random Forest Regressor (RFR), Ridge (R), Bayesian Ridge (BR), AdaBoost (AB), Huber Regressor (HR), Support Vector Regressor (SVR), and K-nearest neighbors Regressor (KNNR)).
IDW outperformed the others, achieving a very good performance (NSE greater than 0.8) in most cases.
In this dataset, we include the original and imputed values for the following variables:
Water temperature (Tw)
Dissolved oxygen (DO)
Electrical conductivity (EC)
pH
Turbidity (Turb)
Nitrite (NO2-)
Nitrate (NO3-)
Total Nitrogen (TN)
Each variable is identified as [STATION] VARIABLE FULL NAME (VARIABLE SHORT NAME) [UNIT METRIC].
More details about the study area, the original datasets, and the methodology adopted can be found in our paper https://www.mdpi.com/2071-1050/13/11/6318.
If you use this dataset in your work, please cite our paper:
Rodríguez, R.; Pastorini, M.; Etcheverry, L.; Chreties, C.; Fossati, M.; Castro, A.; Gorgoglione, A. Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach. Sustainability 2021, 13, 6318. https://doi.org/10.3390/su13116318
Facebook
TwitterIn 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.