In 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
In 2023, the number of missing person files in the United States equaled 563,389 cases, an increase from 2021 which had the lowest number of missing person files in the U.S. since 1990.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comprehensive dataset containing 81 verified Missing persons organization businesses in United States with complete contact information, ratings, reviews, and location data.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The Mississippi Repository for Missing and Unidentified Persons (MS Repository) was developed in January 2022 to help identify, resolve, and archive Mississippi’s missing and unidentified persons cases. The MS Repository, housed at Mississippi State University, serves as a statewide missing and unidentified persons clearinghouse database. The MS Repository is under the purview of the Cobb Institute of Archaeology (including the Department of Anthropology and Middle Eastern Cultures) and the MSU Police Department (MSUPD). In collaboration with law enforcement agencies throughout the state, the goals of the MS Repository are to:1. Provide a centralized location for data on missing and unidentified persons from Mississippi2. Increase missing persons public access for all Mississippians3. Visualize socioeconomic and medicolegal disparities affecting missing persons through geospatial analysis4. Partner with neighboring states to facilitate data sharing of missing and unidentified persons information.The lack of comprehensive missing and unidentified persons repository data at the state and national levels continues to hinder identifying missing and unidentified people. The MS Repository is the only secure, formalized, searchable Mississippi data repository for unidentified and missing persons information. It includes missing and unidentified persons information from the National Missing and Unidentified Persons System (NamUS), law enforcement missing persons reports on social media, cases from non-profit missing persons advocacy groups, and reports from families with missing loved ones. Like NamUS, the MS Repository provides demographic information about the missing individual and case circumstances, including last seen date and location. Each profile has a built-in capacity for holding copies of medical records and DNA records results (including family reference samples). All profiles (current and resolved) are stored electronically and available in perpetuity, regardless of case status. In addition to the database, there is a searchable clearinghouse website accessible to the public (missinginms.msstate.edu).
This data collection represents the empirical materials collected from the ESRC project 'Geographies of Missing People'. It comprises 45 interviews with people previously reported as missing, 9 charity workers, 23 police officers of various ranks and 25 families of missing people. We request that other researchers who wish to reuse our data get in touch to dialogue with the research team about how and why they want to reuse this data. The data is accessible with direct permission from the PI of the original ESRC award: Hester.parr@glasgow.ac.ukThis project seeks to understand the realities involved in 'going missing', and does so from multiple perspectives; using the voices and opinions of the police, families and returned missing people themselves. Qualitative data has been collected to shed light on this significant social (and spatial) problem and help us understand more about the nature of missing experiences for different groups. The purpose of the research project has been to understand more about how people go missing and how the police and families respond to such events (the geographies of searching). Such a focus holds value for both the police and families (the 'left behind') in that it updates and checks current knowledge about the likely spatial experiences of missing people. The project has recruited 45 people formally reported as missing to the project; 9 charity workers in the field of missing persons; 23 police officers of various ranks and 25 family members and these are held by the data archive service. Permission to access from Hester.parr@glasgow.ac.uk Interviews and focus groups. Sampling methods are profiled in the main reports lodged on www.geographiesofmissingpeople.org.uk
Note: DPH is updating and streamlining the COVID-19 cases, deaths, and testing data. As of 6/27/2022, the data will be published in four tables instead of twelve. The COVID-19 Cases, Deaths, and Tests by Day dataset contains cases and test data by date of sample submission. The death data are by date of death. This dataset is updated daily and contains information back to the beginning of the pandemic. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Cases-Deaths-and-Tests-by-Day/g9vi-2ahj. The COVID-19 State Metrics dataset contains over 93 columns of data. This dataset is updated daily and currently contains information starting June 21, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-State-Level-Data/qmgw-5kp6 . The COVID-19 County Metrics dataset contains 25 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-County-Level-Data/ujiq-dy22 . The COVID-19 Town Metrics dataset contains 16 columns of data. This dataset is updated daily and currently contains information starting June 16, 2022 to the present. The data can be found at https://data.ct.gov/Health-and-Human-Services/COVID-19-Town-Level-Data/icxw-cada . To protect confidentiality, if a town has fewer than 5 cases or positive NAAT tests over the past 7 days, those data will be suppressed. COVID-19 cases and associated deaths that have been reported among Connecticut residents, broken down by race and ethnicity. All data in this report are preliminary; data for previous dates will be updated as new reports are received and data errors are corrected. Deaths reported to the either the Office of the Chief Medical Examiner (OCME) or Department of Public Health (DPH) are included in the COVID-19 update. The following data show the number of COVID-19 cases and associated deaths per 100,000 population by race and ethnicity. Crude rates represent the total cases or deaths per 100,000 people. Age-adjusted rates consider the age of the person at diagnosis or death when estimating the rate and use a standardized population to provide a fair comparison between population groups with different age distributions. Age-adjustment is important in Connecticut as the median age of among the non-Hispanic white population is 47 years, whereas it is 34 years among non-Hispanic blacks, and 29 years among Hispanics. Because most non-Hispanic white residents who died were over 75 years of age, the age-adjusted rates are lower than the unadjusted rates. In contrast, Hispanic residents who died tend to be younger than 75 years of age which results in higher age-adjusted rates. The population data used to calculate rates is based on the CT DPH population statistics for 2019, which is available online here: https://portal.ct.gov/DPH/Health-Information-Systems--Reporting/Population/Population-Statistics. Prior to 5/10/2021, the population estimates from 2018 were used. Rates are standardized to the 2000 US Millions Standard population (data available here: https://seer.cancer.gov/stdpopulations/). Standardization was done using 19 age groups (0, 1-4, 5-9, 10-14, ..., 80-84, 85 years and older). More information about direct standardization for age adjustment is available here: https://www.cdc.gov/nchs/data/statnt/statnt06rv.pdf Categories are mutually exclusive. The category “multiracial” includes people who answered ‘yes’ to more than one race category. Counts may not add up to total case counts as data on race and ethnicity may be missing. Age adjusted rates calculated only for groups with more than 20 deaths. Abbreviation: NH=Non-Hispanic. Data on Connecticut deaths were obtained from the Connecticut Deaths Registry maintained by the DPH Office of Vital Records. Cause of death was determined by a death certifier (e.g., physician, APRN, medical
The Integrated Public Use Microdata Series (IPUMS) Complete Count Data include more than 650 million individual-level and 7.5 million household-level records. The microdata are the result of collaboration between IPUMS and the nation’s two largest genealogical organizations—Ancestry.com and FamilySearch—and provides the largest and richest source of individual level and household data.
All manuscripts (and other items you'd like to publish) must be submitted to
phsdatacore@stanford.edu for approval prior to journal submission.
We will check your cell sizes and citations.
For more information about how to cite PHS and PHS datasets, please visit:
https:/phsdocs.developerhub.io/need-help/citing-phs-data-core
This dataset was created on 2020-01-10 22:52:11.461
by merging multiple datasets together. The source datasets for this version were:
IPUMS 1930 households: This dataset includes all households from the 1930 US census.
IPUMS 1930 persons: This dataset includes all individuals from the 1930 US census.
IPUMS 1930 Lookup: This dataset includes variable names, variable labels, variable values, and corresponding variable value labels for the IPUMS 1930 datasets.
Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier.
In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.Historic data are scarce and often only exists in aggregate tables. The key advantage of historic US census data is the availability of individual and household level characteristics that researchers can tabulate in ways that benefits their specific research questions. The data contain demographic variables, economic variables, migration variables and family variables. Within households, it is possible to create relational data as all relations between household members are known. For example, having data on the mother and her children in a household enables researchers to calculate the mother’s age at birth. Another advantage of the Complete Count data is the possibility to follow individuals over time using a historical identifier. In sum: the historic US census data are a unique source for research on social and economic change and can provide population health researchers with information about social and economic determinants.
The historic US 1930 census data was collected in April 1930. Enumerators collected data traveling to households and counting the residents who regularly slept at the household. Individuals lacking permanent housing were counted as residents of the place where they were when the data was collected. Household members absent on the day of data collected were either listed to the household with the help of other household members or were scheduled for the last census subdivision.
Notes
We provide IPUMS household and person data separately so that it is convenient to explore the descriptive statistics on each level. In order to obtain a full dataset, merge the household and person on the variables SERIAL and SERIALP. In order to create a longitudinal dataset, merge datasets on the variable HISTID.
Households with more than 60 people in the original data were broken up for processing purposes. Every person in the large households are considered to be in their own household. The original large households can be identified using the variable SPLIT, reconstructed using the variable SPLITHID, and the original count is found in the variable SPLITNUM.
Coded variables derived from string variables are still in progress. These variables include: occupation and industry.
Missing observations have been allocated and some inconsistencies have been edited for the following variables: SPEAKENG, YRIMMIG, CITIZEN, AGEMARR, AGE, BPL, MBPL, FBPL, LIT, SCHOOL, OWNERSHP, FARM, EMPSTAT, OCC1950, IND1950, MTONGUE, MARST, RACE, SEX, RELATE, CLASSWKR. The flag variables indicating an allocated observation for the associated variables can be included in your extract by clicking the ‘Select data quality flags’ box on the extract summary page.
Most inconsistent information was not edite
This dataset contains information on antibody testing for COVID-19: the number of people who received a test, the number of people with positive results, the percentage of people tested who tested positive, and the rate of testing per 100,000 people, stratified by modified ZIP Code Tabulation Area (ZCTA) of residence. Modified ZCTA reflects the first non-missing address within NYC for each person reported with an antibody test result. This unit of geography is similar to ZIP codes but combines census blocks with smaller populations to allow more stable estimates of population size for rate calculation. It can be challenging to map data that are reported by ZIP Code. A ZIP Code doesn’t refer to an area, but rather a collection of points that make up a mail delivery route. Furthermore, there are some buildings that have their own ZIP Code, and some non-residential areas with ZIP Codes. To deal with the challenges of ZIP Codes, the Health Department uses ZCTAs which solidify ZIP codes into units of area. Often, data reported by ZIP code are actually mapped by ZCTA. The ZCTA geography was developed by the U.S. Census Bureau. These data can also be accessed here: https://github.com/nychealth/coronavirus-data/blob/master/totals/antibody-by-modzcta.csv Exposure to COVID-19 can be detected by measuring antibodies to the disease in a person’s blood, which can indicate that a person may have had an immune response to the virus. Antibodies are proteins produced by the body’s immune system that can be found in the blood. People can test positive for antibodies after they have been exposed, sometimes when they no longer test positive for the virus itself. It is important to note that the science around COVID-19 antibody tests is evolving rapidly and there is still much uncertainty about what individual antibody test results mean for a single person and what population-level antibody test results mean for understanding the epidemiology of COVID-19 at a population level.
These data only provide information on people tested. People receiving an antibody test do not reflect all people in New York City; therefore, these data may not reflect antibody prevalence among all New Yorkers. Increasing instances of screening programs further impact the generalizability of these data, as screening programs influence who and how many people are tested over time. Examples of screening programs in NYC include: employers screening their workers (e.g., hospitals), and long-term care facilities screening their residents.
In addition, there may be potential biases toward people receiving an antibody test who have a positive result because people who were previously ill are preferentially seeking testing, in addition to the testing of persons with higher exposure (e.g., health care workers, first responders)
Rates were calculated using interpolated intercensal population estimates updated in 2019. These rates differ from previously reported rates based on the 2000 Census or previous versions of population estimates. The Health Department produced these population estimates based on estimates from the U.S. Census Bureau and NYC Department of City Planning.
Antibody tests are categorized based on the date of specimen collection and are aggregated by full weeks starting each Sunday and ending on Saturday. For example, a person whose blood was collected for antibody testing on Wednesday, May 6 would be categorized as tested during the week ending May 9. A person tested twice in one week would only be counted once in that week. This dataset includes testing data beginning April 5, 2020.
Data are updated daily, and the dataset preserves historical records and source data changes, so each extract date reflects the current copy of the data as of that date. For example, an extract date of 11/04/2020 and extract date of 11/03/2020 will both contain all records as they were as of that extract date. Without filtering or grouping by extract date, an analysis will almost certainly be miscalculating or counting the same values multiple times. To analyze the most current data, only use the latest extract date. Antibody tests that are missing dates are not included in the dataset; as dates are identified, these events are added. Lags between occurrence and report of cases and tests can be assessed by comparing counts and rates across multiple data extract dates.
For further details, visit:
• https://www1.nyc.gov/site/doh/covid/covid-19-data.page
• https://github.com/nychealth/coronavirus-data
• https://data.cityofnewyork.us/Health/Modified-Zip-Code-Tabulation-Areas-MODZCTA-/pri4-ifjk
Source: CrisisMMD dataset (Alam et al., 2017) ✅Original Labels (8 classes from annotations): Infrastructure and utility damage Vehicle damage Rescue, volunteering, or donation efforts Affected individuals Injured or dead people Missing or found people Other relevant information Not humanitarian ✅Label Preprocessing (Class Merging): Vehicle damage merged into Infrastructure and utility damage Missing or found people merged into Affected individuals Not humanitarian retained as a separate class… See the full description on the dataset page: https://huggingface.co/datasets/Henishma/crisisMMD_cleaned_task2.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Max Foundation is a Netherlands-based NGO that works towards a healthy start for every child in the most effective and long-lasting way. Over the past 15 years, our teams in Bangladesh and Ethiopia have reached almost 3 million people, supporting communities in reducing stunting and undernutrition by gaining better access to clean water, sanitation and hygiene, as well as healthy diets and care for mother and child.
Maximising our impact and cost efficiency are at the core of our work, which makes quantifying and analysing our programmes crucial. We therefore collect a lot of information on the communities we work with; to understand them better and see where and how we can improve as an organisation.
This data set is one of many we are making publicly available because we believe that data in the development sector should be open: not as a goal in itself, but as a way to help the sector be more effective and create more impact.
These data were collected between Q2 and Q3 in 2019 (with a few observations earlier and later) in the areas in Bangladesh where Max Foundation is active. The data were collected on a representative sample of the households in the area which includes at least one child between the age of 2 and 5. The data provide a very detailed picture of the nutritional status of households as well as their knowledge, attitudes and practices in nutrition and especially child nutrition. As this information was collected by a third partner, some information information is missing. We cleaned the data to the best of our ability, and feel very confident on the district, upazila and union information. Village numbers are often missing and ward numbers were inferred for much of the data, and may therefore not always be accurate. We regret this lapse in quality.
All datasets we publish can be linked together at the village-level, and we encourage everyone to not look at these data in isolation, but link it to our other datasets to create richer analyses.
All of Max Foundation's data are collected and processed according to GDPR standards and explicit informed consent is given by all respondents. They are also clearly informed that choosing not to participate in data collection will in no way affect their eligibility for, or receiving of, products or services from Max Foundation.
Furthermore, we enforce strong privacy protections on our open data to minimise the risk of these data being used to cause harm or re-identify individuals. Concretely this means: - Administrative units up to the Union can be directly identified with the BD_ loc_xx data (which can be found in our Max Foundation Bangladesh 2018 WASH Census dataset). Villages are masked by random numbers. However, to ensure it is still possible to compare our data sets, these random numbers are consistent across all datasets. This means that village '1' in this data is the same as village '1' in all of our other Bangladesh datasets, unless stated otherwise; - Sensitive variables are omitted, censored or bucketed.
The column descriptions specify any transformations done to the data.
These data could have not been collected without the generous support from the Embassy of the Kingdom of the Netherlands in Dhaka and numerous other donors who have supported us over the years. Special thanks to our Bangladesh team for their excellent work in guiding the data collection process.
We invite you to share any interesting insights you have derived from the data with us. From visualising our impact, to uncovering which parts of our programmes are most strongly related with reducing stunting, to making new connections we may have not even considered; we are eager to hear how we can be more effective in what we do and how we do it.
More detailed data insights are available from our internal data, such as the linking of households between datasets. Please note that we would be happy to share more detailed data with researchers, students and many others once proper agreements are in place.
As we value impact above all else, we are happy to work with anyone who can help us to improve our impact. We are constantly adapting our approach based on internal and external findings, and invite you to join us on this journey. Together we can ensure that every child has a healthy start.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
About the dataset: Hurricane Michael was the third-most intense Atlantic hurricane to make landfall in the United States in terms of pressure. This dataset was collected from Twitter during Hurricane Michael. The dataset was processed and analyzed using the AIDR (http://aidr.qcri.org) platform.
Dataset Description: This is a Twitter dataset collected during Hurricane Michael 2018. The data was collected, processed, and analyzed by the AIDR (http://aidr.qcri.org) platform using state-of-the-art machine learning techniques. The data includes the number of injured and dead people, infrastructure damage reports, missing or found people, urgent needs and donation offers for each hour. Due to Twitter TOS, we do not share full tweets content on HDX. Please contact us via HDX or on aidr.qcri@gmail.com to get tweet ids of the dataset along with a tool which can be used to rehydrate tweets from tweet ids.
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The venerable insurance industry is no stranger to data driven decision making. Yet in today's rapidly transforming digital landscape, Insurance is struggling to adapt and benefit from new technologies compared to other industries, even within the BFSI sphere (compared to the Banking sector for example.) Extremely complex underwriting rule-sets that are radically different in different product lines, many non-KYC environments with a lack of centralized customer information base, complex relationship with consumers in traditional risk underwriting where sometimes customer centricity runs reverse to business profit, inertia of regulatory compliance - are some of the unique challenges faced by Insurance Business.
Despite this, emergent technologies like AI and Block Chain have brought a radical change in Insurance, and Data Analytics sits at the core of this transformation. We can identify 4 key factors behind the emergence of Analytics as a crucial part of InsurTech:
This dataset can be helpful in a simple yet illuminating study in understanding the risk underwriting in Health Insurance, the interplay of various attributes of the insured and see how they affect the insurance premium.
This dataset contains 1338 rows of insured data, where the Insurance charges are given against the following attributes of the insured: Age, Sex, BMI, Number of Children, Smoker and Region. There are no missing or undefined values in the dataset.
This relatively simple dataset should be an excellent starting point for EDA, Statistical Analysis and Hypothesis testing and training Linear Regression models for predicting Insurance Premium Charges.
Proposed Tasks: - Exploratory Data Analytics - Statistical hypothesis testing - Statistical Modeling - Linear Regression
Not seeing a result you expected?
Learn how you can add new datasets to our index.
In 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.