Facebook
TwitterIn 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.
Facebook
TwitterData from https://github.com/rfordatascience/tidytuesday/edit/master/data/2021/ released under an open license: https://github.com/rfordatascience/tidytuesday/blob/master/LICENSE
The data this week comes from Data.World and Data.World and was originally from the NCES.
High school completion and bachelor's degree attainment among persons age 25 and over by race/ethnicity & sex 1910-2016
Fall enrollment in degree-granting historically Black colleges and universities (HBCU)
Consider donating to HBCUs, to help fund student's financial assistance programs.
Donation link: https://thehbcufoundation.org/donate/
There's other additional HBCU datasets at Data.World as well.
... Donation will be placed in an endowment for students to fund need-based scholarships. President Reynold Verret believes the donation will provide an opportunity for students who don’t have the same financial support as others.
“Xavier has roughly more than half of our students who are Pell-eligible. Which means they are in the lowest fifth of the socioeconomic ladder in the country. The lowest quintile. So these students really have significant family needs,” said Verret. “They’re often the first generation in their families to attend college, and meeting the gap between what Pell and the small loans provide and making it affordable is where that need-based is, which is not just based on merit, on your highest ACT or GPA, but basically to qualify students who are able who have the talent and the ability to succeed at Xavier.”
I've left the datasets relatively "untidy" this week so you can practice some of the pivot_longer() functions from tidyr. Note that all of the individual CSVs that are duplicates of the raw Excel files.
# Get the Data
# Read in with tidytuesdayR package
# Install from CRAN via: install.packages("tidytuesdayR")
# This loads the readme and all the datasets for the week of interest
# Either ISO-8601 date or year/week works!
tuesdata <- tidytuesdayR::tt_load('2021-02-02')
tuesdata <- tidytuesdayR::tt_load(2021, week = 6)
hbcu_all <- tuesdata$hbcu_all
# Or read in the data manually
hbcu_all <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-02-02/hbcu_all.csv')
hbcu.csvhs_students.csvbach_students, female_bach_students, female_hs_students, male_bach_students, male_hs_students:
| variable | class | description |
|---|---|---|
| Total | double | Year |
| Total, percent of all persons age 25 and over | double | Total combined population, |
| Standard Errors - Total, percent of all persons age 25 and over | character | Standard errors (SE) |
| White1 | character | White students |
| Standard Errors - White1 | character | SE |
| Black1 | character | Black students |
| Standard Errors - Black1 | character | SE |
| Hispanic | character | Hispanic students |
| Standard Errors - Hispanic | character | SE |
| Total - Asian/Pacific Islander | character | Asian Pacific Islander Total students |
| Standard Errors - Total - Asian/Pacific Islander | character | SE |
| Asian/Pacific Islander - Asian | character | Asian Pacific Islandar - Asian students |
| Standard Errors - Asian/Pacific Islander - Asian | character | SE |
| Asian/Pacific Islander - Pacific Islander | character | Asian/Pacific Islander - Pacific Islander |
| Standard Errors - Asian/Pacific Islander - Pacific Islander | character | SE |
| American Indian/ Alaska Native | character | American Indian/ Alaska Native Students |
| Standard Errors - American Indian/Alaska Native | character | SE |
| Two or more race ... |
Facebook
TwitterLake County, Illinois Demographic Data. Explanation of field attributes: Total Population – The entire population of Lake County. White – Individuals who are of Caucasian race. This is a percent.African American – Individuals who are of African American race. This is a percent.Asian – Individuals who are of Asian race. This is a percent. Hispanic – Individuals who are of Hispanic ethnicity. This is a percent. Does not Speak English- Individuals who speak a language other than English in their household. This is a percent. Under 5 years of age – Individuals who are under 5 years of age. This is a percent. Under 18 years of age – Individuals who are under 18 years of age. This is a percent. 18-64 years of age – Individuals who are between 18 and 64 years of age. This is a percent. 65 years of age and older – Individuals who are 65 years old or older. This is a percent. Male – Individuals who are male in gender. This is a percent. Female – Individuals who are female in gender. This is a percent. High School Degree – Individuals who have obtained a high school degree. This is a percent. Associate Degree – Individuals who have obtained an associate degree. This is a percent. Bachelor’s Degree or Higher – Individuals who have obtained a bachelor’s degree or higher. This is a percent. Utilizes Food Stamps – Households receiving food stamps/ part of SNAP (Supplemental Nutrition Assistance Program). This is a percent. Median Household Income - A median household income refers to the income level earned by a given household where half of the homes in the area earn more and half earn less. This is a dollar amount. No High School – Individuals who have not obtained a high school degree. This is a percent. Poverty – Poverty refers to families and people whose income in the past 12 months is below the poverty level. This is a percent.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prior research has established the greater exposure of African Americans from all income groups to disadvantaged environments compared to whites, but the traditional focus in studies of neighborhood stratification obscures heterogeneity within racial/ethnic groups in residential attainment over time. Also obscured are the moderating influences of broader social changes on the life-course and the experiences of Latinos, a large and growing presence in American cities. We address these issues by examining group-based trajectory models of residential neighborhood disadvantage among white, Black, and Latino individuals in a multi-cohort longitudinal research design of over 1,000 children from Chicago as they transitioned to adulthood over the last quarter century. We find considerable temporal consistency among white individuals compared to dynamic heterogeneity among nonwhite individuals in exposure to residential disadvantage, especially Black individuals and those born in the 1980s compared to the 1990s. Racial and cohort differences are not accounted for by early-life characteristics that predict long-term attainment. Inequalities by race in trajectories of neighborhood disadvantage are thus at once more stable and more dynamic than previous research suggests, and they are modified by broader social changes. These findings offer insights on the changing pathways by which neighborhood racial inequality is produced.
Facebook
TwitterTHIS DATASET WAS LAST UPDATED AT 7:11 AM EASTERN ON DEC. 1
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Every Kaggler uses internet. Internet is a necessity in our daily life and many people consider it as a utility like water, electricity and gas. But do you know how many households in the US do not have internet, who are these people, and why they do not have internet?
The U.S. Census Bureau began asking internet use in American Community Survey (ACS) in 2013, as part of the 2008 Broadband Data Improvement Act, and has published 1-year estimate each year since 2013. The recent 2016 data shows that in many counties, over a quarter of household still do not have internet access.
This dataset contains data for counties with population over 65000, compiled from the 2016 ACS 1-year estimate. ACS 1-year estimates only summarize data for large geographic areas over 65000 population. The 2013-2017 ACS 5-year estimate is expected to be published at the end of 2018, which has data of all geographic areas down to block group level. Before that we will use the latest 2016 1-year estimate. It provides sufficient data for us to gain insight into internet use.
This dataset is created with totalcensus package for R programming. Here are the list of columns:
All data come from 2016 ACS 1-year estimate.
The U.S. Census Bureau has published tons of data that are available to public. We can create datasets from these public data to address questions we are interested in.
Facebook
TwitterPurpose: To determine if medical students of different races/ethnicities or genders have different perceptions of bias in the United States (US). Methods: An IRB-approved, anonymous survey was sent to US medical students from November 2022 through February 2024. Students responded to statements regarding perceptions of bias toward them from attendings, patients, and classmates. Chi-square tests, or Fisher’s exact tests, when appropriate, were used to calculate if significant differences exist among genders or races/ethnicities in response to these statements. Results: 370 students responded to this survey. Most respondents were women (n=259, 70%), and nearly half were White (n=164, 44.3%). 8.5% of women agreed that they felt excluded by attendings due to their gender, compared to 2.9% of men (p=0.018). 87.5% and 73.3% of Hispanic and Black students agreed that bias due to race negatively impacted research opportunities compared to 37.2% of White students (p<0.001). 87% and 85.7% of W..., This data was collected through Google Forms, and respondents were asked to log in with their email addresses to make sure that they could only submit their responses once. Data was processed in R studio., , # Experiences of US medical students - a national survey
https://doi.org/10.5061/dryad.cz8w9gjbq
This dataset contains responses to an anonymous, IRB-approved survey sent to medical students across the country. The survey included demographic information and students' responses to various questions regarding their medical school experience.Â
The data is structured so that each row is an individual response. A researcher could analyze the data to see what demographic factors are related to various survey responses.Â
There are certain questions on the survey that respondents could respond "NA" to if the question did not apply to them. For example, the last question on the survey asks,
| If you are an MS4, do you feel ready to be a doctor and take care of patients next year as an intern? |
|---|
...
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This is a dataset I created myself, that is similar to the official MNIST handwritten digit dataset, except the images are significantly smaller (10x10), the images are black and white rather than grayscale, and there are only 101 images in each dataset.
The purpose of this dataset was to be able to create small neural networks in slow languages such as Scratch, without significant amounts of lag.
It took roughly half an hour to make, but I hope this will be useful for people.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Unemployment Rate in South Africa decreased to 31.90 percent in the third quarter of 2025 from 33.20 percent in the second quarter of 2025. This dataset provides - South Africa Unemployment Rate - actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
There are 2 Files Present, In the train. en file, there are English sentences and their corresponding translation in the Gujarati language present in train.gu file.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
The ‘Images’ folder contains 14,253 images, each with a unique ID. Train.csv contains Image_IDs and the associated label. The labels indicate the growth stage of the crop in the image Growth stages indicate the maturity of the wheat plants and are represented as a number from 1 to 7 (mature crop). The sample submission file contains the Image_IDs of the test set - you must predict the growth stage for the crops in each of these images.
It is important to note that some of the labels have been determined by experts, and may be more reliable than the other labels which have been indicated by the farmers themselves. All the test images have reliable labels. The ‘label_quality’ column in Train.csv indicates whether a label is high quality (2) or potentially less reliable (1).
Background to the challenge
The images were collected as part of field trials focusing on the Rabi (winter) growing season in two states of India: Punjab (with data collection in Fatehgarh, Ludhiana and Patiala districts) and Haryana (Fatehabad, Sirsa and Yamunanagar districts). Most villages in the field trials were located in a hot arid steppe climate. Punjab and Haryana fields are typically double-cropped with rice (or cotton) planted during the Kharif monsoon (June - October), and wheat planted in the Rabi season (October - March). Smallholder agriculture in this area is largely mechanized and is heavily reliant on irrigation
Over two growing seasons a total of 1685 farmers agreed to participate in the PBI studies. For these farmers, the study team listed all plots on which the farmer was planning to grow wheat, and randomly selected one field per farmer to be included in the study. Farmers were asked to take repeat pictures throughout the season, always from approximately the same location as an initial northward oriented picture, and with approximately the same view angle.
Image acquisitions were facilitated using a custom Android application (WheatCam). The farmer set up an observation site by taking an initial geo-referenced image of a field. Subsequent images were referenced relative to the initial “ghosted” image (a mildly transparent image of the initial picture). The application allowed the farmer to frame nearly identical repeat pictures relative to landscape features (or one or two installed reference poles in the first year). A fixed white balance between images was used to minimize in-camera adjustment of illumination and RGB ratios. All pictures were uploaded to a server for further processing.
Before further processing we manually screened all images to ensure that no people were present in the image scenes, to guarantee their privacy. In addition, we removed images which were mistakenly taken indoors, or other accidental acquisitions. We further screened for images which were excessively blurred or discoloured, covered by a finger or otherwise not contained little vegetation or taken during crop cutting or application development. We anonymized the dataset by masking most non-vegetation details which might provide clues to the exact position of a farmers' field, while selecting the vegetation of interest for processing (see below).
The Region-of-Interest (ROI) was delineated automatically on an image-by-image basis using a horizon detection algorithm. The algorithm first resizes the image to 640 pixels along the x-axis, scaling the y-axis proportionally. The algorithm finds change points in the blue channel along the vertical axis of the images using the Pruned Exact Linear Time (PELT) method, approximating the location of the horizon. We then define a trapezoid ROI defined by the median horizon locations for the left and right half of the image, padded by 15% of the image height and 10% of the image width along y and x-axis directions respectively. Similarly, the two bottom corner points were defined by padding the bottom and sides of the image by 10% of the image width and height.
We use this ROI to exclude most other features from the original image which do not pertain to the area evaluated. Areas of no interest are set to black and the image is saved to disk. In addition, we manually screened all processed images and made manual corrections to guarantee the privacy of volunteer farmers where necessary.
Growth Stage definitions:
Growth Phase | Common Name| 1 | Crown Root| 2 | Tillering| 3 | Mid Vegetative Phase| 4 |Booting | 5 |Heading| 6 |Anthesis| 7 | Milking|
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Youth Unemployment Rate in South Africa decreased to 58.50 percent in the third quarter of 2025 from 62.20 percent in the second quarter of 2025. This dataset provides - South Africa Youth Unemployment Rate- actual values, historical data, forecast, chart, statistics, economic calendar and news.
Facebook
Twitterhttps://mmo-population.com/termshttps://mmo-population.com/terms
World of Warcraft player activity dataset from MMO Populations, combining monthly enhanced players and 30-day daily estimates generated from public signals.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterIn 2024, there were 301,623 cases filed by the National Crime Information Center (NCIC) where the race of the reported missing person was white. In the same year, 17,097 people whose race was unknown were also reported missing in the United States. What is the NCIC? The National Crime Information Center (NCIC) is a digital database that stores crime data for the United States, so criminal justice agencies can access it. As a part of the FBI, it helps criminal justice professionals find criminals, missing people, stolen property, and terrorists. The NCIC database is broken down into 21 files. Seven files belong to stolen property and items, and 14 belong to persons, including the National Sex Offender Register, Missing Person, and Identify Theft. It works alongside federal, tribal, state, and local agencies. The NCIC’s goal is to maintain a centralized information system between local branches and offices, so information is easily accessible nationwide. Missing people in the United States A person is considered missing when they have disappeared and their location is unknown. A person who is considered missing might have left voluntarily, but that is not always the case. The number of the NCIC unidentified person files in the United States has fluctuated since 1990, and in 2022, there were slightly more NCIC missing person files for males as compared to females. Fortunately, the number of NCIC missing person files has been mostly decreasing since 1998.