Facebook
TwitterFrom the project page: https://github.com/jkbren/incarcerated-populations-data/
The United States has the highest incarceration rate in the world. Through combinations of structural biases in the criminal justice and police systems, we see even higher incarceration rates among Black and Hispanic people. During the first year of the COVID-19 pandemic, the number of incarcerated people in the United States decreased by at least 17%---the largest, fastest reduction in prison population in American history. Using an original dataset curated from public sources on prison demographics across all 50 states and the District of Columbia, we show that incarcerated white people benefited disproportionately from this decrease in the U.S. prison population, and the fraction of incarcerated Black and Latino people sharply increased. This pattern persists across prison systems in nearly every state and deviates from a decade-long trend before 2020 and the onset of COVID-19, when the proportion of incarcerated white people was increasing amid declining numbers of Black people in prison. While a variety of mechanisms underlie these alarming trends, we explore why racial inequities in average sentence length are a likely major contributor. Ultimately, this study reveals how disruptions caused by COVID-19 exacerbated racial inequalities in the criminal legal system, and highlights key forces that drive mass incarceration.
Released under MIT license
Facebook
Twitterhttps://www.usa.gov/government-works/https://www.usa.gov/government-works/
This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.
The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.
It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.
Here is the readme file:
---BEGIN README---
Scraped from the Illinois DOC.
https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=
paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showside.asp\?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '
' > showside.txt
paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showfront.asp\?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '
' > showfront.txt
paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/inms_print.asp\?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d '
' > inmates_print.txt
aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt
Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.
Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.
front.7z contains mugshots from the front
side.7z contains mugshots from the side
inmates.7z contains all the html files
csv contains the html files converted to CSV
The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.
All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.
There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.
The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."
Some inmates were marked "Not Available", this has been replaced with "N/A".
Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".
The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.
The "weight" column is often rounded to the nearest 5 lbs.
Statistics for hair:
43305 Black
17371 Brown
2887 Blonde or Strawberry
2539 Gray or Partially Gray
740 Red or Auburn
624 Bald
396 Not Available
209 Salt and Pepper
70 White
7 Sandy
1 Unknown
Statistics for sex:
63409 Male
4740 Female
Statistics for race:
37991 Black
20992 White
8637 Hispanic
235 Asian
104 Amer Indian
94 Unknown
92 Bi-Racial
4
Statistics for eyes:
51714 Brown
7808 Blue
4259 Hazel
2469 Green
1382 Black
420 Not Available
87 Gray
9 Maroon
1 Unknown
---END README---
Here is a formal summary:
---BEGIN SUMMARY---
Documentation:
Title: Illinois DOC dataset
Source Information
-- Creators: Illinois DOC
-- Illinois Department of Corrections
1301 Concordia Court
P.O. Box 19277
Springfield, IL 62794-9277
(217) 558-2200 x 2008
-- Donor: Anonymous
-- Date: 2019
Past Usage:
-- None
Relevant Information:
-- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
-- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv...
Facebook
TwitterOriginal census file name: tl_2020_
Facebook
Twitterhttp://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/
1. Context
Capital punishment is one of the controversial human rights issues in the United States. While surfing the Internet for an interesting dataset, I came across this database by Texas Department of Criminal Justice, which comprises of the offenders' last words before execution. Some of the statements are:
"...Young people, listen to your parents; always do what they tell you to do, go to school, learn from your mistakes. Be careful before you sign anything with your name. Never, despite what other people say..." (Ramiro Hernandez, executed on April 9th, 2014)
"First and foremost I'd like to say, "Justice has never advanced by taking a life" by Coretta Scott King. Lastly, to my wife and to my kids, I love y'all forever and always. That's it." (Taichin Preyor, executed on July 27th, 2017)
As I skimmed these lines, I decided to create this dataset.
2. Content
This dataset includes information on criminals executed by Texas Department of Criminal Justice from 1982 to November 8th, 2017. In Furman v Georgia in 1972, the Supreme Court considered a group of consolidated cases, whereby it severely restricted the death penalty. However, like other states, Texas adjusted its legislation to address the Court's concern and once again allow for capital punishment in 1973. Texas adopted execution by lethal injection in 1977 and in 1982, the starting year of this dataset, the first offender was executed by this method.
The dataset consists of 545 observations with 21 variables. They are:
- Execution: The order of execution, numeric.
- LastName: Last name of the offender, character.
- FirstName: First name of the offender, character.
- TDCJNumber: TDCJ Number of the offender, numeric.
- Age: Age of the offender, numeric.
- Race: Race of the offender, categorical : Black, Hispanic, White, Other.
- CountyOfConviction: County of conviction, character.
- AgeWhenReceived: Age of offender when received, numeric.
- EducationLevel: Education level of offender, numeric.
- Native County: Native county of offender, categorical : 0 = Within Texas, 1= Outside Texas.
- PreviousCrime : Whether the offender committed any crime before, categorical: 0= No, 1= Yes.
- Codefendants: Number of co-defendants, numeric.
- NumberVictim: Number of victims, numeric.
- WhiteVictim, HispanicVictim, BlackVictim, VictimOtherRace. FemaleVictim, MaleVictim: Number of victims with specified demographic features, numeric.
- LastStatement: Last statement of offender, character.
3. Acknowledgement
This dataset is derived from the database by Texas Department of Criminal Justice which can be found in this link: http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html . It can be seen that the original one has fewer than 10 variables and is embedded with some links to sub-datasets, so I manually inputted more variables based on those links.
There are some complications with this dataset. Firstly, the dataset was manually created so mistakes are inevitable, though I have tried my best to minimize them. Secondly, the recording of offender information is not complete and consistent. For example, sometimes the education level of GED is interpreted as 11 years, at other times as 9 or 10 years. "None" and "NA" are used interchangeably, making it hard to distinguish between 0 and NA in the coded variable. The victim's information is often omitted, so I rely on the description of the crime for the names and pronouns to make a judgement of the number of victims and their gender. Finally, the last statements are sometimes recorded in the first person and sometimes in the third, so the word choice might not be original. That being said, I find this dataset meaningful and worth sharing.
4. Inspiration
What are the demographics of the death row inmates? What are the patterns of their last statements? What is the relationship between the two?
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterFrom the project page: https://github.com/jkbren/incarcerated-populations-data/
The United States has the highest incarceration rate in the world. Through combinations of structural biases in the criminal justice and police systems, we see even higher incarceration rates among Black and Hispanic people. During the first year of the COVID-19 pandemic, the number of incarcerated people in the United States decreased by at least 17%---the largest, fastest reduction in prison population in American history. Using an original dataset curated from public sources on prison demographics across all 50 states and the District of Columbia, we show that incarcerated white people benefited disproportionately from this decrease in the U.S. prison population, and the fraction of incarcerated Black and Latino people sharply increased. This pattern persists across prison systems in nearly every state and deviates from a decade-long trend before 2020 and the onset of COVID-19, when the proportion of incarcerated white people was increasing amid declining numbers of Black people in prison. While a variety of mechanisms underlie these alarming trends, we explore why racial inequities in average sentence length are a likely major contributor. Ultimately, this study reveals how disruptions caused by COVID-19 exacerbated racial inequalities in the criminal legal system, and highlights key forces that drive mass incarceration.
Released under MIT license