https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/FE4RLChttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/FE4RLC
This dataset documents the records of mainly Black people incarcerated in the Tennessee State Penitentiary in the period directly before, during, and after the Civil War, from 1850-1870. It includes a staggering amount of formerly enslaved Civil War soldiers and veterans who had enlisted in the segregated regiments of the United States Military, the U.S.C.T. This demographic information of over 1,400 inmates incarcerated in an occupied border state allows us to examine trends, patterns, and relationships that speak to the historic ties between the US military and the TN State Penitentiary, and more broadly, the role of enslavement’s legacies in the development of punitive federal systems. Further analysis of this dataset reveals the genesis of many modern trends in incarceration and law. The dataset of this article and its historiographical implications will be of interest to scholars who study the regional dynamics of antebellum and post-Civil War prison systems, convict leasing and the development of the modern carceral state, Black resistance in the forms of fugitivity and participation in the Civil War, and pre-war era incarceration of free Black men and women and non-Black people convicted of crimes related to enslavement.
https://www.usa.gov/government-works/https://www.usa.gov/government-works/
This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.
The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.
It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.
Here is the readme file:
---BEGIN README---
Scraped from the Illinois DOC.
https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=
https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '
' > showside.txt
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d '
' > showfront.txt
paste <(cat ids.txt | sed 's/^/http://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d '
' > inmates_print.txt
aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt
Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.
Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.
front.7z contains mugshots from the front
side.7z contains mugshots from the side
inmates.7z contains all the html files
csv contains the html files converted to CSV
The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.
All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.
There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.
The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."
Some inmates were marked "Not Available", this has been replaced with "N/A".
Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".
The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.
The "weight" column is often rounded to the nearest 5 lbs.
Statistics for hair:
43305 Black
17371 Brown
2887 Blonde or Strawberry
2539 Gray or Partially Gray
740 Red or Auburn
624 Bald
396 Not Available
209 Salt and Pepper
70 White
7 Sandy
1 Unknown
Statistics for sex:
63409 Male
4740 Female
Statistics for race:
37991 Black
20992 White
8637 Hispanic
235 Asian
104 Amer Indian
94 Unknown
92 Bi-Racial
4
Statistics for eyes:
51714 Brown
7808 Blue
4259 Hazel
2469 Green
1382 Black
420 Not Available
87 Gray
9 Maroon
1 Unknown
---END README---
Here is a formal summary:
---BEGIN SUMMARY---
Documentation:
Title: Illinois DOC dataset
Source Information
-- Creators: Illinois DOC
-- Illinois Department of Corrections
1301 Concordia Court
P.O. Box 19277
Springfield, IL 62794-9277
(217) 558-2200 x 2008
-- Donor: Anonymous
-- Date: 2019
Past Usage:
-- None
Relevant Information:
-- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
-- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv...
Not seeing a result you expected?
Learn how you can add new datasets to our index.
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/FE4RLChttps://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/FE4RLC
This dataset documents the records of mainly Black people incarcerated in the Tennessee State Penitentiary in the period directly before, during, and after the Civil War, from 1850-1870. It includes a staggering amount of formerly enslaved Civil War soldiers and veterans who had enlisted in the segregated regiments of the United States Military, the U.S.C.T. This demographic information of over 1,400 inmates incarcerated in an occupied border state allows us to examine trends, patterns, and relationships that speak to the historic ties between the US military and the TN State Penitentiary, and more broadly, the role of enslavement’s legacies in the development of punitive federal systems. Further analysis of this dataset reveals the genesis of many modern trends in incarceration and law. The dataset of this article and its historiographical implications will be of interest to scholars who study the regional dynamics of antebellum and post-Civil War prison systems, convict leasing and the development of the modern carceral state, Black resistance in the forms of fugitivity and participation in the Civil War, and pre-war era incarceration of free Black men and women and non-Black people convicted of crimes related to enslavement.