4 datasets found
  1. Prison population in the US

    • kaggle.com
    zip
    Updated May 10, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Konrad Banachewicz (2023). Prison population in the US [Dataset]. https://www.kaggle.com/datasets/konradb/prison-population-in-the-us
    Explore at:
    zip(244630 bytes)Available download formats
    Dataset updated
    May 10, 2023
    Authors
    Konrad Banachewicz
    Area covered
    United States
    Description

    From the project page: https://github.com/jkbren/incarcerated-populations-data/

    The United States has the highest incarceration rate in the world. Through combinations of structural biases in the criminal justice and police systems, we see even higher incarceration rates among Black and Hispanic people. During the first year of the COVID-19 pandemic, the number of incarcerated people in the United States decreased by at least 17%---the largest, fastest reduction in prison population in American history. Using an original dataset curated from public sources on prison demographics across all 50 states and the District of Columbia, we show that incarcerated white people benefited disproportionately from this decrease in the U.S. prison population, and the fraction of incarcerated Black and Latino people sharply increased. This pattern persists across prison systems in nearly every state and deviates from a decade-long trend before 2020 and the onset of COVID-19, when the proportion of incarcerated white people was increasing amid declining numbers of Black people in prison. While a variety of mechanisms underlie these alarming trends, we explore why racial inequities in average sentence length are a likely major contributor. Ultimately, this study reveals how disruptions caused by COVID-19 exacerbated racial inequalities in the criminal legal system, and highlights key forces that drive mass incarceration.

    Released under MIT license

  2. Illinois DOC labeled faces dataset

    • kaggle.com
    zip
    Updated Dec 6, 2019
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    David J. Fisher (2019). Illinois DOC labeled faces dataset [Dataset]. https://www.kaggle.com/datasets/davidjfisher/illinois-doc-labeled-faces-dataset/code
    Explore at:
    zip(6556377362 bytes)Available download formats
    Dataset updated
    Dec 6, 2019
    Authors
    David J. Fisher
    License

    https://www.usa.gov/government-works/https://www.usa.gov/government-works/

    Area covered
    Illinois
    Description

    This is a dataset of prisoner mugshots and associated data (height, weight, etc). The copyright status is public domain, since it's produced by the government, the photographs do not have sufficient artistic merit, and a mere collection of facts aren't copyrightable.

    The source is the Illinois Dept. of Corrections. In total, there are 68149 entries, of which a few hundred have shoddy data.

    It's useful for neural network training, since it has pictures from both front and side, and they're (manually) labeled with date of birth, name (useful for clustering), weight, height, hair color, eye color, sex, race, and some various goodies such as sentence duration and whether they're sex offenders.

    Here is the readme file:

    ---BEGIN README---
    Scraped from the Illinois DOC.

    https://www.idoc.state.il.us/subsections/search/inms_print.asp?idoc=
    https://www.idoc.state.il.us/subsections/search/pub_showfront.asp?idoc=
    https://www.idoc.state.il.us/subsections/search/pub_showside.asp?idoc=

    paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showside.asp\?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d ' ' > showside.txt
    paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/pub_showfront.asp\?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.jpg/g') -d ' ' > showfront.txt
    paste <(cat ids.txt | sed 's/^/http:\/\/www.idoc.state.il.us\/subsections\/search\/inms_print.asp\?idoc=/g') <(cat ids.txt| sed 's/^/ out=/g' | sed 's/$/.html/g') -d ' ' > inmates_print.txt

    aria2c -i ../inmates_print.txt -j4 -x4 -l ../log-$(pwd|rev|cut -d/ -f 1|rev)-$(date +%s).txt

    Then use htmltocsv.py to get the csv. Note that the script is very poorly written and may have errors. It also doesn't do anything with the warrant-related info, although there are some commented-out lines which may be relevant.
    Also note that it assumes all the HTML files are located in the inmates directory., and overwrites any csv files in csv if there are any.

    front.7z contains mugshots from the front
    side.7z contains mugshots from the side
    inmates.7z contains all the html files
    csv contains the html files converted to CSV

    The reason for packaging the images is that many torrent clients would otherwise crash if attempting to load the torrent.

    All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
    Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv.py.

    There are 68149 inmates in total, although some (a few hundred) are marked as "Unknown"/"N/A"/"" in one or more fields.

    The "height" column has been processed to contain the height in inches, rather than the height in feet and inches expressed as "X ft YY in."
    Some inmates were marked "Not Available", this has been replaced with "N/A".
    Likewise, the "weight" column has been altered "XXX lbs." -> "XXX". Again, some are marked "N/A".

    The "date of birth" column has some inmates marked as "Not Available" and others as "". There doesn't appear to be any pattern. It may be related to the institution they are kept in. Otherwise, the format is MM/DD/YYYY.

    The "weight" column is often rounded to the nearest 5 lbs.

    Statistics for hair:
    43305 Black
    17371 Brown
    2887 Blonde or Strawberry
    2539 Gray or Partially Gray
    740 Red or Auburn
    624 Bald
    396 Not Available
    209 Salt and Pepper
    70 White
    7 Sandy
    1 Unknown

    Statistics for sex:
    63409 Male
    4740 Female

    Statistics for race:
    37991 Black
    20992 White
    8637 Hispanic
    235 Asian
    104 Amer Indian
    94 Unknown
    92 Bi-Racial
    4

    Statistics for eyes:
    51714 Brown
    7808 Blue
    4259 Hazel
    2469 Green
    1382 Black
    420 Not Available
    87 Gray
    9 Maroon
    1 Unknown
    ---END README---

    Here is a formal summary:

    ---BEGIN SUMMARY---
    Documentation:

    1. Title: Illinois DOC dataset

    2. Source Information
      -- Creators: Illinois DOC
      -- Illinois Department of Corrections
      1301 Concordia Court
      P.O. Box 19277
      Springfield, IL 62794-9277
      (217) 558-2200 x 2008
      -- Donor: Anonymous
      -- Date: 2019

    3. Past Usage:
      -- None

    4. Relevant Information:
      -- All CSV files contain headers describing the nature of the columns. For person.csv, the id is unique. For marks.csv and sentencing.csv, it is not.
      -- Note that the CSV files use semicolons as delimiters and also end with a trailing semicolon. If this is unsuitable, edit the arr2csvR function in htmltocsv...

  3. a

    Percent Non-Hispanic Black

    • arc-gis-hub-home-arcgishub.hub.arcgis.com
    Updated Nov 5, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    OC Public Works (2021). Percent Non-Hispanic Black [Dataset]. https://arc-gis-hub-home-arcgishub.hub.arcgis.com/maps/OCPW::percent-non-hispanic-black-1
    Explore at:
    Dataset updated
    Nov 5, 2021
    Dataset authored and provided by
    OC Public Works
    Area covered
    Description

    Original census file name: tl_2020_

  4. Last Words of Death Row Inmates

    • kaggle.com
    zip
    Updated Dec 31, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    My Khe Nguyen (2017). Last Words of Death Row Inmates [Dataset]. https://www.kaggle.com/mykhe1097/last-words-of-death-row-inmates
    Explore at:
    zip(293175 bytes)Available download formats
    Dataset updated
    Dec 31, 2017
    Authors
    My Khe Nguyen
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    1. Context

    Capital punishment is one of the controversial human rights issues in the United States. While surfing the Internet for an interesting dataset, I came across this database by Texas Department of Criminal Justice, which comprises of the offenders' last words before execution. Some of the statements are:

    "...Young people, listen to your parents; always do what they tell you to do, go to school, learn from your mistakes. Be careful before you sign anything with your name. Never, despite what other people say..." (Ramiro Hernandez, executed on April 9th, 2014)

    "First and foremost I'd like to say, "Justice has never advanced by taking a life" by Coretta Scott King. Lastly, to my wife and to my kids, I love y'all forever and always. That's it." (Taichin Preyor, executed on July 27th, 2017)

    As I skimmed these lines, I decided to create this dataset.

    2. Content

    This dataset includes information on criminals executed by Texas Department of Criminal Justice from 1982 to November 8th, 2017. In Furman v Georgia in 1972, the Supreme Court considered a group of consolidated cases, whereby it severely restricted the death penalty. However, like other states, Texas adjusted its legislation to address the Court's concern and once again allow for capital punishment in 1973. Texas adopted execution by lethal injection in 1977 and in 1982, the starting year of this dataset, the first offender was executed by this method.

    The dataset consists of 545 observations with 21 variables. They are:
    - Execution: The order of execution, numeric.
    - LastName: Last name of the offender, character.
    - FirstName: First name of the offender, character.
    - TDCJNumber: TDCJ Number of the offender, numeric.
    - Age: Age of the offender, numeric.
    - Race: Race of the offender, categorical : Black, Hispanic, White, Other.
    - CountyOfConviction: County of conviction, character.
    - AgeWhenReceived: Age of offender when received, numeric.
    - EducationLevel: Education level of offender, numeric.
    - Native County: Native county of offender, categorical : 0 = Within Texas, 1= Outside Texas.
    - PreviousCrime : Whether the offender committed any crime before, categorical: 0= No, 1= Yes.
    - Codefendants: Number of co-defendants, numeric.
    - NumberVictim: Number of victims, numeric.
    - WhiteVictim, HispanicVictim, BlackVictim, VictimOtherRace. FemaleVictim, MaleVictim: Number of victims with specified demographic features, numeric.
    - LastStatement: Last statement of offender, character.

    3. Acknowledgement

    This dataset is derived from the database by Texas Department of Criminal Justice which can be found in this link: http://www.tdcj.state.tx.us/death_row/dr_executed_offenders.html . It can be seen that the original one has fewer than 10 variables and is embedded with some links to sub-datasets, so I manually inputted more variables based on those links.

    There are some complications with this dataset. Firstly, the dataset was manually created so mistakes are inevitable, though I have tried my best to minimize them. Secondly, the recording of offender information is not complete and consistent. For example, sometimes the education level of GED is interpreted as 11 years, at other times as 9 or 10 years. "None" and "NA" are used interchangeably, making it hard to distinguish between 0 and NA in the coded variable. The victim's information is often omitted, so I rely on the description of the crime for the names and pronouns to make a judgement of the number of victims and their gender. Finally, the last statements are sometimes recorded in the first person and sometimes in the third, so the word choice might not be original. That being said, I find this dataset meaningful and worth sharing.

    4. Inspiration

    What are the demographics of the death row inmates? What are the patterns of their last statements? What is the relationship between the two?

  5. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Konrad Banachewicz (2023). Prison population in the US [Dataset]. https://www.kaggle.com/datasets/konradb/prison-population-in-the-us
Organization logo

Prison population in the US

Dataset on incarcerated populations, state level

Explore at:
352 scholarly articles cite this dataset (View in Google Scholar)
zip(244630 bytes)Available download formats
Dataset updated
May 10, 2023
Authors
Konrad Banachewicz
Area covered
United States
Description

From the project page: https://github.com/jkbren/incarcerated-populations-data/

The United States has the highest incarceration rate in the world. Through combinations of structural biases in the criminal justice and police systems, we see even higher incarceration rates among Black and Hispanic people. During the first year of the COVID-19 pandemic, the number of incarcerated people in the United States decreased by at least 17%---the largest, fastest reduction in prison population in American history. Using an original dataset curated from public sources on prison demographics across all 50 states and the District of Columbia, we show that incarcerated white people benefited disproportionately from this decrease in the U.S. prison population, and the fraction of incarcerated Black and Latino people sharply increased. This pattern persists across prison systems in nearly every state and deviates from a decade-long trend before 2020 and the onset of COVID-19, when the proportion of incarcerated white people was increasing amid declining numbers of Black people in prison. While a variety of mechanisms underlie these alarming trends, we explore why racial inequities in average sentence length are a likely major contributor. Ultimately, this study reveals how disruptions caused by COVID-19 exacerbated racial inequalities in the criminal legal system, and highlights key forces that drive mass incarceration.

Released under MIT license

Search
Clear search
Close search
Google apps
Main menu