15 datasets found

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
search.datacite.org
openicpsr.org
Updated 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/e102263v5-10021
Explore at:
Unique identifier
https://doi.org/10.3886/e102263v5-10021
Dataset updated
2018
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
DataCitehttps://www.datacite.org/
Authors
Jacob Kaplan
Description
Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.

As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:

Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
Updated Mar 29, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1974-2021 [Dataset]. http://doi.org/10.3886/E102263V15
Explore at:
Unique identifier
https://doi.org/10.3886/E102263V15
Dataset updated
Mar 29, 2018
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1974 - 2021
Area covered
United States
Description
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 15 release notes:Adds 2021 data.Version 14 release notes:Adds 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last Arrests by Age, Sex, and Race data they release. Version 13 release notes:Changes R files from .rda to .rds.Fixes bug where the number_of_months_reported variable incorrectly was the largest of the number of months reported for a specific crime variable. For example, if theft was reported Jan-June and robbery was reported July-December in an agency, in total there were 12 months reported. But since each crime (and let's assume no other crime was reported more than 6 months of the year) only was reported 6 months, the number_of_months_reported variable was incorrectly set at 6 months. Now it is the total number of months reported of any crime. So it would be set to 12 months in this example. Thank you to Nick Eubank for alerting me to this issue.Adds rows even when a agency reported zero arrests that month; all arrest values are set to zero for these rows.Version 12 release notes:Adds 2019 data.Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "p
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
Updated Mar 29, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1974-2018 [Dataset]. http://doi.org/10.3886/E102263V11
Explore at:
Unique identifier
https://doi.org/10.3886/E102263V11
Dataset updated
Mar 29, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1974 - 2018
Area covered
United States
Description
Version 11 release notes:Changes release notes description, does not change data.Version 10 release notes:The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data). Version 9 release notes:For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests. The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0. Adds data for 2017 and 2018.Version 8 release notes:Adds annual data in R format.Changes project name to avoid confusing this data for the ones done by NACJD.Fixes bug where bookmaking was excluded as an arrest category. Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race. Version 7 release notes: Adds 1974-1979 dataAdds monthly data (only totals by sex and race, not by age-categories). All data now from FBI, not NACJD. Changes some column names so all columns are <=32 characters to be usable in Stata.Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation. Version 6 release notes: Fix bug where juvenile female columns had the same value as juvenile male columns.Version 5 release notes: Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.Version 4 release notes: Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics. Version 3 release notes: Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Arrests by Age, Sex, and Race (ASR) data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1974-2018 into a single file for each group of crimes. Each monthly file is only a single year as my laptop can't handle combining all the years together. These files are quite large and may take some time to load. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each age
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CO [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-co-b7d1e
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttps://usaid.gov/
Description
This is the carbon monoxide data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
Updated Apr 22, 2021
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2021). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1974-2019 [Dataset]. https://www.openicpsr.org/openicpsr/project/102263/version/V12/view;jsessionid=4A146735840AA661F28BCE9C63F9814B?path=/openicpsr/102263/fcr:versions/V12/ucr_arrests_monthly_alcohol_or_property_1974_2019_rda.zip&type=file
Explore at:
Dataset updated
Apr 22, 2021
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1974 - 2019
Area covered
United States
Description
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.com

Version 12 release notes:
Adds 2019 data.
Version 11 release notes:
Changes release notes description, does not change data.
Version 10 release notes:
The data now has the following age categories (which were previously aggregated into larger groups to reduce file size): under 10, 10-12, 13-14, 40-44, 45-49, 50-54, 55-59, 60-64, over 64. These categories are available for female, male, and total (female+male) arrests. The previous aggregated categories (under 15, 40-49, and over 49 have been removed from the data).
Version 9 release notes:
For each offense, adds a variable indicating the number of months that offense was reported - these variables are labeled as "num_months_[crime]" where [crime] is the offense name. These variables are generated by the number of times one or more arrests were reported per month for that crime. For example, if there was at least one arrest for assault in January, February, March, and August (and no other months), there would be four months reported for assault. Please note that this does not differentiate between an agency not reporting that month and actually having zero arrests.
The variable "number_of_months_reported" is still in the data and is the number of months that any offense was reported. So if any agency reports murder arrests every month but no other crimes, the murder number of months variable and the "number_of_months_reported" variable will both be 12 while every other offense number of month variable will be 0.
Adds data for 2017 and 2018.
Version 8 release notes:
Adds annual data in R format.
Changes project name to avoid confusing this data for the ones done by NACJD.
Fixes bug where bookmaking was excluded as an arrest category.
Changed the number of categories to include more offenses per category to have fewer total files. Added a "total_race" file for each category - this file has total arrests by race for each crime and a breakdown of juvenile/adult by race.
Version 7 release notes:
Adds 1974-1979 data
Adds monthly data (only totals by sex and race, not by age-categories).
All data now from FBI, not NACJD.
Changes some column names so all columns are <=32 characters to be usable in Stata.
Changes how number of months reported is calculated. Now it is the number of unique months with arrest data reported - months of data from the monthly header file (i.e. juvenile disposition data) are not considered in this calculation.
Version 6 release notes:
Fix bug where juvenile female columns had the same value as juvenile male columns.
Version 5 release notes:
Removes support for SPSS and Excel data.
Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.
Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.
Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
<di
Integration of Slurry Separation Technology & Refrigeration Units: Air...
catalog.data.gov
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
data.usaid.gov (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - SO2 [Dataset]. https://catalog.data.gov/dataset/integration-of-slurry-separation-technology-refrigeration-units-air-quality-so2-a3122
Explore at:
Dataset updated
Jun 25, 2024
Dataset provided by
United States Agency for International Developmenthttps://usaid.gov/
Description
This is the raw SO2 data. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
d
Integration of Slurry Separation Technology & Refrigeration Units: Air...
datasets.ai
catalog.data.gov
23, 40, 55, 8
Updated Sep 13, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
US Agency for International Development (2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - H2S [Dataset]. https://datasets.ai/datasets/integration-of-slurry-separation-technology-refrigeration-units-air-quality-h2s-4af17
Explore at:
23, 40, 8, 55Available download formats
Dataset updated
Sep 13, 2024
Dataset authored and provided by
US Agency for International Development
Description
This is the raw H2S data- concentration of H2S in parts per million in the biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
g
Integration of Slurry Separation Technology & Refrigeration Units: Air...
gimi9.com
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - PMVa | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-pmva-87359/
Explore at:
Dataset updated
Jun 25, 2024
Description
This is the gravimetric data used to calibrate the real time readings. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. One simple change in the excel file could make the code full of bugs.
g
Integration of Slurry Separation Technology & Refrigeration Units: Air...
gimi9.com
Updated Jun 25, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Integration of Slurry Separation Technology & Refrigeration Units: Air Quality - CH4 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_integration-of-slurry-separation-technology-refrigeration-units-air-quality-ch4-8abb6/
Explore at:
Dataset updated
Jun 25, 2024
Description
Methane concentration of biogas. Each sheet (tab) is formatted to be exported as a .csv for use with the R-code (AQ-June20.R). In order for this code to work properly, it is important that this file remain intact. Do not change the column names or codes for data, for example. And to be safe, don’t even sort. Just in case. One simple change in the excel file could make the code full of bugs.
Tennessee Eastman Process Simulation Dataset
kaggle.com
zip
Updated Feb 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sergei Averkiev (2020). Tennessee Eastman Process Simulation Dataset [Dataset]. https://www.kaggle.com/averkij/tennessee-eastman-process-simulation-dataset
Explore at:
zip(1370814903 bytes)Available download formats
Dataset updated
Feb 9, 2020
Authors
Sergei Averkiev
Description
Intro

This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017.

Content

Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files.

Each dataframe contains 55 columns:

Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions).

Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping).

Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively.

Columns 4 to 55 contain the process variables; the column names retain the original variable names.

Acknowledgements

This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.

User Agreement

By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms.

The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission.

In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights.

Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law.

When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work.

This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website.
H
Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...
dataverse.harvard.edu
dataone.org
Updated Jul 6, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Harvard Dataverse (2017). Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation [Dataset]. http://doi.org/10.7910/DVN/6C3JR1
Explore at:
application/x-rlang-transport(24678017)Available download formats
Unique identifier
https://doi.org/10.7910/DVN/6C3JR1
Dataset updated
Jul 6, 2017
Dataset provided by
Harvard Dataverse
License
https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1https://dataverse.harvard.edu/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.7910/DVN/6C3JR1
Description
User Agreement, Public Domain Dedication, and Disclaimer of Liability. By accessing or downloading the data or work provided here, you, the User, agree that you have read this agreement in full and agree to its terms. The person who owns, created, or contributed a work to the data or work provided here dedicated the work to the public domain and has waived his or her rights to the work worldwide under copyright law. You can copy, modify, distribute, and perform the work, for any lawful purpose, without asking permission. In no way are the patent or trademark rights of any person affected by this agreement, nor are the rights that any other person may have in the work or in how the work is used, such as publicity or privacy rights. Pacific Science & Engineering Group, Inc., its agents and assigns, make no warranties about the work and disclaim all liability for all uses of the work, to the fullest extent permitted by law. When you use or cite the work, you shall not imply endorsement by Pacific Science & Engineering Group, Inc., its agents or assigns, or by another author or affirmer of the work. This Agreement may be amended, and the use of the data or work shall be governed by the terms of the Agreement at the time that you access or download the data or work from this Website. Description This dataverse contains the data referenced in Rieth et al. (2017). Issues and Advances in Anomaly Detection Evaluation for Joint Human-Automated Systems. To be presented at Applied Human Factors and Ergonomics 2017. Each .RData file is an external representation of an R dataframe that can be read into an R environment with the 'load' function. The variables loaded are named ‘fault_free_training’, ‘fault_free_testing’, ‘faulty_testing’, and ‘faulty_training’, corresponding to the RData files. Each dataframe contains 55 columns: Column 1 ('faultNumber') ranges from 1 to 20 in the “Faulty” datasets and represents the fault type in the TEP. The “FaultFree” datasets only contain fault 0 (i.e. normal operating conditions). Column 2 ('simulationRun') ranges from 1 to 500 and represents a different random number generator state from which a full TEP dataset was generated (Note: the actual seeds used to generate training and testing datasets were non-overlapping). Column 3 ('sample') ranges either from 1 to 500 (“Training” datasets) or 1 to 960 (“Testing” datasets). The TEP variables (columns 4 to 55) were sampled every 3 minutes for a total duration of 25 hours and 48 hours respectively. Note that the faults were introduced 1 and 8 hours into the Faulty Training and Faulty Testing datasets, respectively. Columns 4 to 55 contain the process variables; the column names retain the original variable names. Acknowledgments. This work was sponsored by the Office of Naval Research, Human & Bioengineered Systems (ONR 341), program officer Dr. Jeffrey G. Morrison under contract N00014-15-C-5003. The views expressed are those of the authors and do not reflect the official policy or position of the Office of Naval Research, Department of Defense, or US Government.
H
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...
dataverse.harvard.edu
Updated Jul 8, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Georgios Boumis; Brad Peter (2024). Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends [Dataset]. http://doi.org/10.7910/DVN/ZZDYM9
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/ZZDYM9
Dataset updated
Jul 8, 2024
Dataset provided by
Harvard Dataverse
Authors
Georgios Boumis; Brad Peter
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Time-Series Matrix (TSMx): A visualization tool for plotting multiscale temporal trends TSMx is an R script that was developed to facilitate multi-temporal-scale visualizations of time-series data. The script requires only a two-column CSV of years and values to plot the slope of the linear regression line for all possible year combinations from the supplied temporal range. The outputs include a time-series matrix showing slope direction based on the linear regression, slope values plotted with colors indicating magnitude, and results of a Mann-Kendall test. The start year is indicated on the y-axis and the end year is indicated on the x-axis. In the example below, the cell in the top-right corner is the direction of the slope for the temporal range 2001–2019. The red line corresponds with the temporal range 2010–2019 and an arrow is drawn from the cell that represents that range. One cell is highlighted with a black border to demonstrate how to read the chart—that cell represents the slope for the temporal range 2004–2014. This publication entry also includes an excel template that produces the same visualizations without a need to interact with any code, though minor modifications will need to be made to accommodate year ranges other than what is provided. TSMx for R was developed by Georgios Boumis; TSMx was originally conceptualized and created by Brad G. Peter in Microsoft Excel. Please refer to the associated publication: Peter, B.G., Messina, J.P., Breeze, V., Fung, C.Y., Kapoor, A. and Fan, P., 2024. Perspectives on modifiable spatiotemporal unit problems in remote sensing of agriculture: evaluating rice production in Vietnam and tools for analysis. Frontiers in Remote Sensing, 5, p.1042624. https://www.frontiersin.org/journals/remote-sensing/articles/10.3389/frsen.2024.1042624 TSMx sample chart from the supplied Excel template. Data represent the productivity of rice agriculture in Vietnam as measured via EVI (enhanced vegetation index) from the NASA MODIS data product (MOD13Q1.V006). TSMx R script: # import packages library(dplyr) library(readr) library(ggplot2) library(tibble) library(tidyr) library(forcats) library(Kendall) options(warn = -1) # disable warnings # read data (.csv file with "Year" and "Value" columns) data <- read_csv("EVI.csv") # prepare row/column names for output matrices years <- data %>% pull("Year") r.names <- years[-length(years)] c.names <- years[-1] years <- years[-length(years)] # initialize output matrices sign.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) pval.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) slope.matrix <- matrix(data = NA, nrow = length(years), ncol = length(years)) # function to return remaining years given a start year getRemain <- function(start.year) { years <- data %>% pull("Year") start.ind <- which(data[["Year"]] == start.year) + 1 remain <- years[start.ind:length(years)] return (remain) } # function to subset data for a start/end year combination splitData <- function(end.year, start.year) { keep <- which(data[['Year']] >= start.year & data[['Year']] <= end.year) batch <- data[keep,] return(batch) } # function to fit linear regression and return slope direction fitReg <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(sign(slope)) } # function to fit linear regression and return slope magnitude fitRegv2 <- function(batch) { trend <- lm(Value ~ Year, data = batch) slope <- coefficients(trend)[[2]] return(slope) } # function to implement Mann-Kendall (MK) trend test and return significance # the test is implemented only for n>=8 getMann <- function(batch) { if (nrow(batch) >= 8) { mk <- MannKendall(batch[['Value']]) pval <- mk[['sl']] } else { pval <- NA } return(pval) } # function to return slope direction for all combinations given a start year getSign <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) signs <- lapply(combs, fitReg) return(signs) } # function to return MK significance for all combinations given a start year getPval <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) pvals <- lapply(combs, getMann) return(pvals) } # function to return slope magnitude for all combinations given a start year getMagn <- function(start.year) { remaining <- getRemain(start.year) combs <- lapply(remaining, splitData, start.year = start.year) magns <- lapply(combs, fitRegv2) return(magns) } # retrieve slope direction, MK significance, and slope magnitude signs <- lapply(years, getSign) pvals <- lapply(years, getPval) magns <- lapply(years, getMagn) # fill-in output matrices dimension <- nrow(sign.matrix) for (i in 1:dimension) { sign.matrix[i, i:dimension] <- unlist(signs[i]) pval.matrix[i, i:dimension] <- unlist(pvals[i]) slope.matrix[i, i:dimension] <- unlist(magns[i]) } sign.matrix <-...
e
Data from: Superconductor-ferromagnet hybrids for non-reciprocal electronics...
ekoizpen-zientifikoa.ehu.eus
zenodo.org
Updated 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhuoran Geng; Hijano, Alberto; Ilic, Stefan; Ilyn, Maxim; Maasilta, Ilari J.; Monfardini, Alessandro; Spies, Maria; Strambini, Elia; Virtanen, Pauli; Calvo, Martino; Gonzales-Orellana, Carmen; Helenius, Ari P.; Khorshidian, Sara; Clodoaldo I. Levartoski De Araujo; Levy-Bertrand, Florence; Rogero, Celia; Giazotto, Francesco; F. Sebastian Bergeret; Heikkilä, Tero T.; Zhuoran Geng; Hijano, Alberto; Ilic, Stefan; Ilyn, Maxim; Maasilta, Ilari J.; Monfardini, Alessandro; Spies, Maria; Strambini, Elia; Virtanen, Pauli; Calvo, Martino; Gonzales-Orellana, Carmen; Helenius, Ari P.; Khorshidian, Sara; Clodoaldo I. Levartoski De Araujo; Levy-Bertrand, Florence; Rogero, Celia; Giazotto, Francesco; F. Sebastian Bergeret; Heikkilä, Tero T. (2023). Superconductor-ferromagnet hybrids for non-reciprocal electronics and detectors [Dataset]. https://ekoizpen-zientifikoa.ehu.eus/documentos/668fc45cb9e7c03b01bdb054
Explore at:
Dataset updated
2023
Authors
Zhuoran Geng; Hijano, Alberto; Ilic, Stefan; Ilyn, Maxim; Maasilta, Ilari J.; Monfardini, Alessandro; Spies, Maria; Strambini, Elia; Virtanen, Pauli; Calvo, Martino; Gonzales-Orellana, Carmen; Helenius, Ari P.; Khorshidian, Sara; Clodoaldo I. Levartoski De Araujo; Levy-Bertrand, Florence; Rogero, Celia; Giazotto, Francesco; F. Sebastian Bergeret; Heikkilä, Tero T.; Zhuoran Geng; Hijano, Alberto; Ilic, Stefan; Ilyn, Maxim; Maasilta, Ilari J.; Monfardini, Alessandro; Spies, Maria; Strambini, Elia; Virtanen, Pauli; Calvo, Martino; Gonzales-Orellana, Carmen; Helenius, Ari P.; Khorshidian, Sara; Clodoaldo I. Levartoski De Araujo; Levy-Bertrand, Florence; Rogero, Celia; Giazotto, Francesco; F. Sebastian Bergeret; Heikkilä, Tero T.
Description
Data for the manuscript "Superconductor-ferromagnet hybrids for non-reciprocal electronics and detectors", submitted to Superconductor Science and Technology, arXiv:2302.12732. This archive contains the data for all plots of numerical data in the manuscript. ## Fig. 4
Data of Fig. 4 in the WDX (Wolfram Data Exchange) format (unzip to extract the files). Contains critical exchange fields and critical thicknesses as functions of the temperature. Can be opened with Wolfram Mathematica with the command: Import[FileNameJoin[{NotebookDirectory[],"filename.wdx"}]] ## Fig. 5
Data of Fig. 5 in the WDX (Wolfram Data Exchange) format (unzip to extract the files). Contains theoretically calculated I(V) curves and the rectification coefficient R of N/FI/S junctions. Can be opened with Wolfram Mathematica with the command Import[FileNameJoin[{NotebookDirectory[],"filename.wdx"}]]. ## Fig. 7a
Data of Fig. 7a in the ascii format. Contains G in uS as a function of B in mT and V in mV. ## Fig. 7c
Data of Fig. 7c in the ascii format. Contains G in uS as a function of B in mT and V in mV. ## Fig. 7e
Data of Fig. 7e in the ascii format. Contains G in uS as a function of B in mT and V in mV. The plots 7b, d, and f are taken from the plots a, c and e as indicated in the caption of the figure. ## Fig. 8
Data of Fig. 8 in the ascii format. Contains G in uS as a function V in mV for several values of B in mT. ## Fig. 8 inset
Data of Fig. 8 inset in the ascii format. Contains G_0/G_N as a function of B in mT. ## Fig9a_b First raw Magnetic field values in T, first column voltage drop in V,
rest of the columns differential conductance in S ## Fig9b_FIT First raw Magnetic field values in T, first column voltage drop in V,
rest of the columns differential conductance in S ## Fig9c First raw Magnetic field values in T, first column voltage drop in V,
rest of the columns R (real number) ## Fig9c inset First raw Magnetic field values in T, odd columns voltage drop in V,
even columns injected current in A ## Fog9d Foist column magnetic field in T, second column conductance ration (real
number), sample name in the file name. ## Fig. 12
Data of Fig. 12 in the ascii format. Contains energy resolution as functions of temperature and tunnel resistance with current and voltage readout. ## Fig. 13
Data of Fig. 13 in the ascii format. Contains energy resolution as functions of (a) exchange field, (b) polarization, (c) dynes, and (d) absorber volume with different amplifier noises. ## Fig. 14
Data of Fig. 14 in the ascii format. Contains detector pulse current as functions of (a) temperature change (b) time with different detector parameters.
## Fig. 17
Data of Fig. 17 in the ascii format. Contains dIdV curves as function of the voltage for different THz illumination frequency and polarization. ## Fig. 18
Data of Fig. 18 in the ascii format. Contains the current flowing throughout the junction as function time (arbitrary units) for ON and OFF illumination at 150 GHz for InPol and CrossPol polarization. ## Fig. 21
Data of Fig. 21c in the ascii format. Contains the magnitude of readout line S43 as frequency.
Data of Fig. 21d in the ascii format. Contains the magnitude of iKID line S21 as frequency.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
Updated May 18, 2018
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Hate Crime Data 1991-2022 [Dataset]. http://doi.org/10.3886/E103500V10
Explore at:
Unique identifier
https://doi.org/10.3886/E103500V10
Dataset updated
May 18, 2018
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1991 - 2021
Area covered
United States
Description
!!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 10 release notes:Adds 2022 dataVersion 9 release notes:Adds 2021 data.Version 8 release notes:Adds 2019 and 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last UCR hate crime data they release. Changes .rda file to .rds.Version 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
o
Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...
openicpsr.org
Updated Jun 1, 2017
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2017). Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program Data: Supplementary Homicide Reports (SHR), 1976-2022 [Dataset]. http://doi.org/10.3886/E100699V14
Explore at:
Unique identifier
https://doi.org/10.3886/E100699V14
Dataset updated
Jun 1, 2017
Dataset provided by
Princeton University
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1976 - 2020
Area covered
United States
Description
For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 14 release notes:Reupload data to fix issue with Stata file not opening.Version 13 release notes:Adds 2022 dataVersion 12 release notes:Adds 2021 data.Version 11 release notes:Adds 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last SHR data they release. Changes .rda file to .rds.Version 10 release notes:Changes release notes description, does not change data.Version 9 release notes:Adds 2019 data.Version 8 release notes:Adds 2018 data.Changes source of data for years 1985-2018 to be directly from the FBI. 2018 data was received via email from the FBI, 2016-2017 is from the FBI who mailed me a DVD, and 1985-2015 data is from the FBI's Crime Data Explorer site (https://crime-data-explorer.fr.cloud.gov/downloads-and-docs).Adds .csv version of the data.Makes minor changes to value labels for consistency and to fix grammar. Version 7 release notes:Changes project name to avoid confusing this data for the ones done by NACJD.Version 6 release notes:Adds 2017 data.Version 5 release notes:Adds 2016 data.Standardizes the "group" column which categorizes cities and counties by population.Arrange rows in descending order by year and ascending order by ORI. Version 4 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. Version 3 Release Notes:Merges data with LEAIC data to add FIPS codes, census codes, agency type variables, and ORI9 variable.Change column names for relationship variables from offender_n_relation_to_victim_1 to victim_1_relation_to_offender_n to better indicate that all relationship are victim 1's relationship to each offender. Reorder columns.This is a single file containing all data from the Supplementary Homicide Reports from 1976 to 2018. The Supplementary Homicide Report provides detailed information about the victim, offender, and circumstances of the murder. Details include victim and offender age, sex, race, ethnicity (Hispanic/not Hispanic), the weapon used, circumstances of the incident, and the number of both offenders and victims. Years 1976-1984 were downloaded from NACJD, while more recent years are from the FBI. All files came as ASCII+SPSS Setup files and were cleaned using R. The "cleaning" just means that column names were standardized (different years have slightly different spellings for many columns). Standardization of column names is necessary to stack multiple years together. Categorical variables (e.g. state) were also standardized (i.e. fix spelling errors, have terminology be the same across years). The following is the summary of the Supplementary Homicide Report copied from ICPSR's 2015 page for the data.The Uniform Crime Reporting Program Data: Supplementary Homicide Reports (SHR) provide detailed information on criminal homicides reported to the police. These homicides consist of murders; non-negligent killings also called non-negligent manslaughter; and justifiable homicides. UCR Program contributors compile and submit their crime data by one of two means: either directly to the FBI or through their State UCR Programs. State UCR Programs frequently impose mandatory reporting requirements which have been effective in increasing both the number of reporting agencies as well as the number and accuracy of each participating agency's reports. Each agency may be identified by its numeric state code, alpha-numeric agency ("ORI") code, jurisdiction population, and population group. In addition, each homicide incident is identified by month of occurrence and situation type, allowing flexibility in creating aggregations and subsets.
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/e102263v5-10021

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.3886/e102263v5-10021

Dataset updated

2018

Dataset provided by

Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
DataCitehttps://www.datacite.org/

Authors

Jacob Kaplan

Description

Version 5 release notes:
Removes support for SPSS and Excel data.Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.Order rows by year (descending) and ORI.Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.
The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.
All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male JuvenileTotal Female JuvenileTotal Male AdultTotal Female AdultTotal MaleTotal FemaleTotal JuvenileTotal AdultTotal ArrestsAll of these categories are based on the sums of the sex-age categories (e.g. Male under 10, Female aged 22) rather than using the provided age-race categories (e.g. adult Black, juvenile Asian). As not all agencies report the race data, my method is more accurate. These categories also make up the data in the "simple" version of the data. The "simple" file only includes the above 9 columns as the arrest data (all other columns in the data are just agency identifier columns). Because this "simple" data set need fewer columns, I include all offenses.

As the arrest data is very granular, and each category of arrest is its own column, there are dozens of columns per crime. To keep the data somewhat manageable, there are nine different files, eight which contain different crimes and the "simple" file. Each file contains the data for all years. The eight categories each have crimes belonging to a major crime category and do not overlap in crimes other than with the index offenses. Please note that the crime names provided below are not the same as the column names in the data. Due to Stata limiting column names to 32 characters maximum, I have abbreviated the crime names in the data. The files and their included crimes are:

Index Crimes
MurderRapeRobberyAggravated AssaultBurglaryTheftMotor Vehicle TheftArsonAlcohol CrimesDUIDrunkenness
LiquorDrug CrimesTotal DrugTotal Drug SalesTotal Drug PossessionCannabis PossessionCannabis SalesHeroin or Cocaine PossessionHeroin or Cocaine SalesOther Drug PossessionOther Drug SalesSynthetic Narcotic PossessionSynthetic Narcotic SalesGrey Collar and Property CrimesForgeryFraudStolen PropertyFinancial CrimesEmbezzlementTotal GamblingOther GamblingBookmakingNumbers LotterySex or Family CrimesOffenses Against the Family and Children
Other Sex Offenses
ProstitutionRapeViolent CrimesAggravated AssaultMurderNegligent ManslaughterRobberyWeapon Offenses
Other CrimesCurfewDisorderly ConductOther Non-trafficSuspicion
VandalismVagrancy
Simple
This data set has every crime and only the arrest categories that I created (see above).
If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

Clear search

Close search

Google apps

Main menu

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Integration of Slurry Separation Technology & Refrigeration Units: Air...

Tennessee Eastman Process Simulation Dataset

Intro

Content

Acknowledgements

User Agreement

Additional Tennessee Eastman Process Simulation Data for Anomaly Detection...

Time-Series Matrix (TSMx): A visualization tool for plotting multiscale...

Data from: Superconductor-ferromagnet hybrids for non-reciprocal electronics...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Jacob Kaplan's Concatenated Files: Uniform Crime Reporting (UCR) Program...

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016