CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-2018 provides annual data on prisoners under a sentence of death, as well as those who had their sentences commuted or vacated and prisoners who were executed. This study examines basic sociodemographic classifications including age, sex, race and ethnicity, marital status at time of imprisonment, level of education, and state and region of incarceration. Criminal history information includes prior felony convictions and prior convictions for criminal homicide and the legal status at the time of the capital offense. Additional information is provided on those inmates removed from death row by yearend 2018. The dataset consists of one part which contains 9,583 cases. The file provides information on inmates whose death sentences were removed in addition to information on those inmates who were executed. The file also gives information about inmates who received a second death sentence by yearend 2018 as well as inmates who were already on death row.
This collection furnishes data on executions performed under civil authority in the United States between 1608 and 2002. The dataset describes each individual executed and the circumstances surrounding the crime for which the person was convicted. Variables include age, race, name, sex, and occupation of the offender, place, jurisdiction, date, and method of execution, and the crime for which the offender was executed. Also recorded are data on whether the only evidence for the execution was official records indicating that an individual (executioner or slave owner) was compensated for an execution.
https://data.gov.sg/open-data-licencehttps://data.gov.sg/open-data-licence
Dataset from Singapore Prison Service. For more information, visit https://data.gov.sg/datasets/d_f4081559b7db4f792a395138a540db1d/view
THIS DATASET WAS LAST UPDATED AT 2:11 AM EASTERN ON JULY 12
2019 had the most mass killings since at least the 1970s, according to the Associated Press/USA TODAY/Northeastern University Mass Killings Database.
In all, there were 45 mass killings, defined as when four or more people are killed excluding the perpetrator. Of those, 33 were mass shootings . This summer was especially violent, with three high-profile public mass shootings occurring in the span of just four weeks, leaving 38 killed and 66 injured.
A total of 229 people died in mass killings in 2019.
The AP's analysis found that more than 50% of the incidents were family annihilations, which is similar to prior years. Although they are far less common, the 9 public mass shootings during the year were the most deadly type of mass murder, resulting in 73 people's deaths, not including the assailants.
One-third of the offenders died at the scene of the killing or soon after, half from suicides.
The Associated Press/USA TODAY/Northeastern University Mass Killings database tracks all U.S. homicides since 2006 involving four or more people killed (not including the offender) over a short period of time (24 hours) regardless of weapon, location, victim-offender relationship or motive. The database includes information on these and other characteristics concerning the incidents, offenders, and victims.
The AP/USA TODAY/Northeastern database represents the most complete tracking of mass murders by the above definition currently available. Other efforts, such as the Gun Violence Archive or Everytown for Gun Safety may include events that do not meet our criteria, but a review of these sites and others indicates that this database contains every event that matches the definition, including some not tracked by other organizations.
This data will be updated periodically and can be used as an ongoing resource to help cover these events.
To get basic counts of incidents of mass killings and mass shootings by year nationwide, use these queries:
To get these counts just for your state:
Mass murder is defined as the intentional killing of four or more victims by any means within a 24-hour period, excluding the deaths of unborn children and the offender(s). The standard of four or more dead was initially set by the FBI.
This definition does not exclude cases based on method (e.g., shootings only), type or motivation (e.g., public only), victim-offender relationship (e.g., strangers only), or number of locations (e.g., one). The time frame of 24 hours was chosen to eliminate conflation with spree killers, who kill multiple victims in quick succession in different locations or incidents, and to satisfy the traditional requirement of occurring in a “single incident.”
Offenders who commit mass murder during a spree (before or after committing additional homicides) are included in the database, and all victims within seven days of the mass murder are included in the victim count. Negligent homicides related to driving under the influence or accidental fires are excluded due to the lack of offender intent. Only incidents occurring within the 50 states and Washington D.C. are considered.
Project researchers first identified potential incidents using the Federal Bureau of Investigation’s Supplementary Homicide Reports (SHR). Homicide incidents in the SHR were flagged as potential mass murder cases if four or more victims were reported on the same record, and the type of death was murder or non-negligent manslaughter.
Cases were subsequently verified utilizing media accounts, court documents, academic journal articles, books, and local law enforcement records obtained through Freedom of Information Act (FOIA) requests. Each data point was corroborated by multiple sources, which were compiled into a single document to assess the quality of information.
In case(s) of contradiction among sources, official law enforcement or court records were used, when available, followed by the most recent media or academic source.
Case information was subsequently compared with every other known mass murder database to ensure reliability and validity. Incidents listed in the SHR that could not be independently verified were excluded from the database.
Project researchers also conducted extensive searches for incidents not reported in the SHR during the time period, utilizing internet search engines, Lexis-Nexis, and Newspapers.com. Search terms include: [number] dead, [number] killed, [number] slain, [number] murdered, [number] homicide, mass murder, mass shooting, massacre, rampage, family killing, familicide, and arson murder. Offender, victim, and location names were also directly searched when available.
This project started at USA TODAY in 2012.
Contact AP Data Editor Justin Myers with questions, suggestions or comments about this dataset at jmyers@ap.org. The Northeastern University researcher working with AP and USA TODAY is Professor James Alan Fox, who can be reached at j.fox@northeastern.edu or 617-416-4400.
How much do natural disasters cost us? In lives, in dollars, in infrastructure? This dataset attempts to answer those questions, tracking the death toll and damage cost of major natural disasters since 1985. Disasters included are storms ( hurricanes, typhoons, and cyclones ), floods, earthquakes, droughts, wildfires, and extreme temperatures
This dataset contains information on natural disasters that have occurred around the world from 1900 to 2017. The data includes the date of the disaster, the location, the type of disaster, the number of people killed, and the estimated cost in US dollars
- An all-in-one disaster map displaying all recorded natural disasters dating back to 1900.
- Natural disaster hotspots - where do natural disasters most commonly occur and kill the most people?
- A live map tracking current natural disasters around the world
License
See the dataset description for more information.
This data collection provides annual data on prisoners under a sentence of death and prisoners whose offense sentences were commuted or vacated during the period 1973-2001. Information is supplied for basic sociodemographic characteristics such as age, sex, education, and state of incarceration. Criminal history data include prior felony convictions for criminal homicide and legal status at the time of the capital offense. Additional information is available for inmates removed from death row by year-end 2001 and for inmates who were executed.
Number and percentage of homicide victims, by type of firearm used to commit the homicide (total firearms; handgun; rifle or shotgun; fully automatic firearm; sawed-off rifle or shotgun; firearm-like weapons; other firearms, type unknown), Canada, 1974 to 2018.
Number of homicide victims, by method used to commit the homicide (total methods used; shooting; stabbing; beating; strangulation; fire (burns or suffocation); other methods used; methods used unknown), Canada, 1974 to 2023.
List of every shooting incident that occurred in NYC during the current calendar year.
This is a breakdown of every shooting incident that occurred in NYC during the current calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents a shooting incident in NYC and includes information about the event, the location and time of occurrence. In addition, information related to suspect and victim demographics is also included. This data can be used by the public to explore the nature of police enforcement activity. Please refer to the attached data footnotes for additional information about this dataset.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘The Lost Journalists: Dataset of journalist deaths’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/yamqwe/journalist-deathse on 13 February 2022.
--- Dataset description provided by original source is as follows ---
Credit for the original dataset goes to CPJ
In-the-News
:
- NRT: AT LEAST 122 MEDIA PROFESSIONALS KILLED GLOBALLY IN 2016
- All Africa: Africa: Journalist Killings Ease From Record Highs As Murders Down, Combat Deaths Up
- BBC: The lost journalists of 2016
https://data.world/api/journalism/dataset/journalist-deaths/file/raw/journalist_deaths_by_year.png" alt="journalist_deaths_by_year.png">
Methodology
CPJ began compiling detailed records on journalist deaths in 1992. We apply strict journalistic standards when investigating a death. One important aspect of our research is determining whether a death was work-related. As a result, we classify deaths as "motive confirmed" or "motive unconfirmed."
We consider a case "confirmed" only if we are reasonably certain that a journalist was murdered in direct reprisal for his or her work; was killed in crossfire during combat situations; or was killed while carrying out a dangerous assignment such as coverage of a street protest. We do not include journalists who are killed in accidents such as car or plane crashes.
We include only confirmed cases in the statistical analyses in this database.
When the motive is unclear, but it is possible that a journalist was killed because of his or her work, CPJ classifies the case as "unconfirmed" and continues to investigate. We regularly reclassify cases based on our ongoing research.
Our archives include narrative capsules of all journalists killed, including the cases in which the motive is unconfirmed. In cases where the place of death is incidental to the journalist's killing, we have listed the country where the fatal attack occurred to be the place of the journalist's death (for example, in a case where a journalist is hit by shrapnel in one country and evacuated to another, where he or she dies, CPJ lists the country in which he or she was hit as the place of death).
CPJ defines journalists as people who cover news or comment on public affairs through any media -- including in print, in photographs, on radio, on television, and online. We take up cases involving staff journalists, freelancers, stringers, bloggers, and citizen journalists. The combination of daily reporting and statistical data forms the basis of our case-driven and long-term advocacy.
In 2003, CPJ began documenting the deaths of media support workers. We did so in recognition of the vital role these individuals play in newsgathering. These workers include translators, drivers, fixers, and administrative workers.
Our archives include narrative capsules for media workers killed on duty. These cases are not included our statistical analyses.
About CPJ
The Committee to Protect Journalists is an independent, nonprofit organization that promotes press freedom worldwide. We defend the right of journalists to report the news without fear of reprisal.
Additional Reading
Investigative journalism in Africa – “Walking through a minefield at midnight”
Iraq: The deadliest war for journalists
Being a journalist in Mexico is getting even more dangerousSource: Committee to Protect Journalists
This dataset was created by Journalism, News, and Media and contains around 2000 samples along with Date, Unnamed: 18, technical information and other features such as: - Local/ Foreign - Unnamed: 20 - and more.
- Analyze Coverage in relation to Taken Captive
- Study the influence of Organization on Unnamed: 21
- More datasets
If you use this dataset in your research, please credit Journalism, News, and Media
--- Original source retains full ownership of the source dataset ---
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains the AIT Alert Data Set (AIT-ADS), a collection of synthetic alerts suitable for evaluation of alert aggregation, alert correlation, alert filtering, and attack graph generation approaches. The alerts were forensically generated from the AIT Log Data Set V2 (AIT-LDSv2) and origin from three intrusion detection systems, namely Suricata, Wazuh, and AMiner. The data sets comprise eight scenarios, each of which has been targeted by a multi-step attack with attack steps such as scans, web application exploits, password cracking, remote command execution, privilege escalation, etc. Each scenario and attack chain has certain variations so that attack manifestations and resulting alert sequences vary in each scenario; this means that the data set allows to develop and evaluate approaches that compute similarities of attack chains or merge them into meta-alerts. Since only few benchmark alert data sets are publicly available, the AIT-ADS was developed to address common issues in the research domain of multi-step attack analysis; specifically, the alert data set contains many false positives caused by normal user behavior (e.g., user login attempts or software updates), heterogeneous alert formats (although all alerts are in JSON format, their fields are different for each IDS), repeated executions of attacks according to an attack plan, collection of alerts from diverse log sources (application logs and network traffic) and all components in the network (mail server, web server, DNS, firewall, file share, etc.), and labels for attack phases. For more information on how this alert data set was generated, check out our paper accompanying this data set [1] or our GitHub repository. More information on the original log data set, including a detailed description of scenarios and attacks, can be found in [2].
The alert data set contains two files for each of the eight scenarios, and a file for their labels:
_aminer.json contains alerts from AMiner IDS
_wazuh.json contains alerts from Wazuh IDS and Suricata IDS
labels.csv contains the start and end times of attack phases in each scenario
Beside false positive alerts, the alerts in the AIT-ADS correspond to the following attacks:
Scans (nmap, WPScan, dirb)
Webshell upload (CVE-2020-24186)
Password cracking (John the Ripper)
Privilege escalation
Remote command execution
Data exfiltration (DNSteal) and stopped service
The total number of alerts involved in the data set is 2,655,821, of which 2,293,628 origin from Wazuh, 306,635 origin from Suricata, and 55,558 origin from AMiner. The numbers of alerts in each scenario are as follows. fox: 473,104; harrison: 593,948; russellmitchell: 45,544; santos: 130,779; shaw: 70,782; wardbeck: 91,257; wheeler: 616,161; wilson: 634,246.
Acknowledgements: Partially funded by the European Defence Fund (EDF) projects AInception (101103385) and NEWSROOM (101121403), and the FFG project PRESENT (FO999899544). Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. The European Union cannot be held responsible for them.
If you use the AIT-ADS, please cite the following publications:
[1] Landauer, M., Skopik, F., Wurzenberger, M. (2024): Introducing a New Alert Data Set for Multi-Step Attack Analysis. Proceedings of the 17th Cyber Security Experimentation and Test Workshop. [PDF]
[2] Landauer M., Skopik F., Frank M., Hotwagner W., Wurzenberger M., Rauber A. (2023): Maintainable Log Datasets for Evaluation of Intrusion Detection Systems. IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 4, pp. 3466-3482. [PDF]
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Police fatalities from 2000 to 2016
This dataset aims to provide insight into individuals who were killed during altercations with police. It includes information on their age, race, mental health status, weapons they were armed with, and if they were fleeing.
some of the features are in the original data and the others were added in this updated version. 1. UID, Unique ID of the murdered, In the original data 2. Name, The name of the murdered, In the original data 3. Age, The age of the murdered, In the original data 4. Stages of Life, The age stage of the murdered, Added in this updated version 5. Gender, The Gender of the murdered, In the original data 6. Race, The Race of the murdered, In the original data 7. Date, The date of death, In the original data 8. Year, The year in which the death occurred, Added in this updated version 9. Quarter, The Quarter in which the death occurred, Added in this updated version 10. Month, The month in which the death occurred, Added in this updated version 11. Week, The week in which the death occurred, Added in this updated version 12. Day, The day in which the death occurred, Added in this updated version 13. City, The City in which the death occurred, In the original data 14. State, The State in which the death occurred, In the original data 15. Region, The Region in which the death occurred, Added in this updated version 16. Manner of death In what way was the victim killed?, In the original data 17. Armed, Did the victim have a weapon?, In the original data 18. Mental illness, Was the victim mentally ill?, In the original data 19. Flee, Did the victim try to escape?, In the original data
This dataset comes from https://data.world/awram/us-police-involved-fatalities.
These data offer objective and subjective information about current death row inmates and the management policies and procedures related to their incarceration. The major objectives of the study were to gather data about the inmate population and current management policies and procedures, to identify issues facing correctional administrators in supervising the growing number of condemned inmates, and to offer options for improved management. Four survey instruments were developed: (1) a form for the Department of Corrections in each of the 37 states that had a capital punishment statute as of March 1986, (2) a form for each warden of an institution that housed death-sentenced inmates, (3) a form for staff members who worked with such inmates, and (4) a form for a sample of the inmates. The surveys included questions about inmate demographics (e.g., date of birth, sex, race, Hispanic origin, level of education, marital status, and number of children), the institutional facilities available to death row inmates, state laws pertaining to them, training for staff who deal with them, the usefulness of various counseling, medical, and recreational programs, whether the inmates expected to be executed, and the challenges in managing the death row population. The surveys did not probe legal, moral, or political arguments about the death penalty itself.
This dataset focuses on the breakdown of traffic fatalities in Virginia based on the sex of the individuals involved in the crashes over a five-year period. This dataset gives valuable insights into the gender distribution of those killed in traffic-related accidents.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Introduction
This repository hosts the Testing Roads for Autonomous VEhicLes (TRAVEL) dataset. TRAVEL is an extensive collection of virtual roads that have been used for testing lane assist/keeping systems (i.e., driving agents) and data from their execution in state of the art, physically accurate driving simulator, called BeamNG.tech. Virtual roads consist of sequences of road points interpolated using Cubic splines.
Along with the data, this repository contains instructions on how to install the tooling necessary to generate new data (i.e., test cases) and analyze them in the context of test regression. We focus on test selection and test prioritization, given their importance for developing high-quality software following the DevOps paradigms.
This dataset builds on top of our previous work in this area, including work on
test generation (e.g., AsFault, DeepJanus, and DeepHyperion) and the SBST CPS tool competition (SBST2021),
test selection: SDC-Scissor and related tool
test prioritization: automated test cases prioritization work for SDCs.
Dataset Overview
The TRAVEL dataset is available under the data folder and is organized as a set of experiments folders. Each of these folders is generated by running the test-generator (see below) and contains the configuration used for generating the data (experiment_description.csv), various statistics on generated tests (generation_stats.csv) and found faults (oob_stats.csv). Additionally, the folders contain the raw test cases generated and executed during each experiment (test..json).
The following sections describe what each of those files contains.
Experiment Description
The experiment_description.csv contains the settings used to generate the data, including:
Time budget. The overall generation budget in hours. This budget includes both the time to generate and execute the tests as driving simulations.
The size of the map. The size of the squared map defines the boundaries inside which the virtual roads develop in meters.
The test subject. The driving agent that implements the lane-keeping system under test. The TRAVEL dataset contains data generated testing the BeamNG.AI and the end-to-end Dave2 systems.
The test generator. The algorithm that generated the test cases. The TRAVEL dataset contains data obtained using various algorithms, ranging from naive and advanced random generators to complex evolutionary algorithms, for generating tests.
The speed limit. The maximum speed at which the driving agent under test can travel.
Out of Bound (OOB) tolerance. The test cases' oracle that defines the tolerable amount of the ego-car that can lie outside the lane boundaries. This parameter ranges between 0.0 and 1.0. In the former case, a test failure triggers as soon as any part of the ego-vehicle goes out of the lane boundary; in the latter case, a test failure triggers only if the entire body of the ego-car falls outside the lane.
Experiment Statistics
The generation_stats.csv contains statistics about the test generation, including:
Total number of generated tests. The number of tests generated during an experiment. This number is broken down into the number of valid tests and invalid tests. Valid tests contain virtual roads that do not self-intersect and contain turns that are not too sharp.
Test outcome. The test outcome contains the number of passed tests, failed tests, and test in error. Passed and failed tests are defined by the OOB Tolerance and an additional (implicit) oracle that checks whether the ego-car is moving or standing. Tests that did not pass because of other errors (e.g., the simulator crashed) are reported in a separated category.
The TRAVEL dataset also contains statistics about the failed tests, including the overall number of failed tests (total oob) and its breakdown into OOB that happened while driving left or right. Further statistics about the diversity (i.e., sparseness) of the failures are also reported.
Test Cases and Executions
Each test..json contains information about a test case and, if the test case is valid, the data observed during its execution as driving simulation.
The data about the test case definition include:
The road points. The list of points in a 2D space that identifies the center of the virtual road, and their interpolation using cubic splines (interpolated_points)
The test ID. The unique identifier of the test in the experiment.
Validity flag and explanation. A flag that indicates whether the test is valid or not, and a brief message describing why the test is not considered valid (e.g., the road contains sharp turns or the road self intersects)
The test data are organized according to the following JSON Schema and can be interpreted as RoadTest objects provided by the tests_generation.py module.
{ "type": "object", "properties": { "id": { "type": "integer" }, "is_valid": { "type": "boolean" }, "validation_message": { "type": "string" }, "road_points": { §\label{line:road-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "interpolated_points": { §\label{line:interpolated-points}§ "type": "array", "items": { "$ref": "schemas/pair" }, }, "test_outcome": { "type": "string" }, §\label{line:test-outcome}§ "description": { "type": "string" }, "execution_data": { "type": "array", "items": { "$ref" : "schemas/simulationdata" } } }, "required": [ "id", "is_valid", "validation_message", "road_points", "interpolated_points" ] }
Finally, the execution data contain a list of timestamped state information recorded by the driving simulation. State information is collected at constant frequency and includes absolute position, rotation, and velocity of the ego-car, its speed in Km/h, and control inputs from the driving agent (steering, throttle, and braking). Additionally, execution data contain OOB-related data, such as the lateral distance between the car and the lane center and the OOB percentage (i.e., how much the car is outside the lane).
The simulation data adhere to the following (simplified) JSON Schema and can be interpreted as Python objects using the simulation_data.py module.
{ "$id": "schemas/simulationdata", "type": "object", "properties": { "timer" : { "type": "number" }, "pos" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel" : { "type": "array", "items":{ "$ref" : "schemas/triple" } } "vel_kmh" : { "type": "number" }, "steering" : { "type": "number" }, "brake" : { "type": "number" }, "throttle" : { "type": "number" }, "is_oob" : { "type": "number" }, "oob_percentage" : { "type": "number" } §\label{line:oob-percentage}§ }, "required": [ "timer", "pos", "vel", "vel_kmh", "steering", "brake", "throttle", "is_oob", "oob_percentage" ] }
Dataset Content
The TRAVEL dataset is a lively initiative so the content of the dataset is subject to change. Currently, the dataset contains the data collected during the SBST CPS tool competition, and data collected in the context of our recent work on test selection (SDC-Scissor work and tool) and test prioritization (automated test cases prioritization work for SDCs).
SBST CPS Tool Competition Data
The data collected during the SBST CPS tool competition are stored inside data/competition.tar.gz. The file contains the test cases generated by Deeper, Frenetic, AdaFrenetic, and Swat, the open-source test generators submitted to the competition and executed against BeamNG.AI with an aggression factor of 0.7 (i.e., conservative driver).
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
DEFAULT
200 × 200
120
5 (real time)
0.95
BeamNG.AI - 0.7
SBST
200 × 200
70
2 (real time)
0.5
BeamNG.AI - 0.7
Specifically, the TRAVEL dataset contains 8 repetitions for each of the above configurations for each test generator totaling 64 experiments.
SDC Scissor
With SDC-Scissor we collected data based on the Frenetic test generator. The data is stored inside data/sdc-scissor.tar.gz. The following table summarizes the used parameters.
Name
Map Size (m x m)
Max Speed (Km/h)
Budget (h)
OOB Tolerance (%)
Test Subject
SDC-SCISSOR
200 × 200
120
16 (real time)
0.5
BeamNG.AI - 1.5
The dataset contains 9 experiments with the above configuration. For generating your own data with SDC-Scissor follow the instructions in its repository.
Dataset Statistics
Here is an overview of the TRAVEL dataset: generated tests, executed tests, and faults found by all the test generators grouped by experiment configuration. Some 25,845 test cases are generated by running 4 test generators 8 times in 2 configurations using the SBST CPS Tool Competition code pipeline (SBST in the table). We ran the test generators for 5 hours, allowing the ego-car a generous speed limit (120 Km/h) and defining a high OOB tolerance (i.e., 0.95), and we also ran the test generators using a smaller generation budget (i.e., 2 hours) and speed limit (i.e., 70 Km/h) while setting the OOB tolerance to a lower value (i.e., 0.85). We also collected some 5, 971 additional tests with SDC-Scissor (SDC-Scissor in the table) by running it 9 times for 16 hours using Frenetic as a test generator and defining a more realistic OOB tolerance (i.e., 0.50).
Generating new Data
Generating new data, i.e., test cases, can be done using the SBST CPS Tool Competition pipeline and the driving simulator BeamNG.tech.
Extensive instructions on how to install both software are reported inside the SBST CPS Tool Competition pipeline Documentation;
This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. Should you have questions about this dataset, you may contact the Research & Development Division of the Chicago Police Department at 312.745.6071 or RandD@chicagopolice.org. Disclaimer: These crimes may be based upon preliminary information supplied to the Police Department by the reporting parties that have not been verified. The preliminary crime classifications may be changed at a later date based upon additional investigation and there is always the possibility of mechanical or human error. Therefore, the Chicago Police Department does not guarantee (either expressed or implied) the accuracy, completeness, timeliness, or correct sequencing of the information and the information should not be used for comparison purposes over time. The Chicago Police Department will not be responsible for any error or omission, or for the use of, or the results obtained from the use of this information. All data visualizations on maps should be considered approximate and attempts to derive specific addresses are strictly prohibited. The Chicago Police Department is not responsible for the content of any off-site pages that are referenced by or that reference this web page other than an official City of Chicago or Chicago Police Department web page. The user specifically acknowledges that the Chicago Police Department is not responsible for any defamatory, offensive, misleading, or illegal conduct of other users, links, or third parties and that the risk of injury from the foregoing rests entirely with the user. The unauthorized use of the words "Chicago Police Department," "Chicago Police," or any colorable imitation of these words or the unauthorized use of the Chicago Police Department logo is unlawful. This web page does not, in any way, authorize such use. Data is updated daily Tuesday through Sunday. The dataset contains more than 65,000 records/rows of data and cannot be viewed in full in Microsoft Excel. Therefore, when downloading the file, select CSV from the Export menu. Open the file in an ASCII text editor, such as Wordpad, to view and search. To access a list of Chicago Police Department - Illinois Uniform Crime Reporting (IUCR) codes, go to http://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains models submitted by students in the Alloy4Fun platform to solve the challenge models from various editions of formal methods courses in the University of Minho (UM) and the University of Porto (UP) between the fall of 2019 and the spring of 2023, totalling about 100.000 entries. Participants include those enrolled in the optional MSc course "Specification and Modelling" (EM) and the mandatory MSc course "Formal Methods in Software Engineering" (MFES) in UM, and the optional MSc course "Formal Methods for Critical Systems" (MFS) in UP. Note that since the challenges' permalinks are publicly available, the dataset may contain submissions from other participants outside the classroom context.
The analysis of the 2021 dataset is reported in the Science of Computer Programming paper "Experiences on Teaching Alloy with an Automated Assessment Platform" (extending the ABZ'20 conference version analysing the 2020 dataset).
Name
Permalink
Courses (Students)
Entries
Trash FOL
sDLK7uBCbgZon3znd
EM 19/20 (~20) and 20/21 (~20), MFS 21/22 (~10) and 22/23 (~10)
4092
Classroom FOL
YH3ANm7Y5Qe5dSYem
EM 19/20 (~20) and 20/21 (~20), MFS 21/22 (~10) and 22/23 (~10)
5893
Trash RL
PQAJE67kz8w5NWJuM
EM 19/20 (~20) and 20/21 (~20)
4361
Classroom RL
zRAn69AocpkmxXZnW
EM 19/20 (~20) and 20/21 (~20)
6341
Graphs
gAeD3MTGCCv8YNTaK
EM 19/20 (~20) and 20/21 (~20)
3211
LTS
zoEADeCW2b2suJB2k
EM 19/20 (~20) and 20/21 (~20)
3382
Production line
jyS8Bmceejj9pLbTW
bNCCf9FMRZoxqobfX (v2)
aTwuoJgesSd8hXXEP (v3)
EM 19/20 (~20) and 20/21 (~20)
MFES 21/22 (~200), MFS 21/22 (~10) and 22/23 (~10)
MFES 22/23 (~200)
898
4903
3175
CV
JC8Tij8o8GZb99gEJ
WGdhwKZnCu7aKhXq9 (v2)
EM 19/20 (~20)
EM 20/21 (~20)
1199
393
Trash LTL
9jPK8KBWzjFmBx4Hb
EM 19/20 (~20) and 20/21 (~20)
5279
Train Station
FwCGymHmbqcziisH5
QxGnrFQnXPGh2Lh8C (v2)
EM 20/21 (~20)
MFES 21/22 (~200) and 22/23 (~200), MFS 21/22 (~10) and 22/23 (~10)
1264
8158
Courses
PSqwzYAfW9dFAa9im
JDKw8yJZF5fiP3jv3 (v2)
MFES 21/22 (~200), MFS 21/22 (~10) and 22/23 (~10)
MFES 22/23 (~200)
14884
7632
Social network
dkZH6HJNQNLLDX6Aj
MFES 21/22 (~200) and 22/23 (~200), MFS 21/22 (~10) and 22/23 (~10)
22690
Each entry of the dataset registers either an execution (which may have returned a result or an error) or the creation of a permalink for sharing, and contains:
_id: the id of the interaction
time: the timestamp of its creation
derivationOf: the parent entry
original: the first ancestor with secrets (always the same within an exercise)
code: the complete code of the model (excluding the secrets defined in the original entry) (with student comments removed)
sat: whether the command was satisfiable (counter-example found for checks), or -1 when error thrown [only for executions]
cmd_i: the index of the executed command [only for executions]
cmd_n: the name of the executed command [only for successful executions, i.e. no error thrown]
cmd_c: whether the command was a check [only for successful executions, i.e. no error thrown]
msg: the error or warning message [only for successful executions with warnings or when error thrown]
theme: the visualisation theme [only for sharing entries]
User comments were removed from the code to guarantee anonymization.
View more details and insights related to this measure on the story page:https://data.austintexas.gov/stories/s/a5fa-t7pt/
Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
License information was derived automatically
Programming Languages Infrastructure as Code (PL-IaC) enables IaC programs written in general-purpose programming languages like Python and TypeScript. The currently available PL-IaC solutions are Pulumi and the Cloud Development Kits (CDKs) of Amazon Web Services (AWS) and Terraform. This dataset provides metadata and initial analyses of all public GitHub repositories in August 2022 with an IaC program, including their programming languages, applied testing techniques, and licenses. Further, we provide a shallow copy of the head state of those 7104 repositories whose licenses permit redistribution. The dataset is available under the Open Data Commons Attribution License (ODC-By) v1.0. Contents:
metadata.zip: The dataset metadata and analysis results as CSV files. scripts-and-logs.zip: Scripts and logs of the dataset creation. LICENSE: The Open Data Commons Attribution License (ODC-By) v1.0 text. README.md: This document. redistributable-repositiories.zip: Shallow copies of the head state of all redistributable repositories with an IaC program. This artifact is part of the ProTI Infrastructure as Code testing project: https://proti-iac.github.io. Metadata The dataset's metadata comprises three tabular CSV files containing metadata about all analyzed repositories, IaC programs, and testing source code files. repositories.csv:
ID (integer): GitHub repository ID url (string): GitHub repository URL downloaded (boolean): Whether cloning the repository succeeded name (string): Repository name description (string): Repository description licenses (string, list of strings): Repository licenses redistributable (boolean): Whether the repository's licenses permit redistribution created (string, date & time): Time of the repository's creation updated (string, date & time): Time of the last update to the repository pushed (string, date & time): Time of the last push to the repository fork (boolean): Whether the repository is a fork forks (integer): Number of forks archive (boolean): Whether the repository is archived programs (string, list of strings): Project file path of each IaC program in the repository programs.csv:
ID (string): Project file path of the IaC program repository (integer): GitHub repository ID of the repository containing the IaC program directory (string): Path of the directory containing the IaC program's project file solution (string, enum): PL-IaC solution of the IaC program ("AWS CDK", "CDKTF", "Pulumi") language (string, enum): Programming language of the IaC program (enum values: "csharp", "go", "haskell", "java", "javascript", "python", "typescript", "yaml") name (string): IaC program name description (string): IaC program description runtime (string): Runtime string of the IaC program testing (string, list of enum): Testing techniques of the IaC program (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") tests (string, list of strings): File paths of IaC program's tests testing-files.csv:
file (string): Testing file path language (string, enum): Programming language of the testing file (enum values: "csharp", "go", "java", "javascript", "python", "typescript") techniques (string, list of enum): Testing techniques used in the testing file (enum values: "awscdk", "awscdk_assert", "awscdk_snapshot", "cdktf", "cdktf_snapshot", "cdktf_tf", "pulumi_crossguard", "pulumi_integration", "pulumi_unit", "pulumi_unit_mocking") keywords (string, list of enum): Keywords found in the testing file (enum values: "/go/auto", "/testing/integration", "@AfterAll", "@BeforeAll", "@Test", "@aws-cdk", "@aws-cdk/assert", "@pulumi.runtime.test", "@pulumi/", "@pulumi/policy", "@pulumi/pulumi/automation", "Amazon.CDK", "Amazon.CDK.Assertions", "Assertions_", "HashiCorp.Cdktf", "IMocks", "Moq", "NUnit", "PolicyPack(", "ProgramTest", "Pulumi", "Pulumi.Automation", "PulumiTest", "ResourceValidationArgs", "ResourceValidationPolicy", "SnapshotTest()", "StackValidationPolicy", "Testing", "Testing_ToBeValidTerraform(", "ToBeValidTerraform(", "Verifier.Verify(", "WithMocks(", "[Fact]", "[TestClass]", "[TestFixture]", "[TestMethod]", "[Test]", "afterAll(", "assertions", "automation", "aws-cdk-lib", "aws-cdk-lib/assert", "aws_cdk", "aws_cdk.assertions", "awscdk", "beforeAll(", "cdktf", "com.pulumi", "def test_", "describe(", "github.com/aws/aws-cdk-go/awscdk", "github.com/hashicorp/terraform-cdk-go/cdktf", "github.com/pulumi/pulumi", "integration", "junit", "pulumi", "pulumi.runtime.setMocks(", "pulumi.runtime.set_mocks(", "pulumi_policy", "pytest", "setMocks(", "set_mocks(", "snapshot", "software.amazon.awscdk.assertions", "stretchr", "test(", "testing", "toBeValidTerraform(", "toMatchInlineSnapshot(", "toMatchSnapshot(", "to_be_valid_terraform(", "unittest", "withMocks(") program (string): Project file path of the testing file's IaC program Dataset Creation scripts-and-logs.zip contains all scripts and logs of the creation of this dataset. In it, executions/executions.log documents the commands that generated this dataset in detail. On a high level, the dataset was created as follows:
A list of all repositories with a PL-IaC program configuration file was created using search-repositories.py (documented below). The execution took two weeks due to the non-deterministic nature of GitHub's REST API, causing excessive retries. A shallow copy of the head of all repositories was downloaded using download-repositories.py (documented below). Using analysis.ipynb, the repositories were analyzed for the programs' metadata, including the used programming languages and licenses. Based on the analysis, all repositories with at least one IaC program and a redistributable license were packaged into redistributable-repositiories.zip, excluding any node_modules and .git directories. Searching Repositories The repositories are searched through search-repositories.py and saved in a CSV file. The script takes these arguments in the following order:
Github access token. Name of the CSV output file. Filename to search for. File extensions to search for, separated by commas. Min file size for the search (for all files: 0). Max file size for the search or * for unlimited (for all files: *). Pulumi projects have a Pulumi.yaml or Pulumi.yml (case-sensitive file name) file in their root folder, i.e., (3) is Pulumi and (4) is yml,yaml. https://www.pulumi.com/docs/intro/concepts/project/ AWS CDK projects have a cdk.json (case-sensitive file name) file in their root folder, i.e., (3) is cdk and (4) is json. https://docs.aws.amazon.com/cdk/v2/guide/cli.html CDK for Terraform (CDKTF) projects have a cdktf.json (case-sensitive file name) file in their root folder, i.e., (3) is cdktf and (4) is json. https://www.terraform.io/cdktf/create-and-deploy/project-setup Limitations The script uses the GitHub code search API and inherits its limitations:
Only forks with more stars than the parent repository are included. Only the repositories' default branches are considered. Only files smaller than 384 KB are searchable. Only repositories with fewer than 500,000 files are considered. Only repositories that have had activity or have been returned in search results in the last year are considered. More details: https://docs.github.com/en/search-github/searching-on-github/searching-code The results of the GitHub code search API are not stable. However, the generally more robust GraphQL API does not support searching for files in repositories: https://stackoverflow.com/questions/45382069/search-for-code-in-github-using-graphql-v4-api Downloading Repositories download-repositories.py downloads all repositories in CSV files generated through search-respositories.py and generates an overview CSV file of the downloads. The script takes these arguments in the following order:
Name of the repositories CSV files generated through search-repositories.py, separated by commas. Output directory to download the repositories to. Name of the CSV output file. The script only downloads a shallow recursive copy of the HEAD of the repo, i.e., only the main branch's most recent state, including submodules, without the rest of the git history. Each repository is downloaded to a subfolder named by the repository's ID.
The dataset comprises developer test results of Maven projects with flaky tests across a range of consecutive commits from the projects' git commit histories. The Maven projects are a subset of those investigated in an OOPSLA 2020 paper. The commit range for this dataset has been chosen as the flakiness-introducing commit (FIC) and iDFlakies-commit (see the OOPSLA paper for details). The commit hashes have been obtained from the IDoFT dataset.
The dataset will be presented at the 1st International Flaky Tests Workshop 2024 (FTW 2024). Please refer to our extended abstract for more details about the motivation for and context of this dataset.
The following table provides a summary of the data.
Slug (Module) FIC Hash Tests Commits Av. Commits/Test Flaky Tests Tests w/ Consistent Failures Total Distinct Histories
TooTallNate/Java-WebSocket 822d40 146 75 75 24 1 2.6x10^9
apereo/java-cas-client (cas-client-core) 5e3655 157 65 61.7 3 2 1.0x10^7
eclipse-ee4j/tyrus (tests/e2e/standard-config) ce3b8c 185 16 16 12 0 261
feroult/yawp (yawp-testing/yawp-testing-appengine) abae17 1 191 191 1 1 8
fluent/fluent-logger-java 5fd463 19 131 105.6 11 2 8.0x10^32
fluent/fluent-logger-java 87e957 19 160 122.4 11 3 2.1x10^31
javadelight/delight-nashorn-sandbox d0d651 81 113 100.6 2 5 4.2x10^10
javadelight/delight-nashorn-sandbox d19eee 81 93 83.5 1 5 2.6x10^9
sonatype-nexus-community/nexus-repository-helm 5517c8 18 32 32 0 0 18
spotify/helios (helios-services) 23260 190 448 448 0 37 190
spotify/helios (helios-testing) 78a864 43 474 474 0 7 43
The columns are composed of the following variables:
Slug (Module): The project's GitHub slug (i.e., the project's URL is https://github.com/{Slug}) and, if specified, the module for which tests have been executed.
FIC Hash: The flakiness-introducing commit hash for a known flaky test as described in this OOPSLA 2020 paper. As different flaky tests have different FIC hashes, there may be multiple rows for the same slug/module with different FIC hashes.
Tests: The number of distinct test class and method combinations over the entire considered commit range.
Commits: The number of commits in the considered commit range
Av. Commits/Test: The average number of commits per test class and method combination in the considered commit range. The number of commits may vary for each test class, as some tests may be added or removed within the considered commit range.
Flaky Tests: The number of distinct test class and method combinations that have more than one test result (passed/skipped/error/failure + exception type, if any + assertion message, if any) across 30 repeated test suite executions on at least one commit in the considered commit range.
Tests w/ Consistent Failures: The number of distinct test class and method combinations that have the same error or failure result (error/failure + exception type, if any + assertion message, if any) across all 30 repeated test suite executions on at least one commit in the considered commit range.
Total Distinct Histories: The number of distinct test results (passed/skipped/error/failure + exception type, if any + assertion message, if any) for all test class and method combinations along all commits for that test in the considered commit range.
CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-2018 provides annual data on prisoners under a sentence of death, as well as those who had their sentences commuted or vacated and prisoners who were executed. This study examines basic sociodemographic classifications including age, sex, race and ethnicity, marital status at time of imprisonment, level of education, and state and region of incarceration. Criminal history information includes prior felony convictions and prior convictions for criminal homicide and the legal status at the time of the capital offense. Additional information is provided on those inmates removed from death row by yearend 2018. The dataset consists of one part which contains 9,583 cases. The file provides information on inmates whose death sentences were removed in addition to information on those inmates who were executed. The file also gives information about inmates who received a second death sentence by yearend 2018 as well as inmates who were already on death row.