https://www.icpsr.umich.edu/web/ICPSR/studies/37879/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37879/terms
CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-2018 provides annual data on prisoners under a sentence of death, as well as those who had their sentences commuted or vacated and prisoners who were executed. This study examines basic sociodemographic classifications including age, sex, race and ethnicity, marital status at time of imprisonment, level of education, and state and region of incarceration. Criminal history information includes prior felony convictions and prior convictions for criminal homicide and the legal status at the time of the capital offense. Additional information is provided on those inmates removed from death row by yearend 2018. The dataset consists of one part which contains 9,583 cases. The file provides information on inmates whose death sentences were removed in addition to information on those inmates who were executed. The file also gives information about inmates who received a second death sentence by yearend 2018 as well as inmates who were already on death row.
Investigator(s): Bureau of Justice Statistics These data collections provide annual data on prisoners under a sentence of death and on those whose offense sentences were commuted or vacated during the years indicated. Information is supplied for basic sociodemographic characteristics such as age, sex, race, ethnicity, marital status at time of imprisonment, level of education, and state of incarceration. Criminal history data include prior felony convictions for criminal homicide and legal status at the time of the capital offense. Additional information is available for inmates removed from death row by yearend of the last year indicated and for inmates who were executed. The universe is all inmates on death row since 1972 in the United States. The inmate identification numbers were assigned by the Bureau of the Census and have no purpose outside these data collections.Years Produced: Annually (latest release contains all years)NACJD has produced a resource guide on the Capital Punishment in the United States Series.
This pre-analysis plan outlines a research strategy to test a "self-reinforcing" theory of death penalty executions, which holds that counties face decreasing marginal costs for executions. We test this theory through examining event dependence in executions among counties that have the death penalty. To test for the presence of these self-reinforcing processes in executions, and the exogenous factors that may explain executions, we utilize an event history model that accounts for event dependence. The empirical findings of this analysis may have profound consequences for how we understand executions. Evidence of event dependence would reveal that the main determinant of whether an individual is executed is the county's previous experience with execution, which would raise many important policy, legal, and moral questions.
These instructional materials were prepared for use with EXECUTIONS IN THE UNITED STATES, 1608-1991: THE ESPY FILE (ICPSR 8451), compiled by M. Watt Espy and John Ortiz Smykla. The data file (an SPSS portable file) and accompanying documentation are provided to assist educators in instructing students about the history of capital punishment in the United States. An instructor's handout is also included. This handout contains the following sections, among others: (1) general goals for student analysis of quantitative datasets, (2) specific goals in studying this dataset, (3) suggested appropriate courses for use of the dataset, (4) tips for using the dataset, and (5) related secondary source readings. This dataset furnishes data on executions performed under civil authority in the United States between 1608 and April 24, 1991, and describes each individual executed and the circumstances surrounding the crime for which the person was convicted. Variables include age, race, name, sex, and occupation of the offender, place, jurisdiction, date, and method of execution, and the crime for which the offender was executed. Also recorded are data on whether the only evidence for the execution was official records indicating that an individual (executioner or slave owner) was compensated for an execution.
This collection furnishes data on executions performed under civil authority in the United States between 1608 and 2002. The dataset describes each individual executed and the circumstances surrounding the crime for which the person was convicted. Variables include age, race, name, sex, and occupation of the offender, place, jurisdiction, date, and method of execution, and the crime for which the offender was executed. Also recorded are data on whether the only evidence for the execution was official records indicating that an individual (executioner or slave owner) was compensated for an execution.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Analysis of ‘COVID-19 Cases and Deaths Summarized by Geography’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://catalog.data.gov/dataset/d2e381bb-f395-4b40-979e-920a79a3db88 on 11 February 2022.
--- Dataset description provided by original source is as follows ---
Note: On January 22, 2022, system updates to improve the timeliness and accuracy of San Francisco COVID-19 cases and deaths data were implemented. You might see some fluctuations in historic data as a result of this change. Due to the changes, starting on January 22, 2022, the number of new cases reported daily will be higher than under the old system as cases that would have taken longer to process will be reported earlier.
Note: As of April 16, 2021, this dataset will update daily with a five-day data lag.
A. SUMMARY Medical provider confirmed COVID-19 cases and confirmed COVID-19 related deaths in San Francisco, CA aggregated by several different geographic areas and normalized by 2019 American Community Survey (ACS) 5-year estimates for population data to calculate rate per 10,000 residents.
Cases and deaths are both mapped to the residence of the individual, not to where they were infected or died. For example, if one was infected in San Francisco at work but lives in the East Bay, those are not counted as SF Cases or if one dies in Zuckerberg San Francisco General but is from another county, that is also not counted in this dataset.
Dataset is cumulative and covers cases going back to March 2nd, 2020 when testing began.
Geographic areas summarized are: 1. Analysis Neighborhoods 2. Census Tracts 3. Census Zip Code Tabulation Areas
B. HOW THE DATASET IS CREATED Addresses from medical data are geocoded by the San Francisco Department of Public Health (SFDPH). Those addresses are spatially joined to the geographic areas. Counts are generated based on the number of address points that match each geographic area. The 2019 ACS estimates for population provided by the Census are used to create a rate which is equal to ([count] / [acs_population]) * 10000) representing the number of cases per 10,000 residents.
C. UPDATE PROCESS Geographic analysis is scripted by SFDPH staff and synced to this dataset daily at 7:30 Pacific Time.
D. HOW TO USE THIS DATASET Privacy rules in effect To protect privacy, certain rules are in effect: 1. Case counts greater than 0 and less than 10 are dropped - these will be null (blank) values 2. Death counts greater than 0 and less than 10 are dropped - these will be null (blank) values 3. Cases and deaths dropped altogether for areas where acs_population < 1000
Rate suppression in effect where counts lower than 20 Rates are not calculated unless the case count is greater than or equal to 20. Rates are generally unstable at small numbers, so we avoid calculating them directly. We advise you to apply the same approach as this is best practice in epidemiology.
A note on Census ZIP Code Tabulation Areas (ZCTAs) ZIP Code Tabulation Areas are special boundaries created by the U.S. Census based on ZIP Codes developed by the USPS. They are not, however, the same thing. ZCTAs are areal representations of routes. Read how the Census develops ZCTAs on their website.
Row included for Citywide case counts, incidence rate, and deaths A single row is included that has the Citywide case counts and incidence rate. This can be used for comparisons. Citywide will capture all cases regardless of address quality. While some cases cannot be mapped to sub-areas like Census Tracts, ongoing data quality efforts result in improved mapping on a rolling bases.
--- Original source retains full ownership of the source dataset ---
https://data.gov.tw/licensehttps://data.gov.tw/license
A list of records of in vivo bioavailability, bioequivalence, or dissolution rate curve comparison tests for drugs containing new drug ingredients with domestic licenses for execution in the country.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 2 rows and is filtered where the book is How to stage a military coup : from planning to execution. It features 7 columns including author, publication date, language, and book publisher.
https://data.gov.tw/licensehttps://data.gov.tw/license
All non-observational components drugs that have been tested for bioavailability or bioequivalence domestically (excluding dissolution rate curve matching tests), statistical table (according to component classification, including drugs with or without certification)
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects. It has 2 rows and is filtered where the books is The secret report of Friar Otto : a reinterpretation of the report in confidence on the imprisonment and execution of William de Marisco and sixteen of his followers. It features 10 columns including number of authors, number of books, earliest publication date, and latest publication date.
https://datos.madrid.es/egob/catalogo/aviso-legalhttps://datos.madrid.es/egob/catalogo/aviso-legal
This data set provides information on the annual execution (liquidation) of the Budget of the City of Madrid and its Autonomous Bodies from 2011 to the last closed financial year. The data offered refer to the revenue budget, the expenditure budget, as well as the investment projects they have implemented in each of the years. In relation to the revenue budget, the information is shown according to the organic (center) and economic (subconcept) classification. For each of them, the amount initially foreseen, the modifications that have existed during the year, the final forecast, the rights recognized and their collection are offered. As regards the expenditure budget, the data are also provided according to the threefold organic (centre and section), functional (programme) and economic (sub-concept) classification of the different expenditure budget applications. In this case, the initial credit, its modifications, the final credit, the authorized credit, the drawn credit and the recognized obligations are shown for each of them. For each of the investment projects implemented, the investment data show, inter alia, the code and title of the project, the investment line and sub-line in which the project is classified, the district in which it is located, the budget application (centre and section, programme and sub-concept) where its credit is located, as well as the initial credit, its modifications, the final credit, the authorized credit, the available credit, the recognized obligations and the payments made in each of them. Finally, there are some files that show the eliminations that must be practiced in the budget of income and in the budget of expenses to be able to obtain the consolidated information of the City of Madrid with its Autonomous Organisms. It should be noted that the amounts shown in the files are nominal. Through the website Presupuestosabiertosmadrid.es you can view the consolidated monthly execution data since 2011, both in nominal and real terms, that is, adjusted for inflation. Access the Open Budgets visualization In the associated Documentation section you can find a structure document that describes the information contained in the different files that can be downloaded and a glossary of terms that facilitates the understanding of the data.
capturing features generated duringthe execution of packages and libraries in isolated environments.The dataset includes 9,461 package reports, of which 1,962 are identified as malicious, and encompasses both static and dynamic features such as files, sockets, commands, and DNS records. Each report is labeled with verified information and detailed sub-labels for attack types, facilitating the identification of malicious indicators when source code is unavailable. This dataset supports runtime detection, enhances detection model training, and enables efficient comparative analysis across ecosystems, contributing to the strengthening of supply chain security
https://data.gov.tw/licensehttps://data.gov.tw/license
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about book subjects, has 2 rows and is filtered where the books is Hangmen of England : a history of execution from Jack Ketch to Albert Pierrepoint. It features 10 columns including book subject, number of authors, number of books, earliest publication date, and latest publication date. The preview is ordered by number of books (descending).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is The Bolton massacre and the execution of James Stanley, 7th Earl of Derby : a radio play. It features 7 columns including author, publication date, language, and book publisher.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The PHYTMO database contains data from physical therapy exercises and gait variations recorded with magneto-inertial sensors, including information from an optical reference system. PHYTMO includes the recording of 30 volunteers, aged between 20 and 70 years old. A total amount of 6 exercises and 3 gait variations commonly prescribed in physical therapies were recorded. The volunteers performed two series with a minimum of 8 repetitions in each one. Four magneto-inertial sensors were placed on the lower-or upper-limbs for the recording of the motions together with passive optical reflectors. The files include the specifications of the inertial sensors and the cameras. The database includes magneto-inertial data (linear acceleration, turn rate and magnetic field), together with a highly accurate location and orientation in the 3D space provided by the optical system (errors are lower than 1mm). The database files were stored in CSV format to ensure usability with common data processing software. The main aim of this dataset is the availability of inertial data for two main purposes: the analysis of different techniques for the identification and evaluation of exercises monitored with inertial wearable sensors and the validation of inertial sensor-based algorithms for human motion monitoring that obtains segments orientation in the 3D space. Furthermore, the database stores enough data to train and evaluate Machine Learning-based algorithms. The age range of the participants can be useful for establishing age-based metrics for the exercises evaluation or the study of differences in motions between different aged groups. Finally, the MATLAB function features_extraction, developed by the authors, is also given. This function splits signals using a sliding window, returning its segments, and extract signal features, in the time and frequency domains, based on prior studies of the literature.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains EEG measured during a motor execution task and processed as to emulate EEG originating from a wireless EEG sensor network composed of mini-EEG devices, as presented in [1]. It is a processed version of the original High Gamma dataset of [2].
In mini-EEG devices, we cannot measure the potential between a given electrode and a distant reference (e.g. the mastoid or Cz electrode) , as we would in traditional EEG caps. Instead, we can only record the local potential between two nearby electrodes belonging to the same sensor device. To emulate this setting using a standard cap-EEG recording, we we can considers= each pair of electrodes within a certain maximum distance as a candidate electrode pair or node. By subtracting one channel from the other, we remove the common far-distance reference and obtain a signal that emulates the local potential of the node.
We applied this method to the High Gamma dataset as follows. First, the 44 channels covering the motor cortex were selected. These channels are indicated in the channel_labels.json file. Then, the rereferencing between channels with a distance threshold of 3 cm was applied, yielding a set of 286 candidate electrode pairs or nodes. The nodes.json file indicates the specific pair of channels composing each of these nodes. These have an average inter-electrode distance of 1.98 cm and a standard deviation of 0.59 cm. Finally, we applied the preprocessing described in [2], i.e., resampling at 250 Hz, highpass filtering above 4 Hz, standardizing the per-node mean and variance to 0 and 1 respectively, and extracting a window of 4.5 seconds for each trial.
[1] Strypsteen, Thomas, and Alexander Bertrand. "A distributed neural network architecture for dynamic sensor selection with application to bandwidth-constrained body-sensor networks." arXiv preprint arXiv:2308.08379 (2023).
[2] Schirrmeister, Robin Tibor, et al. "Deep learning with convolutional neural networks for EEG decoding and visualization." Human brain mapping 38.11 (2017): 5391-5420.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is an RO-Crate that bundles artifacts of an AI-based computational pipeline execution. It is an example of application of the CPM RO-Crate profile, which integrates the Common Provenance Model (CPM), and the Process Run Crate profile.
As the CPM is a groundwork for the ISO 23494 Biotechnology — Provenance information model for biological material and data provenance standards series development, the resulting profile and the example is intended to be presented at one of the ISO TC275 WG5 regular meetings, and will become an input for the ISO 23494-5 Biotechnology — Provenance information model for biological material and data — Part 5: Provenance of Data Processing standard development.
Description of the AI pipeline
The goal of the AI pipeline whose execution is described in the dataset is to train an AI model to detect the presence of carcinoma cells in high resolution human prostate images. The pipeline is implemented as a set of python scripts that work over a filesystem, where the datasets, intermediate results, configurations, logs, and other artifacts are stored. In particular, the AI pipeline consists of the following three general parts:
Image data preprocessing. Goal of this step is to prepare the input dataset – whole slide images (WSIs) and their annotations – for the AI model. As the model is not able to process the entire high resolution images, the preprocessing step of the pipeline splits the WSIs into groups (training and testing). Furthermore, each WSI is broken down into smaller overlapping parts called patches. The background patches are filtered out and the remaining tissue patches are labeled according to the provided pathologists’ annotations.
AI model training. Goal of this step is to train the AI model using the training dataset generated in the previous step of the pipeline. Result of this step is a trained AI model.
AI model evaluation. Goal of this step is to evaluate the trained model performance on a dataset which was not provided to the model during the training. Results of this step are statistics describing the AI model performance.
In addition to the above, execution of the steps results in generation of log files. The log files contain detailed traces of the AI pipeline execution, such as file paths, model weight parameters, timestamps, etc. As suggested by the CPM, the logfiles and additional metadata present on the filesystem are then used by a provenance generation step that transforms available information into the CPM compliant data structures, and serializes them into files.
Finally, all these artifacts are packed together in an RO-Crate.
For the purpose of the example, we have included only a small fragment of the input image dataset in the resulting crate, as this has no effect on how the Process Run Crate and CPM RO-Crate profiles are applied to the use case. In real world execution, the input dataset would consist of terabytes of data. In this example, we have selected a representative image for each of the input dataset parts. As a result, the only difference between the real world application and this example would be that the resulting real world crate would contain more input files.
Description of the RO-Crate
Process Run Crate related aspects
The Process Run Crate profile can be used to pack artifacts of a computational workflow of which individual steps are not controlled centrally. Since the pipeline presented in this example consists of steps that are executed individually, and that the pipeline execution is not managed centrally by a workflow engine, the process run crate can be applied.
Each of the computational steps is expressed within the crate’s ro-crate-metadata.json file as a pair of elements: 1) SW used to create files; 2) specific execution of that SW. In particular, we use the SoftwareSourceCode type to indicate the executed python scripts and the CreateAction type to indicate actual executions.
As a result, the crate consists the seven following “executables”:
Three python scripts, each corresponding to a part of the pipeline: preprocessing, training, and evaluation.
Four provenance generation scripts, three of which implement the transformation of the proprietary log files generated by the AI pipeline scripts into CPM compliant provenance files. The fourth one is a meta provenance generation script.
For each of the executables, their execution is expressed in the resulting ro-crate-metadata.json using the CreateAction type. As a result, seven create-actions are present in the resulting crate.
Input dataset, intermediate results, configuration files and resulting provenance files are expressed according to the underlying RO Crate specification.
CPM RO-Crate related aspects
The main purpose of the CPM RO-Crate profile is to enable identification of the CPM compliant provenance files within a crate. To achieve this, the CPM RO-Crate profile specification prescribes specific file types for such files: CPMProvenanceFile, and CPMMetaProvenanceFile.
In this case, the RO Crate contains three CPM Compliant files, each documenting a step of the pipeline, and a single meta-provenance file. These files are generated as a result of the three provenance generation scripts that use available log files and additional information to generate the CPM compliant files. In terms of the CPM, the provenance generation scripts are implementing the concept of provenance finalization event. The three provenance generation scripts are assigned SoftwareSourceCode type, and have corresponding executions expressed in the crate using the CreateAction type.
Remarks
The resulting RO Crate packs artifacts of an execution of the AI pipeline. The scripts that implement individual steps of the pipeline and provenance generation are not included in the crate directly. The implementation scripts are hosted on github and just referenced from the crate’s ro-crate-metadata.json file to their remote location.
The input image files included in this RO-Crate are coming from the Camelyon16 dataset.
Future NASA plans call for long-duration deep space missions with human crews. Because of light-time delay and other considerations, increased autonomy will be needed. This will necessitate integration of tools in such areas as anomaly detection, diagnosis, planning, and execution. In this paper we investigate an approach that integrates planning and execution by embedding planner-derived temporal constraints in an execution procedure. To avoid the need for propagation, we convert the temporal constraints to dispatchable form. We handle some uncertainty in the durations without it affecting the execution; larger variations may cause activities to be skipped.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset and replication package of the study "A continuous open source data collection platform for architectural technical debt assessment".
Abstract
Architectural decisions are the most important source of technical debt. In recent years, researchers spent an increasing amount of effort investigating this specific category of technical debt, with quantitative methods, and in particular static analysis, being the most common approach to investigate such a topic.
However, quantitative studies are susceptible, to varying degrees, to external validity threats, which hinder the generalisation of their findings.
In response to this concern, researchers strive to expand the scope of their study by incorporating a larger number of projects into their analyses. This practice is typically executed on a case-by-case basis, necessitating substantial data collection efforts that have to be repeated for each new study.
To address this issue, this paper presents our initial attempt at tackling this problem and enabling researchers to study architectural smells at large scale, a well-known indicator of architectural technical debt. Specifically, we introduce a novel approach to data collection pipeline that leverages Apache Airflow to continuously generate up-to-date, large-scale datasets using Arcan, a tool for architectural smells detection (or any other tool).
Finally, we present the publicly-available dataset resulting from the first three months of execution of the pipeline, that includes over 30,000 analysed commits and releases from over 10,000 open source GitHub projects written in 5 different programming languages and amounting to over a billion of lines of code analysed.
https://www.icpsr.umich.edu/web/ICPSR/studies/37879/termshttps://www.icpsr.umich.edu/web/ICPSR/studies/37879/terms
CAPITAL PUNISHMENT IN THE UNITED STATES, 1973-2018 provides annual data on prisoners under a sentence of death, as well as those who had their sentences commuted or vacated and prisoners who were executed. This study examines basic sociodemographic classifications including age, sex, race and ethnicity, marital status at time of imprisonment, level of education, and state and region of incarceration. Criminal history information includes prior felony convictions and prior convictions for criminal homicide and the legal status at the time of the capital offense. Additional information is provided on those inmates removed from death row by yearend 2018. The dataset consists of one part which contains 9,583 cases. The file provides information on inmates whose death sentences were removed in addition to information on those inmates who were executed. The file also gives information about inmates who received a second death sentence by yearend 2018 as well as inmates who were already on death row.