7 datasets found

o
Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...
openicpsr.org
search.datacite.org
Updated Aug 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/E102263V5
Explore at:
Unique identifier
https://doi.org/10.3886/E102263V5
Dataset updated
Aug 16, 2018
Dataset provided by
University of Pennsylvania
Authors
Jacob Kaplan
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
1980 - 2016
Area covered
United States
Description
Version 5 release notes:
Removes support for SPSS and Excel data.
Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.
Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.
Removes data on runaways.
Version 4 release notes:
Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.
Version 3 release notes:
Add data for 2016.
Order rows by year (descending) and ORI.
Version 2 release notes:
Fix bug where Philadelphia Police Department had incorrect FIPS county code.

The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.

All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:
Total Male Juvenile
Total Female Juvenile
Total Male Adult
Total Female Adult
Total Ma
c
Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...
s.cnmilf.com
data.usgs.gov
+2more
Updated Feb 22, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Geological Survey (2025). Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8 Analysis Ready Dataset Raster Images from 2013-2023 [Dataset]. https://s.cnmilf.com/user74170196/https/catalog.data.gov/dataset/water-temperature-of-lakes-in-the-conterminous-u-s-using-the-landsat-8-analysis-ready-2013
Explore at:
Dataset updated
Feb 22, 2025
Dataset provided by
United States Geological Surveyhttp://www.usgs.gov/
Area covered
Contiguous United States
Description
This data release contains lake and reservoir water surface temperature summary statistics calculated from Landsat 8 Analysis Ready Dataset (ARD) images available within the Conterminous United States (CONUS) from 2013-2023. All zip files within this data release contain nested directories using .parquet files to store the data. The file example_script_for_using_parquet.R contains example code for using the R arrow package (Richardson and others, 2024) to open and query the nested .parquet files. Limitations with this dataset include: - All biases inherent to the Landsat Surface Temperature product are retained in this dataset which can produce unrealistically high or low estimates of water temperature. This is observed to happen, for example, in cases with partial cloud coverage over a waterbody. - Some waterbodies are split between multiple Landsat Analysis Ready Data tiles or orbit footprints. In these cases, multiple waterbody-wide statistics may be reported - one for each data tile. The deepest point values will be extracted and reported for tile covering the deepest point. A total of 947 waterbodies are split between multiple tiles (see the multiple_tiles = “yes” column of site_id_tile_hv_crosswalk.csv). - Temperature data were not extracted from satellite images with more than 90% cloud cover. - Temperature data represents skin temperature at the water surface and may differ from temperature observations from below the water surface. Potential methods for addressing limitations with this dataset: - Identifying and removing unrealistic temperature estimates: - Calculate total percentage of cloud pixels over a given waterbody as: percent_cloud_pixels = wb_dswe9_pixels/(wb_dswe9_pixels + wb_dswe1_pixels), and filter percent_cloud_pixels by a desired percentage of cloud coverage. - Remove lakes with a limited number of water pixel values available (wb_dswe1_pixels < 10) - Filter waterbodies where the deepest point is identified as water (dp_dswe = 1) - Handling waterbodies split between multiple tiles: - These waterbodies can be identified using the "site_id_tile_hv_crosswalk.csv" file (column multiple_tiles = “yes”). A user could combine sections of the same waterbody by spatially weighting the values using the number of water pixels available within each section (wb_dswe1_pixels). This should be done with caution, as some sections of the waterbody may have data available on different dates. All zip files within this data release contain nested directories using .parquet files to store the data. The example_script_for_using_parquet.R contains example code for using the R arrow package to open and query the nested .parquet files. - "year_byscene=XXXX.zip" – includes temperature summary statistics for individual waterbodies and the deepest points (the furthest point from land within a waterbody) within each waterbody by the scene_date (when the satellite passed over). Individual waterbodies are identified by the National Hydrography Dataset (NHD) permanent_identifier included within the site_id column. Some of the .parquet files with the byscene datasets may only include one dummy row of data (identified by tile_hv="000-000"). This happens when no tabular data is extracted from the raster images because of clouds obscuring the image, a tile that covers mostly ocean with a very small amount of land, or other possible. An example file path for this dataset follows: year_byscene=2023/tile_hv=002-001/part-0.parquet -"year=XXXX.zip" – includes the summary statistics for individual waterbodies and the deepest points within each waterbody by the year (dataset=annual), month (year=0, dataset=monthly), and year-month (dataset=yrmon). The year_byscene=XXXX is used as input for generating these summary tables that aggregates temperature data by year, month, and year-month. Aggregated data is not available for the following tiles: 001-004, 001-010, 002-012, 028-013, and 029-012, because these tiles primarily cover ocean with limited land, and no output data were generated. An example file path for this dataset follows: year=2023/dataset=lakes_annual/tile_hv=002-001/part-0.parquet - "example_script_for_using_parquet.R" – This script includes code to download zip files directly from ScienceBase, identify HUC04 basins within desired landsat ARD grid tile, download NHDplus High Resolution data for visualizing, using the R arrow package to compile .parquet files in nested directories, and create example static and interactive maps. - "nhd_HUC04s_ingrid.csv" – This cross-walk file identifies the HUC04 watersheds within each Landsat ARD Tile grid. -"site_id_tile_hv_crosswalk.csv" - This cross-walk file identifies the site_id (nhdhr{permanent_identifier}) within each Landsat ARD Tile grid. This file also includes a column (multiple_tiles) to identify site_id's that fall within multiple Landsat ARD Tile grids. - "lst_grid.png" – a map of the Landsat grid tiles labelled by the horizontal – vertical ID.
d
Young and older adult vowel categorization responses
datadryad.org
zip
Updated Mar 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mishaela DiNino (2024). Young and older adult vowel categorization responses [Dataset]. http://doi.org/10.5061/dryad.brv15dvh0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.brv15dvh0
Dataset updated
Mar 14, 2024
Dataset provided by
Dryad
Authors
Mishaela DiNino
Time period covered
Feb 20, 2024
Description
Young and older adult vowel categorization responses

https://doi.org/10.5061/dryad.brv15dvh0

On each trial, participants heard a stimulus and clicked a box on the computer screen to indicate whether they heard "SET" or "SAT." Responses of "SET" are coded as 0 and responses of "SAT" are coded as 1. The continuum steps, from 1-7, for duration and spectral quality cues of the stimulus on each trial are named "DurationStep" and "SpectralStep," respectively. Group (young or older adult) and listening condition (quiet or noise) information are provided for each row of the dataset.
E
[Water Column Data - CTD] - Water column data from CTD casts along the East...
erddap.bco-dmo.org
Updated Apr 22, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
BCO-DMO (2019). [Water Column Data - CTD] - Water column data from CTD casts along the East Siberian Arctic Shelf on R/V Oden during 2011 (ESAS Water Column Methane project) (The East Siberian Arctic Shelf as a Source of Atmospheric Methane: First Approach to Quantitative Assessment) [Dataset]. https://erddap.bco-dmo.org/erddap/info/bcodmo_dataset_660543/index.html
Explore at:
Dataset updated
Apr 22, 2019
Dataset provided by
Biological and Chemical Oceanographic Data Management Office (BCO-DMO)
Authors
BCO-DMO
License
https://www.bco-dmo.org/dataset/660543/licensehttps://www.bco-dmo.org/dataset/660543/license
Area covered

Variables measured
Cl, Si, pH, CH4, DOC, DON, DOP, MOX, NH4, NO2, and 13 more
Description
Water column data from CTD casts along the East Siberian Arctic Shelf on R/V Oden during 2011 (ESAS Water Column Methane project) access_formats=.htmlTable,.csv,.json,.mat,.nc,.tsv,.esriCsv,.geoJson acquisition_description=Acquisition methods are described in the following publication: Orcut, B. et al. 2005

Core sectioning, porewater\u00a0collection\u00a0and analysis

At each sampling site, sediment sub-samples were collected for porewater analyses and at selected depths for microbial rate assays (AOM, anaerobic oxidation of methane oxidation; methanogenesis (MOG) from bicarbonate and acetate). Sediment was expelled from core liner using a hydraulic extruder under anoxic conditions. The depth intervals for extrusion varied. At each depth interval, a sub-sample was collected into a cut-off syringe for dissolved methane concentration quantification. Another 5 mL\u00a0sub- sample\u00a0was collected into pre-weighed and pre-combusted glass vial for determination of porosity (determined by the change in weight after drying at 80 degrees celsius to a constant weight). The remaining material was used for porewater extraction. Sample fixation and\u00a0analyses\u00a0for dissolved constituents followed the methods of Joye et al. (2010).\u00a0

Microbial Activity Measurements\u00a0

To determine AOM and MOG rates, 8 to 12 sub-samples (5 cm3) were collected from a core by manual insertion of a glass tube. For AOM, 100 uL of dissolved\u00a014CH4\u00a0tracer (about 2,000,000 DPM as gas) was injected into each core. Samples were incubated for 36 to 48 hours at\u00a0in situ\u00a0temperature.\u00a0 Following incubation, samples were transferred to 20 mL glass vials containing 2 mL of 2M NaOH (which served to arrest biological activity and fix\u00a014CO2\u00a0as\u00a014C-HCO3-).\u00a0 Each vial was sealed with a\u00a0teflon-lined screw cap, vortexed to mix the sample and base, and immediately frozen. Time zero samples were fixed immediately after radiotracer injection. The specific activity of the tracer substrate (14CH4) was determined by injecting 50 uL directly into scintillation cocktail (Scintiverse BD) followed by liquid scintillation counting. The accumulation of 14C product (14CO2) was determined by acid digestion following the method of Joye et al. (2010).\u00a0 The AOM rate was calculated using equation 1:

AOM Rate = [CH4] x alphaCH4 /t x (a-14CO2/a-14CH4)\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 \u00a0(Eq. 1)

Here, the AOM Rate is expressed as nmol CH4 oxidized per cm3 sediment per day (nmol\u00a0cm-3 d-1), [CH4] is the methane concentration (uM), alphaCH4 is the isotope fractionation factor for AOM (1.06; (ALPERIN and REEBURGH, 1988)), t is the incubation time (d), a-14CO2 is the activity of the product pool, and a-14CH4 is the activity of the substrate pool. If methane concentration was not available, the turnover time of the 14CH4 tracer is presented.

Rates of bicarbonate-based-methanogenesis and acetoclastic methanogenesis were determined by incubating samples in gas-tight, closed-tube vessels without headspace, to prevent the loss of gaseous 14CH4 product during sample manipulation. These sample tubes were sealed using custom-designed plungers (black Hungate stoppers with the lip removed containing a plastic \u201ctail\u201d that was run through the stopper) were inserted at the base of the tube; the sediment was then pushed via the plunger to the top of the tube until a small amount protruded through the tube opening. A butyl rubber septa\u00a0was\u00a0then eased into the tube opening to displace sediment in contact with the atmosphere and close the tube, which was then sealed with\u00a0a open-top\u00a0screw cap.\u00a0 The rubber materials used in these assays were boiled in 1N NaOH for 1 hour, followed by several rinses in boiling milliQ, to leach potentially toxic substances. \u00a0 \u00a0

A volume of radiotracer solution (100 uL of 14C-HCO3- tracer (~1 x 107\u00a0dpm\u00a0in slightly alkaline milliQ\u00a0water) or 1,2-14C-CH3COO- tracer (~5 x 107\u00a0dpm\u00a0in slightly alkaline milliQ\u00a0water)) was injected into each sample. Samples were incubated as described above and then 2 ml of 2N NaOH was injected through the top stopper into each sample to terminate biological activity (time zero samples were fixed prior to tracer injection).\u00a0 Samples were mixed to evenly distribute NaOH through the sample.\u00a0 Production of 14CH4 was quantified by stripping methane from the tubes with an air carrier, converting the 14CH4 to 14CO2 in a combustion furnace, and subsequent trapping of the 14CO2 in NaOH as carbonate (CRAGG et al., 1990; CRILL and MARTENS, 1986).\u00a0\u00a0Activity\u00a0of 14CO2 was measured subsequently by liquid scintillation counting.\u00a0

The rates of Bi-MOG and Ac-MOG rates were calculated using equations 2 and 3, respectively:

Bi-MOG Rate = [HCO3-] x alphaHCO3/t x\u00a0 (a-14CH4/a-H14CO3-) \u00a0 \u00a0 (Eq. 2)

Ac-MOG Rate = [CH3COO-] x alphaCH3COO-/t\u00a0 x\u00a0 (a-14CH4/a-14CH314COO-) \u00a0 \u00a0 (Eq. 3)

Both rates are expressed as nmol HCO3- or CH3COO-, respectively, reduced cm-3 d-1, alphaHCO3\u00a0and alphaCH3COO- are the isotope fractionation factors for MOG (assumed to be 1.06). [HCO3-] and [CH3COO-] are the\u00a0pore\u00a0water bicarbonate (mM) and acetate (uM) concentrations, respectively, t is incubation time (d), a-14CH4 is the activity of the product pool, and a-H14CO3 and a-14CH314COO are the activities of the substrate pools. If samples for substrate concentration determination were not available, the substrate turnover constant instead of the rate is presented.

For water column methane oxidation rate assays, triplicate 20 mL of live water (in addition to one 20 mL sample which was killed with ethanol (750 uL of pure EtOH) before tracer addition) were transferred from the CTD into serum vials. Samples were amended with 2 x 10^6 DPM of 3H-labeled-methane tracer and incubated for 24 to 72 hours (linearity of activity was tested and confirmed). After incubation, samples were fixed with ethanol, as above, and a sub-sample to determine total sample activity (3H-methane + 3H-water) was collected. Next, the sample was purged with nitrogen to remove the 3H-methane tracer and a sub-sample was amended with scintillation fluid and counted on a shipboard scintillation counter to determine the activity of tracer in the product of 3H-methane oxidation, 3H-water. The methane oxidation rate was calculated as:

MOX Rate = [methane concentration in nM] x alphaCH4/t\u00a0 x\u00a0 (a-3H- H2O/a-3H-CH4-) \u00a0 \u00a0 (Eq. 3) awards_0_award_nid=651766 awards_0_award_number=PLR-1023444 awards_0_data_url=http://www.nsf.gov/awardsearch/showAward?AWD_ID=1023444 awards_0_funder_name=NSF Division of Polar Programs awards_0_funding_acronym=NSF PLR awards_0_funding_source_nid=490497 awards_0_program_manager=Henrietta N Edmonds awards_0_program_manager_nid=51517 cdm_data_type=Other comment=Water Column Data S. Joye and V. Samarkin, PIs Version 4 October 2016 Conventions=COARDS, CF-1.6, ACDD-1.3 data_source=extract_data_as_tsv version 2.3 19 Dec 2019 defaultDataQuery=&time<now doi=10.1575/1912/bco-dmo.660543.1 Easternmost_Easting=178.9479 geospatial_lat_max=77.3829 geospatial_lat_min=65.0835 geospatial_lat_units=degrees_north geospatial_lon_max=178.9479 geospatial_lon_min=125.0406 geospatial_lon_units=degrees_east geospatial_vertical_max=651.0 geospatial_vertical_min=10.0 geospatial_vertical_positive=down geospatial_vertical_units=m infoUrl=https://www.bco-dmo.org/dataset/660543 institution=BCO-DMO instruments_0_acronym=CTD instruments_0_dataset_instrument_description=Used to collect water column samples instruments_0_dataset_instrument_nid=660553 instruments_0_description=The Conductivity, Temperature, Depth (CTD) unit is an integrated instrument package designed to measure the conductivity, temperature, and pressure (depth) of the water column. The instrument is lowered via cable through the water column and permits scientists observe the physical properties in real time via a conducting cable connecting the CTD to a deck unit and computer on the ship. The CTD is often configured with additional optional sensors including fluorometers, transmissometers and/or radiometers. It is often combined with a Rosette of water sampling bottles (e.g. Niskin, GO-FLO) for collecting discrete water samples during the cast. This instrument designation is used when specific make and model are not known. instruments_0_instrument_external_identifier=https://vocab.nerc.ac.uk/collection/L05/current/130/ instruments_0_instrument_name=CTD profiler instruments_0_instrument_nid=417 instruments_0_supplied_name=CTD keywords_vocabulary=GCMD Science Keywords metadata_source=https://www.bco-dmo.org/api/dataset/660543 Northernmost_Northing=77.3829 param_mapping={'660543': {'lat': 'master - latitude', 'depth_max': 'flag - depth', 'lon': 'master - longitude'}} parameter_source=https://www.bco-dmo.org/mapserver/dataset/660543/parameters people_0_affiliation=University of Georgia people_0_affiliation_acronym=UGA people_0_person_name=Samantha B. Joye people_0_person_nid=51421 people_0_role=Principal Investigator people_0_role_type=originator people_1_affiliation=University of Georgia people_1_affiliation_acronym=UGA people_1_person_name=Vladimir Samarkin people_1_person_nid=641543 people_1_role=Co-Principal Investigator people_1_role_type=originator people_2_affiliation=University of Georgia people_2_affiliation_acronym=UGA people_2_person_name=Samantha B. Joye people_2_person_nid=51421 people_2_role=Contact people_2_role_type=related people_3_affiliation=Woods Hole Oceanographic Institution people_3_affiliation_acronym=WHOI BCO-DMO people_3_person_name=Hannah Ake people_3_person_nid=650173 people_3_role=BCO-DMO Data Manager people_3_role_type=related project=ESAS Water Column Methane projects_0_acronym=ESAS Water Column Methane projects_0_description=We propose to study methane (CH4)
Z
HRV-ACC: a dataset with R-R intervals and accelerometer data for the...
data.niaid.nih.gov
zenodo.org
Updated Aug 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kamil Książek (2023). HRV-ACC: a dataset with R-R intervals and accelerometer data for the diagnosis of psychotic disorders using a Polar H10 wearable sensor [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8171265
Explore at:
Dataset updated
Aug 9, 2023
Dataset provided by
Kamil Książek
Piotr Gorczyca
Wilhelm Masarczyk
Robert Pudlo
Przemysław Głomb
Michał Romaszewski
Paweł Dębski
Magdalena Piegza
Iga Stokłosa
Piotr Ścisło
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
ABSTRACT

The issue of diagnosing psychotic diseases, including schizophrenia and bipolar disorder, in particular, the objectification of symptom severity assessment, is still a problem requiring the attention of researchers. Two measures that can be helpful in patient diagnosis are heart rate variability calculated based on electrocardiographic signal and accelerometer mobility data. The following dataset contains data from 30 psychiatric ward patients having schizophrenia or bipolar disorder and 30 healthy persons. The duration of the measurements for individuals was usually between 1.5 and 2 hours. R-R intervals necessary for heart rate variability calculation were collected simultaneously with accelerometer data using a wearable Polar H10 device. The Positive and Negative Syndrome Scale (PANSS) test was performed for each patient participating in the experiment, and its results were attached to the dataset. Furthermore, the code for loading and preprocessing data, as well as for statistical analysis, was included on the corresponding GitHub repository.

BACKGROUND

Heart rate variability (HRV), calculated based on electrocardiographic (ECG) recordings of R-R intervals stemming from the heart's electrical activity, may be used as a biomarker of mental illnesses, including schizophrenia and bipolar disorder (BD) [Benjamin et al]. The variations of R-R interval values correspond to the heart's autonomic regulation changes [Berntson et al, Stogios et al]. Moreover, the HRV measure reflects the activity of the sympathetic and parasympathetic parts of the autonomous nervous system (ANS) [Task Force of the European Society of Cardiology the North American Society of Pacing Electrophysiology, Matusik et al]. Patients with psychotic mental disorders show a tendency for a change in the centrally regulated ANS balance in the direction of less dynamic changes in the ANS activity in response to different environmental conditions [Stogios et al]. Larger sympathetic activity relative to the parasympathetic one leads to lower HRV, while, on the other hand, higher parasympathetic activity translates to higher HRV. This loss of dynamic response may be an indicator of mental health. Additional benefits may come from measuring the daily activity of patients using accelerometry. This may be used to register periods of physical activity and inactivity or withdrawal for further correlation with HRV values recorded at the same time.

EXPERIMENTS

In our experiment, the participants were 30 psychiatric ward patients with schizophrenia or BD and 30 healthy people. All measurements were performed using a Polar H10 wearable device. The sensor collects ECG recordings and accelerometer data and, additionally, prepares a detection of R wave peaks. Participants of the experiment had to wear the sensor for a given time. Basically, it was between 1.5 and 2 hours, but the shortest recording was 70 minutes. During this time, evaluated persons could perform any activity a few minutes after starting the measurement. Participants were encouraged to undertake physical activity and, more specifically, to take a walk. Due to patients being in the medical ward, they received instruction to take a walk in the corridors at the beginning of the experiment. They were to repeat the walk 30 minutes and 1 hour after the first walk. The subsequent walks were to be slightly longer (about 3, 5 and 7 minutes, respectively). We did not remind or supervise the command during the experiment, both in the treatment and the control group. Seven persons from the control group did not receive this order and their measurements correspond to freely selected activities with rest periods but at least three of them performed physical activities during this time. Nevertheless, at the start of the experiment, all participants were requested to rest in a sitting position for 5 minutes. Moreover, for each patient, the disease severity was assessed using the PANSS test and its scores are attached to the dataset.

The data from sensors were collected using Polar Sensor Logger application [Happonen]. Such extracted measurements were then preprocessed and analyzed using the code prepared by the authors of the experiment. It is publicly available on the GitHub repository [Książek et al].

Firstly, we performed a manual artifact detection to remove abnormal heartbeats due to non-sinus beats and technical issues of the device (e.g. temporary disconnections and inappropriate electrode readings). We also performed anomaly detection using Daubechies wavelet transform. Nevertheless, the dataset includes raw data, while a full code necessary to reproduce our anomaly detection approach is available in the repository. Optionally, it is also possible to perform cubic spline data interpolation. After that step, rolling windows of a particular size and time intervals between them are created. Then, a statistical analysis is prepared, e.g. mean HRV calculation using the RMSSD (Root Mean Square of Successive Differences) approach, measuring a relationship between mean HRV and PANSS scores, mobility coefficient calculation based on accelerometer data and verification of dependencies between HRV and mobility scores.

DATA DESCRIPTION

The structure of the dataset is as follows. One folder, called HRV_anonymized_data contains values of R-R intervals together with timestamps for each experiment participant. The data was properly anonymized, i.e. the day of the measurement was removed to prevent person identification. Files concerned with patients have the name treatment_X.csv, where X is the number of the person, while files related to the healthy controls are named control_Y.csv, where Y is the identification number of the person. Furthermore, for visualization purposes, an image of the raw RR intervals for each participant is presented. Its name is raw_RR_{control,treatment}_N.png, where N is the number of the person from the control/treatment group. The collected data are raw, i.e. before the anomaly removal. The code enabling reproducing the anomaly detection stage and removing suspicious heartbeats is publicly available in the repository [Książek et al]. The structure of consecutive files collecting R-R intervals is following:

Phone timestamp RR-interval [ms] 12:43:26.538000 651 12:43:27.189000 632 12:43:27.821000 618 12:43:28.439000 621 12:43:29.060000 661 ... ...

The first column contains the timestamp for which the distance between two consecutive R peaks was registered. The corresponding R-R interval is presented in the second column of the file and is expressed in milliseconds.
The second folder, called accelerometer_anonymized_data contains values of accelerometer data collected at the same time as R-R intervals. The naming convention is similar to that of the R-R interval data: treatment_X.csv and control_X.csv represent the data coming from the persons from the treatment and control group, respectively, while X is the identification number of the selected participant. The numbers are exactly the same as for R-R intervals. The structure of the files with accelerometer recordings is as follows:

Phone timestamp X [mg] Y [mg] Z [mg] 13:00:17.196000 -961 -23 182 13:00:17.205000 -965 -21 181 13:00:17.215000 -966 -22 187 13:00:17.225000 -967 -26 193 13:00:17.235000 -965 -27 191 ... ... ... ...

The first column contains a timestamp, while the next three columns correspond to the currently registered acceleration in three axes: X, Y and Z, in milli-g unit.

We also attached a file with the PANSS test scores (PANSS.csv) for all patients participating in the measurement. The structure of this file is as follows:

no_of_person PANSS_P PANSS_N PANSS_G PANSS_total 1 8 13 22 43 2 11 7 18 36 3 14 30 44 88 4 18 13 27 58 ... ... ... ... ..

The first column contains the identification number of the patient, while the three following columns refer to the PANSS scores related to positive, negative and general symptoms, respectively.

USAGE NOTES

All the files necessary to run the HRV and/or accelerometer data analysis are available on the GitHub repository [Książek et al]. HRV data loading, preprocessing (i.e. anomaly detection and removal), as well as the calculation of mean HRV values in terms of the RMSSD, is performed in the main.py file. Also, Pearson's correlation coefficients between HRV values and PANSS scores and the statistical tests (Levene's and Mann-Whitney U tests) comparing the treatment and control groups are computed. By default, a sensitivity analysis is made, i.e. running the full pipeline for different settings of the window size for which the HRV is calculated and various time intervals between consecutive windows. Preparing the heatmaps of correlation coefficients and corresponding p-values can be done by running the utils_advanced_plots.py file after performing the sensitivity analysis. Furthermore, a detailed analysis for the one selected set of hyperparameters may be prepared (by setting sensitivity_analysis = False), i.e. for 15-minute window sizes, 1-minute time intervals between consecutive windows and without data interpolation method. Also, patients taking quetiapine may be excluded from further calculations by setting exclude_quetiapine = True because this medicine can have a strong impact on HRV [Hattori et al].

The accelerometer data processing may be performed using the utils_accelerometer.py file. In this case, accelerometer recordings are downsampled to ensure the same timestamps as for R-R intervals and, for each participant, the mobility coefficient is calculated. Then, a correlation
f
Population and GDP/GNI/CO2 emissions (2019, raw data)
figshare.com
txt
Updated Feb 23, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Liang Zhao (2023). Population and GDP/GNI/CO2 emissions (2019, raw data) [Dataset]. http://doi.org/10.6084/m9.figshare.22085060.v6
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.22085060.v6
Dataset updated
Feb 23, 2023
Dataset provided by
figshare
Authors
Liang Zhao
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Original dataset The original year-2019 dataset was downloaded from the World Bank Databank by the following approach on July 23, 2022.

Database: "World Development Indicators" Country: 266 (all available) Series: "CO2 emissions (kt)", "GDP (current US$)", "GNI, Atlas method (current US$)", and "Population, total" Time: 1960, 1970, 1980, 1990, 2000, 2010, 2017, 2018, 2019, 2020, 2021 Layout: Custom -> Time: Column, Country: Row, Series: Column Download options: Excel

Preprocessing

With libreoffice,

remove non-country entries (lines after Zimbabwe), shorten column names for easy processing: Country Name -> Country, Country Code -> Code, "XXXX ... GNI ..." -> GNI_1990, etc (notice '_', not '-', for R), remove unnesssary rows after line Zimbabwe.
PeakAffectDS
zenodo.org
explore.openaire.eu
zip
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nick Greene; Steven R. Livingstone; Steven R. Livingstone; Lech Szymanski; Lech Szymanski; Nick Greene (2025). PeakAffectDS [Dataset]. http://doi.org/10.5281/zenodo.6403363
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6403363
Dataset updated
Apr 24, 2025
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Nick Greene; Steven R. Livingstone; Steven R. Livingstone; Lech Szymanski; Lech Szymanski; Nick Greene
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Contact Information

If you would like further information about PeakAffectDS, to purchase a commercial license, or if you experience any issues downloading files, please contact us at peakaffectds@gmail.com.

Description

PeakAffectDS contains 663 files (total size: 1.84 GB), consisting of 612 physiology files, and 51 perceptual rating files. The dataset contains 51 untrained research participants (39 female, 12 male), who had their body physiology recorded while watching movie clips validated to induce strong emotional reactions. Emotional conditions included: calm, happy, sad, angry, fearful, and disgust; along with baseline a neutral condition. Four physiology channels were recorded with a Biopac MP36 system: two facial muscles with fEMG (zygomaticus major, corrugator supercilii) using Ag/AgCl electrodes, heart activity with ECG using a 1-Lead, Lead II configuration, and respiration with a wearable strain-gauge belt. While viewing movie clips, participants indicated in real-time when they experienced a "peak" emotional event, including: chills, tears, or the startle reflex. After each clip, participants further rated their felt emotional state using a forced-choice categorical response measure, along with their felt Arousal and Valence. All data are provided in plaintext (.csv) format.

PeakAffectDS was created in the Affective Data Science Lab.

Physiology files

Each participant has 12 .CSV physiology files, consisting of 6 Emotional conditions, and 6 Neutral baseline conditions. All physiology channels were recorded at 2000 Hz. A 50Hz notch filter was then applied to fEMG and ECG channels to remove mains hum. Each .CSV file contains 6 columns, in order from left to right:

Sample timestamp (units: seconds)

EMG Zygomaticus (units: millivolts)

EMG Corrugator (units: millivolts)

ECG (units: millivolts)

Peak event makers: 0 = no event, 1 = chills, 2 = tears, 3 = startle

Perceptual files

There are 51 perceptual ratings files, one for each participant. Each .CSV file contains 4 columns, in order from left to right:

Filename of presented stimulus (see File naming Convention, below)

Felt emotional response: 1 = neutral, 2 = calm, 3 = happy, 4 = sad, 5 = angry, 6 = fearful, 7 = disgust

Felt Valence, ranging from: 1 = Very negative, to 7 = Very positive

Felt Arousal, ranging from: 1 = Very low, to 7 = Very high

File naming convention

Each of the 612 physiology files has a unique filename. The filename consists of a 3-part numerical identifier (e.g., 09-02-03.csv). The first identifier refers to the participant's ID (09), while the remaining two identifiers refer to the stimulus presented for that recording (02-03.mp4); these identifiers define the stimulus characteristics:

Participant: 01 = participant 1, 02 = participant 2, ..., 51 = participant 51.

Emotion: 01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust.

Stimulus set. For Emotional files: 01 = group 1, 02 = group 2, 03 = group 3. For Neutral files: 01 = instance 1, 02 = instance 2, ..., 06 = instance 6.

Filename example: 09-02-03.csv

Participant 9 (09)

Calm (02)

Stimulus Set 3 (03)

Filename example: 09-01-05.csv

Participant 9 (09)

Neutral (01)

Instance 5 (05)

Methods

A 1-way mixed-design was used, with a within-subjects factor Emotion (6 levels: Calm, Happy, Sad, Angry, Fearful, Disgust) and a between-subjects factor Stimulus Set (3 levels). Trials were blocked by Affect Condition (Baseline, Emotional), with each participant presented 6 blocked trials: Baseline (neutral), then Emotional (Calm, ..., Disgust). This design reduced potential contamination from preceeding emotional trials, by ensuring that participant's physiology began close to a resting baseline for emotional conditions.

Emotion was presented in pseudorandom order using a carryover balanced generalised Youden design, generated by the crossdes package in R. Eighteen emotional movie clips were used as stimuli, with three instances for each emotion category (6x3). Clips were then grouped into one of three Stimulus Sets, with participants assigned to a given Set using Block randomisation. For example, participants assigned to Stimulus Set 1 (PID: 1, 4, 7, ...) all saw the same movie clips, but these clips differed to those in Sets 2 and 3. Six Neutral baseline movie clips were used as stimuli, with all participants viewing the same neutral clips, with their order also generated with a Youden design.

Stimulus duration varied, with clips lasting several minutes. Lengthy clips without repetition were used to help ensure that participants became engaged, and experienced genuine, strong emotional responses. Participants were instructed to immediately indicate using the keyboard when experiencing a "peak" emotional event, including: chills, tears, or startle. Participants were permitted to indicate multiple events in a single trial, and identified the type of the evens at the trial feedback stage, along with ratings of emotional category, arousal, and valence. The concept of peak physiological events was explained at the beginning of the experiment, but the three states were not described as being associated with any particular emotion or valence.

License information

PeakAffectDS is released under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0.

Citing PeakAffectDS

Greene, N., Livingstone, S. R., & Szymanski, L. (2022). PeakAffectDB [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6403363
Not seeing a result you expected?
Learn how you can add new datasets to our index.

Facebook

Twitter

Click to copy link

Link copied

Cite

Jacob Kaplan (2018). Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016 [Dataset]. http://doi.org/10.3886/E102263V5

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.3886/E102263V5

Dataset updated

Aug 16, 2018

Dataset provided by

University of Pennsylvania

Authors

Jacob Kaplan

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered

1980 - 2016

Area covered

United States

Description

Version 5 release notes:

Removes support for SPSS and Excel data.
Changes the crimes that are stored in each file. There are more files now with fewer crimes per file. The files and their included crimes have been updated below.
Adds in agencies that report 0 months of the year.
Adds a column that indicates the number of months reported. This is generated summing up the number of unique months an agency reports data for. Note that this indicates the number of months an agency reported arrests for ANY crime. They may not necessarily report every crime every month. Agencies that did not report a crime with have a value of NA for every arrest column for that crime.
Removes data on runaways.

Version 4 release notes:

Changes column names from "poss_coke" and "sale_coke" to "poss_heroin_coke" and "sale_heroin_coke" to clearly indicate that these column includes the sale of heroin as well as similar opiates such as morphine, codeine, and opium. Also changes column names for the narcotic columns to indicate that they are only for synthetic narcotics.

Version 3 release notes:

Add data for 2016.
Order rows by year (descending) and ORI.

Version 2 release notes:

Fix bug where Philadelphia Police Department had incorrect FIPS county code.

The Arrests by Age, Sex, and Race data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains highly granular data on the number of people arrested for a variety of crimes (see below for a full list of included crimes). The data sets here combine data from the years 1980-2015 into a single file. These files are quite large and may take some time to load.

All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. If you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.

I did not make any changes to the data other than the following. When an arrest column has a value of "None/not reported", I change that value to zero. This makes the (possible incorrect) assumption that these values represent zero crimes reported. The original data does not have a value when the agency reports zero arrests other than "None/not reported." In other words, this data does not differentiate between real zeros and missing values. Some agencies also incorrectly report the following numbers of arrests which I change to NA: 10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000, 100000, 99999, 99998.

To reduce file size and make the data more manageable, all of the data is aggregated yearly. All of the data is in agency-year units such that every row indicates an agency in a given year. Columns are crime-arrest category units. For example, If you choose the data set that includes murder, you would have rows for each agency-year and columns with the number of people arrests for murder. The ASR data breaks down arrests by age and gender (e.g. Male aged 15, Male aged 18). They also provide the number of adults or juveniles arrested by race. Because most agencies and years do not report the arrestee's ethnicity (Hispanic or not Hispanic) or juvenile outcomes (e.g. referred to adult court, referred to welfare agency), I do not include these columns.

To make it easier to merge with other data, I merged this data with the Law Enforcement Agency Identifiers Crosswalk (LEAIC) data. The data from the LEAIC add FIPS (state, county, and place) and agency type/subtype. Please note that some of the FIPS codes have leading zeros and if you open it in Excel it will automatically delete those leading zeros.

I created 9 arrest categories myself. The categories are:

Total Male Juvenile
Total Female Juvenile
Total Male Adult
Total Female Adult
Total Ma

Clear search

Close search

Google apps

Main menu

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race,...

Water Temperature of Lakes in the Conterminous U.S. Using the Landsat 8...

Young and older adult vowel categorization responses

Young and older adult vowel categorization responses

[Water Column Data - CTD] - Water column data from CTD casts along the East...

HRV-ACC: a dataset with R-R intervals and accelerometer data for the...

Population and GDP/GNI/CO2 emissions (2019, raw data)

PeakAffectDS

Uniform Crime Reporting (UCR) Program Data: Arrests by Age, Sex, and Race, 1980-2016