Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the author is R. Christian Bohlen. It features 2 columns including publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Flexibility of Intrinsic Neural Timescales During Distinct Behavioral StatesDescription of the data and file structureThe github page https://github.com/duodenum96/acwbehavior contains the MATLAB code to replicate the figures. The numerical data necessary for the figures are presented here. We present the data in two formats, .mat format for MATLAB users and .csv format as well. Non-MATLAB users can access the data in .mat files using conversion software such as scipy.io.loadmat as well. The correspondence between .mat and .csv files are explained in get_figure_data.m. For any questions, you can contact to catalyasir@gmail.com.Files and variablesget_figure_data.m: Correspondence between .csv and .mat filesmice_behavioral_states.csv (822 x 1): Each row corresponds to one behavioral state of mice, in agreement with mice_acw_values.csv.mice_acw_values.csv (822 x 92): Each row corresponds to one ACW value estimated from the 10 second interval. The states are in agreement with mice_behavioral_states.csv. Each row is an interval, each column is a ROI.mousenames.csv (822 x 1): Identifiers of the micelabels.csv (1 x 5): Five labels for behavioral states.\\_all_acws.csv (987 x 64): ACW values estimated from 10 second intervals in human EEG data. \\ corresponds to rest, self and other conditions. Each row is an interval, each column is an EEG channel.subjcode.csv (987 x 1): Subject identifiers in numeric form for human EEG data.acwdr_ca.mat: MATLAB struct containing mice data.\\_acws_acwdr.mat: MATLAB struct containing human data. \\ corresponds to rest, self and other conditions.Code/softwarecsv files do not require any specific software. MATLAB files can be opened via MATLAB or software such as scipy.io.loadmat (python) or R.matlab package (R, function readMat in particular) or MAT.jl (Julia).
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the author is Henry R. Guly. It features 2 columns including publication date.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
!!!WARNING~~~This dataset has a large number of flaws and is unable to properly answer many questions that people generally use it to answer, such as whether national hate crimes are changing (or at least they use the data so improperly that they get the wrong answer). A large number of people using this data (academics, advocates, reporting, US Congress) do so inappropriately and get the wrong answer to their questions as a result. Indeed, many published papers using this data should be retracted. Before using this data I highly recommend that you thoroughly read my book on UCR data, particularly the chapter on hate crimes (https://ucrbook.com/hate-crimes.html) as well as the FBI's own manual on this data. The questions you could potentially answer well are relatively narrow and generally exclude any causal relationships. ~~~WARNING!!!For a comprehensive guide to this data and other UCR data, please see my book at ucrbook.comVersion 8 release notes:Adds 2019 and 2020 data. Please note that the FBI has retired UCR data ending in 2020 data so this will be the last UCR hate crime data they release. Changes .rda file to .rds.Version 7 release notes:Changes release notes description, does not change data.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), made all character values lower case, reordered columns. I also generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
Which countries have the most social contacts in the world? In particular, do countries with more social contacts among the elderly report more deaths caused by a pandemic caused by a respiratory virus?
With the emergence of the COVID-19 pandemic, reports have shown that the elderly are at a higher risk of dying than any other age groups. 8 out of 10 deaths reported in the U.S. have been in adults 65 years old and older. Countries have also began to enforce 2km social distancing to contain the pandemic.
To this end, I wanted to explore the relationship between social contacts among the elderly and its relationship with the number of COVID-19 deaths across countries.
This dataset includes a subset of the projected social contact matrices in 152 countries from surveys Prem et al. 2020. It was based on the POLYMOD study where information on social contacts was obtained using cross-sectional surveys in Belgium (BE), Germany (DE), Finland (FI), Great Britain (GB), Italy (IT), Luxembourg (LU), The Netherlands (NL), and Poland (PL) between May 2005 and September 2006.
This dataset includes contact rates from study participants ages 65+ for all countries from all sources of contact (work, home, school and others).
I used this R code to extract this data:
load('../input/contacts.Rdata') # https://github.com/kieshaprem/covid19-agestructureSEIR-wuhan-social-distancing/blob/master/data/contacts.Rdata
View(contacts)
contacts[["ALB"]][["home"]]
contacts[["ITA"]][["all"]]
rowSums(contacts[["ALB"]][["all"]])
out1 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[16,]; out <- rbind(out, data.frame(x)) }
out2 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[15,]; out <- rbind(out, data.frame(x)) }
out3 = data.frame(); for (n in names(contacts)) { x = (contacts[[n]][["all"]])[14,]; out <- rbind(out, data.frame(x)) }
m1 = data.frame(t(matrix(unlist(out1), nrow=16)))
m2 = data.frame(t(matrix(unlist(out2), nrow=16)))
m3 = data.frame(t(matrix(unlist(out3), nrow=16)))
rownames(m1) = names(contacts)
colnames(m1) = c("00_04", "05_09", "10_14", "15_19", "20_24", "25_29", "30_34", "35_39", "40_44", "45_49", "50_54", "55_59", "60_64", "65_69", "70_74", "75_79")
rownames(m2) = rownames(m1)
rownames(m3) = rownames(m1)
colnames(m2) = colnames(m1)
colnames(m3) = colnames(m1)
write.csv(zapsmall(m1),"contacts_75_79.csv", row.names = TRUE)
write.csv(zapsmall(m2),"contacts_70_74.csv", row.names = TRUE)
write.csv(zapsmall(m3),"contacts_65_69.csv", row.names = TRUE)
Rows names correspond to the 3 letter country ISO code, e.g. ITA represents Italy. Column names are the age groups of the individuals contacted in 5 year intervals from 0 to 80 years old. Cell values are the projected mean social contact rate.
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F1139998%2Ffa3ddc065ea46009e345f24ab0d905d2%2Fcontact_distribution.png?generation=1588258740223812&alt=media" alt="">
Thanks goes to Dr. Kiesha Prem for her correspondence and her team for publishing their work on social contact matrices.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
We used the modified Price equation to quantify how the presence of shrubs alters the richness, composition, and abundance of plant and nematode communities, and the resulting effects on ecosystem functioning (i.e., plant biomass and nematode carbon [C] metabolism) on the Qinghai-Tibet Plateau.“README.txt” file shows the analysis methods performed by all code and the conditions, data and corresponding results that need to be run for each code file.“Metadata.docx” file shows what the data are in each column and what rows represent.“.r” files show all codes required for manuscripts and supporting information.“.csv” files show all data required for manuscripts and supporting information.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about politicians. It has 1 row and is filtered where the political party is R Renaissance (Monaco). It features 3 columns: Instagram link, and X followers.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The segment counts by social group and species or species group for the Waterfowl Breeding Population and Habitat Survey and associated segment effort information. Three data files are included with their associated metadata (html and xml formats). Segment counts are summed counts of waterfowl per segment and are separated into two files, described below, along with the effort table needed to analyze recent segment count information. 1. wbphs_segment_counts_1955to1999_forDistribution.csv, which represents the period prior the collection of geolocated data. There is no associated effort file for these counts and segments with zero birds are included in the segment counts table, so effort can be inferred; there is no information to determine the proportion of each segment surveyed for this period and it must be presumed they were surveyed completely. Number of rows in table = 1,988,290. 2. wbphs_segment_counts_forDistribution.csv, which contains positive segment records only, by species or species group beginning with 2000. wbphs_segment_effort_forDistribution.csv file is important for this segment counts file and can be used to infer zero value segments, by species or species group. Number of rows in table = 365,863. 3. wbphs_segment_effort_forDistribution.csv. The segment survey effort and location from the Waterfowl Breeding Population and Habitat Survey beginning with 2000. If a segment was not flown, it is absent from the table for the corresponding year. Number of rows in table = 65,122. Also included here is a small R code file, createSingleSegmentCountTable.R, which can be run to format the 2000+ data to match the 1955-1999 format and combine the data over the two time periods. Please consult the metadata for an explanation of the fields and other information to understand the limitations of the data.
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Data and coding scripts for Seddon et al. (2016) Nature (DOI 10.1038/nature16986). We derived monthly time-series of four key terrestrial ecosystem variables at 0.05 degree (~5km) resolution from observations by the MODIS sensor on Terra (AM) for the period February 2010-December 2013 inclusive, and developed a method to identify vegetation sensitivity to climate variability over this period (see Methods in main paper).
This ORA item contains all data and files required to run the analysis described in the paper. Data required to run the script are provided in six zip files evi.zip, temp.zip, aetpet.zip, cld.zip, stdev.zip, numpxl.zip, each containing 167 text files, one per month of available data, in addition to a supporting files folder. Details are as follows.
supporting_files.zip : This directory includes computer code and additional supporting files. Please see the 'read me.txt' file within this directory for more information.
evi.zip: ENHANCED VEGETATION INDEX (EVI). We used the MOD13C2 product (Huete et al 2002) which comprises monthly, global EVI at 0.05 degree resolution. In some cases where no clear-sky observations are available, the MOD13C2 version 5 product replaces no-data values with climatological monthly means, so we removed these values where appropriate.
EVI format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = dimensionless scale factor = 10000 (divide the value by 10000 to get EVI) filenames = yyyymmevi.txt
numpxl.zip - COUNTS OF THE NUMBER OF PIXELS USED IN EVI CALCULATION. The MOD13C2 product is the result of a spatially and temporally averaged mosaic of higher resolution (1km pixels). Data in this directory represent the number of 1km observations used to calculate the MODIS EVI product. See the online documentation for more details (Solano et al. 2010).
numpxl format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = counts filenames = yyyy_mm_numpxl_pt05deg.txt
stdev.zip - STANDARD DEVIATION OF EVI VALUES. Standard deviation of the monthly EVI observations. See discussion in numpxl.zip item (above) and the online documentation for more details (Solano et al. 2010).
stdev format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = dimensionless scale factor = 10000 (divide the value by 10000 to get EVI) filenames = yyyy_mm_stdev_pt05deg.txt
temp.zip: AIR TEMPERATURE. We used the MOD07_L2 Atmospheric Profile product (Seeman et al 2006) as a measure of air temperature. Five-minute swaths of Retrieved Temperature Profile were projected to geographic co-ordinates. Pixels from the highest available pressure level, corresponding to the temperature closest to the Earth's surface, were selected from each swath. Swaths were then mean-mosaicked into global daily grids, and the daily global grids were mean-composited to monthly grids of air temperature.
Air temperature format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = degrees C scale factor = 1 (divide the value by 1 to get Air temperature) filenames = yyyymmtemp.txt
aetpet.zip: WATER AVAILABILITY. We used the MOD16 Global Evapotranspiration product (Mu et al 2011) to calculate the monthly 0.05 degree ratio of Actual to Potential Evapotranspiration (AET/PET).
AET/PET format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = dimensionless scale factor = 10000 (divide the value by 10000 to get AET/PET) filenames = yyyymmaetpet.txt
cld.zip - CLOUDINESS. We used the MOD35_L2 Cloud Mask product (Ackerman et al 2010). This product provides daily records on the presence of cloudy vs cloudless skies, and we used this to make an index of the proportion of of cloudy to clear days in a given pixel. After conversion to geographic co-ordinates, five-minute swaths at 1-km resolution were reclassed as clear sky or cloudy, and these daily swaths were mean-mosaicked to global coverages, mean composited from daily to monthly, and mean-aggregated from 1km to 0.05 degree.
cld format = ascii text file projection = geographic projection spatial resolution = 0.05 degrees min x = -180 max x = 180 min y = -60 max x = 90 rows = 3000 cols = 7200 bit depth = 16 bit signed integer nodata (sea) = -9999 missing data (on land) = -999 units = percentage of days in the month which were cloudy scale factor = 100 (divide the value by 100 to get percentage cloudy days) filenames = yyyymmcld.txt
References
Ackerman, S. et al. (2010) Discriminating clear-sky from cloud with MODIS: Algorithm Theoretical Basis Document (MOD35), Version 6.1. (URL: ttp://modis- atmos.gsfc.nasa.gov/_docs/MOD35_A TBD_Collection6.pdf)
Huete, A. et al. (2002) Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment 83, 195–213.
Mu, Q., Zhao, M., Running, S.R. (2011) Improvements to a MODIS global terrestrial evapotranspiration algorithm. Remote Sensing of Environment 115, 1781-1800
Seeman, S. W., Borbas, E. E., Li, J., Menzel, W. P. & Gumley, L. E. (2006) MODIS Atmospheric Profile Retrieval Algorithm Theoretical Basis Document, Version 6 (URL: http://modis-atmos.gsfc.nasa.gov/_docs/MOD07_atbd_v7_April2011.pdf)
Solano, R. et al. (2010) MODIS Vegetation Index User’s Guide (MOD13 Series) Version 2.00, May 2010 (Collection 5) (URL: http://vip.arizona.edu/documents/MODIS/MODIS_VI_UsersGuide_01_2012.pdf) Seddon et al. (2016) Nature (DOI 10.1038/nature16986) ABSTRACT: Identification of properties that contribute to the persistence and resilience of ecosystems despite climate change constitutes a research priority of global significance. Here, we present a novel, empirical approach to assess the relative sensitivity of ecosystems to climate variability, one property of resilience that builds on theoretical modelling work recognising that systems closer to critical thresholds respond more sensitively to external perturbations. We develop a new metric, the Vegetation Sensitivity Index (VSI) which identifies areas sensitive to climate variability over the past 14 years. The metric uses time-series data of MODIS derived Enhanced Vegetation Index (EVI) and three climatic variables that drive vegetation productivity (air temperature, water availability and cloudiness). Underlying the analysis is an autoregressive modelling approach used to identify regions with memory effects and reduced response rates to external forcing. We find ecologically sensitive regions with amplified responses to climate variability in the arctic tundra, parts of the boreal forest belt, the tropical rainforest, alpine regions worldwide, steppe and prairie regions of central Asia and North and South America, the Caatinga deciduous forest in eastern South America, and eastern areas of Australia. Our study provides a quantitative methodology for assessing the relative response rate of ecosystems – be they natural or with a strong anthropogenic signature – to environmental variability, which is the first step to address why some regions appear to be more sensitive than others and what impact this has upon the resilience of ecosystem service provision and human wellbeing.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
For any questions about this data please email me at jacob@crimedatatool.com. If you use this data, please cite it.Version 6 release notes:Adds 2018 dataVersion 5 release notes:Adds data in the following formats: SPSS, SAS, and Excel.Changes project name to avoid confusing this data for the ones done by NACJD.Adds data for 1991.Fixes bug where bias motivation "anti-lesbian, gay, bisexual, or transgender, mixed group (lgbt)" was labeled "anti-homosexual (gay and lesbian)" prior to 2013 causing there to be two columns and zero values for years with the wrong label.All data is now directly from the FBI, not NACJD. The data initially comes as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
File List CAraw_data.csv CA_raw.R CA_rel_to_raw.swf Description The CAraw_data.csv file is a comma-delimited data file containing the benthic abundance data analyzed in the statistical report. --- Rows are the 13 sampling stations (stations R40 and R42 are references) --- Columns are the 92 species, with abbreviated latin names Note that the first row contains the 92 column names, and the first column the 13 site names. Thus there is one fewer item in the first row, which is convenient for reading with the read.table function in R. The CA_raw.R file is an R script for performing the computations of CA-raw. The CA_rel_to_raw.swf file is a video file showing the transition from CA-relative (the usual form of correspondence analysis) to CA-raw. (This video in shockwave flash format can be opened in a browser such as Firefox).
Amount of calling activity (calling effort) is a strong determinant of male mating success in species such as orthopterans and anurans that use acoustic communication in the context of mating behaviour. While many studies in crickets have investigated the determinants of calling effort, patterns of variability in male calling effort in natural choruses remain largely unexplored. Within-individual variability in calling activity across multiple nights of calling can influence female mate search and mate choice strategies. Moreover, calling site fidelity across multiple nights of calling can also affect the female mate sampling strategy. We therefore investigated the spatio-temporal dynamics of acoustic signaling behaviour in a wild population of the field cricket species Plebeiogryllus guttiventris. We first studied the consistency of calling activity by quantifying variation in male calling effort across multiple nights of calling using repeatability analysis. Callers were inconsistent ...
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Walking and running are mechanically and energetically different locomotion modes. For selecting one or another, speed is a parameter of paramount importance. Yet, both are likely controlled by similar low-dimensional neuronal networks that reflect in patterned muscle activations called muscle synergies. Here, we investigated how humans synergistically activate muscles during locomotion at different submaximal and maximal speeds. We analysed the duration and complexity (or irregularity) over time of motor primitives, the temporal components of muscle synergies. We found that the challenge imposed by controlling high-speed locomotion forces the central nervous system to produce muscle activation patterns that are wider and less complex relative to the duration of the gait cycle. The motor modules, or time-independent coefficients, were redistributed as locomotion speed changed. These outcomes show that robust locomotion control at challenging speeds is achieved by modulating the relative contribution of muscle activations and producing less complex and wider control signals, whereas slow speeds allow for more irregular control.
In this supplementary data set we made available: a) the metadata with anonymized participant information, b) the raw EMG, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via NMF and f) the code to process the data, including the scripts to calculate the Higuchi's fractal dimension (HFD) of motor primitives. In total, 180 trials from 30 participants are included in the supplementary data set.
The file “metadata.dat” is available in ASCII and RData format and contains:
Code: the participant’s code
Group: the experimental group in which the participant was involved (G1 = walking and submaximal running; G2 = submaximal and maximal running)
Sex: the participant’s sex (M or F)
Speeds: the type of locomotion (W for walking or R for running) and speed at which the recordings were conducted in 10*[m/s]
Age: the participant’s age in years
Height: the participant’s height in [cm]
Mass: the participant’s body mass in [kg]
PB: 100 m-personal best time (for G2).
The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17). All the other trials consist of 30 gait cycles. Trials are named like “P20_R_20,” where the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.
Old versions not compatible with the R package musclesyneRgies
The files containing the gait cycle breakdown are available in RData format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with as many rows as the available number of gait cycles and two columns. The first column named “touchdown” contains the touchdown incremental times in seconds. The second column named “stance” contains the duration of each stance phase of the right foot in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P20_R_20,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times, the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17).
The files containing the raw, filtered and the normalized EMG data are available in RData format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with as many rows as the amount of recorded data points and 13 columns. The first column named “time” contains the incremental time in seconds. The remaining 12 columns contain the raw EMG data, named with muscle abbreviations that follow those reported above. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P03_R_30”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P03” indicate the participant number (in this example the 3rd), the character “R” indicate the locomotion type (see above), and the numbers “30” indicate the locomotion speed (see above). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.
The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorisation and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed in the methods section to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P12_W_07”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P12” indicate the participant number (in this example the 12th), the character “W” indicate the locomotion type (see above), and the numbers “07” indicate the speed (see above). Motor modules are data frames with 12 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported above, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P22_R_20”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P22” indicate the participant number (in this example the 22nd), the character “W” indicates the locomotion type (see above), and the numbers “20” indicate the speed (see above). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.
The files containing the HFD calculated from motor primitives are available in RData format, in the file named “HFD.RData”. HFD results are presented in a list of lists containing, for each trial, 1) the HFD, and 2) the interval time k used for the calculations. HFDs are presented as one number (mean HFD of the primitives for that trial), as are the interval times k. Trials are named like “HFD_P01_R_95”, where the characters “HFD” indicate that the trial contains HFD data, the characters “P01” indicate the participant number (in this example the 1st), the character “R” indicates the locomotion type (see above), and the numbers “95” indicate the speed (see above).
All the code used for the pre-processing of EMG data, the extraction of muscle synergies and the calculation of HFD is available in R format. Explanatory comments are profusely present throughout the script “muscle_synergies.R”.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Version 4 release notes: Adds data for 2017.Adds rows that submitted a zero-report (i.e. that agency reported no hate crimes in the year). This is for all years 1992-2017. Made changes to categorical variables (e.g. bias motivation columns) to make categories consistent over time. Different years had slightly different names (e.g. 'anti-am indian' and 'anti-american indian') which I made consistent. Made the 'population' column which is the total population in that agency. Version 3 release notes: Adds data for 2016.Order rows by year (descending) and ORI.Version 2 release notes: Fix bug where Philadelphia Police Department had incorrect FIPS county code. The Hate Crime data is an FBI data set that is part of the annual Uniform Crime Reporting (UCR) Program data. This data contains information about hate crimes reported in the United States. The data sets here combine all data from the years 1992-2015 into a single file. Please note that the files are quite large and may take some time to open.Each row indicates a hate crime incident for an agency in a given year. I have made a unique ID column ("unique_id") by combining the year, agency ORI9 (the 9 character Originating Identifier code), and incident number columns together. Each column is a variable related to that incident or to the reporting agency. Some of the important columns are the incident date, what crime occurred (up to 10 crimes), the number of victims for each of these crimes, the bias motivation for each of these crimes, and the location of each crime. It also includes the total number of victims, total number of offenders, and race of offenders (as a group). Finally, it has a number of columns indicating if the victim for each offense was a certain type of victim or not (e.g. individual victim, business victim religious victim, etc.). All the data was downloaded from NACJD as ASCII+SPSS Setup files and read into R using the package asciiSetupReader. All work to clean the data and save it in various file formats was also done in R. For the R code used to clean this data, see here. https://github.com/jacobkap/crime_data. The only changes I made to the data are the following. Minor changes to column names to make all column names 32 characters or fewer (so it can be saved in a Stata format), changed the name of some UCR offense codes (e.g. from "agg asslt" to "aggravated assault"), made all character values lower case, reordered columns. I also added state, county, and place FIPS code from the LEAIC (crosswalk) and generated incident month, weekday, and month-day variables from the incident date variable included in the original data. The zip file contains the data in the following formats and a codebook: .dta - Stata.rda - RIf you have any questions, comments, or suggestions please contact me at jkkaplan6@gmail.com.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Background
Is the control of movement less stable when we walk or run in challenging settings? Intuitively, one might answer that it is, given that challenging locomotion externally (e.g. rough terrain) or internally (e.g. age-related impairments) makes our movements more unstable. Here, we investigated how young and old humans synergistically activate muscles during locomotion when different perturbation levels are introduced. Of these control signals, called muscle synergies, we analyzed the stability over time and the complexity (or irregularity). Surprisingly, we found that perturbations force the central nervous system to produce muscle activation patterns that are less unstable and less complex. These outcomes show that robust locomotion in challenging settings is achieved by producing less complex control signals which are more stable over time, whereas easier tasks allow for more unstable and irregular control.
How to use the data set
This supplementary data set contains: a) the metadata with anonymized participant information, b) the raw electromyographic (EMG) data acquired during locomotion, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via non-negative matrix factorization and f) the code written in R (R Found. for Stat. Comp.) to process the data, including the scripts to calculate the short-term Maximum Lyapunov Exponents (sMLE) and Higuchi's fractal dimension (HFD) of motor primitives. In total, 476 trials from 86 participants are included in the supplementary data set.
The file “participant_data.dat” is available in ASCII and RData (R Found. for Stat. Comp.) format and contains:
Code: the participant’s code
Experiment: the experimental setup in which the participant was involved (E1 = walking and running, overground and treadmill; E2 = walking and running, even- and uneven-surface; E3 = unperturbed and perturbed walking, young and old)
Group: the group to which the participant was assigned (see methods for the details)
Sex: the participant’s sex (M or F)
Speed: the speed at which the recordings were conducted in m/s
Age: the participant’s age in years (participants were considered old if older than 65 years, but younger than 80)
Height: the participant’s height in [cm]
Mass: the participant’s body mass in [kg].
The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the running overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 consist of 22, 27 and 23 cycles, respectively. All the other trials consist of 30 gait cycles. Trials are named like “P0053_OW_02”, where the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized EMG data are named, following the same rules, like “FILT_EMG_P0053_OG_02”.
Old versions not compatible with the R package musclesyneRgies
The files containing the gait cycle breakdown are available in RData (R Found. for Stat. Comp.) format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times and the characters “P0020” indicate the participant number (in this example the 20th). Please note that the overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 only contain 22, 27 and 23 cycles, respectively.
The files containing the raw, filtered and the normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining thirteen columns contain the raw EMG data, named with muscle abbreviations that follow those reported in the methods section. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0053_OW_02”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized EMG data is named, following the same rules, like “FILT_EMG_P0053_OG_02”.
The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorization and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “Time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed above to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P0012_PW_02”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P0012” indicate the participant number (in this example the 12th), ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported in the methods section, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P0082_PW_02”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P0082” indicate the participant number (in this example the 82nd) ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.
The files containing the sMLE calculated from motor primitives are available in RData (R Found. for Stat. Comp.) format, in the file named “sMLE.RData”. sMLE results are presented in a list of lists containing, for each trial, 1) the divergences, 2) the sMLE, and 3) the value of the R2 between the divergence curve and its linear interpolation made using the specified amount of points. The divergences are presented as a one-dimensional vector. sMLE are one number like the R2 value. Trials are named like “MLE_P0081_EW_01”, where the characters “sMLE” indicate that the trial containss sMLE data, the characters “P0081” indicate the participant number (in this example the
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The use of motorized treadmills as convenient tools for the study of locomotion has been in vogue for many decades. However, despite the widespread presence of these devices in many scientific and clinical environments, a full consensus on their validity to faithfully substitute free overground locomotion is still missing. Specifically, little information is available on whether and how the neural control of movement is affected when humans walk and run on a treadmill as compared to overground. Here, we made use of linear and nonlinear analysis tools to extract information from electromyographic recordings during walking and running overground and on an instrumented treadmill. We extracted synergistic activation patterns from the muscles of the lower limb via non-negative matrix factorization. We then investigated how the motor modules (or time-invariant muscle weightings) were used in the two locomotion environments. Subsequently, we examined the timing of motor primitives (or time-dependent coefficients of muscle synergies) by calculating their duration, the time of main activation, and their Hurst exponent, a nonlinear metric derived from fractal analysis. We found that motor modules were not influenced by the locomotion environment, while motor primitives resulted overall more regular in treadmill than in overground locomotion, with the main activity of the primitive for propulsion shifted earlier in time. Our results suggest that the spatial and sensory constraints imposed by the treadmill environment forced the central nervous system to adopt a different neural control strategy than that used for free overground locomotion. A data-driven indication that treadmills induce perturbations to the neural control of locomotion.
In this supplementary data set we made available: a) the metadata with anonymized participant information; b) the raw EMG, already concatenated for the overground trials; c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG; e) the muscle synergies extracted via NMF; f) the code to process the data. In total, 120 trials from 30 participants are included in the supplementary data set.
The file “metadata.dat” is available in ASCII and RData format and contains:
Code: the participant’s code
Sex: the participant’s sex (M or F)
Locomotion: the type of locomotion (W=walking, R=running)
Environment: to distinguish between overground (O) and treadmill (T)
Speed: the speed at which the recordings were conducted in m/s
Age: the participant’s age in years
Height: the participant’s height in [cm]
Mass: the participant’s body mass in [kg].
The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the running overground trials of participants P0001, P0007, P0008 and P0009 consist of 21, 29, 29 and 26 cycles, respectively. All the other trials consist of 30 gait cycles. Trials are named like “P0003_OR_01”, where the characters “P0003” indicate the participant number (in this example the 3rd), the characters “OR” indicate the locomotion type and environment (see above), and the numbers “01” indicate the trial number. The filtered and time-normalized emg data are named, following the same rules, like “FILT_EMG_P0003_OR_01”.
Old versions not compatible with the R package musclesyneRgies
The files containing the gait cycle breakdown are available in RData format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020_TW_01,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times, the characters “P0020” indicate the participant number (in this example the 20th), the characters “TW” indicate the locomotion type and environment (O=overground, T=treadmill, W=walking, R=running), and the numbers “01” indicate the trial number. Please note that the running overground trials of participants P0001, P0007, P0008 and P0009 only contain 21, 29, 29 and 26 cycles, respectively.
The files containing the raw, filtered, and the normalized EMG data are available in RData format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with muscle abbreviations that follow those reported above. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0003_OR_01”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0003” indicate the participant number (in this example the 3rd), the characters “OR” indicate the locomotion type and environment (see above), and the numbers “01” indicate the trial number. The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P0003_OR_01”.
The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData format, in the file named “SYNS.RData”. Each element of this R list represents one trial and contains the factorization rank (list element named “synsR2”), the motor modules (list element named “M”), the motor primitives (list element named “P”), the reconstructed EMG (list element named “Vr”), the number of iterations needed by the NMF algorithm to converge (list element named “iterations”), and the reconstruction quality measured as the coefficient of determination (list element named “R2”). The motor modules and motor primitives are presented as direct output of the factorization and not in any functional order. Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported above, contain the time-independent coefficients (motor modules M), one for each synergy and for each muscle. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives P), one column for each synergy plus the time points (columns are named e.g. “time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed in the methods section to improve user readability. Trials are named like “SYNS_ P0012_OW_01”, where the characters “SYNS” indicate that the trial contains muscle synergy data, the characters “P0012” indicate the participant number (in this example the 12th), the characters “OW” indicate the locomotion type and environment (see above), and the numbers “01” indicate the trial number. Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.
All the code used for the pre-processing of EMG data and the extraction of muscle synergies is available in R format. Explanatory comments are profusely present throughout the script “muscle_synergies.R”.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the author is R. Christian Bohlen. It features 2 columns including publication date.