35 datasets found
  1. m

    R codes and dataset for Visualisation of Diachronic Constructional Change...

    • bridges.monash.edu
    • researchdata.edu.au
    zip
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768
    Explore at:
    zipAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Monash University
    Authors
    Gede Primahadi Wijaya Rajeg
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

  2. C

    Children's questionnaire (data frame)

    • dataverse.csuc.cat
    pdf, txt +2
    Updated Jul 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Carme Montserrat; Carme Montserrat; Marta Garcia-Molsosa; Marta Garcia-Molsosa (2023). Children's questionnaire (data frame) [Dataset]. http://doi.org/10.34810/data247
    Explore at:
    pdf(485871), pdf(330192), pdf(331430), xlsx(2484824), pdf(485221), txt(7161), pdf(355715), xlsx(2504364), type/x-r-syntax(1161), pdf(355899), type/x-r-syntax(3928)Available download formats
    Dataset updated
    Jul 12, 2023
    Dataset provided by
    CORA.Repositori de Dades de Recerca
    Authors
    Carme Montserrat; Carme Montserrat; Marta Garcia-Molsosa; Marta Garcia-Molsosa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 20, 2021 - Oct 31, 2022
    Dataset funded by
    https://ror.org/03zhx9h04
    Description

    "WeAreHere!" Children's questionnaire. This dataset includes: (1) the WaH children's questionnaire (20 questions including 5-point Likert scale questions, dichotomous questions and an open space for comments). The Catalan version (original), and the Spanish and English versions of the questionnaire can be found in this dataset in pdf format. (2) The data frame in xlsx format, with the children's answers to the questionnaire (a total of 3664 answers) and a reduced version of it for doing the regression (with the 5-point likert scale variable "ask for help" transformed into a dichotomous variable). (3) The data frame in xlsx format, with the children's answers to the questionnaire and the categorization of their comments (sheet 1), the data frame with only the MCA variables selected (sheet 2), and the categories and subcategories table (sheet 3). (4) The data analysis procedure for the regression, the component and multiple component analysis (R script).

  3. Time Series Forecasting Using Prophet in R

    • kaggle.com
    zip
    Updated Jul 25, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    vikram amin (2023). Time Series Forecasting Using Prophet in R [Dataset]. https://www.kaggle.com/datasets/vikramamin/time-series-forecasting-using-prophet-in-r
    Explore at:
    zip(9000 bytes)Available download formats
    Dataset updated
    Jul 25, 2023
    Authors
    vikram amin
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Main objective : To forecast the page visits of a website
    • Tool : Time Series Forecasting using Prophet in R.
    • Steps:
    • Read the data
    • Data Cleaning: Checking data types, date formats and missing data https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F56d7b1edf4f51157804e81b02c032e4d%2FPicture1.png?generation=1690271521103777&alt=media" alt="">
    • Run libraries (dplyr, ggplot2, tidyverse, lubridate, prophet, forecast)
    • Change the Date column from character vector to date and change data format using lubridate package
    • Rename the column "Date" to "ds" and "Visits" to "y".
    • Treat "Christmas" and "Black.Friday" as holiday events. As the data ranges from 2016 to 2020, there will be 5 Christmas and 5 Black Friday days.
    • We will look at the impact of Christmas 3 days prior and 3 days later from Christmas date on "Visits" and 3 days prior and 1 day later for Black Friday
    • We create two data frames called Christmas and Black.Friday and merge the two into a data frame called "holidays". https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fd07b366be2050fefe6a62563b6abac0c%2FPicture2.png?generation=1690272066356516&alt=media" alt="">
    • We create train and test data. In train data & test data, we select only 3 variables namely ds, y , Easter. In train data, ds contains data before 2020-12-01 and test data contains data equal to and after 2020-12-01 (31 days) data
    • Train Data
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8f3f58fe40b29b276bb7103cb1dfdde1%2FPicture3.png?generation=1690272272038405&alt=media" alt="">
    • Test Data
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fb4362117f46aeb210dad23f07d3ecb39%2FPicture4.png?generation=1690272400355824&alt=media" alt="">
    • Use prophet model which will include multiple parameter. We are going with the default parameters. Thereafter, we add the external regressor "Easter".
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F7325be63d887372cc5764ddf29a94310%2FPicture5.png?generation=1690272892963939&alt=media" alt="">
    • We create the future data frame for forecasting and name the data frame "future". It will include "m" and 31 days of the test data. We then predict this future data frame and create a new data frame called "forecast".
    • Forecast data frame consists of 1827 rows and 34 variables. This shows the external Regressor (Easter) value is 0 through the entire time period. This shows that "Easter" has no impact or effect on "Visits".
    • yhat stands for the predicted value (predicted visits).
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fae5c9414d1b1bbb2670b372a326970a5%2FPicture6.png?generation=1690273558489681&alt=media" alt="">
    • We try to understand the impact of Holiday events "Christmas" and "Black.Friday"
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F5a36cc5308f9e46f0b63fa8e37c4b932%2FPicture7.png?generation=1690273814760538&alt=media" alt="">
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F8cc3dd0581db1e8b9d542d9a524abd39%2FPicture8.png?generation=1690273879506571&alt=media" alt="">
    • We plot the forecast.
    • plot(m,forecast) https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2Fa7968ff05abdd5b4e789f3723b41c4ed%2FPicture9.png?generation=1690274020880594&alt=media" alt="">
    • blue is predicted value(yhat) and black is actual value(y) and blue shaded regions are the yhat_upper and yhat_lower values
    • prophet_plot_components(m,forecast) https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F52408afb8c71118ef6729420085875e8%2FPicture10.png?generation=1690274184325240&alt=media" alt="">
    • Trend indicates that the page visits remained constant from Jan'16 to Mid'17 and thereafter there was an upswing from Mid'19 to End of 2020
    • From Holidays, we can make out that Christmas had a negative effect on page visits whereas Black Friday had a positive effect on page visits
    • Weekly seasonality indicates that page visits tend to remain the highest from Monday to Thursday and starts going down thereafter
    • Yearly seasonality indicates that page visits are the highest in Apr and then starts going down thereafter with
    • Oct having reaching the bottom point
    • External regressor "Easter" has no impact on page visits
    • plot(m,forecast) + add_changepoints_to_plot(m)
    • https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F10868729%2F1253a0e381ae04d3156a4b098dafb2ca%2FPicture11.png?generation=1690274373570449&alt=media" alt="">
    • Trend which is indicated by the red line starts moving upwards from Mid 2019 to 2020 onwards
    • We check for acc...
  4. d

    Data from: The island-mainland species turnover relationship

    • datadryad.org
    • data.niaid.nih.gov
    • +1more
    zip
    Updated Oct 10, 2012
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yoel E. Stuart; Jonathan B. Losos; Adam C. Algar (2012). The island-mainland species turnover relationship [Dataset]. http://doi.org/10.5061/dryad.gm2p8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 10, 2012
    Dataset provided by
    Dryad
    Authors
    Yoel E. Stuart; Jonathan B. Losos; Adam C. Algar
    Time period covered
    Jul 13, 2012
    Area covered
    Caribbean, Caribbean islands, Neotropics
    Description

    Many oceanic islands are notable for their high endemism, suggesting that islands may promote unique assembly processes. However, mainland assemblages sometimes harbour comparable levels of endemism, suggesting that island biotas may not be as unique as often assumed. Here, we test the uniqueness of island biotic assembly by comparing the rate of species turnover among islands and the mainland, after accounting for distance decay and environmental gradients. We modeled species turnover as a function of geographic and environmental distance for mainland (M-M) communities of Anolis lizards and Terrarana frogs, two clades that have diversified extensively on Caribbean islands and the mainland Neotropics. We compared mainland-island (M–I) and island-island (I–I) species turnover to predictions of the M–M model. If island assembly is not unique, then the M–M model should successfully predict M–I and I–I turnover, given geographic and environmental distance. We found that M–I turnover and, to...

  5. f

    Data for the Farewell and Herberg example of a two-phase experiment using a...

    • adelaide.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    application/gzip
    Updated Jun 12, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Chris Brien (2021). Data for the Farewell and Herberg example of a two-phase experiment using a plaid design [Dataset]. http://doi.org/10.25909/13122095
    Explore at:
    application/gzipAvailable download formats
    Dataset updated
    Jun 12, 2021
    Dataset provided by
    The University of Adelaide
    Authors
    Chris Brien
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The experiment that Farewell and Herzberg (2003) describe is pain-rating experiment that is a subset of the experiment reported by Solomon et al. (1997). It is a two-phase experiment. The first phase is a self-assessment phase in which patients self-assess for pain while moving a painful shoulder joint. The second phase of this experiment is an evaluation phase in which occupational and physical therapy students (the raters) are evaluated for rating patients in a set of videos for pain. The measured response is the difference between a student rating and the patient's rating.The R data file plaid.dat.rda contains the data.frame plaid.dat that has a revised version of the data for the Farewell and Herzberg example downloaded from https://doi.org/10.17863/CAM.54494. The comma delimited text file plaid.dat.csv has the same information in this more commonly accepted format, but without the metadata associated with the data.frame.The data.frame contains the factors Raters, Viewings, Trainings, Expressiveness, Patients, Occasions, and Motions and a column for the response variable Y. The two factors Viewings and Occasions are additional to those in the downloaded file and the remaining factors have been converted from integers or characters to factors and renamed to the names given above. The column Y is unchanged from the column in the original file.To load the data in R use: load("plaid.dat.rda") or plaid.dat

  6. d

    Data from: Constraints on trait combinations explain climatic drivers of...

    • datadryad.org
    • search.dataone.org
    zip
    Updated Apr 27, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John M. Dwyer; Daniel C. Laughlin (2018). Constraints on trait combinations explain climatic drivers of biodiversity: the importance of trait covariance in community assembly [Dataset]. http://doi.org/10.5061/dryad.76kt8
    Explore at:
    zipAvailable download formats
    Dataset updated
    Apr 27, 2018
    Dataset provided by
    Dryad
    Authors
    John M. Dwyer; Daniel C. Laughlin
    Time period covered
    Apr 27, 2017
    Description

    quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.species.in.quadrat.scale.dataRefer to R script ("Dwyer_&_Laughlin_2017_Trait_covariance_script.r" for information about this dataframe.Dwyer_&_Laughlin_2017_Trait_covariance_scriptThis script reads in the two dataframes of "raw" data, calculates diversity and trait metrics and runs the major analyses presented in Dwyer & Laughlin 2017.

  7. Rcode – Custom code written the R programming language that will translate...

    • plos.figshare.com
    txt
    Updated Nov 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anthony Nearman; Alriana Buller-Jarrett; Dawn Boncristiani; Eugene Ryabov; Yanping Chen; Jay D. Evans (2025). Rcode – Custom code written the R programming language that will translate an open reading frame for an existing sequence, then compare it to a data frame of nucleotide polymorphisms at specific locations, and retranslate the amino acid changes into a new data frame. [Dataset]. http://doi.org/10.1371/journal.pone.0337191.s009
    Explore at:
    txtAvailable download formats
    Dataset updated
    Nov 19, 2025
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Anthony Nearman; Alriana Buller-Jarrett; Dawn Boncristiani; Eugene Ryabov; Yanping Chen; Jay D. Evans
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Rcode – Custom code written the R programming language that will translate an open reading frame for an existing sequence, then compare it to a data frame of nucleotide polymorphisms at specific locations, and retranslate the amino acid changes into a new data frame.

  8. Data from: Humans exploit robust locomotion by improving the stability of...

    • zenodo.org
    • explore.openaire.eu
    bin
    Updated Jun 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Alessandro Santuz; Alessandro Santuz; Leon Brüll; Antonis Ekizos; Antonis Ekizos; Arno Schroll; Nils Eckardt; Nils Eckardt; Armin Kibele; Armin Kibele; Michael Schwenk; Michael Schwenk; Adamantios Arampatzis; Adamantios Arampatzis; Leon Brüll; Arno Schroll (2022). Humans exploit robust locomotion by improving the stability of control signals [Dataset]. http://doi.org/10.5281/zenodo.2687682
    Explore at:
    binAvailable download formats
    Dataset updated
    Jun 17, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Alessandro Santuz; Alessandro Santuz; Leon Brüll; Antonis Ekizos; Antonis Ekizos; Arno Schroll; Nils Eckardt; Nils Eckardt; Armin Kibele; Armin Kibele; Michael Schwenk; Michael Schwenk; Adamantios Arampatzis; Adamantios Arampatzis; Leon Brüll; Arno Schroll
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background

    Is the control of movement less stable when we walk or run in challenging settings? Intuitively, one might answer that it is, given that adding constraints to locomotion (e.g. rough terrain, age-related impairments, etc.) makes movements less stable. Here, we investigated how young and old humans synergistically activate muscles during locomotion when different perturbation levels are introduced. Of these control signals, called muscle synergies, we analyzed the stability over time. Surprisingly, we found that perturbations and older age force the central nervous system to produce muscle activation patterns that are more stable. These outcomes show that robust locomotion in challenging settings is achieved by increasing the stability of control signals, whereas easier tasks allow for more unstable control.

    How to use the data set

    This supplementary data set contains: a) the metadata with anonymized participant information, b) the raw electromyographic (EMG) data acquired during locomotion, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via non-negative matrix factorization and f) the code written in R (R Found. for Stat. Comp.) to process the data, including the scripts to calculate the Maximum Lyapunov Exponents of motor primitives. In total, 476 trials from 86 participants are included in the supplementary data set.

    The file “participant_data.dat” is available in ASCII and RData (R Found. for Stat. Comp.) format and contains:

    • Code: the participant’s code
    • Experiment: the experimental setup in which the participant was involved (E1 = walking and running, overground and treadmill; E2 = walking and running, even- and uneven-surface; E3 = unperturbed and perturbed walking, young and old)
    • Group: the group to which the participant was assigned (see above for the details)
    • Sex: the participant’s sex (M or F)
    • Speed: the speed at which the recordings were conducted in [m/s] (two values separated by a comma mean that recordings were done at two different speeds, i.e. walking and running)
    • Age: the participant’s age in years (participants were considered old if older than 65 years, but younger than 80)
    • Height: the participant’s height in [cm]
    • Mass: the participant’s body mass in [kg].

    The files containing the gait cycle breakdown are available in RData (R Found. for Stat. Comp.) format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times and the characters “P0020” indicate the participant number (in this example the 20th). Please note that the overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 only contain 22, 27 and 23 cycles, respectively.

    The files containing the raw, filtered and the normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining thirteen columns contain the raw EMG data, named with muscle abbreviations that follow those reported in the Materials and Methods section of this Supplementary Materials file. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0053_OG_02”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P0053_OG_02”.

    The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorization and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “Time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed above to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P0012_PW_02”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P0012” indicate the participant number (in this example the 12th), ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported in the Materials and Methods section of this Supplementary Materials file, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P0082_PW_02”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P0082” indicate the participant number (in this example the 82nd) ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

    The files containing the MLE calculated from motor primitives are available in RData (R Found. for Stat. Comp.) format, in the file named “MLE.RData”. MLE results are presented in a list of lists containing, for each trial, 1) the divergences, 2) the MLE, and 3) the value of the R2 between the divergence curve and its linear interpolation made using the specified amount of points. The divergences are presented as a one-dimensional vector. MLE are one number like the R2 value. Trials are named like “MLE_P0081_EW_01”, where the characters “MLE” indicate that the trial contains MLE data, the characters “P0081” indicate the participant number (in this example the 81st) ), the characters “EW” indicate the locomotion type (see above), and the numbers “01” indicate the trial number (in this case the 1st).

    All the code used for the preprocessing of EMG data, the extraction of muscle synergies and the calculation of MLE is available in R (R Found. for Stat. Comp.) format. Explanatory comments are profusely present throughout the scripts (“SYNS.R”, which is the script to extract synergies, “fun_NMF.R”, which contains the NMF function, “MLE.R”, which is the script to calculate the MLE of motor primitives and “fun_MLE.R”, which contains the MLE function).

  9. e

    Data from: Phonotactically driven cue weighting in a sound change in...

    • data.europa.eu
    • data.ub.uni-muenchen.de
    csv
    Updated May 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2023). Phonotactically driven cue weighting in a sound change in progress: Acoustic evidence from West Central Bavarian [Dataset]. https://data.europa.eu/data/datasets/https-open-bydata-de-api-hub-repo-datasets-https-data-ub-uni-muenchen-de-380-dataset?locale=ga
    Explore at:
    csvAvailable download formats
    Dataset updated
    May 8, 2023
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    R script and data frames to accompany the conference paper of the same name. The R file contain the script on which the analyses in Thon, K. & Kleber, F. (2023). Phonotactically driven cue weighting in a sound change in progress: Acoustic evidence from West Central Bavarian/ Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS), Prague, Czech Republic are based. The two .csv files contain the corresponding input data that needs to be loaded in R prior to run the respective analysis script.

  10. Case study: Cyclistic bike-share analysis

    • kaggle.com
    zip
    Updated Mar 25, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jorge4141 (2022). Case study: Cyclistic bike-share analysis [Dataset]. https://www.kaggle.com/datasets/jorge4141/case-study-cyclistic-bikeshare-analysis
    Explore at:
    zip(131490806 bytes)Available download formats
    Dataset updated
    Mar 25, 2022
    Authors
    Jorge4141
    Description

    Introduction

    This is a case study called Capstone Project from the Google Data Analytics Certificate.

    In this case study, I am working as a junior data analyst at a fictitious bike-share company in Chicago called Cyclistic.

    Cyclistic is a bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike.

    Scenario

    The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, our team will design a new marketing strategy to convert casual riders into annual members.

    ****Primary Stakeholders:****

    1: Cyclistic Executive Team

    2: Lily Moreno, Director of Marketing and Manager

    ASK

    1. How do annual members and casual riders use Cyclistic bikes differently?
    2. Why would casual riders buy Cyclistic annual memberships?
    3. How can Cyclistic use digital media to influence casual riders to become members?

    # Prepare

    The last four quarters were selected for analysis which cover April 01, 2019 - March 31, 2020. These are the datasets used:

    Divvy_Trips_2019_Q2
    Divvy_Trips_2019_Q3
    Divvy_Trips_2019_Q4
    Divvy_Trips_2020_Q1
    

    The data is stored in CSV files. Each file contains one month data for a total of 12 .csv files.

    Data appears to be reliable with no bias. It also appears to be original, current and cited.

    I used Cyclistic’s historical trip data found here: https://divvy-tripdata.s3.amazonaws.com/index.html

    The data has been made available by Motivate International Inc. under this license: https://ride.divvybikes.com/data-license-agreement

    Limitations

    Financial information is not available.

    Process

    Used R to analyze and clean data

    • After installing the R packages, data was collected, wrangled and combined into a single file.
    • Columns were renamed.
    • Looked for incongruencies in the dataframes and converted some columns to character type, so they can stack correctly.
    • Combined all quarters into one big data frame.
    • Removed unnecessary columns

    Analyze

    • Inspected new data table to ensure column names were correctly assigned.
    • Formatted columns to ensure proper data types were assigned (numeric, character, etc).
    • Consolidated the member_casual column.
    • Added day, month and year columns to aggregate data.
    • Added ride-length column to the entire dataframe for consistency.
    • Deleted trip duration rides that showed as negative and bikes out of circulation for quality control.
    • Replaced the word "member" with "Subscriber" and also replaced the word "casual" with "Customer".
    • Aggregated data, compared average rides between members and casual users.

    Share

    After analysis, visuals were created as shown below with R.

    Act

    Conclusion:

    • Data appears to show that casual riders and members use bike share differently.
    • Casual riders' average ride length is more than twice of that of members.
    • Members use bike share for commuting, casual riders use it for leisure and mostly on the weekends.
    • Unfortunately, there's no financial data available to determine which of the two (casual or member) is spending more money.

    Recommendations

    • Offer casual riders a membership package with promotions and discounts.
  11. Z

    Data from: Lower complexity of motor primitives ensures robust control of...

    • data.niaid.nih.gov
    Updated Jun 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santuz, Alessandro; Ekizos, Antonis; Kunimasa, Yoko; Kijima, Kota; Ishikawa, Masaki; Arampatzis, Adamantios (2022). Lower complexity of motor primitives ensures robust control of high-speed human locomotion [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_3764760
    Explore at:
    Dataset updated
    Jun 18, 2022
    Dataset provided by
    Osaka University of Health and Sport Sciences
    Humboldt-Universität zu Berlin, Dalhousie University
    Humboldt-Universität zu Berlin
    Authors
    Santuz, Alessandro; Ekizos, Antonis; Kunimasa, Yoko; Kijima, Kota; Ishikawa, Masaki; Arampatzis, Adamantios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Walking and running are mechanically and energetically different locomotion modes. For selecting one or another, speed is a parameter of paramount importance. Yet, both are likely controlled by similar low-dimensional neuronal networks that reflect in patterned muscle activations called muscle synergies. Here, we investigated how humans synergistically activate muscles during locomotion at different submaximal and maximal speeds. We analysed the duration and complexity (or irregularity) over time of motor primitives, the temporal components of muscle synergies. We found that the challenge imposed by controlling high-speed locomotion forces the central nervous system to produce muscle activation patterns that are wider and less complex relative to the duration of the gait cycle. The motor modules, or time-independent coefficients, were redistributed as locomotion speed changed. These outcomes show that robust locomotion control at challenging speeds is achieved by modulating the relative contribution of muscle activations and producing less complex and wider control signals, whereas slow speeds allow for more irregular control.

    In this supplementary data set we made available: a) the metadata with anonymized participant information, b) the raw EMG, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via NMF and f) the code to process the data, including the scripts to calculate the Higuchi's fractal dimension (HFD) of motor primitives. In total, 180 trials from 30 participants are included in the supplementary data set.

    The file “metadata.dat” is available in ASCII and RData format and contains:

    Code: the participant’s code

    Group: the experimental group in which the participant was involved (G1 = walking and submaximal running; G2 = submaximal and maximal running)

    Sex: the participant’s sex (M or F)

    Speeds: the type of locomotion (W for walking or R for running) and speed at which the recordings were conducted in 10*[m/s]

    Age: the participant’s age in years

    Height: the participant’s height in [cm]

    Mass: the participant’s body mass in [kg]

    PB: 100 m-personal best time (for G2).

    The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17). All the other trials consist of 30 gait cycles. Trials are named like “P20_R_20,” where the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.

    Old versions not compatible with the R package musclesyneRgies

    The files containing the gait cycle breakdown are available in RData format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with as many rows as the available number of gait cycles and two columns. The first column named “touchdown” contains the touchdown incremental times in seconds. The second column named “stance” contains the duration of each stance phase of the right foot in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P20_R_20,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times, the characters “P20” indicate the participant number (in this example the 20th), the character “R” indicate the locomotion type (W=walking, R=running), and the numbers “20” indicate the locomotion speed in 10*m/s (in this case the speed is 2.0 m/s). Please note that the following trials include less than 30 gait cycles (the actual number shown between parentheses): P16_R_83 (20), P16_R_95 (25), P17_R_28 (28), P17_R_83 (24), P17_R_95 (13), P18_R_95 (23), P19_R_95 (18), P20_R_28 (25), P20_R_42 (27), P20_R_95 (25), P22_R_28 (23), P23_R_28(29), P24_R_28 (28), P24_R_42 (29), P25_R_28 (29), P25_R_95 (28), P26_R_28 (29), P26_R_95 (28), P27_R_28 (28), P27_R_42 (29), P27_R_95 (24), P28_R_28 (29), P29_R_95 (17).

    The files containing the raw, filtered and the normalized EMG data are available in RData format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with as many rows as the amount of recorded data points and 13 columns. The first column named “time” contains the incremental time in seconds. The remaining 12 columns contain the raw EMG data, named with muscle abbreviations that follow those reported above. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P03_R_30”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P03” indicate the participant number (in this example the 3rd), the character “R” indicate the locomotion type (see above), and the numbers “30” indicate the locomotion speed (see above). The filtered and time-normalized emg data is named, following the same rules, like “FILT_EMG_P03_R_30”.

    The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorisation and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed in the methods section to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P12_W_07”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P12” indicate the participant number (in this example the 12th), the character “W” indicate the locomotion type (see above), and the numbers “07” indicate the speed (see above). Motor modules are data frames with 12 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported above, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P22_R_20”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P22” indicate the participant number (in this example the 22nd), the character “W” indicates the locomotion type (see above), and the numbers “20” indicate the speed (see above). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

    The files containing the HFD calculated from motor primitives are available in RData format, in the file named “HFD.RData”. HFD results are presented in a list of lists containing, for each trial, 1) the HFD, and 2) the interval time k used for the calculations. HFDs are presented as one number (mean HFD of the primitives for that trial), as are the interval times k. Trials are named like “HFD_P01_R_95”, where the characters “HFD” indicate that the trial contains HFD data, the characters “P01” indicate the participant number (in this example the 1st), the character “R” indicates the locomotion type (see above), and the numbers “95” indicate the speed (see above).

    All the code used for the pre-processing of EMG data, the extraction of muscle synergies and the calculation of HFD is available in R format. Explanatory comments are profusely present throughout the script “muscle_synergies.R”.

  12. d

    Data and code from: Severity of charcoal rot disease in soybean genotypes...

    • catalog.data.gov
    • datasetcatalog.nlm.nih.gov
    • +1more
    Updated May 8, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Agricultural Research Service (2025). Data and code from: Severity of charcoal rot disease in soybean genotypes inoculated with Macrophomina phaseolina isolates differs among growth environments [Dataset]. https://catalog.data.gov/dataset/data-and-code-from-severity-of-charcoal-rot-disease-in-soybean-genotypes-inoculated-with-i
    Explore at:
    Dataset updated
    May 8, 2025
    Dataset provided by
    Agricultural Research Service
    Description

    This dataset includes all the raw data and all the R statistical software code that we used to analyze the data and produce all the outputs that are in the figures, tables, and text of the associated manuscript:Mengistu, A., Q. D. Read, C. R. Little, H. M. Kelly, P. M. Henry, and N. Bellaloui. 2025. Severity of charcoal rot disease in soybean genotypes inoculated with Macrophomina phaseolina isolates differs among growth environments. Plant Disease. DOI: 10.1094/PDIS-10-24-2230-RE.The data included here come from a series of tests designed to evaluate methods for identifying soybean genotypes that are resistant or susceptible to charcoal rot, a widespread and economically significant disease. Four independent experiments were performed to determine the variability in disease severity by soybean genotype and by isolated variant of the charcoal rot fungus: two field tests, a greenhouse test, and a growth chamber test. The tests differed in the number of genotypes and isolates used, as well as the method of inoculation. The accuracy of identifying resistant and susceptible genotypes varied by study, and the same isolate tested across different studies often had highly variable disease severity. Our results indicate that the non-field methods are not reliable ways to identify sources of charcoal rot resistance in soybean.The models fit in the R script archived here are Bayesian general linear mixed models with AUDPC (area under the disease progress curve) as the response variable. One-dimensional clustering is used to divide the genotypes into resistant and susceptible based on their model-predicted AUDPC values, and this result is compared with the preexisting resistance classification. Posterior distributions of the marginal means for different combinations of genotype, isolate, and other covariates are estimated and compared. Code to reproduce the tables and figures of the manuscript is also included.The following files are included:README.pdf: Full description, with column metadata for the data spreadsheets and text description of each R scriptdata2023-04-18.xlsx: Excel sheet with data from three of the four trialscleaned_data.RData: all data in analysis-ready format; generates a set of data frames when imported into an R environmentModified Cut-Tip Inoculation on DT974290 and LS980358 on first 32 isolates.xlsx: Excel spreadsheet with data from the fourth trialdata_cleaning.R: Script required to format data from .xlsx files into analysis-ready format (running this script is not necessary to reproduce the analysis; instead you may begin with the following script importing the cleaned .RData object)AUDPC_fits.R: Script containing code for all model fitting, model predictions and comparisons, and figure and table generation

  13. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  14. d

    Data and code from: Partitioning variance in a signaling trade-off under...

    • search.dataone.org
    • data-staging.niaid.nih.gov
    • +2more
    Updated Aug 5, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Reichert; Ivan de la Hera; Maria Moiron (2025). Data and code from: Partitioning variance in a signaling trade-off under sexual selection reveals among-individual covariance in trait allocation [Dataset]. http://doi.org/10.5061/dryad.kd51c5b95
    Explore at:
    Dataset updated
    Aug 5, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Michael Reichert; Ivan de la Hera; Maria Moiron
    Time period covered
    Jan 1, 2022
    Description

    Understanding the evolution of traits subject to trade-offs is challenging because phenotypes can (co)vary at both the among- and within-individual levels. Among-individual covariation indicates consistent, possibly genetic, differences in how individuals resolve the trade-off, while within-individual covariation indicates trait plasticity. There is also the potential for consistent among-individual differences in behavioral plasticity, although this has rarely been investigated. We studied the sources of (co)variance in two characteristics of an acoustic advertisement signal that trade off with one another and are under sexual selection in the gray treefrog, Hyla chrysoscelis: call duration and call rate. We recorded males on multiple nights calling spontaneously and in response to playbacks simulating different competition levels. Call duration, call rate, and their product, call effort, were all repeatable both within and across social contexts. Call duration and call rate covaried n..., , , # Data and code from:Partitioning variance in a signaling trade-off under sexual selection reveals among-individual covariance in trait allocation

    Michael S. Reichert, Ivan de la Hera, Maria Moiron

    Evolution 2024

    Summary: Data are measurements of the characteristics of individual calls from a study of individual variation in calling in Cope's gray treefrog, Hyla chrysoscelis.

    Description of the Data and file structure

    Note: There are some NA entries in the data files because these are outputs of R data frames. NA corresponds to an empty cell (i.e. no data are available for that variable for that row).

    List of files: TreefrogVariance.csv -This is the main raw data file. Each row contains the data from a single call. Variables are as follows: CD - call duration, in seconds CR - call rate, in calls/second *Note that the intercall interval (ICI), which is analyzed in the supplement as an alternative to call rate, is not directly included in this data file but can be calculated a...

  15. NEOWISE-R Single Exposure (L1b) Frame Metadata Table - Dataset - NASA Open...

    • data.nasa.gov
    Updated Apr 1, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    nasa.gov (2025). NEOWISE-R Single Exposure (L1b) Frame Metadata Table - Dataset - NASA Open Data Portal [Dataset]. https://data.nasa.gov/dataset/neowise-r-single-exposure-l1b-frame-metadata-table
    Explore at:
    Dataset updated
    Apr 1, 2025
    Dataset provided by
    NASAhttp://nasa.gov/
    Description

    The Near-Earth Object Wide-field Infrared Survey Explorer Reactivation Mission (NEOWISE; Mainzer et al. 2014, ApJ, 792, 30) is a NASA Planetary Science Division space-based survey to detect, track and characterize asteroids and comets, and to learn more about the population of near-Earth objects that could pose an impact hazard to the Earth. NEOWISE systematically images the sky at 3.4 and 4.6 μm, obtaining multiple independent observations on each location that enable detection of previously known and new solar system small bodies by virtue of the their motion. Because it is an infrared survey, NEOWISE detects asteroid thermal emission and is equally sensitive to high and low albedo objects.The following table contains brief descriptions of all metadata information that is relevant to the processing of Single-exposure (level 1) images and the extraction of sources from the corresponding Single-exposure images. The table contains the unique scan ID and frame number for specific each single-exposure image and the reconstructed right ascension and declination of the image center. Much of the information in this table is processing-specific, and may not be of interest to general users (e.g. flags indicating whether frames have been processed or not, and the date and time for starting of the pipeline etc). The metadata table also contains some characterization and derived statistics of the Single-exposure image frames, basic parameters used for photometry and derived statistics for extracted sources and artifacts. For example, it contains the number of sources with profile-fit photometry Signal-to-Noise (SNR) greater than 3, and the total number of real sources affected by artifacts such as latent images and electronic ghosts.

  16. Data from: Neuromotor dynamics of human locomotion in challenging settings

    • data.niaid.nih.gov
    • zenodo.org
    Updated Jun 18, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santuz, Alessandro; Brüll, Leon; Ekizos, Antonis; Schroll, Arno; Eckardt, Nils; Kibele, Armin; Schwenk, Michael; Arampatzis, Adamantios (2022). Neuromotor dynamics of human locomotion in challenging settings [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_2669485
    Explore at:
    Dataset updated
    Jun 18, 2022
    Dataset provided by
    Network Aging Research
    University of Kassel
    University of Oldenburg
    Heidelberg University
    Humboldt-Universität zu Berlin
    Humboldt University of Berlin, Dalhousie University
    Authors
    Santuz, Alessandro; Brüll, Leon; Ekizos, Antonis; Schroll, Arno; Eckardt, Nils; Kibele, Armin; Schwenk, Michael; Arampatzis, Adamantios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Background

    Is the control of movement less stable when we walk or run in challenging settings? Intuitively, one might answer that it is, given that challenging locomotion externally (e.g. rough terrain) or internally (e.g. age-related impairments) makes our movements more unstable. Here, we investigated how young and old humans synergistically activate muscles during locomotion when different perturbation levels are introduced. Of these control signals, called muscle synergies, we analyzed the stability over time and the complexity (or irregularity). Surprisingly, we found that perturbations force the central nervous system to produce muscle activation patterns that are less unstable and less complex. These outcomes show that robust locomotion in challenging settings is achieved by producing less complex control signals which are more stable over time, whereas easier tasks allow for more unstable and irregular control.

    How to use the data set

    This supplementary data set contains: a) the metadata with anonymized participant information, b) the raw electromyographic (EMG) data acquired during locomotion, c) the touchdown and lift-off timings of the recorded limb, d) the filtered and time-normalized EMG, e) the muscle synergies extracted via non-negative matrix factorization and f) the code written in R (R Found. for Stat. Comp.) to process the data, including the scripts to calculate the short-term Maximum Lyapunov Exponents (sMLE) and Higuchi's fractal dimension (HFD) of motor primitives. In total, 476 trials from 86 participants are included in the supplementary data set.

    The file “participant_data.dat” is available in ASCII and RData (R Found. for Stat. Comp.) format and contains:

    Code: the participant’s code

    Experiment: the experimental setup in which the participant was involved (E1 = walking and running, overground and treadmill; E2 = walking and running, even- and uneven-surface; E3 = unperturbed and perturbed walking, young and old)

    Group: the group to which the participant was assigned (see methods for the details)

    Sex: the participant’s sex (M or F)

    Speed: the speed at which the recordings were conducted in m/s

    Age: the participant’s age in years (participants were considered old if older than 65 years, but younger than 80)

    Height: the participant’s height in [cm]

    Mass: the participant’s body mass in [kg].

    The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus. Please note that the running overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 consist of 22, 27 and 23 cycles, respectively. All the other trials consist of 30 gait cycles. Trials are named like “P0053_OW_02”, where the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized EMG data are named, following the same rules, like “FILT_EMG_P0053_OG_02”.

    Old versions not compatible with the R package musclesyneRgies

    The files containing the gait cycle breakdown are available in RData (R Found. for Stat. Comp.) format, in the file named “CYCLE_TIMES.RData”. The files are structured as data frames with 30 rows (one for each gait cycle) and two columns. The first column contains the touchdown incremental times in seconds. The second column contains the duration of each stance phase in seconds. Each trial is saved as an element of a single R list. Trials are named like “CYCLE_TIMES_P0020,” where the characters “CYCLE_TIMES” indicate that the trial contains the gait cycle breakdown times and the characters “P0020” indicate the participant number (in this example the 20th). Please note that the overground trials of participants P0001 and P0009 and the second uneven-surface running trial of participant P0048 only contain 22, 27 and 23 cycles, respectively.

    The files containing the raw, filtered and the normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “RAW_EMG.RData” and “FILT_EMG.RData”. The raw EMG files are structured as data frames with 30000 rows (one for each recorded data point) and 14 columns. The first column contains the incremental time in seconds. The remaining thirteen columns contain the raw EMG data, named with muscle abbreviations that follow those reported in the methods section. Each trial is saved as an element of a single R list. Trials are named like “RAW_EMG_P0053_OW_02”, where the characters “RAW_EMG” indicate that the trial contains raw emg data, the characters “P0053” indicate the participant number (in this example the 53rd), the characters “OW” indicate the locomotion type (E1: OW=overground walking, OR=overground running, TW=treadmill walking, TR=treadmill running; E2: EW=even-surface walking, ER=even-surface running, UW=uneven-surface walking, UR=uneven-surface running; E3: NW=normal walking, PW=perturbed walking), and the numbers “02” indicate the trial number (in this case the 2nd). The 10 trials per participant recorded for each overground session (i.e. 10 for walking and 10 for running) were concatenated into one. The filtered and time-normalized EMG data is named, following the same rules, like “FILT_EMG_P0053_OG_02”.

    The files containing the muscle synergies extracted from the filtered and normalized EMG data are available in RData (R Found. for Stat. Comp.) format, in the files named “SYNS_H.RData” and “SYNS_W.RData”. The muscle synergies files are divided in motor primitives and motor modules and are presented as direct output of the factorization and not in any functional order. Motor primitives are data frames with 6000 rows and a number of columns equal to the number of synergies (which might differ from trial to trial) plus one. The rows contain the time-dependent coefficients (motor primitives), one column for each synergy plus the time points (columns are named e.g. “Time, Syn1, Syn2, Syn3”, where “Syn” is the abbreviation for “synergy”). Each gait cycle contains 200 data points, 100 for the stance and 100 for the swing phase which, multiplied by the 30 recorded cycles, result in 6000 data points distributed in as many rows. This output is transposed as compared to the one discussed above to improve user readability. Each set of motor primitives is saved as an element of a single R list. Trials are named like “SYNS_H_P0012_PW_02”, where the characters “SYNS_H” indicate that the trial contains motor primitive data, the characters “P0012” indicate the participant number (in this example the 12th), ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Motor modules are data frames with 13 rows (number of recorded muscles) and a number of columns equal to the number of synergies (which might differ from trial to trial). The rows, named with muscle abbreviations that follow those reported in the methods section, contain the time-independent coefficients (motor modules), one for each synergy and for each muscle. Each set of motor modules relative to one synergy is saved as an element of a single R list. Trials are named like “SYNS_W_P0082_PW_02”, where the characters “SYNS_W” indicate that the trial contains motor module data, the characters “P0082” indicate the participant number (in this example the 82nd) ), the characters “PW” indicate the locomotion type (see above), and the numbers “02” indicate the trial number (in this case the 2nd). Given the nature of the NMF algorithm for the extraction of muscle synergies, the supplementary data set might show non-significant differences as compared to the one used for obtaining the results of this paper.

    The files containing the sMLE calculated from motor primitives are available in RData (R Found. for Stat. Comp.) format, in the file named “sMLE.RData”. sMLE results are presented in a list of lists containing, for each trial, 1) the divergences, 2) the sMLE, and 3) the value of the R2 between the divergence curve and its linear interpolation made using the specified amount of points. The divergences are presented as a one-dimensional vector. sMLE are one number like the R2 value. Trials are named like “MLE_P0081_EW_01”, where the characters “sMLE” indicate that the trial containss sMLE data, the characters “P0081” indicate the participant number (in this example the

  17. Insurance Claims Data

    • kaggle.com
    zip
    Updated Jan 30, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Satish Varma (2022). Insurance Claims Data [Dataset]. https://www.kaggle.com/datasets/saisatish09/insuranceclaimsdata
    Explore at:
    zip(1959661 bytes)Available download formats
    Dataset updated
    Jan 30, 2022
    Authors
    Satish Varma
    Description

    Autobi(Automobile Bodily Injury Claims) -

    The data contains information on demographic information about the claimant, attorney involvement and the economic loss (LOSS, in thousands), among other variables.The full data contains over 70,000 closed claims based on data from thirty-two insurers.

    A data frame with 1340 observations on the following 8 variables.

    CASENUM- Case number to identify the claim, a numeric vector ATTORNEY- Whether the claimant is represented by an attorney (=1 if yes and =2 if no), a numeric vector CLMSEX - Claimant's gender (=1 if male and =2 if female), a numeric vector MARITAL- claimant's marital status (=1 if married, =2 if single, =3 if widowed, and =4 if divorced/separated), a numeric vector CLMINSUR- Whether or not the driver of the claimant's vehicle was uninsured (=1 if yes, =2 if no, and =3 if not applicable), a numeric vector SEATBELT- Whether or not the claimant was wearing a seatbelt/child restraint (=1 if yes, =2 if no, and =3 if not applicable), a numeric vector CLMAGE- Claimant's age, a numeric vector LOSS- The claimant's total economic loss (in thousands), a numeric vector

    AutoClaims(Automobile Insurance Claims) -

    A data frame with 6773 observations on the following 5 variables.

    STATE CLASS - Rating class of operator, based on age, gender, marital status, use of vehicle GENDER AGE - Age of operator PAID - Amount paid to settle and close a claim

    AutoCollision(Automobile UK Collision Claims)

    8,942 collision losses from private passenger United Kingdom (UK) automobile insurance policies. The average severity is in pounds sterling adjusted for inflation.

    A data frame with 32 observations on the following 4 variables.

    Age - Age of driver Vehicle_Use - Purpose of the vehicle use Severity - Average amount of claims Claim_Count - Number of claims

    Additional information can be found in the document: https://cran.r-project.org/web/packages/insuranceData/index.html

  18. Z

    Modular control of human movement during running: an open access data set

    • data.niaid.nih.gov
    Updated Jun 18, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Santuz, Alessandro; Ekizos, Antonis; Janshen, Lars; Mersmann, Falk; Bohm, Sebastian; Baltzopoulos, Vasilios; Arampatzis, Adamantios (2022). Modular control of human movement during running: an open access data set [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_1254380
    Explore at:
    Dataset updated
    Jun 18, 2022
    Dataset provided by
    Liverpool John Moores University
    Humboldt-Universität zu Berlin
    Authors
    Santuz, Alessandro; Ekizos, Antonis; Janshen, Lars; Mersmann, Falk; Bohm, Sebastian; Baltzopoulos, Vasilios; Arampatzis, Adamantios
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The human body is an outstandingly complex machine including around 1000 muscles and joints acting synergistically. Yet, the coordination of the enormous amount of degrees of freedom needed for movement is mastered by our one brain and spinal cord. The idea that some synergistic neural components of movement exist was already suggested at the beginning of the XX century. Since then, it has been widely accepted that the central nervous system might simplify the production of movement by avoiding the control of each muscle individually. Instead, it might be controlling muscles in common patterns that have been called muscle synergies. Only with the advent of modern computational methods and hardware it has been possible to numerically extract synergies from electromyography (EMG) signals. However, typical experimental setups do not include a big number of individuals, with common sample sizes of five to 20 participants. With this study, we make publicly available a set of EMG activities recorded during treadmill running from the right lower limb of 135 healthy and young adults (78 males, 57 females). Moreover, we include in this open access data set the code used to extract synergies from EMG data using non-negative matrix factorization and the relative outcomes. Muscle synergies, containing the time-invariant muscle weightings (motor modules) and the time-dependent activation coefficients (motor primitives), were extracted from 13 ipsilateral EMG activities using non-negative matrix factorization. Four synergies were enough to describe as many gait cycle phases during running: weight acceptance, propulsion, early swing and late swing. We foresee many possible applications of our data, that we can summarize in three key points. First, it can be a prime source for broadening the representation of human motor control due to the big sample size. Second, it could serve as a benchmark for scientists from multiple disciplines such as musculoskeletal modelling, robotics, clinical neuroscience, sport science, etc. Third, the data set could be used both to train students or to support established scientists in the perfection of current muscle synergies extraction methods.

    The "RAW_DATA.RData" R list consists of elements of S3 class "EMG", each of which is a human locomotion trial containing cycle segmentation timings and raw electromyographic (EMG) data from 13 muscles of the right-side leg. Cycle times are structured as data frames containing two columns that correspond to touchdown (first column) and lift-off (second column). Raw EMG data sets are also structured as data frames with one row for each recorded data point and 14 columns. The first column contains the incremental time in seconds. The remaining 13 columns contain the raw EMG data, named with the following muscle abbreviations: ME = gluteus medius, MA = gluteus maximus, FL = tensor fasciæ latæ, RF = rectus femoris, VM = vastus medialis, VL = vastus lateralis, ST = semitendinosus, BF = biceps femoris, TA = tibialis anterior, PL = peroneus longus, GM = gastrocnemius medialis, GL = gastrocnemius lateralis, SO = soleus.

    The file "dataset.rar" contains data in older format, not compatible with the R package musclesyneRgies.

  19. Data_Patterns of gene flow across multiple anthropogenic infrastructures:...

    • figshare.com
    zip
    Updated Feb 11, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jonathan Remon; Sylvain Moulhérat; Jérémie H. Cornuau; Lucie Gendron; Murielle Richard; Michel Baguette; Jérôme G. Prunier (2020). Data_Patterns of gene flow across multiple anthropogenic infrastructures: insights from a multi-species approach [Dataset]. http://doi.org/10.6084/m9.figshare.11835840.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Feb 11, 2020
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Jonathan Remon; Sylvain Moulhérat; Jérémie H. Cornuau; Lucie Gendron; Murielle Richard; Michel Baguette; Jérôme G. Prunier
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data attached to the manuscript "Patterns of gene flow across multiple anthropogenic infrastructures: insights from a multi-species approach". There is four data sets, one per studied species including R scripts and original data frames. We also included aditional R functions needed to run the R scripts.

  20. Z

    Data from: Code and Data Schimmelradar manuscript 1.1

    • data-staging.niaid.nih.gov
    Updated Apr 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kortenbosch, Hylke (2025). Code and Data Schimmelradar manuscript 1.1 [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_14851614
    Explore at:
    Dataset updated
    Apr 3, 2025
    Dataset provided by
    Wageningen University & Research
    Authors
    Kortenbosch, Hylke
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Read me – Schimmelradar manuscript

    The code in this repository was written to analyse the data and generate figures for the manuscript “Land use drives spatial structure of drug resistance in a fungal pathogen”.

    This repository consists of two original .csv raw data files, 2 .tif files that are minimally reformatted after being downloaded from LGN.nl and www.pdok.nl/introductie/-/article/basisregistratie-gewaspercelen-brp-, and 9 scripts using the R language. The remaining files include intermediate .tif and .csv files to skip more computationally heavy steps of the analysis and facilitate the reproduction of the analysis.

    Data files:§1

    Schimmelradar_360_submission.csv: The raw phenotypic resistance spatial data from the air sample

    • Sample: an arbitrary sample code given to each of the participants

    • Area: A random number assigned to each of the 100 areas the Netherlands was split up into to facilitate an even spread of samples across the country during participant selection.

    • Logistics status: Variable used to indicate whether the sample was returned in good order, not otherwise used in the analysis.

    • Arrived back on: The date by which the sample arrived back at Wageningen University

    • Quality seals: quality of the seals upon sample return, only samples of a quality designated as good seals were processed. (also see Supplement file – section A).

    • Start sampling: The date on which the trap was deployed and the stickers exposed to the air, recorded by the participant

    • End sampling: The date on which the trap was taken down and the stickers were re-covered and no longer exposed to the air, recorded by the participant

    • 3 back in area?: Binary indicating whether at least three samples have been returned in the respective area (see Area)

    • Batch: The date on which processing of the sample was started. To be more specific, the date at which Flamingo medium was poured over the seals of the sample and incubation was started.

    • Lab processing: Binary indication completion of lab processing

    • Tot ITR: A. fumigatus CFU count in the permissive layer of the itraconazole-treated plate

    • RES ITR: CFU count of colonies that had breached the surface of the itraconazole-treated layer after incubation and were visually (with the unaided eye) sporulating.

    • RF ITR: The itraconazole (~4 mg/L) resistance fraction = RES ITR/Tot ITR

    • Muccor ITR: Indication of the presence of Mucorales spp. growth on the itraconazole treatment plate

    • Tot VOR: A. fumigatus CFU count in the permissive layer of the voriconazole-treated plate

    • RES VOR: CFU count of colonies that had breached the surface of the voriconazole-treated layer after incubation and were visually (with the unaided eye) sporulating.

    • RF VOR: The voriconazole (~2 mg/L) resistance fraction = RES VOR/Tot VOR

    • Muccor VOR: Indication of the presence of Mucorales spp. growth on the voriconazole treatment plate

    • Tot CON: CFU count on the untreated growth control plate Note: note on the sample based on either information given by the participant or observations in the lab. The exclude label was given if the sample had either too little (<25) or too many (>300) CFUs on one or more of the plates (also see Supplement file – section A).

    • Lat: Exact latitude of the address where the sample was taken. Not used in the published version of the code and hidden for privacy reasons.

    • Long: Exact longitude of the address where the sample was taken. Not used in the published version of the code and hidden for privacy reasons.

    • Round_Lat: Rounded latitude of the address where the sample was taken. Rounded down to two decimals (the equivalent of a 1 km2 area), so they could not be linked to a specific address. Used in the published version of the code.

    • Round_Long: Rounded longitude of the address where the sample was taken. Rounded down to two decimals (the equivalent of a 1 km2 area), so they could not be linked to a specific address. Used in the published version of the code.

    Analysis_genotypic_schimmelradar_TR_types.csv: The genotype data inferred from gel electrophoresis for all resistant isolates

    • TR type: Indicates the length of the tandem repeats in bp, as judged from a gel. 34 bp, 46 bp, or multiples of 46.

    • Plate: 96-well plate on which the isolate was cultured

    • 96-well: well in which the isolate was cultured

    • Azole: Azole on which the isolate was grown and resistant to. Itraconazole (ITRA) or Voriconazole (VORI).

    • Sample: The air sample the isolate was taken from, corresponds to “Sample” in “Schimmelradar_360_submission.csv”.

    • Strata: The number that equates to “Area” in “Schimmelradar_360_submission.csv”.

    • WT: A binary that indicates whether an isolate had a wildtype cyp51a promotor.

    • TR34: A binary that indicates whether an isolate had a TR34 cyp51a promotor.

    • TR46: A binary that indicates whether an isolate had a TR46 cyp51a promotor.

    • TR46_3: A binary that indicates whether an isolate had a TR46*3 cyp51a promotor.

    • TR46_4: A binary that indicates whether an isolate had a TR46*4 cyp51a promotor.

    Script 1 - generation_100_equisized_areas_NL

    NOTE: Running this code is not necessary for the other analyses, it was used primarily for sample selection. The area distribution was used during the analysis in script 2B, yet each sample is already linked to an area in “Schimmelradar_360_submission.csv". This script was written to generate a spatial polygons data frame of 100 equisized areas of the Netherlands. The registrations for the citizen science project Schimmelradar were binned into these areas to facilitate a relatively even distribution of samples throughout the country which can be seen in Figure S1. The spatial polygons data frame can be opened and displayed in open-source software such as QGIS. The package “spcosa” used to generate the areas has RJava as a dependency, so having Java installed is required to run this script. The R script uses a shapefile of the Netherlands from the tmap package to generate the areas within the Netherlands. Generating a similar distribution for other countries will require different shape files!

    Script 2 - Spatial_data_integration_fungalradar

    This script produces 4 data files that describe land use in the Netherlands: The three focal.RData files with land use and resistant/colony counts, as well as the “Predictor_raster_NL.tif” land use file.

    In this script, both the phenotypic and genotypic resistance spatial data from the air samples taken during the Fungal radar citizen science project are integrated with the land use and weather data used to model them. It is not recommended to run this code because the data extraction is fairly computationally demanding and it does not itself contain key statistical analyses. Rather it is used to generate the objects used for modelling and spatial predictions that are also included in this repository.

    The phenotypic resistance is summarised in Table 1, which is generated in this script. Subsequently, the spatial data from the LNG22 and BRP datasets are integrated into the data. These dataset can be loaded from the "LGN2022.tif" and "Gewas22rast.tiff" raster files, respectively. Link to webpages where these files can be downloaded can found in the code.

    Once the raster files are loaded, the code generates heatmaps and calculates the proportions of all the land use classes in both a 5 and 10-km radius around every sample and across the country to make spatial predictions. Only the 10 km radius data are used in the later analysis, but the 5 km radius was generated to test if that radius would be more appropriate, during an earlier stage of the analyses, and was left in for completeness. For documentation of the LGN22 data set, we refer to https://lgn.nl/documentatie and for BRP to https://nationaalgeoregister.nl/geonetwork/srv/dut/catalog.search#/metadata/44e6d4d3-8fc5-47d6-8712-33dd6d244eef, both of these online resources are in Dutch but can be readily translated. A list of the variables that were included from these datasets during model selection can be found in Table S3. Alongside land-use data, the code extracts weather data from datafiles that can be downloaded from https://cds.climate.copernicus.eu/datasets/sis-agrometeorological-indicators?tab=download for the Netherlands during the sampling window, dates and dimensions are listed within the code. The Weather_schimmelradar folder contains four subfolders for each weather variable that was considered during modelling: temperature, wind speed, precipitation and humidity. Each of these subfolders contains 44 .nc files that each cover the daily mean of the respective weather variable across the Netherlands for each of the 44 days of the sampling window the citizen scientists were given.

    All spatial objects weather + land use are eventually merged into one predictor raster "Predictor_raster_NL.tif". The land use fractions and weather data are subsequently integrated with the air sample data into a single spatial data frame along with the resistance data and saved into an R object "Schimmelradar360spat_focal.RData". The script concludes by merging the cyp51a haplotype data with this object as well, to create two different objects: "Schimmelradar360spat_focal_TR_VORI.RData" for the haplotype data of the voriconazole resistant isolates and "Schimmelradar360spat_focal_TR_ITRA.RData" including the haplotype data of itraconazole resistant isolates. These two datasets are modeled separately in scripts 5,9 and 6,8, respectively. This final section of the script also generates summary table S2, which summarises the frequency of the cyp51a haplotypes per azole treatment.

    If the relevant objects are loaded

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gede Primahadi Wijaya Rajeg (2023). R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart [Dataset]. http://doi.org/10.26180/5c844c7a81768

R codes and dataset for Visualisation of Diachronic Constructional Change using Motion Chart

Explore at:
zipAvailable download formats
Dataset updated
May 30, 2023
Dataset provided by
Monash University
Authors
Gede Primahadi Wijaya Rajeg
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

PublicationPrimahadi Wijaya R., Gede. 2014. Visualisation of diachronic constructional change using Motion Chart. In Zane Goebel, J. Herudjati Purwoko, Suharno, M. Suryadi & Yusuf Al Aried (eds.). Proceedings: International Seminar on Language Maintenance and Shift IV (LAMAS IV), 267-270. Semarang: Universitas Diponegoro. doi: https://doi.org/10.4225/03/58f5c23dd8387Description of R codes and data files in the repositoryThis repository is imported from its GitHub repo. Versioning of this figshare repository is associated with the GitHub repo's Release. So, check the Releases page for updates (the next version is to include the unified version of the codes in the first release with the tidyverse).The raw input data consists of two files (i.e. will_INF.txt and go_INF.txt). They represent the co-occurrence frequency of top-200 infinitival collocates for will and be going to respectively across the twenty decades of Corpus of Historical American English (from the 1810s to the 2000s).These two input files are used in the R code file 1-script-create-input-data-raw.r. The codes preprocess and combine the two files into a long format data frame consisting of the following columns: (i) decade, (ii) coll (for "collocate"), (iii) BE going to (for frequency of the collocates with be going to) and (iv) will (for frequency of the collocates with will); it is available in the input_data_raw.txt. Then, the script 2-script-create-motion-chart-input-data.R processes the input_data_raw.txt for normalising the co-occurrence frequency of the collocates per million words (the COHA size and normalising base frequency are available in coha_size.txt). The output from the second script is input_data_futurate.txt.Next, input_data_futurate.txt contains the relevant input data for generating (i) the static motion chart as an image plot in the publication (using the script 3-script-create-motion-chart-plot.R), and (ii) the dynamic motion chart (using the script 4-script-motion-chart-dynamic.R).The repository adopts the project-oriented workflow in RStudio; double-click on the Future Constructions.Rproj file to open an RStudio session whose working directory is associated with the contents of this repository.

Search
Clear search
Close search
Google apps
Main menu