57 datasets found

d
Data from: R Manual for QCA
search.dataone.org
dataverse.harvard.edu
Updated Nov 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mello, Patrick A. (2023). R Manual for QCA [Dataset]. http://doi.org/10.7910/DVN/KYF7VJ
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/KYF7VJ
Dataset updated
Nov 17, 2023
Dataset provided by
Harvard Dataverse
Authors
Mello, Patrick A.
Description
The R Manual for QCA entails a PDF file that describes all the steps and code needed to prepare and conduct a Qualitative Comparative Analysis (QCA) study in R. This is complemented by an R Script that can be customized as needed. The dataset further includes two files with sample data, for the set-theoretic analysis and the visualization of QCA results. The R Manual for QCA is the online appendix to "Qualitative Comparative Analysis: An Introduction to Research Design and Application", Georgetown University Press, 2021.
Crime Data Analysis
kaggle.com
Updated Aug 9, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Candace Gostinski (2024). Crime Data Analysis [Dataset]. https://www.kaggle.com/datasets/candacegostinski/crime-data-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Candace Gostinski
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
In a world of increasing crime, many organizations are interested in examining incident details to learn from and prevent future crime. Our client, based in Los Angeles County, was interested in this exact thing. They asked us to examine the data to answer several questions; among them, what was the rate of increase or decrease in crime from 2020 to 2023, and which ethnicity or group of people were targeted the most.

Our data was collected from Kaggle.com at the following link:

https://www.kaggle.com/datasets/nathaniellybrand/los-angeles-crime-dataset-2020-present

It was cleaned, examined for further errors, and the analysis performed using RStudio. The results of this analysis are in the attached PDF entitled: "crime_data_analysis_report." Please feel free to review the results as well as follow along with the dataset on your own machine.
f
DataSheet1_ALASCA: An R package for longitudinal and cross-sectional...
frontiersin.figshare.com
pdf
Updated Jun 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anders Hagen Jarmund; Torfinn Støve Madssen; Guro F. Giskeødegård (2023). DataSheet1_ALASCA: An R package for longitudinal and cross-sectional analysis of multivariate data by ASCA-based methods.pdf [Dataset]. http://doi.org/10.3389/fmolb.2022.962431.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fmolb.2022.962431.s001
Dataset updated
Jun 10, 2023
Dataset provided by
Frontiers
Authors
Anders Hagen Jarmund; Torfinn Støve Madssen; Guro F. Giskeødegård
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The increasing availability of multivariate data within biomedical research calls for appropriate statistical methods that can describe and model complex relationships between variables. The extended ANOVA simultaneous component analysis (ASCA+) framework combines general linear models and principal component analysis (PCA) to decompose and visualize the separate effects of experimental factors. It has recently been demonstrated how linear mixed models can be included in the framework to analyze data from longitudinal experimental designs with repeated measurements (RM-ASCA+). The ALASCA package for R makes the ASCA+ framework accessible for general use and includes multiple methods for validation and visualization. The package is especially useful for longitudinal data and the ability to easily adjust for covariates is an important strength. This paper demonstrates how the ALASCA package can be applied to gain insights into multivariate data from interventional as well as observational designs. Publicly available data sets from four studies are used to demonstrate the methods available (proteomics, metabolomics, and transcriptomics).
Healthcare Device Data Analysis with R
kaggle.com
zip
Updated Oct 7, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
stanley888cy (2021). Healthcare Device Data Analysis with R [Dataset]. https://www.kaggle.com/stanley888cy/google-project-02
Explore at:
zip(353177 bytes)Available download formats
Dataset updated
Oct 7, 2021
Authors
stanley888cy
Description
Context

Hi. This is my data analysis project and also try using R in my work. They are the capstone project for Google Data Analysis Certificate Course offered in Coursera. (https://www.coursera.org/professional-certificates/google-data-analytics) It is about operation data analysis of data from health monitoring device. For detailed background story, please check the pdf file (Case 02.pdf) for reference.

In this case study, I use personal health tracker data from Fitbit to evaluate the how the usage of health tracker device, and then determine if there are any trends or patterns.

My data analysis will be focus in 2 area: exercise activity and sleeping habit. Exercise activity will be a study of relationship between activity type and calories consumed, while sleeping habit will be identify any the pattern of user sleeping. In this analysis, I will also try to use some linear regression model, so that the data can be explain in a quantitative way and make prediction easier.

I understand that I am just new to data analysis and the skills or code is very beginner level. But I am working hard to learn more in both R and data science field. If you have any idea or feedback. Please feel free to comment.

Stanley Cheng 2021-10-07
m
Inflation- Unemployment Data & Analysis Codes (R)
data.mendeley.com
Updated Sep 11, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hazar Altinbas (2018). Inflation- Unemployment Data & Analysis Codes (R) [Dataset]. http://doi.org/10.17632/v9679528f7.1
Explore at:
Unique identifier
https://doi.org/10.17632/v9679528f7.1
Dataset updated
Sep 11, 2018
Authors
Hazar Altinbas
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data is used for examination of inflation- unemployment relationship for 18 countries after 1991. Inflation data is obtained from World Bank database (https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG) and unemployment data is obtained from International Labor Organization (http://www.ilo.org/wesodata/).

Analysis period is different for all countries because of structural breaks determined by single point change point detection algorithm included in changepoint package of Killick & Eckley (2014). Granger-causality is conducted with Toda&Yamamoto (1995) procedure. Integration levels are determined with 3 stationary tests. VAR models are run with vars package (Pfaff, Stigler & Pfaff; 2018) without trend and constant terms. Cointegration test is conducted with urca package (Pfaff, Zivot, Stigler & Pfaff; 2016).

All data files are .csv files. Analyst need to change country index (variable name: j) in order to see individual results. Findings can be seen in the article.

Killick, R., & Eckley, I. (2014). changepoint: An R package for changepoint analysis. Journal of statistical software, 58(3), 1-19.

Pfaff, B., Stigler, M., & Pfaff, M. B. (2018). Package ‘vars’. Online] https://cran. r-project. org/web/packages/vars/vars. pdf.

Pfaff, B., Zivot, E., Stigler, M., & Pfaff, M. B. (2016). Package ‘urca’. Unit root and cointegration tests for time series data. R package version, 1-2.

Toda, H. Y., & Yamamoto, T. (1995). Statistical inference in vector autoregressions with possibly integrated processes. Journal of econometrics, 66(1-2), 225-250.
d
Replication Data for: Revisiting 'The Rise and Decline' in a Population of...
search.dataone.org
dataverse.harvard.edu
Updated Nov 22, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill (2023). Replication Data for: Revisiting 'The Rise and Decline' in a Population of Peer Production Projects [Dataset]. http://doi.org/10.7910/DVN/SG3LP1
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/SG3LP1
Dataset updated
Nov 22, 2023
Dataset provided by
Harvard Dataverse
Authors
TeBlunthuis, Nathan; Aaron Shaw; Benjamin Mako Hill
Description
This archive contains code and data for reproducing the analysis for “Replication Data for Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects”. Depending on what you hope to do with the data you probabbly do not want to download all of the files. Depending on your computation resources you may not be able to run all stages of the analysis. The code for all stages of the analysis, including typesetting the manuscript and running the analysis, is in code.tar. If you only want to run the final analysis or to play with datasets used in the analysis of the paper, you want intermediate_data.7z or the uncompressed tab and csv files. The data files are created in a four-stage process. The first stage uses the program “wikiq” to parse mediawiki xml dumps and create tsv files that have edit data for each wiki. The second stage generates all.edits.RDS file which combines these tsvs into a dataset of edits from all the wikis. This file is expensive to generate and at 1.5GB is pretty big. The third stage builds smaller intermediate files that contain the analytical variables from these tsv files. The fourth stage uses the intermediate files to generate smaller RDS files that contain the results. Finally, knitr and latex typeset the manuscript. A stage will only run if the outputs from the previous stages do not exist. So if the intermediate files exist they will not be regenerated. Only the final analysis will run. The exception is that stage 4, fitting models and generating plots, always runs. If you only want to replicate from the second stage onward, you want wikiq_tsvs.7z. If you want to replicate everything, you want wikia_mediawiki_xml_dumps.7z.001 wikia_mediawiki_xml_dumps.7z.002, and wikia_mediawiki_xml_dumps.7z.003. These instructions work backwards from building the manuscript using knitr, loading the datasets, running the analysis, to building the intermediate datasets. Building the manuscript using knitr This requires working latex, latexmk, and knitr installations. Depending on your operating system you might install these packages in different ways. On Debian Linux you can run apt install r-cran-knitr latexmk texlive-latex-extra. Alternatively, you can upload the necessary files to a project on Overleaf.com. Download code.tar. This has everything you need to typeset the manuscript. Unpack the tar archive. On a unix system this can be done by running tar xf code.tar. Navigate to code/paper_source. Install R dependencies. In R. run install.packages(c("data.table","scales","ggplot2","lubridate","texreg")) On a unix system you should be able to run make to build the manuscript generalizable_wiki.pdf. Otherwise you should try uploading all of the files (including the tables, figure, and knitr folders) to a new project on Overleaf.com. Loading intermediate datasets The intermediate datasets are found in the intermediate_data.7z archive. They can be extracted on a unix system using the command 7z x intermediate_data.7z. The files are 95MB uncompressed. These are RDS (R data set) files and can be loaded in R using the readRDS. For example newcomer.ds <- readRDS("newcomers.RDS"). If you wish to work with these datasets using a tool other than R, you might prefer to work with the .tab files. Running the analysis Fitting the models may not work on machines with less than 32GB of RAM. If you have trouble, you may find the functions in lib-01-sample-datasets.R useful to create stratified samples of data for fitting models. See line 89 of 02_model_newcomer_survival.R for an example. Download code.tar and intermediate_data.7z to your working folder and extract both archives. On a unix system this can be done with the command tar xf code.tar && 7z x intermediate_data.7z. Install R dependencies. install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). On a unix system you can simply run regen.all.sh to fit the models, build the plots and create the RDS files. Generating datasets Building the intermediate files The intermediate files are generated from all.edits.RDS. This process requires about 20GB of memory. Download all.edits.RDS, userroles_data.7z,selected.wikis.csv, and code.tar. Unpack code.tar and userroles_data.7z. On a unix system this can be done using tar xf code.tar && 7z x userroles_data.7z. Install R dependencies. In R run install.packages(c("data.table","ggplot2","urltools","texreg","optimx","lme4","bootstrap","scales","effects","lubridate","devtools","roxygen2")). Run 01_build_datasets.R. Building all.edits.RDS The intermediate RDS files used in the analysis are created from all.edits.RDS. To replicate building all.edits.RDS, you only need to run 01_build_datasets.R when the int... Visit https://dataone.org/datasets/sha256%3Acfa4980c107154267d8eb6dc0753ed0fde655a73a062c0c2f5af33f237da3437 for complete metadata about this dataset.
d
Health and Retirement Study (HRS)
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Damico, Anthony (2023). Health and Retirement Study (HRS) [Dataset]. http://doi.org/10.7910/DVN/ELEKOY
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ELEKOY
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Damico, Anthony
Description
analyze the health and retirement study (hrs) with r the hrs is the one and only longitudinal survey of american seniors. with a panel starting its third decade, the current pool of respondents includes older folks who have been interviewed every two years as far back as 1992. unlike cross-sectional or shorter panel surveys, respondents keep responding until, well, death d o us part. paid for by the national institute on aging and administered by the university of michigan's institute for social research, if you apply for an interviewer job with them, i hope you like werther's original. figuring out how to analyze this data set might trigger your fight-or-flight synapses if you just start clicking arou nd on michigan's website. instead, read pages numbered 10-17 (pdf pages 12-19) of this introduction pdf and don't touch the data until you understand figure a-3 on that last page. if you start enjoying yourself, here's the whole book. after that, it's time to register for access to the (free) data. keep your username and password handy, you'll need it for the top of the download automation r script. next, look at this data flowchart to get an idea of why the data download page is such a righteous jungle. but wait, good news: umich recently farmed out its data management to the rand corporation, who promptly constructed a giant consolidated file with one record per respondent across the whole panel. oh so beautiful. the rand hrs files make much of the older data and syntax examples obsolete, so when you come across stuff like instructions on how to merge years, you can happily ignore them - rand has done it for you. the health and retirement study only includes noninstitutionalized adults when new respondents get added to the panel (as they were in 1992, 1993, 1998, 2004, and 2010) but once they're in, they're in - respondents have a weight of zero for interview waves when they were nursing home residents; but they're still responding and will continue to contribute to your statistics so long as you're generalizing about a population from a previous wave (for example: it's possible to compute "among all americans who were 50+ years old in 1998, x% lived in nursing homes by 2010"). my source for that 411? page 13 of the design doc. wicked. this new github repository contains five scripts: 1992 - 2010 download HRS microdata.R loop through every year and every file, download, then unzip everything in one big party impor t longitudinal RAND contributed files.R create a SQLite database (.db) on the local disk load the rand, rand-cams, and both rand-family files into the database (.db) in chunks (to prevent overloading ram) longitudinal RAND - analysis examples.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create tw o database-backed complex sample survey object, using a taylor-series linearization design perform a mountain of analysis examples with wave weights from two different points in the panel import example HRS file.R load a fixed-width file using only the sas importation script directly into ram with < a href="http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html">SAScii parse through the IF block at the bottom of the sas importation script, blank out a number of variables save the file as an R data file (.rda) for fast loading later replicate 2002 regression.R connect to the sql database created by the 'import longitudinal RAND contributed files' program create a database-backed complex sample survey object, using a taylor-series linearization design exactly match the final regression shown in this document provided by analysts at RAND as an update of the regression on pdf page B76 of this document . click here to view these five scripts for more detail about the health and retirement study (hrs), visit: michigan's hrs homepage rand's hrs homepage the hrs wikipedia page a running list of publications using hrs notes: exemplary work making it this far. as a reward, here's the detailed codebook for the main rand hrs file. note that rand also creates 'flat files' for every survey wave, but really, most every analysis you c an think of is possible using just the four files imported with the rand importation script above. if you must work with the non-rand files, there's an example of how to import a single hrs (umich-created) file, but if you wish to import more than one, you'll have to write some for loops yourself. confidential to sas, spss, stata, and sudaan users: a tidal wave is coming. you can get water up your nose and be dragged out to sea, or you can grab a surf board. time to transition to r. :D
r
Supplemental R Scripts and output files for Designing, understanding and...
researchdata.edu.au
Updated Apr 7, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chris Brien (2022). Supplemental R Scripts and output files for Designing, understanding and modelling two-phase experiments with human subjects. Statistical Methods in Medical Research, 31(4), 626-645. [Dataset]. http://doi.org/10.25909/13135052.V1
Explore at:
Unique identifier
https://doi.org/10.25909/13135052.V1
Dataset updated
Apr 7, 2022
Dataset provided by
The University of Adelaide
Authors
Chris Brien
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
R scripts and their pdf output files are provided for designing and analyzing the Farewell and Herzberg (2003) example of a human training experiment and for producing an alternative design that employs a row-column design for the second phase. (Also available with the peer-reviewed publication linked on the right.)

The following files are available here:
FHdesign.r:

R script to produce and investigate the Farewell and Herzberg Plaid design (output in FHdesign.pdf).
FHanal.r:

R script to conduct a mixed-model analysis of the Farewell and Herzberg data (output in FHanal.pdf). This script requires the R data file plaid.dat.rda that contains the Farewell and Herzberg data in a suitable format and the file global.r that is sourced by FHanal.r and sets constants and functions used in the script.
FHpower.r:
R script to do a power simulation for the Farewell and Herzberg design (output in FHpower.pdf). This script requires the R data file plaid.dat.rda that contains the Farewell and Herzberg design and the file globalPower.r that sets constants and functions used in the script.
globalPower.r:
R script to sets constants and functions. It is sourced by the scripts FHpower.r and Altpower.r.
AltRowColDesign.r:
R script to produce and investigate an alternative second-phase design employing a row-column design and groups of patients assigned to different raters (output in AltRowColDesign.pdf).
Altpower.r:
R script to do a power simulation for the row-column design (output in Altpower.pdf). This script requires the R data file AltRowColDesign.RData, produce by AltRowColDesign.r and the file globalPower.r that sets constants and functions used in the script.
Z
Assessing the impact of hints in learning formal specification: Research...
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jan 29, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Macedo, Nuno; Cunha, Alcino; Campos, José Creissac; Sousa, Emanuel; Margolis, Iara (2024). Assessing the impact of hints in learning formal specification: Research artifact [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_10450608
Explore at:
Dataset updated
Jan 29, 2024
Dataset provided by
Centro de Computação Gráfica
INESC TEC
Authors
Macedo, Nuno; Cunha, Alcino; Campos, José Creissac; Sousa, Emanuel; Margolis, Iara
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.

Dataset

The artifact contains the resources described below.

Experiment resources

The resources needed for replicating the experiment, namely in directory experiment:

alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.

alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.

docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.

api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.

Experiment data

The task database used in our application of the experiment, namely in directory data/experiment:

Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.

identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.

Collected data

Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:

data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).

data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:

participant identification: participant's unique identifier (ID);

socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).

data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:

participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);

detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.

data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:

participant identification: participant's unique identifier (ID);

user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).

participants.txt: the list of participant identifiers that have registered for the experiment.

Analysis scripts

The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:

analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.

requirements.r: An R script to install the required libraries for the analysis script.

normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.

normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.

Dockerfile: Docker script to automate the analysis script from the collected data.

Setup

To replicate the experiment and the analysis of the results, only Docker is required.

If you wish to manually replicate the experiment and collect your own data, you'll need to install:

A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.

If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:

Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.

R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.

Usage

Experiment replication

This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.

To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.

cd experimentdocker-compose up

This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.

In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:

Group N (no hints): http://localhost:3000/0CAN

Group L (error locations): http://localhost:3000/CA0L

Group E (counter-example): http://localhost:3000/350E

Group D (error description): http://localhost:3000/27AD

In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.

Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.

Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.

After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:

Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.

Analysis of other applications of the experiment

This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.

The analysis script expects data in 4 CSV files,
Data from Acoustic Primer Exercises: A Tutorial for Landscape Ecologists
figshare.com
search.datacite.org
zip
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luis J. Villanueva-Rivera (2023). Data from Acoustic Primer Exercises: A Tutorial for Landscape Ecologists [Dataset]. http://doi.org/10.6084/m9.figshare.1040423.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.1040423.v1
Dataset updated
May 31, 2023
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Luis J. Villanueva-Rivera
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data if part of the tutorial supplement to the paper "A Primer on Acoustic Analysis for Landscape Ecologists" by Villanueva-Rivera et al. featured in the Landscape Ecology special issue entitled "Soundscape Ecology" (vol. 26, pages 1233-1246, doi: 10.1007/s10980-011-9636-9). Accordingly, the exercises in the tutorial are meant to be undertaken while reading the article.

Primer_Tutorial_1.3.1.pdf - pdf of the tutorial, version 1.3.1 (24june2014) Exercise1.zip - Files for exercise 1 Exercise2.zip - Files for exercise 2 The following zip files contain 1-minute versions of the files for exercise 3 (the original files were 15 minutes long). Each site was divided in 4 files: Ag1_1min_[number].zip - Files from the Ag1 site Ag2_1min_[number].zip - Files from the Ag2 site FNRFarm_1min_[number].zip - Files from the FNR Farm site Martell_1min_[number].zip - Files from the Martell site McCormick_1min_[number].zip - Files from the McCormick site PurdueWildlife_1min_[number].zip - Files from the Purdue Wildlife site Ross_1min_[number].zip - Files from the Ross site

This dataset was revised on 26Jun2014 to correct the date of the Tutorial v 1.3.1.
Bike Sharing Data Analysis with R
kaggle.com
zip
Updated Sep 28, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
stanley888cy (2021). Bike Sharing Data Analysis with R [Dataset]. https://www.kaggle.com/stanley888cy/google-project-01
Explore at:
zip(189322255 bytes)Available download formats
Dataset updated
Sep 28, 2021
Authors
stanley888cy
Description
What is this ? In this case study, I use a bike-share company data to evaluate the biking performance between members and casuals, determine if there are any trends or patterns, and theorize what are causing them. I am then able to develop a recommendation based on those findings.

Content: Hi. This is my first data analysis project and also my first time to use R in my work. They are the capstone project for Google Data Analysis Certificate Course offered in Coursera. (https://www.coursera.org/professional-certificates/google-data-analytics) It is about operation data analysis of a frictional bike-share company in Chicago. For detailed background story, please check the pdf file (Case 01.pdf) for reference.

In this case study, I use a bike-share company data to evaluate the biking performance between members and casuals, determine if there are any trends or patterns, and theorize what are causing them by descriptive analysis. I am then able to develop a recommendation based on those findings.

First I will make a background introduction, my business tasks and objectives, and how I obtain the data sources for analysis. Also, they are the R code I worked in RStudio for data processing, cleaning and generating graphs for next part analysis. Next, there are my analysis of bike data, with graphs and charts generated by R ggplot2. At the end, I also provide some recommendations to business tasks, based on the data finding.

I understand that I am just new to data analysis and the skills or code is very beginner level. But I am working hard to learn more in both R and data science field. If you have any idea or feedback. Please feel free to comment.

Stanley Cheng 2021-09-30
96 wells fluorescence reading and R code statistic for analysis
zenodo.org
bin, csv, doc, pdf
Updated Aug 2, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
JVD Molino; JVD Molino (2024). 96 wells fluorescence reading and R code statistic for analysis [Dataset]. http://doi.org/10.5281/zenodo.1119285
Explore at:
doc, csv, pdf, binAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1119285
Dataset updated
Aug 2, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
JVD Molino; JVD Molino
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Overview

Data points present in this dataset were obtained following the subsequent steps: To assess the secretion efficiency of the constructs, 96 colonies from the selection plates were evaluated using the workflow presented in Figure Workflow. We picked transformed colonies and cultured in 400 μL TAP medium for 7 days in Deep-well plates (Corning Axygen®, No.: PDW500CS, Thermo Fisher Scientific Inc., Waltham, MA), covered with Breathe-Easy® (Sigma-Aldrich®). Cultivation was performed on a rotary shaker, set to 150 rpm, under constant illumination (50 μmol photons/m²s). Then 100 μL sample were transferred clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA) and fluorescence was measured using an Infinite® M200 PRO plate reader (Tecan, Männedorf, Switzerland). Fluorescence was measured at excitation 575/9 nm and emission 608/20 nm. Supernatant samples were obtained by spinning Deep-well plates at 3000 × g for 10 min and transferring 100 μL from each well to the clear bottom 96-well plate (Corning Costar, Tewksbury, MA, USA), followed by fluorescence measurement. To compare the constructs, R Statistic version 3.3.3 was used to perform one-way ANOVA (with Tukey's test), and to test statistical hypotheses, the significance level was set at 0.05. Graphs were generated in RStudio v1.0.136. The codes are deposit herein.

Info

ANOVA_Turkey_Sub.R -> code for ANOVA analysis in R statistic 3.3.3

barplot_R.R -> code to generate bar plot in R statistic 3.3.3

boxplotv2.R -> code to generate boxplot in R statistic 3.3.3

pRFU_+_bk.csv -> relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

sup_+_bl.csv -> supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

sup_raw.csv -> supernatant mCherry fluorescence dataset of 96 colonies for each construct.

who_+_bl2.csv -> whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

who_raw.csv -> whole culture mCherry fluorescence dataset of 96 colonies for each construct.

who_+_Chlo.csv -> whole culture chlorophyll fluorescence dataset of 96 colonies for each construct.

Anova_Output_Summary_Guide.pdf -> Explain the ANOVA files content

ANOVA_pRFU_+_bk.doc -> ANOVA of relative supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_sup_+_bk.doc -> ANOVA of supernatant mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_who_+_bk.doc -> ANOVA of whole culture mCherry fluorescence dataset of positive colonies, blanked with parental wild-type cc1690 cell of Chlamydomonas reinhardtii

ANOVA_Chlo.doc -> ANOVA of whole culture chlorophyll fluorescence of all constructs, plus average and standard deviation values.

Consider citing our work.

Molino JVD, de Carvalho JCM, Mayfield SP (2018) Comparison of secretory signal peptides for heterologous protein expression in microalgae: Expanding the secretion portfolio for Chlamydomonas reinhardtii. PLoS ONE 13(2): e0192433. https://doi.org/10.1371/journal. pone.0192433
Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...
data.niaid.nih.gov
zenodo.org
+1more
zip
Updated Dec 7, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dylan Westfall; Mullins James (2023). Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies [Dataset]. http://doi.org/10.5061/dryad.w3r2280w0
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.w3r2280w0
Dataset updated
Dec 7, 2023
Dataset provided by
HIV Prevention Trials Networkhttp://www.hptn.org/
HIV Vaccine Trials Networkhttp://www.hvtn.org/
National Institute of Allergy and Infectious Diseaseshttp://www.niaid.nih.gov/
PEPFAR
Authors
Dylan Westfall; Mullins James
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies. Methods This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies" Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005 For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub. The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub. The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results. Sequence_Analysis.Rmd has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd and Figures.Rmd. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program. To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper. Using Identifying_Recombinant_Reads.Rmd, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd. Figures.Rmd used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
Data supporting the Master thesis "Monitoring von Open Data Praktiken -...
zenodo.org
data.niaid.nih.gov
+1more
zip
Updated Nov 21, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Katharina Zinke; Katharina Zinke (2024). Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" [Dataset]. http://doi.org/10.5281/zenodo.14196539
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.14196539
Dataset updated
Nov 21, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Katharina Zinke; Katharina Zinke
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
Dresden
Description
Data supporting the Master thesis "Monitoring von Open Data Praktiken - Herausforderungen beim Auffinden von Datenpublikationen am Beispiel der Publikationen von Forschenden der TU Dresden" (Monitoring open data practices - challenges in finding data publications using the example of publications by researchers at TU Dresden) - Katharina Zinke, Institut für Bibliotheks- und Informationswissenschaften, Humboldt-Universität Berlin, 2023

This ZIP-File contains the data the thesis is based on, interim exports of the results and the R script with all pre-processing, data merging and analyses carried out. The documentation of the additional, explorative analysis is also available. The actual PDFs and text files of the scientific papers used are not included as they are published open access.

The folder structure is shown below with the file names and a brief description of the contents of each file. For details concerning the analyses approach, please refer to the master's thesis (publication following soon).

## Data sources

Folder 01_SourceData/

- PLOS-Dataset_v2_Mar23.csv (PLOS-OSI dataset)

- ScopusSearch_ExportResults.csv (export of Scopus search results from Scopus)

- ScopusSearch_ExportResults.ris (export of Scopus search results from Scopus)

- Zotero_Export_ScopusSearch.csv (export of the file names and DOIs of the Scopus search results from Zotero)

## Automatic classification

Folder 02_AutomaticClassification/

- (NOT INCLUDED) PDFs folder (Folder for PDFs of all publications identified by the Scopus search, named AuthorLastName_Year_PublicationTitle_Title)

- (NOT INCLUDED) PDFs_to_text folder (Folder for all texts extracted from the PDFs by ODDPub, named AuthorLastName_Year_PublicationTitle_Title)

- PLOS_ScopusSearch_matched.csv (merge of the Scopus search results with the PLOS_OSI dataset for the files contained in both)

- oddpub_results_wDOIs.csv (results file of the ODDPub classification)

- PLOS_ODDPub.csv (merge of the results file of the ODDPub classification with the PLOS-OSI dataset for the publications contained in both)

## Manual coding

Folder 03_ManualCheck/

- CodeSheet_ManualCheck.txt (Code sheet with descriptions of the variables for manual coding)

- ManualCheck_2023-06-08.csv (Manual coding results file)

- PLOS_ODDPub_Manual.csv (Merge of the results file of the ODDPub and PLOS-OSI classification with the results file of the manual coding)

## Explorative analysis for the discoverability of open data

Folder04_FurtherAnalyses

Proof_of_of_Concept_Open_Data_Monitoring.pdf (Description of the explorative analysis of the discoverability of open data publications using the example of a researcher) - in German

## R-Script

Analyses_MA_OpenDataMonitoring.R (R-Script for preparing, merging and analyzing the data and for performing the ODDPub algorithm)
f
Supplement 1. R code and instructions to implement coincidence rate method...
datasetcatalog.nlm.nih.gov
wiley.figshare.com
Updated Aug 10, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
O'Keefe, Joy M.; Walters, Brianne; Clement, Matthew J. (2016). Supplement 1. R code and instructions to implement coincidence rate method for abundance estimation. [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001585130
Explore at:
Dataset updated
Aug 10, 2016
Authors
O'Keefe, Joy M.; Walters, Brianne; Clement, Matthew J.
Description
File List coincidence_functions.R (MD5: 3e84e1aee18a63675a591b86106d7a44) guide_to_computer_code.pdf (MD5: c116f7a3067ffa55be45e3dc6e8a11fb) example_data1.txt (MD5: f4a0fa3b1d63de0c534cef1462545da2) example_data2.txt (MD5: 4a3469bb0cdb2de0c293475bc95d1eaf) example_data3.txt (MD5: 3329f73cf552b84a9354305cc91dba9f) example_data4.txt (MD5: 63df7645a93420a0dce2b3e134c3015f) Description coincidence_functions.R – R code to simulate data, estimate abundance, and conduct power analysis, as described in the main text guide_to_computer_code.pdf – Explains how to use R code example_data1.txt – example Indiana bat data used in text example_data2.txt – example Indiana bat data used in text example_data3.txt – example Indiana bat data used in text example_data4.txt – example Indiana bat data used in text
Z
Identification of a human blood biomarker of pharmacological...
data-staging.niaid.nih.gov
data.niaid.nih.gov
+1more
Updated Jul 11, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gómez, Cristina; Alimajstorovic, Zerin; Othonos, Nantia; Winter, Denise V.; White, Sarah; Lavery, Gareth G.; Tomlinson, Jeremy W.; Sinclair, Alexandra J.; Odermatt, Alex (2024). Identification of a human blood biomarker of pharmacological 11β-hydroxysteroid dehydrogenase 1 inhibition [Dataset]. https://data-staging.niaid.nih.gov/resources?id=zenodo_8403031
Explore at:
Dataset updated
Jul 11, 2024
Dataset provided by
Metabolic Neurology, Institute of Metabolism and Systems Research, University of Birmingham, Birmingham, B15 2TT, UK
Department for Biosciences, Nottingham Trent University, Nottingham, NG11 8NS UK
Oxford Centre for Diabetes, Endocrinology and Metabolism, NIHR Oxford Biomedical Research Centre, University of Oxford, Churchill Hospital, Oxford, OX3 7LE UK
Division of Molecular and Systems Toxicology, Department of Pharmaceutical Sciences, University of Basel, 4056 Basel, Switzerland
Authors
Gómez, Cristina; Alimajstorovic, Zerin; Othonos, Nantia; Winter, Denise V.; White, Sarah; Lavery, Gareth G.; Tomlinson, Jeremy W.; Sinclair, Alexandra J.; Odermatt, Alex
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data-Sets of, “Identification of a human blood biomarker of pharmacological 11β-hydroxysteroid dehydrogenase 1 inhibition”

The Dataset (Derived from https://doi.org/10.1111/bph.16251) contains the original figures and tables as PNG-format (10.1111_bph.16251_Figure 1-4.PNG; 10.1111_bph.16251_Table1-2.PNG and supplemental information 10.1111_bph.16251_FigS1-S2.PNG; 10.1111_bph.16251_TableS1-S6.PNG), as well as the graphical abstract.PNG

Corresponding raw data and subsequent data analysis obtained from LC-MS/MS analysis and reused data on THF, THE and allo-THE and clinical parameters as well as the statistical evaluation of obtained data are provided as raw-files and corresponding meta data-files (FAIR-Principle).

Fig 2:

Two files in CSV format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_3.csv and 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_28_1.csv). Detailed description of the LC-MS/MS method is provided as pdf-Format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_M_2.pdf). All further experiment related information provided as one meta-data-file (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_3_M .txt) in txt format and two files containing further related information (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_M_A1-2 .pdf) PDF format.

Fig3:

Two files in CSV format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_4.csv; 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_28_2.csv). One meta data file as pdf-format with detailed LC-MS/MS method description 310030-(214978_10.1111_bph.16251_CGC_Human_Biomarker_4_M_2.pdf). All further related information are provided as one meta-data-file (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_4_M .txt) in txt format. Cohort B related information is provided as three files in pdf- format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_M_B1-3).

Fig 4:

Six files in CSV format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_3-5 .csv, 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_26_1.csv and 214978_10.1111_bph.16251_CGC_Human_Biomarker_28_1-2.csv). Two meta data file as pdf-format with detailed LC-MS/MS method descriptions (214978_10.1111_bph.16251_CGC_Human_Biomarker_4_M_2-3 .pdf ). All further related information are provided as one meta-data-file (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_4_3-5_M.txt) in txt format. Cohort related information is provided as five files in pdf- format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_M_A1-2 and B1-3.pdf) Statistical analysis is provided as R-File (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_26_M_1_1.R).

Tab1:

One file in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_1 .csv). One meta file as pdf-format with detailed LC-MS/MS method description (214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_1.pdf ). All further related information are provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_1_M.txt) in txt format. Cohort A related information is provided as two files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_A1-2).

Tab2:

One file in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_2.csv). One meta data file as pdf-format with detailed LC-MS/MS method description (214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_1.pdf ). All further related information are provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_2_M.txt) in txt format. Cohort B related information is provided as three files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_B1-3).

Fig S1:

Four files in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_3-4.csv and 214978_10.1111_BPH.16251_CGC_Human_Biomarker_28_1-2.csv). One meta data file as pdf-format with detailed LC-MS/MS method description (214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_2.pdf ). All further related information are provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_3-4_M.txt) in txt format. Cohort A and B related information is provided as five files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_A1-2 and-B1-3).

Fig S2:

Five files in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_3-4.csv and 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_26_2.csv, 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_28_1-2.csv) and one file as R-File (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_26_2_M1.R). One meta data file as pdf-format provides detailed LC-MS/MS method description (214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_2.pdf). All further related information are provided as one meta-data-file (310030 214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_26_2_M.txt) in txt format. Cohort A and B related information is provided as five files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_A1-2 and-B1-3).

Tab S1:

One file in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_28_3.csv). All further related information are provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_28_3_M.txt) in txt format. Cohort A related information is provided as two files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_A1-2).

TabS2:

One file in CSV format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_28_4.csv). All further related information are provided as one meta-data-file (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_28_4_M.txt) in txt format. Cohort B related information is provided as three files in pdf- format (310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_M_B1-3).

TabS3:

One file in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_3.csv). Detailed description of the LC-MS/MS method is provided as pdf-Formate (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_2.pdf) All further experiment related information provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_26_3_M.txt) in txt format and two files containing further related information (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_A1-2 .pdf) PDF format.

Tab S4:

Three file in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_5.csv, 310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_26_1-2.csv). Detailed description of the LC-MS/MS methods is provided two files pdf-Format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_2-3.pdf) All further experiment related information provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_26_4_M.txt) in txt format. Cohort related information is provided as two files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_A1-2. Statistical analysis is provided as R-File (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_26_M_1.R).

TabS5:

One files in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_4.csv One meta data file as pdf-format with detailed LC-MS/MS method description (214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_2.pdf ). All further related information are provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_26_5_M.txt) in txt format. Cohort B related information is provided as three files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_B1-3).

TabS6:

Four files in CSV format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_4-5.csv 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_28_4.csv; 310030-214978_10.1111_bph.16251_CGC_Human_Biomarker_26_1.csv). One meta data file as pdf-format with detailed LC-MS/MS method description (214978_10.1111_BPH.16251_CGC_Human_Biomarker_4_M_2-3.pdf ). All further related information are provided as one meta-data-file (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_26_6_M.txt) in txt format. Cohort B related information is provided as three files in pdf- format (310030-214978_10.1111_BPH.16251_CGC_Human_Biomarker_M_B1-3).
H
Replication Data for: Responsiveness of decision-makers to stakeholder...
dataverse.harvard.edu
dataone.org
+1more
Updated May 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Yuxuan Lei (2023). Replication Data for: Responsiveness of decision-makers to stakeholder preferences in the European Union legislative process [Dataset]. http://doi.org/10.7910/DVN/RH5H3H
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/RH5H3H
Dataset updated
May 11, 2023
Dataset provided by
Harvard Dataverse
Authors
Yuxuan Lei
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
European Union
Description
This dataset contains original quantitative datafiles, analysis data, a codebook, R scripts, syntax for replication, the original output from R studio and figures from a statistical program. The analyses can be found in Chapter 5 of my PhD dissertation, i.e., ‘Political Factors Affecting the EU Legislative Decision-Making Speed’. The data supporting the findings of this study are accessible and replicable. Restrictions apply to the availability of these data, which were used under license for this study. The datafiles include: File name of R script: Chapter 5 script.R File name of syntax: Syntax for replication 5.0.docx File name of the original output from R studio: The original output 5.0.pdf File name of code book: Codebook 5.0.txt File name of the analysis data: data5.0.xlsx File name of the dataset: Original quantitative data for Chapter 5.xlsx File name of the dataset: Codebook of policy responsiveness.pdf File name of figures: Chapter 5 Figures.zip Data analysis software: R studio R version 4.1.0 (2021-05-18) -- "Camp Pontanezen" Copyright (C) 2021 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin17.0 (64-bit)
There are five documents included. The first PDF document contains the data...
figshare.com
pdf
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hengzhi Hu (2025). There are five documents included. The first PDF document contains the data analysis scripts and the analysis statement from the educational authority that regulates the research. Each page in this document is stamped for verification purposes. The second TXT document contains the data analysis scripts obtained from R program. The script is exactly the same as the one included in the first document. The third PDF document contains the official statement on the data access and confidentiality, with official stamp for verification. Due to ethical considerations and the regulation of local educational policies, the raw dataset for the study cannot be shared but can be accessed on-site with the educational authority. The last two documents are the speaking tasks used in the study, including a text presentation of task instructions and a test instruction recording. [Dataset]. http://doi.org/10.6084/m9.figshare.29390120.v1
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.29390120.v1
Dataset updated
Jun 24, 2025
Dataset provided by
Figsharehttp://figshare.com/
Authors
Hengzhi Hu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The data analysis transcript is specifically for the study titled "What Shapes Automated Ratings in Computer-Based English Speaking Tests? Perspectives from Analytic Complexity, Accuracy, Fluency, and Pronunciation Indices". This study adopts a quantitative, correlational research design to investigate the extent to which various linguistic features—namely Complexity, Accuracy, Fluency, and Pronunciation (CAFP)—predict Automated Ratings (ARs) in China’s Computer-Based English Speaking Test (CBEST) administered during Zhongkao. The aim is to uncover how these linguistic indices influence machine-generated scores and to evaluate the validity and fairness of automated assessment systems in high-stakes educational contexts.The CBEST format used in this study includes three task types: Reading-Aloud, Communicative Question & Answer, and Response to a Topic. These tasks are scored using an integrated system developed by iFlytek, which combines automatic speech recognition (ASR), deep learning models, and benchmarked manual expert evaluation. The assessment model has been officially recognized and is widely adopted in Chinese provinces for junior secondary school students.
d
Data from: Data and Code for: “How is your thesis going?” – Ph.D. students’...
demo-b2find.dkrz.de
Updated Mar 20, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2023). Data and Code for: “How is your thesis going?” – Ph.D. students’ perspectives on mental health and stress in academia. [Dataset]. http://demo-b2find.dkrz.de/dataset/628885cb-dcba-576c-a688-ca591af54985
Explore at:
Dataset updated
Mar 20, 2023
Description
Data and Code for the study [“How is your thesis going?” – Ph.D. students’ perspectives on mental health and stress in academia] by Julian Friedrich, Anna Bareis, Moritz Bross, Zoé Bürger, Álvaro Cortés Rodríguez, Nina Effenberger, Markus Kleinhansl, Fabienne Kremer, Cornelius Schröder. See also preprint: https://psyarxiv.com/uq9w5/ Check readme file: - Data (quantitative) (csv) - Date (open questions) (csv) - Data analysis (R code, pdf) - Notebooks (html codebook/ R markdown) - Preprocessing (R code) - Questionnaires (pdf)
e
Online survey data for the 2017 Aesthetic value project (NESP TWQ 3.2.3,...
catalogue.eatlas.org.au
Updated Nov 22, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Institute of Marine Science (AIMS) (2019). Online survey data for the 2017 Aesthetic value project (NESP TWQ 3.2.3, Griffith Institute for Tourism Research) [Dataset]. https://catalogue.eatlas.org.au/geonetwork/srv/api/records/595f79c7-b553-4aab-9ad8-42c092508f81
Explore at:
www:link-1.0-http--downloaddata, www:link-1.0-http--relatedAvailable download formats
Dataset updated
Nov 22, 2019
Dataset provided by
Australian Institute of Marine Science (AIMS)
Time period covered
Jan 28, 2017 - Jan 28, 2018
Description
This dataset consists of three data folders including all related documents of the online survey conducted within the NESP 3.2.3 project (Tropical Water Quality Hub) and a survey format document representing how the survey was designed. Apart from participants’ demographic information, the survey consists of three sections: conjoint analysis, picture rating and open question. Correspondent outcome of these three sections are downloaded from Qualtrics website and used for three different data analysis processes.

Related data to the first section “conjoint analysis” is saved in the Conjoint analysis folder which contains two sub-folders. The first one includes a plan file of SAV. Format representing the design suggestion by SPSS orthogonal analysis for testing beauty factors and 9 photoshoped pictures used in the survey. The second (i.e. Final results) contains 1 SAV. file named “data1” which is the imported results of conjoint analysis section in SPSS, 1 SPS. file named “Syntax1” representing the code used to run conjoint analysis, 2 SAV. files as the output of conjoint analysis by SPSS, and 1 SPV file named “Final output” showing results of further data analysis by SPSS on the basis of utility and importance data.

Related data to the second section “Picture rating” is saved into Picture rating folder including two subfolders. One subfolder contains 2500 pictures of Great Barrier Reef used in the rating survey section. These pictures are organised by named and stored in two folders named as “Survey Part 1” and “Survey Part 2” which are correspondent with two parts of the rating survey sections. The other subfolder “Rating results” consist of one XLSX. file representing survey results downloaded from Qualtric website.

Finally, related data to the open question is saved in “Open question” folder. It contains one csv. file and one PDF. file recording participants’ answers to the open question as well as one PNG. file representing a screenshot of Leximancer analysis outcome.

Methods: This dataset resulted from the input and output of an online survey regarding how people assess the beauty of Great Barrier Reef. This survey was designed for multiple purposes including three main sections: (1) conjoint analysis (ranking 9 photoshopped pictures to determine the relative importance weights of beauty attributes), (2) picture rating (2500 pictures to be rated) and (3) open question on the factors that makes a picture of the Great Barrier Reef beautiful in participants’ opinion (determining beauty factors from tourist perspective). Pictures used in this survey were downloaded from public sources such as websites of the Tourism and Events Queensland and Tropical Tourism North Queensland as well as tourist sharing sources (i.e. Flickr). Flickr pictures were downloaded using the key words “Great Barrier Reef”. About 10,000 pictures were downloaded in August and September 2017. 2,500 pictures were then selected based on several research criteria: (1) underwater pictures of GBR, (2) without humans, (3) viewed from 1-2 metres from objects and (4) of high resolution.

The survey was created on Qualtrics website and launched on 4th October 2017 using Qualtrics survey service. Each participant rated 50 pictures randomly selected from the pool of 2500 survey pictures. 772 survey completions were recorded and 705 questionnaires were eligible for data analysis after filtering unqualified questionnaires. Conjoint analysis data was imported to IBM SPSS using SAV. format and the output was saved using SPV. format. Automatic aesthetic rating of 2500 Great Barrier Reef pictures –all these pictures are rated (1 – 10 scale) by at least 10 participants and this dataset was saved in a XLSX. file which is used to train and test an Artificial Intelligence (AI)-based system recognising and assessing the beauty of natural scenes. Answers of the open-question were saved in a XLSX. file and a PDF. file to be employed for theme analysis by Leximancer software.

Further information can be found in the following publication: Becken, S., Connolly R., Stantic B., Scott N., Mandal R., Le D., (2018), Monitoring aesthetic value of the Great Barrier Reef by using innovative technologies and artificial intelligence, Griffith Institute for Tourism Research Report No 15.

Format: The Online survey dataset includes one PDF file representing the survey format with all sections and questions. It also contains three subfolders, each has multiple files. The subfolder of Conjoint analysis contains an image of the 9 JPG. Pictures, 1 SAV. format file for the Orthoplan subroutine outcome and 5 outcome documents (i.e. 3 SAV. files, 1 SPS. file, 1 SPV. file). The subfolder of Picture rating contains a capture of the 2500 pictures used in the survey, 1 excel file for rating results. The subfolder of Open question includes 1 CSV. file, 1 PDF. file representing participants’ answers and one PNG. file for the analysis outcome.

Data Dictionary:

Card 1: Picture design option number 1 suggested by SPSS orthogonal analysis. Importance value: The relative importance weight of each beauty attribute calculated by SPSS conjoint analysis. Utility: Score reflecting influential valence and degree of each beauty attribute on beauty score. Syntax: Code used to run conjoint analysis by SPSS Leximancer: Specialised software for qualitative data analysis. Concept map: A map showing the relationship between concepts identified Q1_1: Beauty score of the picture Q1_1 by the correspondent participant (i.e. survey part 1) Q2.1_1: Beauty score of the picture Q2.1_1 by the correspondent participant (i.e. survey part 2) Conjoint _1: Ranking of the picture 1 designed for conjoint analysis by the correspondent participant

References: Becken, S., Connolly R., Stantic B., Scott N., Mandal R., Le D., (2018), Monitoring aesthetic value of the Great Barrier Reef by using innovative technologies and artificial intelligence, Griffith Institute for Tourism Research Report No 15.

Data Location:

This dataset is filed in the eAtlas enduring data repository at: data esp3\3.2.3_Aesthetic-value-GBR

Facebook

Twitter

Click to copy link

Link copied

Cite

Mello, Patrick A. (2023). R Manual for QCA [Dataset]. http://doi.org/10.7910/DVN/KYF7VJ

Data from: R Manual for QCA

Explore at:

2 scholarly articles cite this dataset (View in Google Scholar)

Unique identifier

https://doi.org/10.7910/DVN/KYF7VJ

Dataset updated

Nov 17, 2023

Dataset provided by

Harvard Dataverse

Authors

Mello, Patrick A.

Description

The R Manual for QCA entails a PDF file that describes all the steps and code needed to prepare and conduct a Qualitative Comparative Analysis (QCA) study in R. This is complemented by an R Script that can be customized as needed. The dataset further includes two files with sample data, for the set-theoretic analysis and the visualization of QCA results. The R Manual for QCA is the online appendix to "Qualitative Comparative Analysis: An Introduction to Research Design and Application", Georgetown University Press, 2021.

Clear search

Close search

Google apps

Main menu

Data from: R Manual for QCA

Crime Data Analysis

DataSheet1_ALASCA: An R package for longitudinal and cross-sectional...

Healthcare Device Data Analysis with R

Context

Inflation- Unemployment Data & Analysis Codes (R)

Replication Data for: Revisiting 'The Rise and Decline' in a Population of...

Health and Retirement Study (HRS)

Supplemental R Scripts and output files for Designing, understanding and...

Assessing the impact of hints in learning formal specification: Research...

Data from Acoustic Primer Exercises: A Tutorial for Landscape Ecologists

Bike Sharing Data Analysis with R

96 wells fluorescence reading and R code statistic for analysis

Data from: Optimized SMRT-UMI protocol produces highly accurate sequence...

Data supporting the Master thesis "Monitoring von Open Data Praktiken -...

Supplement 1. R code and instructions to implement coincidence rate method...

Identification of a human blood biomarker of pharmacological...

Replication Data for: Responsiveness of decision-makers to stakeholder...

There are five documents included. The first PDF document contains the data...

Data from: Data and Code for: “How is your thesis going?” – Ph.D. students’...

Online survey data for the 2017 Aesthetic value project (NESP TWQ 3.2.3,...

Data from: R Manual for QCA