Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Facebook
TwitterThe complexity of contexts and varied purposes for which biome donation are requested is unknown in South Africa. The aim of this study was to provide strategic data towards actualisation of whether a gastrointestinal (GIT) stool donor bank may be established as a collaborative between Western Cape Blood Services (WCBS) and the University of Cape Town (UCT).We designed a cross-sectional, questionnaire-based survey to determine willingness of WCBS blood donors to donate stool specimens for microbiome biobanking. The prospective observational pilot study was conducted between 1 June 2022 and 1 July 2022 at three WCBS donation centres in Cape Town, South Africa. Anonymous blood donors who met the inclusion criteria were provided with infographics on stool donation and a stool collection kit. Anonymised demographic and interview data was aggregated for descriptive purposes, and for statistical analysis.Analysis of responses from 209/231 blood donors demonstrated in a logistic regression model that compensation (p = 3.139e-05) and ' societal benefit outweighs inconvenience’ beliefs (p = 7.751e-05) were covariates significantly associated with willingness to donate stool. Age was borderline significant at a 5% level (p = 0.0556). Most willing stool donors indicated that donating stool samples would not affect blood donations (140/157, 90%). Factors decreasing willingness to donate were stool collection being unpleasant or embarrassing.The survey provides strategic data for the WCBS and UCT towards establishment of a stool bank and provided an understanding of the underlying determinants governing participants decision process with regards to becoming potential donors.
Facebook
TwitterThis dataset was created by Rajdeep Kaur Bajwa
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R, a programming language, is an attractive tool for data visualization because it is free and open source. However, learning R can be intimidating and cumbersome for many. In this report, we introduce an R package called “smplot” for easy and elegant data visualization. The R package “smplot” generates graphs with defaults that are visually pleasing and informative. Although it requires basic knowledge of R and ggplot2, it significantly simplifies the process of plotting a bar graph, a violin plot, a correlation plot, a slope chart, a Bland-Altman plot and a raincloud plot. The aesthetics of the plots generated from the package are elegant, highly customisable and adhere to important practices of data visualization. The functions from smplot can be used in a modular fashion, thereby allowing the user to further customise the aesthetics. The smplot package is open source under the MIT license and available on Github (https://github.com/smin95/smplot), where updates will be posted. All the example figures in this report are reproducible and the codes and data are provided for the reader in a separate online guide (https://smin95.github.io/dataviz/).
Facebook
TwitterThe R Manual for QCA entails a PDF file that describes all the steps and code needed to prepare and conduct a Qualitative Comparative Analysis (QCA) study in R. This is complemented by an R Script that can be customized as needed. The dataset further includes two files with sample data, for the set-theoretic analysis and the visualization of QCA results. The R Manual for QCA is the online appendix to "Qualitative Comparative Analysis: An Introduction to Research Design and Application", Georgetown University Press, 2021.
Facebook
TwitterR scripts to re-analyze data published in Melo et al., 2020. We recommend using the R studio interface to run the code in these scripts. The script already contains the data table; therefore, there is no need to set the data tables in the script. Data table files are only available for better visualization between wide and long formats. Reference: Melo, M. B.; Favaro, V. M.; Oliveira, M. G. M (2020). The dorsal subiculum is required for the contextual fear consolidation in rats. Behavioural Brain Research. 390:112661. Doi: https://doi.org/10.1016/j.bbr.2020.112661.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset was created by Nargis Karimova
Released under CC0: Public Domain
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
R scripts in this fileset are those used in the PLOS ONE publication "A snapshot of translational research funded by the National Institutes of Health (NIH): A case study using behavioral and social science research awards and Clinical and Translational Science Awards funded publications." The article can be accessed here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0196545This consists of all R scripts used for data cleaning, data manipulation, and statistical analysis used in the publication.There are eleven files in total:1. "Step1a.bBSSR.format.grants.and.publications.data.R" combines all bBSSR 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 2. "Step1b.BSSR.format.grants.and.publications.data.R" combines all BSSR-only 2008-2014 grant award data and associated publications downloaded from NIH Reporter. 3. "Step2a.bBSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated bBSSR publication data.4. "Step2b.BSSR.get.pubdates.transl.and.all.grants.R" queries PubMed and downloads associated BSSR-only publication data.5. "Step3.summary.stats.R" performs summary statistics6. "Step4.time.to.first.publication.R" performs time to first publication analysis.7. "Step5.time.to.citation.analysis.R" performs time to first citation and time to overall citation analyses.8. "Step6.combine.NIH.iCite.data.R" combines NIH iCite citation data.9. "Step7.iCite.data.analysis.R" performs citation analysis on combined iCite data.10. "Step8.MeSH.descriptors.R" queries PubMed and pulls down all MeSH descriptors for all publications11. "Step9.CTSA.publications.R" compares the percent of translational publications among bBSSR, BSSR-only, and CTSA publications.
Facebook
TwitterThe Cyclistic Data Analytics Capstone Project delves into the realm of urban transportation, focusing on bike-sharing systems. Cyclistic, a fictional Chicago-based bike-share company, provides the backdrop for analyzing trends and using data analysis to help guide in converting more members towards the annual subscription.
Primary data is sourced from Cyclistic's own trip records, offering insights into ride patterns, user behavior and impact of seasonality.
This project was done in completion of the Google Data Analysis course on Coursera. It is a fantastic introduction to the world of data analysis and implements cleaning, transformation, analysis and visualization.. The project revolves around a real-world challenge facing a bike-sharing company in Chicago. By leveraging data analytics, it aims to optimize operations, growth in the compancy and sustained marketshare in the industry.
In summary, the Cyclistic Data Analytics Capstone Project showcases the transformative potential of data analytics in shaping sustainable and efficient urban transportation systems.
Facebook
TwitterThis is the R code used to run the statistical analysis on the collected data. (TXT)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover how AI code interpreters are revolutionizing data visualization, reducing chart creation time from 20 to 5 minutes while simplifying complex statistical analysis.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This artifact accompanies the SEET@ICSE article "Assessing the impact of hints in learning formal specification", which reports on a user study to investigate the impact of different types of automated hints while learning a formal specification language, both in terms of immediate performance and learning retention, but also in the emotional response of the students. This research artifact provides all the material required to replicate this study (except for the proprietary questionnaires passed to assess the emotional response and user experience), as well as the collected data and data analysis scripts used for the discussion in the paper.
Dataset
The artifact contains the resources described below.
Experiment resources
The resources needed for replicating the experiment, namely in directory experiment:
alloy_sheet_pt.pdf: the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment. The sheet was passed in Portuguese due to the population of the experiment.
alloy_sheet_en.pdf: a version the 1-page Alloy sheet that participants had access to during the 2 sessions of the experiment translated into English.
docker-compose.yml: a Docker Compose configuration file to launch Alloy4Fun populated with the tasks in directory data/experiment for the 2 sessions of the experiment.
api and meteor: directories with source files for building and launching the Alloy4Fun platform for the study.
Experiment data
The task database used in our application of the experiment, namely in directory data/experiment:
Model.json, Instance.json, and Link.json: JSON files with to populate Alloy4Fun with the tasks for the 2 sessions of the experiment.
identifiers.txt: the list of all (104) available participant identifiers that can participate in the experiment.
Collected data
Data collected in the application of the experiment as a simple one-factor randomised experiment in 2 sessions involving 85 undergraduate students majoring in CSE. The experiment was validated by the Ethics Committee for Research in Social and Human Sciences of the Ethics Council of the University of Minho, where the experiment took place. Data is shared the shape of JSON and CSV files with a header row, namely in directory data/results:
data_sessions.json: data collected from task-solving in the 2 sessions of the experiment, used to calculate variables productivity (PROD1 and PROD2, between 0 and 12 solved tasks) and efficiency (EFF1 and EFF2, between 0 and 1).
data_socio.csv: data collected from socio-demographic questionnaire in the 1st session of the experiment, namely:
participant identification: participant's unique identifier (ID);
socio-demographic information: participant's age (AGE), sex (SEX, 1 through 4 for female, male, prefer not to disclosure, and other, respectively), and average academic grade (GRADE, from 0 to 20, NA denotes preference to not disclosure).
data_emo.csv: detailed data collected from the emotional questionnaire in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID) and the assigned treatment (column HINT, either N, L, E or D);
detailed emotional response data: the differential in the 5-point Likert scale for each of the 14 measured emotions in the 2 sessions, ranging from -5 to -1 if decreased, 0 if maintained, from 1 to 5 if increased, or NA denoting failure to submit the questionnaire. Half of the emotions are positive (Admiration1 and Admiration2, Desire1 and Desire2, Hope1 and Hope2, Fascination1 and Fascination2, Joy1 and Joy2, Satisfaction1 and Satisfaction2, and Pride1 and Pride2), and half are negative (Anger1 and Anger2, Boredom1 and Boredom2, Contempt1 and Contempt2, Disgust1 and Disgust2, Fear1 and Fear2, Sadness1 and Sadness2, and Shame1 and Shame2). This detailed data was used to compute the aggregate data in data_emo_aggregate.csv and in the detailed discussion in Section 6 of the paper.
data_umux.csv: data collected from the user experience questionnaires in the 2 sessions of the experiment, namely:
participant identification: participant's unique identifier (ID);
user experience data: summarised user experience data from the UMUX surveys (UMUX1 and UMUX2, as a usability metric ranging from 0 to 100).
participants.txt: the list of participant identifiers that have registered for the experiment.
Analysis scripts
The analysis scripts required to replicate the analysis of the results of the experiment as reported in the paper, namely in directory analysis:
analysis.r: An R script to analyse the data in the provided CSV files; each performed analysis is documented within the file itself.
requirements.r: An R script to install the required libraries for the analysis script.
normalize_task.r: A Python script to normalize the task JSON data from file data_sessions.json into the CSV format required by the analysis script.
normalize_emo.r: A Python script to compute the aggregate emotional response in the CSV format required by the analysis script from the detailed emotional response data in the CSV format of data_emo.csv.
Dockerfile: Docker script to automate the analysis script from the collected data.
Setup
To replicate the experiment and the analysis of the results, only Docker is required.
If you wish to manually replicate the experiment and collect your own data, you'll need to install:
A modified version of the Alloy4Fun platform, which is built in the Meteor web framework. This version of Alloy4Fun is publicly available in branch study of its repository at https://github.com/haslab/Alloy4Fun/tree/study.
If you wish to manually replicate the analysis of the data collected in our experiment, you'll need to install:
Python to manipulate the JSON data collected in the experiment. Python is freely available for download at https://www.python.org/downloads/, with distributions for most platforms.
R software for the analysis scripts. R is freely available for download at https://cran.r-project.org/mirrors.html, with binary distributions available for Windows, Linux and Mac.
Usage
Experiment replication
This section describes how to replicate our user study experiment, and collect data about how different hints impact the performance of participants.
To launch the Alloy4Fun platform populated with tasks for each session, just run the following commands from the root directory of the artifact. The Meteor server may take a few minutes to launch, wait for the "Started your app" message to show.
cd experimentdocker-compose up
This will launch Alloy4Fun at http://localhost:3000. The tasks are accessed through permalinks assigned to each participant. The experiment allows for up to 104 participants, and the list of available identifiers is given in file identifiers.txt. The group of each participant is determined by the last character of the identifier, either N, L, E or D. The task database can be consulted in directory data/experiment, in Alloy4Fun JSON files.
In the 1st session, each participant was given one permalink that gives access to 12 sequential tasks. The permalink is simply the participant's identifier, so participant 0CAN would just access http://localhost:3000/0CAN. The next task is available after a correct submission to the current task or when a time-out occurs (5mins). Each participant was assigned to a different treatment group, so depending on the permalink different kinds of hints are provided. Below are 4 permalinks, each for each hint group:
Group N (no hints): http://localhost:3000/0CAN
Group L (error locations): http://localhost:3000/CA0L
Group E (counter-example): http://localhost:3000/350E
Group D (error description): http://localhost:3000/27AD
In the 2nd session, likewise the 1st session, each permalink gave access to 12 sequential tasks, and the next task is available after a correct submission or a time-out (5mins). The permalink is constructed by prepending the participant's identifier with P-. So participant 0CAN would just access http://localhost:3000/P-0CAN. In the 2nd sessions all participants were expected to solve the tasks without any hints provided, so the permalinks from different groups are undifferentiated.
Before the 1st session the participants should answer the socio-demographic questionnaire, that should ask the following information: unique identifier, age, sex, familiarity with the Alloy language, and average academic grade.
Before and after both sessions the participants should answer the standard PrEmo 2 questionnaire. PrEmo 2 is published under an Attribution-NonCommercial-NoDerivatives 4.0 International Creative Commons licence (CC BY-NC-ND 4.0). This means that you are free to use the tool for non-commercial purposes as long as you give appropriate credit, provide a link to the license, and do not modify the original material. The original material, namely the depictions of the diferent emotions, can be downloaded from https://diopd.org/premo/. The questionnaire should ask for the unique user identifier, and for the attachment with each of the depicted 14 emotions, expressed in a 5-point Likert scale.
After both sessions the participants should also answer the standard UMUX questionnaire. This questionnaire can be used freely, and should ask for the user unique identifier and answers for the standard 4 questions in a 7-point Likert scale. For information about the questions, how to implement the questionnaire, and how to compute the usability metric ranging from 0 to 100 score from the answers, please see the original paper:
Kraig Finstad. 2010. The usability metric for user experience. Interacting with computers 22, 5 (2010), 323–327.
Analysis of other applications of the experiment
This section describes how to replicate the analysis of the data collected in an application of the experiment described in Experiment replication.
The analysis script expects data in 4 CSV files,
Facebook
TwitterSupplementary Information:
FENO_MultipleCO_basic.csv (Raw Study Data) diagmeta.R (R functions for running the Multiple thresholds model) example.R (R code for running the model with the data)
Facebook
TwitterReproduces all analyses and plots using the dataset in S3 File. (RMD)
Facebook
TwitterThis child page contains a zipped folder which contains all items necessary to run trend models and produce results published in U.S. Geological Scientific Investigations Report 2021–XXXX [Tatge, W.S., Nustad, R.A., and Galloway, J.M., 2021, Evaluation of Salinity and Nutrient Conditions in the Heart River Basin, North Dakota, 1970-2020: U.S. Geological Survey Scientific Investigations Report 2021-XXXX, XX p.]. To run the R-QWTREND program in R 6 files are required and each is included in this child page: prepQWdataV4.txt, runQWmodelV4XXUEP.txt, plotQWtrendV4XXUEP.txt, qwtrend2018v4.exe, salflibc.dll, and StartQWTrendV4.R (Vecchia and Nustad, 2020). The folder contains: six items required to run the R–QWTREND trend analysis tool; a readme.txt file; a flowtrendData.RData file; an allsiteinfo.table.csv file, a folder called "scripts", and a folder called "waterqualitydata". The "scripts" folder contains the scripts that can be used to reproduce the results found in the USGS Scientific Investigations Report referenced above. The "waterqualitydata" folder contains .csv files with the naming convention of site_ions or site_nuts for major ions and nutrients constituents and contains machine readable files with the water-quality data used for the trend analysis at each site. R–QWTREND is a software package for analyzing trends in stream-water quality. The package is a collection of functions written in R (R Development Core Team, 2019), an open source language and a general environment for statistical computing and graphics. The following system requirements are necessary for using R–QWTREND: • Windows 10 operating system • R (version 3.4 or later; 64 bit recommended) • RStudio (version 1.1.456 or later). An accompanying report (Vecchia and Nustad, 2020) serves as the formal documentation for R–QWTREND. Vecchia, A.V., and Nustad, R.A., 2020, Time-series model, statistical methods, and software documentation for R–QWTREND—An R package for analyzing trends in stream-water quality: U.S. Geological Survey Open-File Report 2020–1014, 51 p., https://doi.org/10.3133/ofr20201014 R Development Core Team, 2019, R—A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, accessed December 7, 2020, at https://www.r-project.org.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Learn 6 essential cohort analysis reports to track SaaS growth, revenue trends, and customer churn. Data-driven insights with code examples for startup founders.
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Discover how Apache Arrow unifies fragmented data systems across tech companies, enabling faster analysis and insights for modern data architectures.
Facebook
Twitterhttp://opensource.org/licenses/BSD-3-Clausehttp://opensource.org/licenses/BSD-3-Clause
Data analysis is one of the most relevant steps of aquatic benthic environmental monitoring and research studies, and should be a fundamental consideration in both the planning (i.e., defining appropriate sampling design strategies) and implementation phases (application of appropriate standardized sampling procedures). A common objective of these studies is to identify relationships between environmental stressors and benthic bioindicator metrics. However, assessing these relationships is a complex process. Multivariate regression model adjustment coupled with forward and backward model selection routines is an appropriate complementary statistical analysis tool to test for the existence of statistically significant associations between a non-autocorrelated biological response and each variable within a group of environmental covariates included in a model. With this in mind, we developed SDesti, a user-friendly R package to analyze benthos data (number of individuals, biomass, chlorophyll concentration, or biological indices, excluding beta diversity metrics). SDesti contains four user accessible functions. AnalysisDescriptives() and Estimation() give information on the quality, homogeneity and representativeness of the data for one sampling campaign for one site. TimeLineAnalysisDescriptives() performs the descriptive analysis that usually precedes the adjustment of a regression model. TimeLineAnalysis() automatically adjusts an adequate regression model (linear, Poisson, quasipoisson, or negative binomial) and also returns the necessary measures and graphics to evaluate the quality of the adjustment and verify the model assumptions. SDesti greatly simplifies the process of data analysis and can be easily used by non-statisticians. The analytical package includes a complete manual that provides detailed information: on the data structure requirements, on the variable nomenclature rules and program operating procedures, on the data analysis (complemented with examples) and on the interpretation of the results (type ??SDesti on R console). SDesti eliminates redundancy, reduces human error and, coupled with a suitable sampling design, standard sampling and sample treatment procedures, it contributes to improve the consistency of the results in environmental studies. SDesti binary for windows users and installation instructions can be found below. Compiled for R 4.3.2 version. Refer to the program PDF manual for a detailed description of the data structures, functions, data analyses and interpretation of results. Type ??SDesti on R, or RStudio consoles and select the PDF file. Note: RStudio 2023.09.1 has a bug that delivers an error message when trying to open PDF vignettes (program manuals). Use R 4.3.2 console to open SDesti's PDF manual.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.