Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
The Laboratory Temperament Assessment Battery (Lab-TAB) is a standardized observational assessment tool that is used to study temperament dimensions in young children through a series of episodes that mimic everyday situations. In Generation R, four Lab-TAB tasks were conducted, namely Stranger Approach, Bubble Blowing, Jumping Spider (preschool version) and Puppet Game (locomotor version). The Jumping Spider episode is designed to elicit a startle/fear reaction based on an unexpected event, which is generally a predominant source of fear among preschool children. The Puppet Game episode measures enjoyment in response to social stimulation. In the Popping Bubbles episode, the child engages in the pleasurable activity of blowing and popping bubbles and plays with the experimenter. The Stranger Approach episode measures child social fearfulness. The child's behavior was rated on a scale, higher scores indicating higher skills on the measurement.
nyu-dice-lab/allenai_WildChat-1M-Full-CohereForAI_c4ai-command-r-plus-08-2024 dataset hosted on Hugging Face and contributed by the HF Datasets community
The data presented in this data file is a product of a journal publication. The dataset contains PCB sorption concentrations on encapsulants, PCB concentrations in the air and in wipe samples, model simulation of the PCB concentration gradient in the source and encapsulant layers on exposed surfaces of encapsulants and in room air at different times, the ranking of encapsulants’ performance. This dataset is associated with the following publication: Liu , X., Z. Guo, K. Krebs , N. Roache, R. Stinson, J. Nardin, R. Pope, C. Mocka, and R. Logan. Laboratory evaluation of PCBs encapsulation method. Indoor and Built Environment. Sage Publications, THOUSAND OAKS, CA, USA, 25(6): 895-915, (2016).
Argumentative skills are crucial for any individual at the personal and professional levels. In recent decades, there has been an increasing concern about the weak undergraduates' skills and considerable difficulty in reworking and expressing one's own reflection on a topic. In turn, this has implications for being a critical thinker, able to express an original point of view. Tailored interventions in Higher Education could constitute a powerful approach to promote argumentative skills and extend these skills to professional and personal life. In this regard, argument maps (AM) could prove to be a valuable support to the visualization process of arguments. They don’t just create associations between concepts, but trace the logical relationships between different statements, allowing you to track the reasoning chain and understand it better. We conducted an experimental study to investigate how a path with AM could support students in increasing the level of text comprehension (CoT) competence, in terms of identifying the elements of an argumentative text, and critical thinking (CT), in terms of reconstructing meaning and building their own reflection.
Our preliminary descriptive analysis suggested the positivity of the role of AM in increasing students’ CoT and CT proficiency levels
This Zenodo record follows the full analysis process with R (https://cran.r-project.org/bin/windows/base/ ) composed of the following datasets and script:
1. Comprehension of Text and AMs Results - ExpAM.xlsx
2. Critical Thinking Results - CriThink.xlsx
3. Argumentative skills in Forum - ExpForum.xlsx
4. Selfassessment Results - Dataset_Quest.xlsx
5. Data for Correlation and Regression - Dataset_CorRegr.xlsx
6. Descriptive Statistics - Preliminary Analysis.R
7. Inferential Statistics - Correlation and Regression.R
Any comments or improvements are welcome!
Survival data across density and frequencyThese data were collected in the laboratory. They are the data used in Experiment 1. Details for this dataset and all datasets can be found in the README file.Kilgour_et_al_2018_Survival_Data.csvDifferences in aggression between two strains of Drosophila melanogasterThese data were collected in the laboratory and were used in Experiment 2 of the manuscript.Kilgour_et_al_2018_Strain_Differences_Aggro.csvSurvival data from Experiment 2These data were collected in the laboratory. They were used to assess negative frequency-dependent selection in the Experiment 2.Kilgour_et_al_2018_Strain_Survival_Aggro_Assay.csvFruit fly mass by StrainThese data were collected in the laboratory. Results to the associated analysis are found in the Supporting Information. They were used to assess any differences in mass between the strains across sex and time.Kilgour_et_al_2018_Strain_Differences_Mass.csvReproductive differences between strains of fruit fliesThese da...
The data presented in this data file is a product of a journal publication. The dataset contains PCB sorption concentrations to settled dust due to dust/air partition and PCB migration concentration due to dust/source partitioning. This dataset is associated with the following publication: Liu , X., Z. Guo, K. Krebs , D. Greenwell , N. Roache, A. Stinson, J. Nardin, and R. Pope. Laboratory study of PCB transport from primary sources to settled dust. CHEMOSPHERE. Elsevier Science Ltd, New York, NY, USA, 169: 62-69, (2016).
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Ratifying Executive Committee Resolution No. 001 S. 2006, Entitled Approving The Collective Negotiation Agreement Between The Association Of HLURB Employees For Advancement And Development And The HLURB
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Pathogen diversity resulting in quasispecies can enable persistence and adaptation to host defenses and therapies. However, accurate quasispecies characterization can be impeded by errors introduced during sample handling and sequencing which can require extensive optimizations to overcome. We present complete laboratory and bioinformatics workflows to overcome many of these hurdles. The Pacific Biosciences single molecule real-time platform was used to sequence PCR amplicons derived from cDNA templates tagged with universal molecular identifiers (SMRT-UMI). Optimized laboratory protocols were developed through extensive testing of different sample preparation conditions to minimize between-template recombination during PCR and the use of UMI allowed accurate template quantitation as well as removal of point mutations introduced during PCR and sequencing to produce a highly accurate consensus sequence from each template. Handling of the large datasets produced from SMRT-UMI sequencing was facilitated by a novel bioinformatic pipeline, Probabilistic Offspring Resolver for Primer IDs (PORPIDpipeline), that automatically filters and parses reads by sample, identifies and discards reads with UMIs likely created from PCR and sequencing errors, generates consensus sequences, checks for contamination within the dataset, and removes any sequence with evidence of PCR recombination or early cycle PCR errors, resulting in highly accurate sequence datasets. The optimized SMRT-UMI sequencing method presented here represents a highly adaptable and established starting point for accurate sequencing of diverse pathogens. These methods are illustrated through characterization of human immunodeficiency virus (HIV) quasispecies.
Methods
This serves as an overview of the analysis performed on PacBio sequence data that is summarized in Analysis Flowchart.pdf and was used as primary data for the paper by Westfall et al. "Optimized SMRT-UMI protocol produces highly accurate sequence datasets from diverse populations – application to HIV-1 quasispecies"
Five different PacBio sequencing datasets were used for this analysis: M027, M2199, M1567, M004, and M005
For the datasets which were indexed (M027, M2199), CCS reads from PacBio sequencing files and the chunked_demux_config files were used as input for the chunked_demux pipeline. Each config file lists the different Index primers added during PCR to each sample. The pipeline produces one fastq file for each Index primer combination in the config. For example, in dataset M027 there were 3–4 samples using each Index combination. The fastq files from each demultiplexed read set were moved to the sUMI_dUMI_comparison pipeline fastq folder for further demultiplexing by sample and consensus generation with that pipeline. More information about the chunked_demux pipeline can be found in the README.md file on GitHub.
The demultiplexed read collections from the chunked_demux pipeline or CCS read files from datasets which were not indexed (M1567, M004, M005) were each used as input for the sUMI_dUMI_comparison pipeline along with each dataset's config file. Each config file contains the primer sequences for each sample (including the sample ID block in the cDNA primer) and further demultiplexes the reads to prepare data tables summarizing all of the UMI sequences and counts for each family (tagged.tar.gz) as well as consensus sequences from each sUMI and rank 1 dUMI family (consensus.tar.gz). More information about the sUMI_dUMI_comparison pipeline can be found in the paper and the README.md file on GitHub.
The consensus.tar.gz and tagged.tar.gz files were moved from sUMI_dUMI_comparison pipeline directory on the server to the Pipeline_Outputs folder in this analysis directory for each dataset and appended with the dataset name (e.g. consensus_M027.tar.gz). Also in this analysis directory is a Sample_Info_Table.csv containing information about how each of the samples was prepared, such as purification methods and number of PCRs. There are also three other folders: Sequence_Analysis, Indentifying_Recombinant_Reads, and Figures. Each has an .Rmd
file with the same name inside which is used to collect, summarize, and analyze the data. All of these collections of code were written and executed in RStudio to track notes and summarize results.
Sequence_Analysis.Rmd
has instructions to decompress all of the consensus.tar.gz files, combine them, and create two fasta files, one with all sUMI and one with all dUMI sequences. Using these as input, two data tables were created, that summarize all sequences and read counts for each sample that pass various criteria. These are used to help create Table 2 and as input for Indentifying_Recombinant_Reads.Rmd
and Figures.Rmd
. Next, 2 fasta files containing all of the rank 1 dUMI sequences and the matching sUMI sequences were created. These were used as input for the python script compare_seqs.py which identifies any matched sequences that are different between sUMI and dUMI read collections. This information was also used to help create Table 2. Finally, to populate the table with the number of sequences and bases in each sequence subset of interest, different sequence collections were saved and viewed in the Geneious program.
To investigate the cause of sequences where the sUMI and dUMI sequences do not match, tagged.tar.gz was decompressed and for each family with discordant sUMI and dUMI sequences the reads from the UMI1_keeping directory were aligned using geneious. Reads from dUMI families failing the 0.7 filter were also aligned in Genious. The uncompressed tagged folder was then removed to save space. These read collections contain all of the reads in a UMI1 family and still include the UMI2 sequence. By examining the alignment and specifically the UMI2 sequences, the site of the discordance and its case were identified for each family as described in the paper. These alignments were saved as "Sequence Alignments.geneious". The counts of how many families were the result of PCR recombination were used in the body of the paper.
Using Identifying_Recombinant_Reads.Rmd
, the dUMI_ranked.csv file from each sample was extracted from all of the tagged.tar.gz files, combined and used as input to create a single dataset containing all UMI information from all samples. This file dUMI_df.csv was used as input for Figures.Rmd.
Figures.Rmd
used dUMI_df.csv, sequence_counts.csv, and read_counts.csv as input to create draft figures and then individual datasets for eachFigure. These were copied into Prism software to create the final figures for the paper.
Flight line reflectance measurements from the Portable Remote Imaging Spectrometer (PRISM) instrument aboard the Tempus Applied Solutions Gulfstream-IV (G-IV) aircraft, taken as part of the NASA COral Reef Airborne Laboratory (CORAL) Earth Venture Suborbital-2 (EVS-2) mission designed to provide an extensive, uniform picture of coral reef composition. The CORAL mission surveyed parts of the reefs surrounding the Mariana Islands, Palau, portions of the Great Barrier Reef, the main Hawaiian Islands, and the Florida Keys.
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Approving The Provincial Physical Framework Plan Of Camarines Norte
Summary
This dataset contains spectral reflectance values for laboratory mixtures that were created and analyzed at Arizona State University. These mixtures were made by combining red crystalline hematite, plagioclase, clinopyroxene (augite), and nontronite in varying abundances to understand how the spectral features are affected by variations in mineral abundances. Various files are described below.
Contents
SJacob_ASD_Mixture_Data.zip: full spectral reflectance data from the spectrometer at ASU for all mixtures discussed in the paper.
SJacob_Lab_Mixture_parameters.xlsx: an excel sheet documenting the abundances of each mineral in the various mixtures that were created. This file also includes band depth parameters that were calculated for each mixture and some corresponding plots.
SJacob_SupplementaryTables.xlsx: file contains two tables that list the Mastcam and ChemCam observations (with sequence IDs) that were used in the main paper and the associated sol on which the observation was taken.
Attribution
If you use this dataset, please cite this DOI: 10.5281/zenodo.3827971 as well as the paper below:
Jacob et al. (2020) Spectral, Compositional, and Physical Properties of the Upper Murray Formation and Vera Rubin Ridge, Gale Crater, Mars. In review at Journal of Geophysical Research: Planets.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Subjects. A total of 515 male Long-Evans rats (Rattus norvegicus) between the ages of 2 months and 24 months were used for this study. Rats did not begin water restriction and training until their weight exceeded 225 g. The majority of rats in this study were pair housed. Animal use procedures were approved by the Princeton University Institutional Animal Care and Use Committee (IACUC) and carried out in accordance with National Institutes of Health standards.
Behavior. Rats were trained to perform an auditory evidence-accumulation decision task, the Poisson Clicks Task. For each session, rats were placed in a behavioral training box that itself was located within a sound attenuation chamber with active ventilation. The behavioral training box has three conical nose ports and two speakers. Each nose port is equipped with an infrared beam to detect nose poke events and a white LED that can illuminate the port. The left and right ports are equipped with a sipper tube and a solenoid valve that dispenses distilled water from a gravity fed tank. The center port has an enlarged opening in the center to facilitate the rat’s ability to hold their nose in the port.
Trials are self initiated by the rat who is cued with a LED in the center port. When they are ready to perform a trial the rat pokes his nose into the center port and holds it there. After a variable delay of silence two streams of auditory clicks play from each of two speakers positioned above the left and right nose ports. In the “location” variant of the task, each speaker plays an independent stream of randomly timed white noise clicks. In this version of the task the first click is often a stereo click played simultaneously from both speakers. When the clicks stop playing the center LED is extinguished and the rat is free to withdraw from the center port. The rat is rewarded with a small drop of water if they poke their nose into the side port associated with the speaker that played the greater number of clicks. An alternate “frequency” version of the task all clicks are stereo and consist of a 3 ms pure tone, either low 6.5 kHz or high 14.2 kHz. Here the rat must decide if more high or low tone clicks were played. In the frequency version high and low clicks are never allowed to overlap, and therefore there is no equivalent of an initial stereo click as in the location task.
The Dataset. This dataset contains the behavioral data of 515 rats trained in the Brody Lab at Princeton University between 2009 and 2024 to perform the Poisson Clicks Task. Any work that uses this dataset should cite the manuscript in which it was published. Some of the labels in this dataset use the word bup, this means the same as click. This is the parsed dataset. It does not contain all trails the rats performed including violation trials and trials with stimulus durations longer than 1 s. It also does not identify where the day breaks occur in the data. The full dataset will be made available in the future. Each file is a matlab .mat file named for the rat that provided the data and contains a single variable, ratdata:
ratdata =
parsed: [1×1 struct]
parsed_frozen: [1×1 struct]
task_type: 'location'
The task type identifies which version of the Poisson Clicks Task the rat trained on, Location or Frequency. The parsed field contains all trail data from unique, i.e. non-frozen trials.
ratdata.parsed =
hh: [1 × n double]
bt: {1 × n cell}
nL: [1 × n double]
nR: [1 × n double]
sd: [1 × n double]
gr: [1 × n double]
ga: [1 × n double]
rg: [1 × n double]
Where n is the number of non-violation non-frozen trials the rat performed.
hh - is a vector identifying which trials the rat got correct: 0 if they got it wrong, 1 if they got it correct
bt - is a cell which contains information about the stimulus for each trial
nL - is a vector identifying how many left evidence clicks each trial contains
nR - is a vector identifying how many right evidence clicks each trial contains
sd - is a vector of the stimulus duration in seconds for each trial
gr - is a vector identifying which response the rat made: 0 if they responded left, 1 if they responded right
ga - is a vector of the gamma value used to generate the stimulus on each trial
rg - is a vector identifying how the reward was assigned on each trial: 0 if it was offered with the side that played more clicks, 1 if it was offered on the side with the higher generative Poisson click rate
For the rats in this dataset where reward followed the trial’s gamma value, 98.5% of the trials the reward was still offered on the side that played the greater number of clicks.
At a minimum bt is a structure that has two fields:
ratdata.parsed.bt{1} =
left: [1 x nL double]
right: [1 x nR double]
left - a vector of the left evidence click times in seconds relative to the stimulus onset
right - a vector of the right evidence click times in seconds relative to the stimulus onset
Where nL is the number of left evidence clicks and nR is the number of right evidence clicks. Additional fields can include:
real_T - the duration of the stimulus in seconds
bup_width - the duration of each click in milliseconds
base_freq - the lowest frequency used to make in each click sound in Hertz
bup_ramp - defines the taper that smoothes the edges of each click
first_bup_stereo - 1 if the first click is a stereo click, 0 if not
avoid_collisions - 1 if click trains are selected where clicks do not overlap, 0 if only Poisson statistics are used to select click times
seed - the pseudorandom list seed used to generate the stimulus
is_probe_trial - 1 if the trial is a "probe" trial. These typically are defined to have a specific stimulus duration, for example 1 second, such that this duration is oversampled. Probe trials exist to facilitate other experiments such as optogenetics, however no trials with optogenetic stimuli are included in this dataset
is_frozen - 1 if this is a frozen noise trial, 0 if it is not
tones - the specific frequencies used to construct the click in Hertz
In the location version of the task typical settings are:
bup_width: 3
base_freq: 2000
bup_ramp: 2
first_bup_stereo: 1
avoid_collisions: 0
tones: [2000 4000 8000 16000 32000]
Whereas in the frequency version of the Poisson Clicks Task typical settings that differ are:
base_freq: [6500 14200]
first_bup_stereo: 0
tones: [6500 14200]
The base frequency and tone fields identify the frequency of the tone used to identify left and right evidence clicks, here 6.5 kHz is the tone frequency for a left evidence click and 14.2 kHz is the tone frequency for a right evidence click. In the frequency task the first click is typically not a stereo click.
The parsed_frozen field contains similar information as the parsed field but for the frozen noise trials:
ratdata.parsed_frozen =
hh: [1 × n double]
bt: {1 × n cell}
nL: [1 × n double]
nR: [1 × n double]
sd: [1 × n double]
gr: [1 × n double]
ga: [1 × n double]
rg: [1 × n double]
seed: [1 × n double]
unique_seed: [1 × n_uf double]
gr_seed: [1 × n_uf double]
Where n is the number of frozen noise trials the rat performed and n_uf is the number of unique frozen noise trials performed.
seed - is a vector of the pseudorandom list seeds used to generate the stimulus for each trial
unique_seed - is a vector of the unique seeds used to generate the frozen noise trials
gr_seed - is a vector of the fraction of right responses the rat makes given each unique seed
The field names in this dataset are abbreviations:
hh - hit history, hit being a trial the subject got correct
bt - bup times, bup mean the same as click
nL - number of Left
nR - number of Right
sd - stimulus duration
gr - go right, i.e. did the subject respond right
ga - gamma
rg - reward gamma
Quantitative polymerase chain reaction (qPCR) has become a frequently used technique for quantifying enterococci in recreational surface waters, but there are several methodological options. Here we evaluated how three method permutations, type of mastermix, sample extract dilution and use of controls in results calculation, affect method reliability among multiple laboratories with respect to sample interference. Multiple samples from each of 22 sites representing an array of habitat types were analyzed using EPA Method 1611 and 1609 reagents with full strength and five-fold diluted extracts. The presence of interference was assessed three ways: using sample processing and PCR amplifications controls; consistency of results across extract dilutions; and relative recovery of target genes from spiked enterococci in water sample compared to control matrices with acceptable recovery defined as 50 to 200%. Method 1609, which is based on an environmental mastermix, was found to be superior to Method 1611, which is based on a universal mastermix. Method 1611 had over a 40% control assay failure rate with undiluted extracts and a 6% failure rate with diluted extracts. Method 1609 failed in only 11% and 3% of undiluted and diluted extracts analyses. Use of sample processing control assay results in the delta-delta Ct method for calculating relative target gene recoveries increased the number of acceptable recovery results. Delta-delta tended to bias recoveries from apparent partially inhibitory samples on the high side which could help in avoiding potential underestimates of enterococci - an important consideration in a public health context. Control assay and delta-delta recovery results were largely consistent across the range of habitats sampled, and among laboratories. The methodological option that best balanced acceptable estimated target gene recoveries with method sensitivity and avoidance of underestimated enterococci densities was Method 1609 without extract dilution and using the delta-delta calculation method. The applicability of this method can be extended by the analysis of diluted extracts to sites where interference is indicated but, particularly in these instances, should be confirmed by augmenting the control assays with analyses for target gene recoveries from spiked target organisms. This dataset is associated with the following publication: Haugland , R., S. Siefring , M. Varma , K. Oshima , M. Sivaganesan , Y. Cao, M. Raith, J. Griffith, S. Weisberg, R. Noble, A.D. Blackwood, J. Kinzelman, T. Anan'eva, R. Bushon, E. Stelzer, V. Harwood, K. Gordon, and C. Sinigalliano. Multi-laboratory survey of qPCR enterococci analysis method performance in U.S. coastal and inland surface waters. JOURNAL OF MICROBIOLOGICAL METHODS. Elsevier Science Ltd, New York, NY, USA, 123(1): 114-125, (2016).
** Inventory of pre-trained learning models of the Etalab AI Lab** The publication of the inventory of pre-trained machine learning models is part of the roadmap of the Ministry of Transformation and Public Service (see p. 25 of the downloadable document here). This dataset lists the different algorithms trained by the Lab IA to date as part of the development of its shared tools (more information on the dedicated page of the IA Lab). Details of what the inventory contains For each algorithm, the column “link_model_card” provides a link to access a description of the algorithm. We followed the pattern description frame presented in Margaret Mitchell & al’s “Model Cards for Model Reporting” paper (downloadable here). The column “link_depot_github” returns to the GitHub repository containing the code that led to the algorithm. The column “model_entraine_open” has the value “no” if the trained model is not opened and is “yes” if the driven model is opened. In the latter case, the link to the driven model is entered in the column “link_modele_entraine_si_pertinent”. The column “date_last_mise_a_day” indicates the date of last update of the template.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This record contains the lab study dataset and evaluation R source code from the paper "Sounds Good? Fast and Secure Contact Exchange in Groups" by Florentin Putz, Steffen Haesler, and Matthias Hollick in Proceedings of the ACM on Human-Computer Interaction (CSCW '24).
Abstract:
Trustworthy digital communication requires the secure exchange of contact information, but current approaches lack usability and scalability for larger groups of users. We evaluate the usability of two secure contact exchange systems: the current state of the art, SafeSlinger, and our newly designed protocol, PairSonic, which extends trust from physical encounters to spontaneous online communication. Our lab study (N=45) demonstrates PairSonic's superior usability, automating the tedious verification tasks from previous approaches via an acoustic out-of-band channel. Although participants significantly preferred our system, minimizing user effort surprisingly decreased the perceived security for some users, who associated security with complexity. We discuss user perceptions of the different protocol components and identify remaining usability barriers for CSCW application scenarios.
Dataset:
Our pseudonymous dataset contains usability, security, and preference scores, completion times, reported usage of nine types of social and collaborative tools, and seven demographic and control variables, for each of our 45 participants.
Analysis source code:
Our R Markdown source code includes the full reproducible code of our analysis. This code generates all statistical figures from our paper. The code can also be used to reproduce our quantitative results and tables.
Please refer to the README.md file and our paper for further details about the dataset and the lab study.
Acknowledgments:
This work has been funded by the LOEWE initiative (Hesse, Germany) within the emergenCITY center [LOEWE/1/12/519/03/05.001(0016)/72].
This dataset consists of Integrated Sounding System (ISS) sounding data from aboard the People's Republic of China Research Vessel (R/V) Kexue 1 which was operated by U.S. personnel. Soundings were taken four times daily (00, 06, 12, and 18 UTC) during participation in the IOP. UCAR/JOSS conducted no quality control on these data. This dataset includes pressure, temperature, dew point, relative humidity, wind speed, wind direction, and altitude taken at 10 second vertical intervals. Refer to the station README file for details. NOTE: This dataset has been updated (4/23/2002) with corrected data for the humidity measurement errors of the Vaisala RS80 radiosonde (Wang et al 2002). Please see the README for additional information.
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains raw and pre-processed EEG data from a mobile EEG study investigating the effects of cognitive task demands, motor demands, and environmental complexity on attentional processing (see below for experiment details).
All preprocessing and analysis code is deposited in the code
directory. The entire MATLAB pipeline can be reproduced by executing the run_pipeline.m
script. In order to run these scripts, you will need to ensure you have the required MATLAB toolboxes and R packages on your system. You will also need to adapt def_local.m
to specify local paths to MATLAB and EEGLAB. Descriptive statistics and mixed-effects models can be reproduced in R by running the stat_analysis.R
script.
See below for software details.
In addition to citing this dataset, please cite the original manuscript reporting data collection and experimental procedures.
For more information, see the dataset_description.json
file.
ODC Open Database License (ODbL). For more information, see the LICENCE
file.
Dataset is formatted according to the EEG-BIDS extension (Pernet et al., 2019) and the BIDS extension proposal for common electrophysiological derivatives (BEP021) v0.0.1, which can be found here:
Note that BEP021 is still a work in progress as of 2021-03-01.
Generally, you can find data in the .tsv files and descriptions in the accompanying .json files.
An important BIDS definition to consider is the "Inheritance Principle" (see 3.5 in the BIDS specification: http://bids.neuroimaging.io/bids_spec.pdf), which states:
Any metadata file (.json, .bvec, .tsv, etc.) may be defined at any directory level. The values from the top level are inherited by all lower levels unless they are overridden by a file at the lower level.
Forty-four healthy adults aged 18-40 performed an oddball task involving complex tone (piano and horn) stimuli in three settings: (1) sitting in a quiet room in the lab (LAB); (2) walking around a sports field (FIELD); (3) navigating a route through a university campus (CAMPUS).
Participants performed each environmental condition twice: once while attending to oddball stimuli (i.e. counting the number of presented deviant tones; COUNT), and once while disregarding or ignoring the tone stimuli (IGNORE).
EEG signals were recorded from 32 active electrodes using a Brain Vision LiveAmp 32 amplifier. See manuscript for further details.
MATLAB Version: 9.7.0.1319299 (R2019b) Update 5 MATLAB License Number: 678256 Operating System: Microsoft Windows 10 Enterprise Version 10.0 (Build 18363) Java Version: Java 1.8.0_202-b08 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
The following toolboxes/helper functions were also used:
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale: _LC_COLLATE=English_Australia.1252_, _LC_CTYPE=English_Australia.1252_, _LC_MONETARY=English_Australia.1252_, _LC_NUMERIC=C_ and _LC_TIME=English_Australia.1252_
attached base packages:
other attached packages:
loaded via a namespace (and not attached):
Piscine orthoreovirus genotype one (PRV-1) is the causative agent of heart and skeletal muscle inflammation (HSMI) in farmed Atlantic salmon (Salmo salar L.). The virus has also been found in Pacific salmonids in western North America, raising concerns about the risk to native salmon and trout. Here, we report the results of laboratory challenges using juvenile Chinook salmon, coho salmon, and rainbow trout injected with tissue homogenates from Atlantic salmon testing positive for PRV-1 or with control material. Fish were sampled at intervals to assess viral RNA transcript levels, hematocrit, erythrocytic inclusions, and histopathology. While PRV-1 replicated to high loads in all species, there was negligible mortality in any group. We observed a few erythrocytic inclusion bodies in fish from PRV-1 infected groups. At a few time points, hematocrits were significantly lower in the PRV-1 infected groups relative to controls but in no case was anemia noted. The most common histopathological finding was mild, focal myocarditis in both the non-infected controls and PRV-1 infected fish. All cardiac lesions were judged mild and none were consistent with those of HSMI. Together, these results suggest all three species are relatively susceptible to PRV-1 infection, but in no case did infection cause notable disease in these experiments.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.