100+ datasets found
  1. f

    Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...

    • frontiersin.figshare.com
    docx
    Updated Mar 22, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Frontiers
    Authors
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

  2. Statistical Data Analysis using R

    • figshare.com
    txt
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statistical Data Analysis using R [Dataset]. https://figshare.com/articles/dataset/Statistical_Data_Analysis_using_R/5501035
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    figshare
    Authors
    Samuel Barsanelli Costa
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    R Scripts contain statistical data analisys for streamflow and sediment data, including Flow Duration Curves, Double Mass Analysis, Nonlinear Regression Analysis for Suspended Sediment Rating Curves, Stationarity Tests and include several plots.

  3. Datasets used in the benchmarking study of MR methods

    • zenodo.org
    zip
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hu Xianghong; Hu Xianghong (2024). Datasets used in the benchmarking study of MR methods [Dataset]. http://doi.org/10.5281/zenodo.10929572
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Hu Xianghong; Hu Xianghong
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We conducted a benchmarking analysis of 16 summary-level data-based MR methods for causal inference with five real-world genetic datasets, focusing on three key aspects: type I error control, the accuracy of causal effect estimates, replicability, and power.

    The datasets used in the MR benchmarking study can be downloaded here:

    1. "dataset-GWASATLAS-negativecontrol.zip": the GWASATLAS dataset for evaluation of type I error control in confounding scenario (a): Population stratification
    2. "dataset-NealeLab-negativecontrol.zip": the Neale Lab dataset for evaluation of type I error control in confounding scenario (a): Population stratification;
    3. "dataset-PanUKBB-negativecontrol.zip": the Pan UKBB dataset for evaluation of type I error control in confounding scenario (a): Population stratification;
    4. "dataset-Pleiotropy-negativecontrol": the dataset used for evaluation of type I error control in confounding scenario (b): Pleiotropy;
    5. "dataset-familylevelconf-negativecontrol.zip": the dataset used for evaluation of type I error control in confounding scenario (c): Family-level confounders;
    6. "dataset_ukb-ukb.zip": the dataset used for evaluation of the accuracy of causal effect estimates;
    7. "dataset-LDL-CAD_clumped.zip": the dataset used for evaluation of replicability and power;

    Each of the datasets contains the following files:

    1. "Tested Trait pairs": the exposure-outcome trait pairs to be analyzed;
    2. "MRdat" refers to the summary statistics after performing IV selection (p-value < 5e-05) and PLINK LD clumping with a clumping window size of 1000kb and an r^2 threshold of 0.001.
    3. "bg_paras" are the estimated background parameters "Omega" and "C" which will be used for MR estimation in MR-APSS.

    Note:

    1. Supplemental Tables S1-S7.xlxs provide the download link for the original GWAS summary-level data for the traits used as exposures or outcomes.
    2. The formatted dataset after quality control can be accessible at our GitHub website (https://github.com/YangLabHKUST/MRbenchmarking).
    3. The details on quality control of GWAS summary statistics, formatting GWASs, and LD clumping for IV selection can be found on the MR-APSS software tutorial on the MR-APSS website (https://github.com/YangLabHKUST/MR-APSS).
    4. R code for running MR methods is also available at https://github.com/YangLabHKUST/MRbenchmarking.
  4. d

    Political Analysis Using R: Example Code and Data, Plus Data for Practice...

    • search.dataone.org
    • dataverse.harvard.edu
    Updated Nov 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Monogan, Jamie (2023). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
    Explore at:
    Dataset updated
    Nov 21, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Monogan, Jamie
    Description

    Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.

  5. o

    Data from: AMSM Summary Papers- (Chapter One- Introduction)

    • osf.io
    Updated Oct 16, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    R. Dahman (2018). AMSM Summary Papers- (Chapter One- Introduction) [Dataset]. https://osf.io/tkby3
    Explore at:
    Dataset updated
    Oct 16, 2018
    Dataset provided by
    Center For Open Science
    Authors
    R. Dahman
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    In the upcoming some 40 summary papers I will demonstrate a comprehensive view of Applied Multivariate Statistical Modeling. First, I will start with a thorough introduction of AMSM. Then, I will explain the univariate descriptive statistics, sampling distribution, estimation, in addition to hypothesis testing. After that, I will do a comprehensive review of multivariate descriptive statistics, the normal distribution of it, and the inferential statistics. Having we accomplished that, it will be the time to discuss some various models: ANOVA, MANOVA, Multiple Linear Regression, and Multivariate Linear Regression. Furthermore, we will discuss, Principal Component analysis, Factor Analysis, and Cluster Analysis. At the end of this series of summaries, some intro to structural equation modeling (SEM), and correspondence analysis will be discussed. Prerequisite skills are, of which readers must have, basic knowledge of statistics and probability, in addition to some advanced knowledge of linear algebra. I have published summary papers in both disciplines, see the reference page.

  6. Z

    Summary statistics from "Sex-Specific Causal Relations between Steroid...

    • data.niaid.nih.gov
    • zenodo.org
    Updated Nov 15, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janne Pott (2021). Summary statistics from "Sex-Specific Causal Relations between Steroid Hormones and Obesity—A Mendelian Randomization Study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644895
    Explore at:
    Dataset updated
    Nov 15, 2021
    Dataset authored and provided by
    Janne Pott
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GWAMA summary statistics of four steroid hormone levels and one steroid hormone ratio using fixed-effect model.

    When using this data, please cite: Pott J, Horn K, Zeidler R, et al.. Sex-Specific Causal Relations between Steroid Hormones and Obesity - A Mendelian Randomization Study. Metabolites 2021, 11, 738. https://doi.org/10.3390/metabo11110738

    All txt files contain the following columns:

    markername

    chr

    bp_hg19 (base position according to hg19)

    ea (effect allele)

    oa (other allele)

    eaf (effect allele frequency)

    info (minimal info score across all used studies)

    nSamples (sample size per SNP)

    nStudies (number of studies)

    beta (effect estimate)

    se (standard error)

    p (p-value)

    I2 (SNP heterogeneity across studies)

    phenotype (phenotyp setting)

  7. 4

    TUD R Cafe Plot-a-thon: 4TU.ResearchData statistics

    • data.4tu.nl
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hyeokjin Kwon (2023). TUD R Cafe Plot-a-thon: 4TU.ResearchData statistics [Dataset]. http://doi.org/10.4121/7b8ae119-47b9-4759-9c1f-90f70f94ba73.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Hyeokjin Kwon
    License

    https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html

    Description

    This dataset is to visualize the 4TU.ResearchData resources for the plot-a-thon.

  8. f

    Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene...

    • figshare.com
    docx
    Updated Mar 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s002
    Explore at:
    docxAvailable download formats
    Dataset updated
    Mar 22, 2024
    Dataset provided by
    Frontiers
    Authors
    Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

  9. Z

    Full summary statistics from 41 EWAS conducted for the EWAS Catalog

    • data.niaid.nih.gov
    • zenodo.org
    Updated Apr 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    EWAS Catalog team (2021). Full summary statistics from 41 EWAS conducted for the EWAS Catalog [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4672753
    Explore at:
    Dataset updated
    Apr 9, 2021
    Dataset authored and provided by
    EWAS Catalog team
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Full summary statistics from 41 epigenome-wide association studies (EWAS) conducted by The EWAS Catalog team (www.ewascatalog.org). Meta-data is found in the "studies-full.csv" file and the results are in "full_stats.tar.gz". Unzipping the "full_stats.tar.gz" file will reveal a folder containing 41 csv files, each with the full summary statistics from one EWAS. The results can be linked to the meta-data using the "Results_file" column in "studies-full.csv". These analyses were conducted using data extracted from the Gene Expression Omnibus (GEO). These data were extracted using the geograbi R package. For more information on the EWAS, please consult our paper: Battram, Thomas, et al. "The EWAS Catalog: A Database of Epigenome-wide Association Studies." OSF Preprints, 4 Feb. 2021. https://doi.org/10.31219/osf.io/837wn. Please cite the paper if you use this dataset.

  10. 4

    TUD R Cafe Plot-a-thon: A Visual Summary

    • data.4tu.nl
    zip
    Updated Oct 11, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Revilla Llaca (2023). TUD R Cafe Plot-a-thon: A Visual Summary [Dataset]. http://doi.org/10.4121/5440fa35-b481-489f-8600-3b6c2d1be655.v1
    Explore at:
    zipAvailable download formats
    Dataset updated
    Oct 11, 2023
    Dataset provided by
    4TU.ResearchData
    Authors
    Rodrigo Revilla Llaca
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    A visual summary of the contents of the 4TU Research Data Repository, in 4 plots.

  11. CONTENT -- Multi-context genetic modeling TWAS summary statistics

    • zenodo.org
    zip
    Updated Jun 20, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen; Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen (2022). CONTENT -- Multi-context genetic modeling TWAS summary statistics [Dataset]. http://doi.org/10.5281/zenodo.5208183
    Explore at:
    zipAvailable download formats
    Dataset updated
    Jun 20, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen; Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    We provide the summary statistics of running CONTENT, the context-by-context approach, and UTMOST on over 22 phenotypes. The phenotypes are listed in the manuscript, and their respective studies and sample size can be found in a table under the supplementary section of the manuscript. All 3 methods were trained on GTEx v7 as well as CLUES, a single-cell RNA sequencing dataset of PBMCs. The data include the gene name, model, cross-validated R^2, prediction pvalue, TWAS p value, TWAS Z score, and a column titled "hFDR" indicating whether the association was statistically significant while employing hierarchical FDR. The benefits of employing such an approach for all methods can be found in the manuscript.

  12. Data from: A dataset to model Levantine landcover and land-use change...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Dec 16, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
    Explore at:
    zipAvailable download formats
    Dataset updated
    Dec 16, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Kempf; Michael Kempf
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Dec 16, 2023
    Area covered
    Levant
    Description

    Overview

    This dataset is the repository for the following paper submitted to Data in Brief:

    Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

    The Data in Brief article contains the supplement information and is the related data paper to:

    Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

    Description/abstract

    The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

    Folder structure

    The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

    “code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

    “MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

    “mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

    “yield_productivity” contains .csv files of yield information for all countries listed above.

    “population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

    “GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

    “built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

    Code structure

    1_MODIS_NDVI_hdf_file_extraction.R


    This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.


    2_MERGE_MODIS_tiles.R


    In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").


    3_CROP_MODIS_merged_tiles.R


    Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
    The repository provides the already clipped and merged NDVI datasets.


    4_TREND_analysis_NDVI.R


    Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
    To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.


    5_BUILT_UP_change_raster.R


    Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.


    6_POPULATION_numbers_plot.R


    For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.


    7_YIELD_plot.R


    In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.


    8_GLDAS_read_extract_trend


    The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
    Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
    From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
    From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.

  13. Data and Code for "A Ray-Based Input Distance Function to Model Zero-Valued...

    • zenodo.org
    • data.niaid.nih.gov
    bin, zip
    Updated Jun 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Juan José Price; Juan José Price; Arne Henningsen; Arne Henningsen (2023). Data and Code for "A Ray-Based Input Distance Function to Model Zero-Valued Output Quantities: Derivation and an Empirical Application" [Dataset]. http://doi.org/10.5281/zenodo.7882079
    Explore at:
    bin, zipAvailable download formats
    Dataset updated
    Jun 17, 2023
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Juan José Price; Juan José Price; Arne Henningsen; Arne Henningsen
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    This data and code archive provides all the data and code for replicating the empirical analysis that is presented in the journal article "A Ray-Based Input Distance Function to Model Zero-Valued Output Quantities: Derivation and an Empirical Application" authored by Juan José Price and Arne Henningsen and published in the Journal of Productivity Analysis (DOI: 10.1007/s11123-023-00684-1).

    We conducted the empirical analysis with the "R" statistical software (version 4.3.0) using the add-on packages "combinat" (version 0.0.8), "miscTools" (version 0.6.28), "quadprog" (version 1.5.8), sfaR (version 1.0.0), stargazer (version 5.2.3), and "xtable" (version 1.8.4) that are available at CRAN. We created the R package "micEconDistRay" that provides the functions for empirical analyses with ray-based input distance functions that we developed for the above-mentioned paper. Also this R package is available at CRAN (https://cran.r-project.org/package=micEconDistRay).

    This replication package contains the following files and folders:

    • README
      This file
    • MuseumsDk.csv
      The original data obtained from the Danish Ministry of Culture and from Statistics Denmark. It includes the following variables:
      • museum: Name of the museum.
      • type: Type of museum (Kulturhistorisk museum = cultural history museum; Kunstmuseer = arts museum; Naturhistorisk museum = natural history museum; Blandet museum = mixed museum).
      • munic: Municipality, in which the museum is located.
      • yr: Year of the observation.
      • units: Number of visit sites.
      • resp: Whether or not the museum has special responsibilities (0 = no special responsibilities; 1 = at least one special responsibility).
      • vis: Number of (physical) visitors.
      • aarc: Number of articles published (archeology).
      • ach: Number of articles published (cultural history).
      • aah: Number of articles published (art history).
      • anh: Number of articles published (natural history).
      • exh: Number of temporary exhibitions.
      • edu: Number of primary school classes on educational visits to the museum.
      • ev: Number of events other than exhibitions.
      • ftesc: Scientific labor (full-time equivalents).
      • ftensc: Non-scientific labor (full-time equivalents).
      • expProperty: Running and maintenance costs [1,000 DKK].
      • expCons: Conservation expenditure [1,000 DKK].
      • ipc: Consumer Price Index in Denmark (the value for year 2014 is set to 1).
    • prepare_data.R
      This R script imports the data set MuseumsDk.csv, prepares it for the empirical analysis (e.g., removing unsuitable observations, preparing variables), and saves the resulting data set as DataPrepared.csv.
    • DataPrepared.csv
      This data set is prepared and saved by the R script prepare_data.R. It is used for the empirical analysis.
    • make_table_descriptive.R
      This R script imports the data set DataPrepared.csv and creates the LaTeX table /tables/table_descriptive.tex, which provides summary statistics of the variables that are used in the empirical analysis.
    • IO_Ray.R
      This R script imports the data set DataPrepared.csv, estimates a ray-based Translog input distance functions with the 'optimal' ordering of outputs, imposes monotonicity on this distance function, creates the LaTeX table /tables/idfRes.tex that presents the estimated parameters of this function, and creates several figures in the folder /figures/ that illustrate the results.
    • IO_Ray_ordering_outputs.R
      This R script imports the data set DataPrepared.csv, estimates a ray-based Translog input distance functions, imposes monotonicity for each of the 720 possible orderings of the outputs, and saves all the estimation results as (a huge) R object allOrderings.rds.
    • allOrderings.rds (not included in the ZIP file, uploaded separately)
      This is a saved R object created by the R script IO_Ray_ordering_outputs.R that contains the estimated ray-based Translog input distance functions (with and without monotonicity imposed) for each of the 720 possible orderings.
    • IO_Ray_model_averaging.R
      This R script loads the R object allOrderings.rds that contains the estimated ray-based Translog input distance functions for each of the 720 possible orderings, does model averaging, and creates several figures in the folder /figures/ that illustrate the results.
    • /tables/
      This folder contains the two LaTeX tables table_descriptive.tex and idfRes.tex (created by R scripts make_table_descriptive.R and IO_Ray.R, respectively) that provide summary statistics of the data set and the estimated parameters (without and with monotonicity imposed) for the 'optimal' ordering of outputs.
    • /figures/
      This folder contains 48 figures (created by the R scripts IO_Ray.R and IO_Ray_model_averaging.R) that illustrate the results obtained with the 'optimal' ordering of outputs and the model-averaged results and that compare these two sets of results.
  14. CLM AWRA HRVs Uncertainty Analysis

    • researchdata.edu.au
    • data.gov.au
    • +2more
    Updated Jul 10, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bioregional Assessment Program (2017). CLM AWRA HRVs Uncertainty Analysis [Dataset]. https://researchdata.edu.au/clm-awra-hrvs-uncertainty-analysis/2984398
    Explore at:
    Dataset updated
    Jul 10, 2017
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Bioregional Assessment Program
    Description

    Abstract

    This dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

    This dataset contains the data and scripts to generate the hydrological response variables for surface water in the Clarence Moreton subregion as reported in CLM261 (Gilfedder et al. 2016).

    Dataset History

    File CLM_AWRA_HRVs_flowchart.png shows the different files in this dataset and how they interact. The python and R-scripts are written by the BA modelling team to, as detailed below, read, combine and analyse the source datasets CLM AWRA model, CLM groundwater model V1 and CLM16swg Surface water gauging station data within the Clarence Moreton Basin to create the hydrological response variables for surface water as reported in CLM2.6.1 (Gilfedder et al. 2016).

    R-script HRV_SWGW_CLM.R reads, for each model simulation, the outputs from the surface water model in netcdf format from file Qtot.nc (dataset CLM AWRA model) and the outputs from the groundwater model, flux_change.csv (dataset CLM groundwater model V1) and creates a set of files in subfolder /Output for each GaugeNr and simulation Year:

    CLM_GaugeNr_Year_all.csv and CLM_GaugeNR_Year_baseline.csv: the set of 9 HRVs for GaugeNr and Year for all 5000 simulations for baseline conditions

    CLM_GaugeNr_Year_CRDP.csv: the set of 9 HRVs for GaugeNr and Year for all 5000 simulations for CRDP conditions (=AWRA streamflow - MODFLOW change in SW-GW flux)

    CLM_GaugeNr_Year_minMax.csv: minimum and maximum of HRVs over all 5000 simulations

    Python script CLM_collate_DoE_Predictions.py collates that information into following files, for each HRV and each maxtype (absolute maximum (amax), relative maximum (pmax) and time of absolute maximum change (tmax)):

    CLM_AWRA_HRV_maxtyp_DoE_Predictions: for each simulation and each gauge_nr, the maxtyp of the HRV over the prediction period (2012 to 2102)

    CLM_AWRA_HRV_DoE_Observations: for each simulation and each gauge_nr, the HRV for the years that observations are available

    CLM_AWRA_HRV_Observations: summary statistics of each HRV and the observed value (based on data set CLM16swg Surface water gauging station data within the Clarence Moreton Basin)

    CLM_AWRA_HRV_maxtyp_Predictions: summary statistics of each HRV

    R-script CLM_CreateObjectiveFunction.R calculates for each HRV the objective function value for all simulations and stores it in CLM_AWRA_HRV_ss.csv. This file is used by python script CLM_AWRA_SI.py to generate figure CLM-2615-002-SI.png (sensitivity indices).

    The AWRA objective function is combined with the overall objective function from the groundwater model in dataset CLM Modflow Uncertainty Analysis (CLM_MF_DoE_ObjFun.csv) into csv file CLM_AWRA_HRV_oo.csv. This file is used to select behavioural simulations in python script CLM-2615-001-top10.py. This script uses files CLM_NodeOrder.csv and BA_Visualisation.py to create the figures CLM-2616-001-HRV_10pct.png.

    Dataset Citation

    Bioregional Assessment Programme (2016) CLM AWRA HRVs Uncertainty Analysis. Bioregional Assessment Derived Dataset. Viewed 28 September 2017, http://data.bioregionalassessments.gov.au/dataset/e51a513d-fde7-44ba-830c-07563a7b2402.

    Dataset Ancestors

  15. H

    Data from: Measuring Spatio-Temporal Civil War Dimensions Using...

    • dataverse.harvard.edu
    Updated Jan 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ore Koren (2023). Measuring Spatio-Temporal Civil War Dimensions Using Community-Based Dynamic Network Representation (CoDNet) [Dataset]. http://doi.org/10.7910/DVN/0S9AFT
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 29, 2023
    Dataset provided by
    Harvard Dataverse
    Authors
    Ore Koren
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    This folder the script files and the underlying data used to aggregate, analyze, and create all the tables in the study “Measuring Spatio-Temporal Civil War Dimensions Using Community-Based Dynamic Network Representation (CoDNet).” These data include: 1. The .csv data file used to conduct the regressions, with all CoDNET based variables included therein (“ccnet_12_19.csv”). 2. The .R script file used to estimate these models, as well as all robustness models in the appendix, and summary statistics. For any questions about the data or scripts, please contact Ore Koren at okoren@iu.edu.

  16. APS Employment Data 30 June 2011

    • researchdata.edu.au
    Updated May 12, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Australian Public Service Commission (2013). APS Employment Data 30 June 2011 [Dataset]. https://researchdata.edu.au/aps-employment-data-june-2011/3386535
    Explore at:
    Dataset updated
    May 12, 2013
    Dataset provided by
    Data.govhttps://data.gov/
    Authors
    Australian Public Service Commission
    License

    Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
    License information was derived automatically

    Description

    \r \r The Australian Public Service Statistical Bulletin 2010-11 presents a\r summary of employment under the Public Service Act 1999 at 30 June 2011 and\r during the 2010-11 financial year, as well as summary data for the past\r 15 years. This Excel dataset consists of tables used to create the Statistical\r Bulletin. You can view the Bulletin online atA "http://www.apsc.gov.au/about-the-apsc/parliamentary/aps-r%0Astatistical-bulletin/aps-statistical-bulletin-2010-11">http://www.apsc.gov.au/about-\r the-apsc/parliamentary/aps-statistical-bulletin/aps-statistical-\r bulletin-2010-11\r \r

  17. e

    Seasonal and annual summary statistics of urbanization, vegetation, land...

    • portal.edirepository.org
    • dataone.org
    bin, csv, txt
    Updated Jul 26, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jeffrey Haight; Fábio de Albuquerque; Amy Frazier (2024). Seasonal and annual summary statistics of urbanization, vegetation, land surface temperature, and bioclimatic variables derived from remotely-sensed imagery in areas surrounding long-term bird monitoring locations in the greater Phoenix, Arizona, USA metropolitan area (1997-2023) [Dataset]. http://doi.org/10.6073/pasta/9d44cd85f881586d6d06e7a7293e833c
    Explore at:
    csv(8351604 bytes), txt(20801 byte), txt(37120 byte), bin(5467 byte)Available download formats
    Dataset updated
    Jul 26, 2024
    Dataset provided by
    EDI
    Authors
    Jeffrey Haight; Fábio de Albuquerque; Amy Frazier
    Time period covered
    Dec 21, 1997 - Dec 20, 2023
    Area covered
    Variables measured
    vp, LST, lat, ppt, NDBI, NDVI, NISI, SAVI, dayl, long, and 13 more
    Description

    This data package consists of 26 years (1998-2023) of environmental data and 22 years (2000-2022) years of bioclimatic data associated with CAP-LTER long-term point-count bird censusing sites (https://doi.org/10.6073/pasta/4777d7f0a899f506d6d4f9b5d535ba09), temporally aggregated by year and by four meteorological seasons (Winter, Spring, Summer, Fall). The environmental variables include land surface temperature (LST), three spectral indices of vegetation and water – the normalized difference vegetation index (NDVI), the soil adjusted vegetation index (SAVI), and modified normalized difference water index (MNDWI) – and four spectral indices of impervious surface/urbanization. Impervious surface indices include the normalized difference built-up index (NDBI), the normalized difference impervious surface index (NDISI), the enhanced normalized differences impervious surface index (ENDISI), and the normalized impervious surface index (NISI). LST and all spectral indices were derived from annual and seasonal composites of 30-m resolution Landsat 5-9 Level-2 Surface Reflectance imagery. The seven bioclimatic variables (e.g., air temperature, precipitation) were sourced from 1-km resolution gridded estimates of daily climatic data from NASA Daymet V4. We created temporally-aggregated Daymet raster images by calculating mean pixel-values for each season and year, as well as seasonally and annually summed precipitation. We summarized the values of each environmental variable by generating variously-sized (100-m, 500-m, 1000-m) buffers around each bird point count location and extracting weighted mean values of each environmental variable, with each pixel's values weighted by the proportion of its area falling within the buffer. All imagery retrieval and data processing were completed with Google Earth Engine (Gorelick et al. 2017) and program R. A complete description of data processing methods, including the aggregation of imagery by year and season and the calculation of spectral indices, can be found in the data package metadata (see 'Methods and Protocols') and accompanying Javascript and R code.

  18. Z

    Data from: Dataset on "Argument maps as a proxy for critical thinking...

    • data.niaid.nih.gov
    • recerca.uoc.edu
    • +1more
    Updated Jan 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Crudele, Francesca (2024). Dataset on "Argument maps as a proxy for critical thinking development: A Lab for undergraduate students" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8093919
    Explore at:
    Dataset updated
    Jan 4, 2024
    Dataset provided by
    Raffaghelli, Juliana Elisa
    Crudele, Francesca
    Description

    Argumentative skills are crucial for any individual at the personal and professional levels. In recent decades, there has been an increasing concern about the weak undergraduates' skills and considerable difficulty in reworking and expressing one's own reflection on a topic. In turn, this has implications for being a critical thinker, able to express an original point of view. Tailored interventions in Higher Education could constitute a powerful approach to promote argumentative skills and extend these skills to professional and personal life. In this regard, argument maps (AM) could prove to be a valuable support to the visualization process of arguments. They don’t just create associations between concepts, but trace the logical relationships between different statements, allowing you to track the reasoning chain and understand it better. We conducted an experimental study to investigate how a path with AM could support students in increasing the level of text comprehension (CoT) competence, in terms of identifying the elements of an argumentative text, and critical thinking (CT), in terms of reconstructing meaning and building their own reflection.

    Our preliminary descriptive analysis suggested the positivity of the role of AM in increasing students’ CoT and CT proficiency levels

    This Zenodo record follows the full analysis process with R (https://cran.r-project.org/bin/windows/base/ ) composed of the following datasets and script:

    1. Comprehension of Text and AMs Results - ExpAM.xlsx

    2. Critical Thinking Results - CriThink.xlsx

    3. Argumentative skills in Forum - ExpForum.xlsx

    4. Selfassessment Results - Dataset_Quest.xlsx

    5. Data for Correlation and Regression - Dataset_CorRegr.xlsx

    6. Descriptive Statistics - Preliminary Analysis.R

    7. Inferential Statistics - Correlation and Regression.R

    Any comments or improvements are welcome!

  19. Summary Statistics from "Meta-GWAS of PCSK9 levels detects two novel loci at...

    • zenodo.org
    • data.niaid.nih.gov
    zip
    Updated Aug 3, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Janne Pott; Janne Pott; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz (2022). Summary Statistics from "Meta-GWAS of PCSK9 levels detects two novel loci at APOB and TM6SF2" [Dataset]. http://doi.org/10.5281/zenodo.5643551
    Explore at:
    zipAvailable download formats
    Dataset updated
    Aug 3, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Janne Pott; Janne Pott; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    GWAMA summary statistics of PCSK9 levels using fixed-effect model. Genome-wide data is given for Europeans with statin adjustment and Europeans without statin treatment only (subset of the population). In addition, locus-wide data of the PCSK9 gene locus for African-Americans without statin treatment is listed.

    When using this data, please cite: Pott J, Gadin J, Theusch E, et al.. Meta-GWAS of PCSK9 levels detects two novel loci at APOB and TM6SF2. Hum Mol Genet. 2021 Sep 30:ddab279. doi: 10.1093/hmg/ddab279. PMID: 34590679

    All txt files contain the following columns:

    • markername
    • chr
    • bp_hg19 (base position according to hg19)
    • ea (effect allele)
    • oa (other allele)
    • eaf (effect allele frequency)
    • info (minimal info score across all used studies)
    • nSamples (sample size per SNP)
    • nStudies (number of studies)
    • beta (effect estimate)
    • se (standard error)
    • p (p-value)
    • I2 (SNP heterogeneity across studies)
    • phenotype (phenotyp setting)
  20. Data from: Posterior predictive checks of coalescent models: P2C2M, an R...

    • zenodo.org
    • search.dataone.org
    • +2more
    application/gzip, txt
    Updated May 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens; Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens (2022). Data from: Posterior predictive checks of coalescent models: P2C2M, an R package [Dataset]. http://doi.org/10.5061/dryad.n715n
    Explore at:
    application/gzip, txtAvailable download formats
    Dataset updated
    May 28, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens; Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    Bayesian inference operates under the assumption that the empirical data are a good statistical fit to the analytical model, but this assumption can be challenging to evaluate. Here, we introduce a novel r package that utilizes posterior predictive simulation to evaluate the fit of the multispecies coalescent model used to estimate species trees. We conduct a simulation study to evaluate the consistency of different summary statistics in comparing posterior and posterior predictive distributions, the use of simulation replication in reducing error rates and the utility of parallel process invocation towards improving computation times. We also test P2C2M on two empirical data sets in which hybridization and gene flow are suspected of contributing to shared polymorphism, which is in violation with the coalescent model: Tamias chipmunks and Myotis bats. Our results indicate that (i) probability-based summary statistics display the lowest error rates, (ii) the implementation of simulation replication decreases the rate of type II errors, and (iii) our r package displays improved statistical power compared to previous implementations of this approach. When probabilistic summary statistics are used, P2C2M corroborates the assumption that genealogies collected from Tamias and Myotis are not a good fit to the multispecies coalescent model. Taken as a whole, our findings argue that an assessment of the fit of the multispecies coalescent model should accompany any phylogenetic analysis that estimates a species tree.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx

Related Article
Explore at:
docxAvailable download formats
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

Search
Clear search
Close search
Google apps
Main menu