100+ datasets found

f
Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...
frontiersin.figshare.com
docx
Updated Mar 22, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s001
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Statistical Data Analysis using R
figshare.com
txt
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statistical Data Analysis using R [Dataset]. https://figshare.com/articles/dataset/Statistical_Data_Analysis_using_R/5501035
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5501035.v1
Dataset updated
May 30, 2023
Dataset provided by
figshare
Authors
Samuel Barsanelli Costa
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
R Scripts contain statistical data analisys for streamflow and sediment data, including Flow Duration Curves, Double Mass Analysis, Nonlinear Regression Analysis for Suspended Sediment Rating Curves, Stationarity Tests and include several plots.
Datasets used in the benchmarking study of MR methods
zenodo.org
zip
Updated Aug 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hu Xianghong; Hu Xianghong (2024). Datasets used in the benchmarking study of MR methods [Dataset]. http://doi.org/10.5281/zenodo.10929572
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10929572
Dataset updated
Aug 4, 2024
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Hu Xianghong; Hu Xianghong
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We conducted a benchmarking analysis of 16 summary-level data-based MR methods for causal inference with five real-world genetic datasets, focusing on three key aspects: type I error control, the accuracy of causal effect estimates, replicability, and power.

The datasets used in the MR benchmarking study can be downloaded here:

"dataset-GWASATLAS-negativecontrol.zip": the GWASATLAS dataset for evaluation of type I error control in confounding scenario (a): Population stratification

"dataset-NealeLab-negativecontrol.zip": the Neale Lab dataset for evaluation of type I error control in confounding scenario (a): Population stratification;

"dataset-PanUKBB-negativecontrol.zip": the Pan UKBB dataset for evaluation of type I error control in confounding scenario (a): Population stratification;

"dataset-Pleiotropy-negativecontrol": the dataset used for evaluation of type I error control in confounding scenario (b): Pleiotropy;

"dataset-familylevelconf-negativecontrol.zip": the dataset used for evaluation of type I error control in confounding scenario (c): Family-level confounders;

"dataset_ukb-ukb.zip": the dataset used for evaluation of the accuracy of causal effect estimates;

"dataset-LDL-CAD_clumped.zip": the dataset used for evaluation of replicability and power;

Each of the datasets contains the following files:

"Tested Trait pairs": the exposure-outcome trait pairs to be analyzed;

"MRdat" refers to the summary statistics after performing IV selection (p-value < 5e-05) and PLINK LD clumping with a clumping window size of 1000kb and an r^2 threshold of 0.001.

"bg_paras" are the estimated background parameters "Omega" and "C" which will be used for MR estimation in MR-APSS.

Note:

Supplemental Tables S1-S7.xlxs provide the download link for the original GWAS summary-level data for the traits used as exposures or outcomes.

The formatted dataset after quality control can be accessible at our GitHub website (https://github.com/YangLabHKUST/MRbenchmarking).

The details on quality control of GWAS summary statistics, formatting GWASs, and LD clumping for IV selection can be found on the MR-APSS software tutorial on the MR-APSS website (https://github.com/YangLabHKUST/MR-APSS).

R code for running MR methods is also available at https://github.com/YangLabHKUST/MRbenchmarking.
d
Political Analysis Using R: Example Code and Data, Plus Data for Practice...
search.dataone.org
dataverse.harvard.edu
Updated Nov 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Monogan, Jamie (2023). Political Analysis Using R: Example Code and Data, Plus Data for Practice Problems [Dataset]. http://doi.org/10.7910/DVN/ARKOTI
Explore at:
Unique identifier
https://doi.org/10.7910/DVN/ARKOTI
Dataset updated
Nov 21, 2023
Dataset provided by
Harvard Dataverse
Authors
Monogan, Jamie
Description
Each R script replicates all of the example code from one chapter from the book. All required data for each script are also uploaded, as are all data used in the practice problems at the end of each chapter. The data are drawn from a wide array of sources, so please cite the original work if you ever use any of these data sets for research purposes.
o
Data from: AMSM Summary Papers- (Chapter One- Introduction)
osf.io
Updated Oct 16, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
R. Dahman (2018). AMSM Summary Papers- (Chapter One- Introduction) [Dataset]. https://osf.io/tkby3
Explore at:
Dataset updated
Oct 16, 2018
Dataset provided by
Center For Open Science
Authors
R. Dahman
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
In the upcoming some 40 summary papers I will demonstrate a comprehensive view of Applied Multivariate Statistical Modeling. First, I will start with a thorough introduction of AMSM. Then, I will explain the univariate descriptive statistics, sampling distribution, estimation, in addition to hypothesis testing. After that, I will do a comprehensive review of multivariate descriptive statistics, the normal distribution of it, and the inferential statistics. Having we accomplished that, it will be the time to discuss some various models: ANOVA, MANOVA, Multiple Linear Regression, and Multivariate Linear Regression. Furthermore, we will discuss, Principal Component analysis, Factor Analysis, and Cluster Analysis. At the end of this series of summaries, some intro to structural equation modeling (SEM), and correspondence analysis will be discussed. Prerequisite skills are, of which readers must have, basic knowledge of statistics and probability, in addition to some advanced knowledge of linear algebra. I have published summary papers in both disciplines, see the reference page.
Z
Summary statistics from "Sex-Specific Causal Relations between Steroid...
data.niaid.nih.gov
zenodo.org
Updated Nov 15, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janne Pott (2021). Summary statistics from "Sex-Specific Causal Relations between Steroid Hormones and Obesity—A Mendelian Randomization Study" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644895
Explore at:
Dataset updated
Nov 15, 2021
Dataset authored and provided by
Janne Pott
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GWAMA summary statistics of four steroid hormone levels and one steroid hormone ratio using fixed-effect model.

When using this data, please cite: Pott J, Horn K, Zeidler R, et al.. Sex-Specific Causal Relations between Steroid Hormones and Obesity - A Mendelian Randomization Study. Metabolites 2021, 11, 738. https://doi.org/10.3390/metabo11110738

All txt files contain the following columns:

markername

chr

bp_hg19 (base position according to hg19)

ea (effect allele)

oa (other allele)

eaf (effect allele frequency)

info (minimal info score across all used studies)

nSamples (sample size per SNP)

nStudies (number of studies)

beta (effect estimate)

se (standard error)

p (p-value)

I2 (SNP heterogeneity across studies)

phenotype (phenotyp setting)
4
TUD R Cafe Plot-a-thon: 4TU.ResearchData statistics
data.4tu.nl
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hyeokjin Kwon (2023). TUD R Cafe Plot-a-thon: 4TU.ResearchData statistics [Dataset]. http://doi.org/10.4121/7b8ae119-47b9-4759-9c1f-90f70f94ba73.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/7b8ae119-47b9-4759-9c1f-90f70f94ba73.v1
Dataset updated
Oct 11, 2023
Dataset provided by
4TU.ResearchData
Authors
Hyeokjin Kwon
License
https://www.gnu.org/licenses/gpl-3.0.htmlhttps://www.gnu.org/licenses/gpl-3.0.html
Description
This dataset is to visualize the 4TU.ResearchData resources for the plot-a-thon.
f
Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene...
figshare.com
docx
Updated Mar 22, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s002
Explore at:
docxAvailable download formats
Unique identifier
https://doi.org/10.3389/feduc.2024.1379910.s002
Dataset updated
Mar 22, 2024
Dataset provided by
Frontiers
Authors
Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.
Z
Full summary statistics from 41 EWAS conducted for the EWAS Catalog
data.niaid.nih.gov
zenodo.org
Updated Apr 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EWAS Catalog team (2021). Full summary statistics from 41 EWAS conducted for the EWAS Catalog [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4672753
Explore at:
Dataset updated
Apr 9, 2021
Dataset authored and provided by
EWAS Catalog team
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Full summary statistics from 41 epigenome-wide association studies (EWAS) conducted by The EWAS Catalog team (www.ewascatalog.org). Meta-data is found in the "studies-full.csv" file and the results are in "full_stats.tar.gz". Unzipping the "full_stats.tar.gz" file will reveal a folder containing 41 csv files, each with the full summary statistics from one EWAS. The results can be linked to the meta-data using the "Results_file" column in "studies-full.csv". These analyses were conducted using data extracted from the Gene Expression Omnibus (GEO). These data were extracted using the geograbi R package. For more information on the EWAS, please consult our paper: Battram, Thomas, et al. "The EWAS Catalog: A Database of Epigenome-wide Association Studies." OSF Preprints, 4 Feb. 2021. https://doi.org/10.31219/osf.io/837wn. Please cite the paper if you use this dataset.
4
TUD R Cafe Plot-a-thon: A Visual Summary
data.4tu.nl
zip
Updated Oct 11, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Rodrigo Revilla Llaca (2023). TUD R Cafe Plot-a-thon: A Visual Summary [Dataset]. http://doi.org/10.4121/5440fa35-b481-489f-8600-3b6c2d1be655.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/5440fa35-b481-489f-8600-3b6c2d1be655.v1
Dataset updated
Oct 11, 2023
Dataset provided by
4TU.ResearchData
Authors
Rodrigo Revilla Llaca
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A visual summary of the contents of the 4TU Research Data Repository, in 4 plots.
CONTENT -- Multi-context genetic modeling TWAS summary statistics
zenodo.org
zip
Updated Jun 20, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen; Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen (2022). CONTENT -- Multi-context genetic modeling TWAS summary statistics [Dataset]. http://doi.org/10.5281/zenodo.5208183
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5208183
Dataset updated
Jun 20, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen; Mike Thompson; Mary Grace Gordon; Andrew Lu; Eran Halperin; Alexander Gusev; Jimmie Ye Chun; Brunilda Balliu; Noah Zaitlen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
We provide the summary statistics of running CONTENT, the context-by-context approach, and UTMOST on over 22 phenotypes. The phenotypes are listed in the manuscript, and their respective studies and sample size can be found in a table under the supplementary section of the manuscript. All 3 methods were trained on GTEx v7 as well as CLUES, a single-cell RNA sequencing dataset of PBMCs. The data include the gene name, model, cross-validated R^2, prediction pvalue, TWAS p value, TWAS Z score, and a column titled "hFDR" indicating whether the association was statistically significant while employing hierarchical FDR. The benefits of employing such an approach for all methods can be found in the manuscript.
Data from: A dataset to model Levantine landcover and land-use change...
zenodo.org
data.niaid.nih.gov
zip
Updated Dec 16, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Kempf; Michael Kempf (2023). A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19 [Dataset]. http://doi.org/10.5281/zenodo.10396148
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.10396148
Dataset updated
Dec 16, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Kempf; Michael Kempf
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Dec 16, 2023
Area covered
Levant
Description
Overview

This dataset is the repository for the following paper submitted to Data in Brief:

Kempf, M. A dataset to model Levantine landcover and land-use change connected to climate change, the Arab Spring and COVID-19. Data in Brief (submitted: December 2023).

The Data in Brief article contains the supplement information and is the related data paper to:

Kempf, M. Climate change, the Arab Spring, and COVID-19 - Impacts on landcover transformations in the Levant. Journal of Arid Environments (revision submitted: December 2023).

Description/abstract

The Levant region is highly vulnerable to climate change, experiencing prolonged heat waves that have led to societal crises and population displacement. Since 2010, the area has been marked by socio-political turmoil, including the Syrian civil war and currently the escalation of the so-called Israeli-Palestinian Conflict, which strained neighbouring countries like Jordan due to the influx of Syrian refugees and increases population vulnerability to governmental decision-making. Jordan, in particular, has seen rapid population growth and significant changes in land-use and infrastructure, leading to over-exploitation of the landscape through irrigation and construction. This dataset uses climate data, satellite imagery, and land cover information to illustrate the substantial increase in construction activity and highlights the intricate relationship between climate change predictions and current socio-political developments in the Levant.

Folder structure

The main folder after download contains all data, in which the following subfolders are stored are stored as zipped files:

“code” stores the above described 9 code chunks to read, extract, process, analyse, and visualize the data.

“MODIS_merged” contains the 16-days, 250 m resolution NDVI imagery merged from three tiles (h20v05, h21v05, h21v06) and cropped to the study area, n=510, covering January 2001 to December 2022 and including January and February 2023.

“mask” contains a single shapefile, which is the merged product of administrative boundaries, including Jordan, Lebanon, Israel, Syria, and Palestine (“MERGED_LEVANT.shp”).

“yield_productivity” contains .csv files of yield information for all countries listed above.

“population” contains two files with the same name but different format. The .csv file is for processing and plotting in R. The .ods file is for enhanced visualization of population dynamics in the Levant (Socio_cultural_political_development_database_FAO2023.ods).

“GLDAS” stores the raw data of the NASA Global Land Data Assimilation System datasets that can be read, extracted (variable name), and processed using code “8_GLDAS_read_extract_trend” from the respective folder. One folder contains data from 1975-2022 and a second the additional January and February 2023 data.

“built_up” contains the landcover and built-up change data from 1975 to 2022. This folder is subdivided into two subfolder which contain the raw data and the already processed data. “raw_data” contains the unprocessed datasets and “derived_data” stores the cropped built_up datasets at 5 year intervals, e.g., “Levant_built_up_1975.tif”.

Code structure

1_MODIS_NDVI_hdf_file_extraction.R

This is the first code chunk that refers to the extraction of MODIS data from .hdf file format. The following packages must be installed and the raw data must be downloaded using a simple mass downloader, e.g., from google chrome. Packages: terra. Download MODIS data from after registration from: https://lpdaac.usgs.gov/products/mod13q1v061/ or https://search.earthdata.nasa.gov/search (MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V061, last accessed, 09th of October 2023). The code reads a list of files, extracts the NDVI, and saves each file to a single .tif-file with the indication “NDVI”. Because the study area is quite large, we have to load three different (spatially) time series and merge them later. Note that the time series are temporally consistent.

2_MERGE_MODIS_tiles.R

In this code, we load and merge the three different stacks to produce large and consistent time series of NDVI imagery across the study area. We further use the package gtools to load the files in (1, 2, 3, 4, 5, 6, etc.). Here, we have three stacks from which we merge the first two (stack 1, stack 2) and store them. We then merge this stack with stack 3. We produce single files named NDVI_final_*consecutivenumber*.tif. Before saving the final output of single merged files, create a folder called “merged” and set the working directory to this folder, e.g., setwd("your directory_MODIS/merged").

3_CROP_MODIS_merged_tiles.R

Now we want to crop the derived MODIS tiles to our study area. We are using a mask, which is provided as .shp file in the repository, named "MERGED_LEVANT.shp". We load the merged .tif files and crop the stack with the vector. Saving to individual files, we name them “NDVI_merged_clip_*consecutivenumber*.tif. We now produced single cropped NDVI time series data from MODIS.
The repository provides the already clipped and merged NDVI datasets.

4_TREND_analysis_NDVI.R

Now, we want to perform trend analysis from the derived data. The data we load is tricky as it contains 16-days return period across a year for the period of 22 years. Growing season sums contain MAM (March-May), JJA (June-August), and SON (September-November). December is represented as a single file, which means that the period DJF (December-February) is represented by 5 images instead of 6. For the last DJF period (December 2022), the data from January and February 2023 can be added. The code selects the respective images from the stack, depending on which period is under consideration. From these stacks, individual annually resolved growing season sums are generated and the slope is calculated. We can then extract the p-values of the trend and characterize all values with high confidence level (0.05). Using the ggplot2 package and the melt function from reshape2 package, we can create a plot of the reclassified NDVI trends together with a local smoother (LOESS) of value 0.3.
To increase comparability and understand the amplitude of the trends, z-scores were calculated and plotted, which show the deviation of the values from the mean. This has been done for the NDVI values as well as the GLDAS climate variables as a normalization technique.

5_BUILT_UP_change_raster.R

Let us look at the landcover changes now. We are working with the terra package and get raster data from here: https://ghsl.jrc.ec.europa.eu/download.php?ds=bu (last accessed 03. March 2023, 100 m resolution, global coverage). Here, one can download the temporal coverage that is aimed for and reclassify it using the code after cropping to the individual study area. Here, I summed up different raster to characterize the built-up change in continuous values between 1975 and 2022.

6_POPULATION_numbers_plot.R

For this plot, one needs to load the .csv-file “Socio_cultural_political_development_database_FAO2023.csv” from the repository. The ggplot script provided produces the desired plot with all countries under consideration.

7_YIELD_plot.R

In this section, we are using the country productivity from the supplement in the repository “yield_productivity” (e.g., "Jordan_yield.csv". Each of the single country yield datasets is plotted in a ggplot and combined using the patchwork package in R.

8_GLDAS_read_extract_trend

The last code provides the basis for the trend analysis of the climate variables used in the paper. The raw data can be accessed https://disc.gsfc.nasa.gov/datasets?keywords=GLDAS%20Noah%20Land%20Surface%20Model%20L4%20monthly&page=1 (last accessed 9th of October 2023). The raw data comes in .nc file format and various variables can be extracted using the [“^a variable name”] command from the spatraster collection. Each time you run the code, this variable name must be adjusted to meet the requirements for the variables (see this link for abbreviations: https://disc.gsfc.nasa.gov/datasets/GLDAS_CLSM025_D_2.0/summary, last accessed 09th of October 2023; or the respective code chunk when reading a .nc file with the ncdf4 package in R) or run print(nc) from the code or use names(the spatraster collection).
Choosing one variable, the code uses the MERGED_LEVANT.shp mask from the repository to crop and mask the data to the outline of the study area.
From the processed data, trend analysis are conducted and z-scores were calculated following the code described above. However, annual trends require the frequency of the time series analysis to be set to value = 12. Regarding, e.g., rainfall, which is measured as annual sums and not means, the chunk r.sum=r.sum/12 has to be removed or set to r.sum=r.sum/1 to avoid calculating annual mean values (see other variables). Seasonal subset can be calculated as described in the code. Here, 3-month subsets were chosen for growing seasons, e.g. March-May (MAM), June-July (JJA), September-November (SON), and DJF (December-February, including Jan/Feb of the consecutive year).
From the data, mean values of 48 consecutive years are calculated and trend analysis are performed as describe above. In the same way, p-values are extracted and 95 % confidence level values are marked with dots on the raster plot. This analysis can be performed with a much longer time series, other variables, ad different spatial extent across the globe due to the availability of the GLDAS variables.
Data and Code for "A Ray-Based Input Distance Function to Model Zero-Valued...
zenodo.org
data.niaid.nih.gov
bin, zip
Updated Jun 17, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Juan José Price; Juan José Price; Arne Henningsen; Arne Henningsen (2023). Data and Code for "A Ray-Based Input Distance Function to Model Zero-Valued Output Quantities: Derivation and an Empirical Application" [Dataset]. http://doi.org/10.5281/zenodo.7882079
Explore at:
bin, zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7882079
Dataset updated
Jun 17, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Juan José Price; Juan José Price; Arne Henningsen; Arne Henningsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data and code archive provides all the data and code for replicating the empirical analysis that is presented in the journal article "A Ray-Based Input Distance Function to Model Zero-Valued Output Quantities: Derivation and an Empirical Application" authored by Juan José Price and Arne Henningsen and published in the Journal of Productivity Analysis (DOI: 10.1007/s11123-023-00684-1).

We conducted the empirical analysis with the "R" statistical software (version 4.3.0) using the add-on packages "combinat" (version 0.0.8), "miscTools" (version 0.6.28), "quadprog" (version 1.5.8), sfaR (version 1.0.0), stargazer (version 5.2.3), and "xtable" (version 1.8.4) that are available at CRAN. We created the R package "micEconDistRay" that provides the functions for empirical analyses with ray-based input distance functions that we developed for the above-mentioned paper. Also this R package is available at CRAN (https://cran.r-project.org/package=micEconDistRay).

This replication package contains the following files and folders:

README
This file

MuseumsDk.csv
The original data obtained from the Danish Ministry of Culture and from Statistics Denmark. It includes the following variables:

museum: Name of the museum.

type: Type of museum (Kulturhistorisk museum = cultural history museum; Kunstmuseer = arts museum; Naturhistorisk museum = natural history museum; Blandet museum = mixed museum).

munic: Municipality, in which the museum is located.

yr: Year of the observation.

units: Number of visit sites.

resp: Whether or not the museum has special responsibilities (0 = no special responsibilities; 1 = at least one special responsibility).

vis: Number of (physical) visitors.

aarc: Number of articles published (archeology).

ach: Number of articles published (cultural history).

aah: Number of articles published (art history).

anh: Number of articles published (natural history).

exh: Number of temporary exhibitions.

edu: Number of primary school classes on educational visits to the museum.

ev: Number of events other than exhibitions.

ftesc: Scientific labor (full-time equivalents).

ftensc: Non-scientific labor (full-time equivalents).

expProperty: Running and maintenance costs [1,000 DKK].

expCons: Conservation expenditure [1,000 DKK].

ipc: Consumer Price Index in Denmark (the value for year 2014 is set to 1).

prepare_data.R
This R script imports the data set MuseumsDk.csv, prepares it for the empirical analysis (e.g., removing unsuitable observations, preparing variables), and saves the resulting data set as DataPrepared.csv.

DataPrepared.csv
This data set is prepared and saved by the R script prepare_data.R. It is used for the empirical analysis.

make_table_descriptive.R
This R script imports the data set DataPrepared.csv and creates the LaTeX table /tables/table_descriptive.tex, which provides summary statistics of the variables that are used in the empirical analysis.

IO_Ray.R
This R script imports the data set DataPrepared.csv, estimates a ray-based Translog input distance functions with the 'optimal' ordering of outputs, imposes monotonicity on this distance function, creates the LaTeX table /tables/idfRes.tex that presents the estimated parameters of this function, and creates several figures in the folder /figures/ that illustrate the results.

IO_Ray_ordering_outputs.R
This R script imports the data set DataPrepared.csv, estimates a ray-based Translog input distance functions, imposes monotonicity for each of the 720 possible orderings of the outputs, and saves all the estimation results as (a huge) R object allOrderings.rds.

allOrderings.rds (not included in the ZIP file, uploaded separately)
This is a saved R object created by the R script IO_Ray_ordering_outputs.R that contains the estimated ray-based Translog input distance functions (with and without monotonicity imposed) for each of the 720 possible orderings.

IO_Ray_model_averaging.R
This R script loads the R object allOrderings.rds that contains the estimated ray-based Translog input distance functions for each of the 720 possible orderings, does model averaging, and creates several figures in the folder /figures/ that illustrate the results.

/tables/
This folder contains the two LaTeX tables table_descriptive.tex and idfRes.tex (created by R scripts make_table_descriptive.R and IO_Ray.R, respectively) that provide summary statistics of the data set and the estimated parameters (without and with monotonicity imposed) for the 'optimal' ordering of outputs.

/figures/
This folder contains 48 figures (created by the R scripts IO_Ray.R and IO_Ray_model_averaging.R) that illustrate the results obtained with the 'optimal' ordering of outputs and the model-averaged results and that compare these two sets of results.
CLM AWRA HRVs Uncertainty Analysis
researchdata.edu.au
data.gov.au
+2more
Updated Jul 10, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Bioregional Assessment Program (2017). CLM AWRA HRVs Uncertainty Analysis [Dataset]. https://researchdata.edu.au/clm-awra-hrvs-uncertainty-analysis/2984398
Explore at:
Dataset updated
Jul 10, 2017
Dataset provided by
Data.govhttps://data.gov/
Authors
Bioregional Assessment Program
Description
Abstract

This dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

This dataset contains the data and scripts to generate the hydrological response variables for surface water in the Clarence Moreton subregion as reported in CLM261 (Gilfedder et al. 2016).

Dataset History

File CLM_AWRA_HRVs_flowchart.png shows the different files in this dataset and how they interact. The python and R-scripts are written by the BA modelling team to, as detailed below, read, combine and analyse the source datasets CLM AWRA model, CLM groundwater model V1 and CLM16swg Surface water gauging station data within the Clarence Moreton Basin to create the hydrological response variables for surface water as reported in CLM2.6.1 (Gilfedder et al. 2016).

R-script HRV_SWGW_CLM.R reads, for each model simulation, the outputs from the surface water model in netcdf format from file Qtot.nc (dataset CLM AWRA model) and the outputs from the groundwater model, flux_change.csv (dataset CLM groundwater model V1) and creates a set of files in subfolder /Output for each GaugeNr and simulation Year:

CLM_GaugeNr_Year_all.csv and CLM_GaugeNR_Year_baseline.csv: the set of 9 HRVs for GaugeNr and Year for all 5000 simulations for baseline conditions

CLM_GaugeNr_Year_CRDP.csv: the set of 9 HRVs for GaugeNr and Year for all 5000 simulations for CRDP conditions (=AWRA streamflow - MODFLOW change in SW-GW flux)

CLM_GaugeNr_Year_minMax.csv: minimum and maximum of HRVs over all 5000 simulations

Python script CLM_collate_DoE_Predictions.py collates that information into following files, for each HRV and each maxtype (absolute maximum (amax), relative maximum (pmax) and time of absolute maximum change (tmax)):

CLM_AWRA_HRV_maxtyp_DoE_Predictions: for each simulation and each gauge_nr, the maxtyp of the HRV over the prediction period (2012 to 2102)

CLM_AWRA_HRV_DoE_Observations: for each simulation and each gauge_nr, the HRV for the years that observations are available

CLM_AWRA_HRV_Observations: summary statistics of each HRV and the observed value (based on data set CLM16swg Surface water gauging station data within the Clarence Moreton Basin)

CLM_AWRA_HRV_maxtyp_Predictions: summary statistics of each HRV

R-script CLM_CreateObjectiveFunction.R calculates for each HRV the objective function value for all simulations and stores it in CLM_AWRA_HRV_ss.csv. This file is used by python script CLM_AWRA_SI.py to generate figure CLM-2615-002-SI.png (sensitivity indices).

The AWRA objective function is combined with the overall objective function from the groundwater model in dataset CLM Modflow Uncertainty Analysis (CLM_MF_DoE_ObjFun.csv) into csv file CLM_AWRA_HRV_oo.csv. This file is used to select behavioural simulations in python script CLM-2615-001-top10.py. This script uses files CLM_NodeOrder.csv and BA_Visualisation.py to create the figures CLM-2616-001-HRV_10pct.png.

Dataset Citation

Bioregional Assessment Programme (2016) CLM AWRA HRVs Uncertainty Analysis. Bioregional Assessment Derived Dataset. Viewed 28 September 2017, http://data.bioregionalassessments.gov.au/dataset/e51a513d-fde7-44ba-830c-07563a7b2402.

Dataset Ancestors

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements 20131204

Derived From Qld 100K mapsheets - Mount Lindsay

Derived From Qld 100K mapsheets - Helidon

Derived From Qld 100K mapsheets - Ipswich

Derived From CLM - Woogaroo Subgroup extent

Derived From CLM - Interpolated surfaces of Alluvium depth

Derived From CLM - Extent of Logan and Albert river alluvial systems

Derived From CLM - Bore allocations NSW v02

Derived From CLM - Bore allocations NSW

Derived From CLM - Bore assignments NSW and QLD summary tables

Derived From CLM - Geology NSW & Qld combined v02

Derived From CLM - Orara-Bungawalbin bedrock

Derived From CLM16gwl NSW Office of Water_GW licence extract linked to spatial locations_CLM_v3_13032014

Derived From CLM groundwater model hydraulic property data

Derived From CLM - Koukandowie FM bedrock

Derived From GEODATA TOPO 250K Series 3, File Geodatabase format (.gdb)

Derived From NSW Office of Water - National Groundwater Information System 20140701

Derived From CLM - Gatton Sandstone extent

Derived From CLM16gwl NSW Office of Water, GW licence extract linked to spatial locations in CLM v2 28022014

Derived From Bioregional Assessment areas v03

Derived From NSW Geological Survey - geological units DRAFT line work.

Derived From Mean Annual Climate Data of Australia 1981 to 2012

Derived From CLM Preliminary Assessment Extent Definition & Report( CLM PAE)

Derived From Qld 100K mapsheets - Caboolture

Derived From CLM - AWRA Calibration Gauges SubCatchments

Derived From CLM - NSW Office of Water Gauge Data for Tweed, Richmond & Clarence rivers. Extract 20140901

Derived From Qld 100k mapsheets - Murwillumbah

Derived From AHGFContractedCatchment - V2.1 - Bremer-Warrill

Derived From Bioregional Assessment areas v01

Derived From Bioregional Assessment areas v02

Derived From QLD Current Exploration Permits for Minerals (EPM) in Queensland 6/3/2013

Derived From Pilot points for prediction interpolation of layer 1 in CLM groundwater model

Derived From CLM - Bore water level NSW

Derived From Climate model 0.05x0.05 cells and cell centroids

Derived From CLM - New South Wales Department of Trade and Investment 3D geological model layers

Derived From CLM - Metgasco 3D geological model formation top grids

Derived From State Transmissivity Estimates for Hydrogeology Cross-Cutting Project

Derived From CLM - Extent of Bremer river and Warrill creek alluvial systems

Derived From NSW Catchment Management Authority Boundaries 20130917

Derived From QLD Department of Natural Resources and Mining Groundwater Database Extract 20131111

Derived From Qld 100K mapsheets - Esk

Derived From QLD Dept of Natural Resources and Mines, Groundwater Entitlements linked to bores and NGIS v4 28072014

Derived From BILO Gridded Climate Data: Daily Climate Data for each year from 1900 to 2012

Derived From CLM - Qld Surface Geology Mapsheets

Derived From NSW Office of Water Pump Test dataset

Derived From [CLM -
H
Data from: Measuring Spatio-Temporal Civil War Dimensions Using...
dataverse.harvard.edu
Updated Jan 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ore Koren (2023). Measuring Spatio-Temporal Civil War Dimensions Using Community-Based Dynamic Network Representation (CoDNet) [Dataset]. http://doi.org/10.7910/DVN/0S9AFT
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.7910/DVN/0S9AFT
Dataset updated
Jan 29, 2023
Dataset provided by
Harvard Dataverse
Authors
Ore Koren
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This folder the script files and the underlying data used to aggregate, analyze, and create all the tables in the study “Measuring Spatio-Temporal Civil War Dimensions Using Community-Based Dynamic Network Representation (CoDNet).” These data include: 1. The .csv data file used to conduct the regressions, with all CoDNET based variables included therein (“ccnet_12_19.csv”). 2. The .R script file used to estimate these models, as well as all robustness models in the appendix, and summary statistics. For any questions about the data or scripts, please contact Ore Koren at okoren@iu.edu.
APS Employment Data 30 June 2011
researchdata.edu.au
Updated May 12, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Australian Public Service Commission (2013). APS Employment Data 30 June 2011 [Dataset]. https://researchdata.edu.au/aps-employment-data-june-2011/3386535
Explore at:
Dataset updated
May 12, 2013
Dataset provided by
Data.govhttps://data.gov/
Authors
Australian Public Service Commission
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
\r \r The Australian Public Service Statistical Bulletin 2010-11 presents a\r summary of employment under the Public Service Act 1999 at 30 June 2011 and\r during the 2010-11 financial year, as well as summary data for the past\r 15 years. This Excel dataset consists of tables used to create the Statistical\r Bulletin. You can view the Bulletin online atA "http://www.apsc.gov.au/about-the-apsc/parliamentary/aps-r%0Astatistical-bulletin/aps-statistical-bulletin-2010-11">http://www.apsc.gov.au/about-\r the-apsc/parliamentary/aps-statistical-bulletin/aps-statistical-\r bulletin-2010-11\r \r
e
Seasonal and annual summary statistics of urbanization, vegetation, land...
portal.edirepository.org
dataone.org
bin, csv, txt
Updated Jul 26, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jeffrey Haight; Fábio de Albuquerque; Amy Frazier (2024). Seasonal and annual summary statistics of urbanization, vegetation, land surface temperature, and bioclimatic variables derived from remotely-sensed imagery in areas surrounding long-term bird monitoring locations in the greater Phoenix, Arizona, USA metropolitan area (1997-2023) [Dataset]. http://doi.org/10.6073/pasta/9d44cd85f881586d6d06e7a7293e833c
Explore at:
csv(8351604 bytes), txt(20801 byte), txt(37120 byte), bin(5467 byte)Available download formats
Unique identifier
https://doi.org/10.6073/pasta/9d44cd85f881586d6d06e7a7293e833c
Dataset updated
Jul 26, 2024
Dataset provided by
EDI
Authors
Jeffrey Haight; Fábio de Albuquerque; Amy Frazier
Time period covered
Dec 21, 1997 - Dec 20, 2023
Area covered

Variables measured
vp, LST, lat, ppt, NDBI, NDVI, NISI, SAVI, dayl, long, and 13 more
Description
This data package consists of 26 years (1998-2023) of environmental data and 22 years (2000-2022) years of bioclimatic data associated with CAP-LTER long-term point-count bird censusing sites (https://doi.org/10.6073/pasta/4777d7f0a899f506d6d4f9b5d535ba09), temporally aggregated by year and by four meteorological seasons (Winter, Spring, Summer, Fall). The environmental variables include land surface temperature (LST), three spectral indices of vegetation and water – the normalized difference vegetation index (NDVI), the soil adjusted vegetation index (SAVI), and modified normalized difference water index (MNDWI) – and four spectral indices of impervious surface/urbanization. Impervious surface indices include the normalized difference built-up index (NDBI), the normalized difference impervious surface index (NDISI), the enhanced normalized differences impervious surface index (ENDISI), and the normalized impervious surface index (NISI). LST and all spectral indices were derived from annual and seasonal composites of 30-m resolution Landsat 5-9 Level-2 Surface Reflectance imagery. The seven bioclimatic variables (e.g., air temperature, precipitation) were sourced from 1-km resolution gridded estimates of daily climatic data from NASA Daymet V4. We created temporally-aggregated Daymet raster images by calculating mean pixel-values for each season and year, as well as seasonally and annually summed precipitation. We summarized the values of each environmental variable by generating variously-sized (100-m, 500-m, 1000-m) buffers around each bird point count location and extracting weighted mean values of each environmental variable, with each pixel's values weighted by the proportion of its area falling within the buffer. All imagery retrieval and data processing were completed with Google Earth Engine (Gorelick et al. 2017) and program R. A complete description of data processing methods, including the aggregation of imagery by year and season and the calculation of spectral indices, can be found in the data package metadata (see 'Methods and Protocols') and accompanying Javascript and R code.
Z
Data from: Dataset on "Argument maps as a proxy for critical thinking...
data.niaid.nih.gov
recerca.uoc.edu
+1more
Updated Jan 4, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Crudele, Francesca (2024). Dataset on "Argument maps as a proxy for critical thinking development: A Lab for undergraduate students" [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_8093919
Explore at:
Dataset updated
Jan 4, 2024
Dataset provided by
Raffaghelli, Juliana Elisa
Crudele, Francesca
Description
Argumentative skills are crucial for any individual at the personal and professional levels. In recent decades, there has been an increasing concern about the weak undergraduates' skills and considerable difficulty in reworking and expressing one's own reflection on a topic. In turn, this has implications for being a critical thinker, able to express an original point of view. Tailored interventions in Higher Education could constitute a powerful approach to promote argumentative skills and extend these skills to professional and personal life. In this regard, argument maps (AM) could prove to be a valuable support to the visualization process of arguments. They don’t just create associations between concepts, but trace the logical relationships between different statements, allowing you to track the reasoning chain and understand it better. We conducted an experimental study to investigate how a path with AM could support students in increasing the level of text comprehension (CoT) competence, in terms of identifying the elements of an argumentative text, and critical thinking (CT), in terms of reconstructing meaning and building their own reflection.

Our preliminary descriptive analysis suggested the positivity of the role of AM in increasing students’ CoT and CT proficiency levels

This Zenodo record follows the full analysis process with R (https://cran.r-project.org/bin/windows/base/ ) composed of the following datasets and script:

Comprehension of Text and AMs Results - ExpAM.xlsx

Critical Thinking Results - CriThink.xlsx

Argumentative skills in Forum - ExpForum.xlsx

Selfassessment Results - Dataset_Quest.xlsx

Data for Correlation and Regression - Dataset_CorRegr.xlsx

Descriptive Statistics - Preliminary Analysis.R

Inferential Statistics - Correlation and Regression.R

Any comments or improvements are welcome!
Summary Statistics from "Meta-GWAS of PCSK9 levels detects two novel loci at...
zenodo.org
data.niaid.nih.gov
zip
Updated Aug 3, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Janne Pott; Janne Pott; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz (2022). Summary Statistics from "Meta-GWAS of PCSK9 levels detects two novel loci at APOB and TM6SF2" [Dataset]. http://doi.org/10.5281/zenodo.5643551
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.5643551
Dataset updated
Aug 3, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Janne Pott; Janne Pott; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz; Jesper R. Gadin; Elizabeth Theusch; Marcus E. Kleber; Graciela E. Delgado; Holger Kirsten; Stefanie M. Hauck; Ralph Burkhardt; Hubert Scharnagl; Ronald M. Krauss; Markus Loeffler; Winfried März; Joachim Thiery; Angela Siveira; Ferdinand M. van't Hooft; Markus Scholz
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
GWAMA summary statistics of PCSK9 levels using fixed-effect model. Genome-wide data is given for Europeans with statin adjustment and Europeans without statin treatment only (subset of the population). In addition, locus-wide data of the PCSK9 gene locus for African-Americans without statin treatment is listed.

When using this data, please cite: Pott J, Gadin J, Theusch E, et al.. Meta-GWAS of PCSK9 levels detects two novel loci at APOB and TM6SF2. Hum Mol Genet. 2021 Sep 30:ddab279. doi: 10.1093/hmg/ddab279. PMID: 34590679

All txt files contain the following columns:

markername

chr

bp_hg19 (base position according to hg19)

ea (effect allele)

oa (other allele)

eaf (effect allele frequency)

info (minimal info score across all used studies)

nSamples (sample size per SNP)

nStudies (number of studies)

beta (effect estimate)

se (standard error)

p (p-value)

I2 (SNP heterogeneity across studies)

phenotype (phenotyp setting)
Data from: Posterior predictive checks of coalescent models: P2C2M, an R...
zenodo.org
search.dataone.org
+2more
application/gzip, txt
Updated May 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens; Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens (2022). Data from: Posterior predictive checks of coalescent models: P2C2M, an R package [Dataset]. http://doi.org/10.5061/dryad.n715n
Explore at:
application/gzip, txtAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.n715n
Dataset updated
May 28, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens; Michael Gruenstaeudl; Noah M. Reid; Gregory L. Wheeler; Bryan C. Carstens
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Bayesian inference operates under the assumption that the empirical data are a good statistical fit to the analytical model, but this assumption can be challenging to evaluate. Here, we introduce a novel r package that utilizes posterior predictive simulation to evaluate the fit of the multispecies coalescent model used to estimate species trees. We conduct a simulation study to evaluate the consistency of different summary statistics in comparing posterior and posterior predictive distributions, the use of simulation replication in reducing error rates and the utility of parallel process invocation towards improving computation times. We also test P2C2M on two empirical data sets in which hybridization and gene flow are suspected of contributing to shared polymorphism, which is in violation with the coalescent model: Tamias chipmunks and Myotis bats. Our results indicate that (i) probability-based summary statistics display the lowest error rates, (ii) the implementation of simulation replication decreases the rate of type II errors, and (iii) our r package displays improved statistical power compared to previous implementations of this approach. When probabilistic summary statistics are used, P2C2M corroborates the assumption that genealogies collected from Tamias and Myotis are not a good fit to the multispecies coalescent model. Taken as a whole, our findings argue that an assessment of the fit of the multispecies coalescent model should accompany any phylogenetic analysis that estimates a species tree.

Facebook

Twitter

Click to copy link

Link copied

Cite

Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder (2024). Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx [Dataset]. http://doi.org/10.3389/feduc.2024.1379910.s001

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx

Explore at:

docxAvailable download formats

Unique identifier

https://doi.org/10.3389/feduc.2024.1379910.s001

Dataset updated

Mar 22, 2024

Dataset provided by

Frontiers

Authors

Amy E. Pomeroy; Andrea Bixler; Stefanie H. Chen; Jennifer E. Kerr; Todd D. Levine; Elizabeth F. Ryder

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

As high-throughput methods become more common, training undergraduates to analyze data must include having them generate informative summaries of large datasets. This flexible case study provides an opportunity for undergraduate students to become familiar with the capabilities of R programming in the context of high-throughput evolutionary data collected using macroarrays. The story line introduces a recent graduate hired at a biotech firm and tasked with analysis and visualization of changes in gene expression from 20,000 generations of the Lenski Lab’s Long-Term Evolution Experiment (LTEE). Our main character is not familiar with R and is guided by a coworker to learn about this platform. Initially this involves a step-by-step analysis of the small Iris dataset built into R which includes sepal and petal length of three species of irises. Practice calculating summary statistics and correlations, and making histograms and scatter plots, prepares the protagonist to perform similar analyses with the LTEE dataset. In the LTEE module, students analyze gene expression data from the long-term evolutionary experiments, developing their skills in manipulating and interpreting large scientific datasets through visualizations and statistical analysis. Prerequisite knowledge is basic statistics, the Central Dogma, and basic evolutionary principles. The Iris module provides hands-on experience using R programming to explore and visualize a simple dataset; it can be used independently as an introduction to R for biological data or skipped if students already have some experience with R. Both modules emphasize understanding the utility of R, rather than creation of original code. Pilot testing showed the case study was well-received by students and faculty, who described it as a clear introduction to R and appreciated the value of R for visualizing and analyzing large datasets.

Clear search

Close search

Google apps

Main menu

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene...

Statistical Data Analysis using R

Datasets used in the benchmarking study of MR methods

Political Analysis Using R: Example Code and Data, Plus Data for Practice...

Data from: AMSM Summary Papers- (Chapter One- Introduction)

Summary statistics from "Sex-Specific Causal Relations between Steroid...

TUD R Cafe Plot-a-thon: 4TU.ResearchData statistics

Data_Sheet_2_“R” U ready?: a case study using R to analyze changes in gene...

Full summary statistics from 41 EWAS conducted for the EWAS Catalog

TUD R Cafe Plot-a-thon: A Visual Summary

CONTENT -- Multi-context genetic modeling TWAS summary statistics

Data from: A dataset to model Levantine landcover and land-use change...

Data and Code for "A Ray-Based Input Distance Function to Model Zero-Valued...

CLM AWRA HRVs Uncertainty Analysis

Abstract

Dataset History

Dataset Citation

Dataset Ancestors

Data from: Measuring Spatio-Temporal Civil War Dimensions Using...

APS Employment Data 30 June 2011

Seasonal and annual summary statistics of urbanization, vegetation, land...

Data from: Dataset on "Argument maps as a proxy for critical thinking...

Summary Statistics from "Meta-GWAS of PCSK9 levels detects two novel loci at...

Data from: Posterior predictive checks of coalescent models: P2C2M, an R...

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docxSee More Versions

Data_Sheet_1_“R” U ready?: a case study using R to analyze changes in gene expression during evolution.docx