Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128
Facebook
TwitterEximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
The SAS code (Supplementary File 1) and R program code (Supplementary File 2). For the analysis to proceed, this code requires an input data file (Supplementary File 3-5) prepared in excel format (CSV). Data can be stored in any format such as xlsx, txt, xls and others. Economic values in the SAS code are entered manually in the code, but in the R code are stored in an Excel file (Supplementary File 6).
Facebook
TwitterThese data represent an update to the dataset, "State IO Two-Region Economic Input-Output Models for 50 U.S. States 2012-2017" based on the methods described by Li et al. (2022). They are an update to that dataset published with an expanded time series. These models were produced with the stateior R package, v0.4.0. Excel files (50 in total) are provided for two region (State of Interest and Rest of U.S.) Make and Use tables for each U.S. State IO model for years 2012-2023. Additional data files supporting this release including all intermediate and final products in native R format (.RDS) and can be opened directly in R software or through the stateior package. See the stateior github page for more details. https://dmap-data-commons-ord.s3.amazonaws.com/index.html#stateio/ All values are in current dollar years (e.g "Make 2012" is the Make table in 2012 USD in a given model). For a description of the methods used and survey of results see the Addendum 1 on the EPA Science Inventory page for the original publication. Please cite this dataset as: Young, Ben, Julie Chen, Jorge Vendries, and Wesley Ingwersen. 2025. “StateIO v0.4.0 Two-Region Economic Input-Output Models for 50 U.S. States: 2012-2023.” Data.gov. https://doi.org/10.23719/1532211. This dataset is associated with the following publication: Li, M., J. Ferreira, C.D. Court, D. Meyer, M. Li, and W.W. Ingwersen. StateIO - Open Source Economic Input-Output Models for the 50 States of the United States of America. International Regional Science Review. SAGE Publications, THOUSAND OAKS, CA, USA, 46(4): 428-481, (2023).
Facebook
TwitterMarket basket analysis with Apriori algorithm
The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.
Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.
Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.
Number of Attributes: 7
https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">
First, we need to load required libraries. Shortly I describe all libraries.
https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">
Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.
https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png">
https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">
After we will clear our data frame, will remove missing values.
https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">
To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Facebook
TwitterSediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. The Dataset consists of a Portable Document Format (.pdf) file of the Voucher Flora, six Microsoft Excel (.xlsx) data files, an R script, and five output Comma Separated Values (.csv) files. The Voucher Flora documents the morphological species concepts in the dataset using diatom images compiled into plates (NE_Lakes_Voucher_Flora_102421.pdf) and the translation scheme of the OTU codes to diatom scientific or provisional names with identification sources, references, and notes (VoucherFloraTranslation_102421.xlsx). The file Slide_accession_numbers_102421.xlsx has slide accession numbers in the ANS Diatom Herbarium. The “DiatomHarmonization_032222_files for R.zip” archive contains four Excel input data files, the R code, and a subfolder “OUTPUT” with five .csv files. The file Counts_original_long_102421.xlsx contains original diatom count data in long format. The file Harmonization_102421.xlsx is the taxonomic harmonization scheme with notes and references. The file SiteInfo_031922.xlsx contains sampling site- and sample-level information. WaterQualityData_021822.xlsx is a supplementary file with water quality data. R code (DiatomHarmonization_032222.R) was used to apply the harmonization scheme to the original diatom counts to produce the output files. The resulting output files are five wide format files containing diatom count data at different harmonization steps (Counts_1327_wide.csv, Step1_1327_wide.csv, Step2_1327_wide.csv, Step3_1327_wide.csv) and the summary of the Indicator Species Analysis (INDVAL_RESULT.csv). The harmonization scheme (Harmonization_102421.xlsx) can be further modified based on additional taxonomic investigations, while the associated R code (DiatomHarmonization_032222.R) provides a straightforward mechanism to diatom data versioning. This dataset is associated with the following publication: Potapova, M., S. Lee, S. Spaulding, and N. Schulte. A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States. Scientific Data. Springer Nature, New York, NY, 9(540): 1-8, (2022).
Facebook
TwitterAttribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
This dataset provides basic information about Linny-R software developed in TU Delft for industrial system optimization and describes its functionalities in cluster modeling. Moreover, it presents technical and economic input data to Linny-R models generated in the manuscript: Exploring the emergence of waste recovery and exchange in industrial clusters. Then, it presents the Linny-R models and their results in an excel file.
Facebook
TwitterA comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework consists of three major phases: Phase 1—Preliminary raw data sets exploration, including time formatting and combining datasets of different lengths and different time intervals; Phase 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme values; and Phase 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado) were analyzed. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.The files that are in this data package include one excel file, converted to CSV format (Billy_Barr_raw_qaqc.csv) that contains the raw meteorological data, i.e., input data used for the QA/QC analysis. The second CSV file (Billy_Barr_1hr.csv) is the QA/QC and flagged meteorological data, i.e., output data from the QA/QC analysis. The last file (QAQC_Billy_Barr_2021-03-22.R) is a script written in R that implements the QA/QC and flagging process. The purpose of the CSV data files included in this package is to provide input and output files implemented in the R script.
Facebook
Twitteremografie; vraag Q2 Academic Bouyancy; Vraag Q3 (Academic Buoyancy Scale, Martin & Marsh, 2008a, 2008b)). Welbevinden/Positieve geestelijke gezondheid (MHC-SF); Vraag Q4 Persoonlijke groei; Vraag Q5 (Onderdeel van de Psychologisch Welbevinden Schaal (PWB)) Psychopathologie (HADS); Vraag Q6 Vitaliteit; Vraag Q7 (Subjective vitality scale, Ryan en Frederick (1997)) Flourishing Scale; Vraag Q8 (Diener, E., Wirtz, D., Tov, W., Kim-Prieto, C., Choi, D., Oishi, S., & Biswas-Diener, R. (2009)). Warwick-Edinburgh Mental Well-being Scale; Vraag Q9 Negatief affect (MDES State); Vraag Q10 Welbevinden(SWLS); Vraag Q11 Psychologisch Welbevinden Schaal (PWB)–Doelen; Vraag Q12 (http://danrobertsgroup.com/wp-content/uploads/2018/02/PWB-Scale.pdf) De Basic Psychological Need Satisfaction Scale was employed (Deci & Ryan, 2000; Gagné, 2003) Vraag Q13 (Annette F.J. Custers , Gerben J. Westerhof , Yolande Kuin & Marianne RiksenWalraven (2010) Need fulfillment in caring relationships: Its relation with well-being of residents in somatic nursing homes, Aging & Mental Health, 14:6, 731-739, DOI: 10.1080/13607861003713133) Zelfcompassie; Vraag Q14 (Self-Compassion Scale- Short Form) Psychologisch Welbevinden Schaal(PWB) – Zelfacceptatie; Vraag Q15 SF-12; Vraag Q16 Coping (CISS); Vraag Q18 (http://www.psychiatryinvestigation.org/journal/view.php?number=374) Veerkracht (RS-NL); Vraag Q19
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.
Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.
Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.
Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.
Facebook
TwitterOpen Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
The "yahoo_finance_dataset(2018-2023)" dataset is a financial dataset containing daily stock market data for multiple assets such as equities, ETFs, and indexes. It spans from April 1, 2018 to March 31, 2023, and contains 1257 rows and 7 columns. The data was sourced from Yahoo Finance, and the purpose of the dataset is to provide researchers, analysts, and investors with a comprehensive dataset that they can use to analyze stock market trends, identify patterns, and develop investment strategies. The dataset can be used for various tasks, including stock price prediction, trend analysis, portfolio optimization, and risk management. The dataset is provided in XLSX format, which makes it easy to import into various data analysis tools, including Python, R, and Excel.
The dataset includes the following columns:
Date: The date on which the stock market data was recorded. Open: The opening price of the asset on the given date. High: The highest price of the asset on the given date. Low: The lowest price of the asset on the given date. Close*: The closing price of the asset on the given date. Note that this price does not take into account any after-hours trading that may have occurred after the market officially closed. Adj Close**: The adjusted closing price of the asset on the given date. This price takes into account any dividends, stock splits, or other corporate actions that may have occurred, which can affect the stock price. Volume: The total number of shares of the asset that were traded on the given date.
Facebook
TwitterLength data were collected for wild-caught Noturus taylori from sample sites throughout the known distribution range. Habitat variables were collected from each 2x2 meter plot where Noturus taylori was captured during seasonal sampling.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Excel file containing input data for the case study. The excel file is composed of one sheet for each exposure and a last sheet containing grouping information. Each exposure sheet is named with the exposure ID and contains two columns containing the list of selected genes and the associated t-statistics, respectively. The last sheet contains two columns: one reporting the list of exposure IDs and another the corresponding group. (XLSX 63 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
An Excel spreadsheet containing a sheet that shows overall killing data for all strains in the associated manuscript. Also contains a sheet for the subset of strains used in killing and sensitivity matrix clustering, which contains the calculations for overall amount of killing activity shown in Figure 2.
Facebook
TwitterThis is part 2 of INDILACT, part 1 is published separately.
The objective of this study is to investigate how a customized voluntary waiting period before first insemination in prmiparous dairy cows would affect milk production, fertility and health of primparous dairy cows during their first calving interval.
The data was registered between January 2019 and october 2022.
This data is archived: - Metadata (publically available) - Raw data (.txt files) from the Swedish national herd recording scheme (SNDRS), operated by Växa Sverige: access restricted due to agreements with the principle owners of the data, Växa Sverige and the farms. Code lists are available in INDILACT part 1. - Aggregated data (Excel files): access restricted due to agreements with the principle owners of the data, Växa Sverige and the farms - R- scripts with statistical calculations (Openly available)
Metadata (3 filer): - Metadata gentypning: The only new file type compared to INDILACT Part 1, description of how this data category have been handled. The other file-types have been handled in the same way as in INDILACT Part 1. - Metadata - del 2 - General summary of initioal data handeling for aggregation of the files of the same types (dates etc.) to create excel-files used in the R-scripts. - DisCodes: Divisions of the diagnoses into categories.
Raw data: -59 .txt files containing data retrieved from SNDRS from 8 separate occacions. -Data from 18 Swedish farms from Jan 2019 to Oct 2022.
Aggregeated data: - 29 Excelfiles. The textfiles have been transformed to Excel formate and all data from the same file type is aggregated into one file. - Data collected from the farms by email and phone contact, about individual cows enrolled in the trial, from Oct 2020 to Oct 2022. - One merged Script derived from initial data handeling in R where relevant variables were calculated and aggregated to be used for statistical calculations.
R-script with data handeling and statistical calculations: - "Data analysis part 2 - final": Data handeling to create the file used in the statistical calculations. - "Part 2 - Binomial models - Fertility": Statistiscal calculations of variables using Binomial models. - "Part 2 - glmmTMB models - Fertility": Statistiscal calculations of variables using glmmTMB models. - "Part 2 - linear models - Fertility": Statistiscal calculations of fertility variables using linear models. - "Part 2 - linear models": Statistiscal calculations of milk variables using linear models.
Running the R scripts requires access to the restricted files. The files should be unpacked in a subdirectory "data" relative to the working directory for the scripts. See also the file "sessionInfo.txt" for information on R packages used.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
a Food Standards Australia and New Zealand,b United States Department of Agriculture,c Food Standards Agency,d Separate databases for flavonoids, carotenoids, proanthocyanidins and isoflavones,e Eurofir EBASIS contains bioactive data for UK and Europe,f National Health Survey,ghttps://www.xyris.com.au/foodworks/fw_pro.html,hhttp://www.nutribase.com/highend.html,ihttp://www.foodresearch.ca/wp-content/uploads/2013/06/candat-features-1.pdf,j Tinuviel Software,i Downlees Systems,k Forestfield Software,l Kelicomp,mhttp://www.tinuvielsoftware.com/faqs.htm,nhttp://www.dietsoftware.com/canada.html,o Text file: a file that only contains text,p A file containing tables of information stored in columns and separated by tabs (can be exported into almost any spreadsheet program),q Microsoft Excel spreadsheet,r Microsoft Access Database file: is a database file with automated functions and queries,s American Standard Code for Information Interchange (a standard file type that can be used by many programs),t Database File Format (this file type can be opened with Microsoft Excel and Access),u information to create Excel or PDF available,v Composition of Foods, Australia,w International Network of Food Data System,x Users guide states food name is most descriptive & recognisable of food referencedyhttp://www.foodstandards.gov.au/science/monitoringnutrients/nutrientables/nuttab/Pages/NUTTAB-2010-electronic-database-files.aspx,zhttp://www.foodstandards.gov.au/science/monitoringnutrients/ausnut/ausnutdatafiles/Pages/default.aspx,aahttp://ndb.nal.usda.gov/ndb/search/list,bbhttp://tna.europarchive.org/20110116113217/http://www.food.gov.uk/science/dietarysurveys/dietsurveys/,cchttp://webprod3.hc-sc.gc.ca/cnf-fce/index-eng.jspDesktop analysis and examination of six key food composition databases format.
Facebook
TwitterThis file describes the dataset used in Ou et al., "Air pollution control strategies directly limiting national health damages in the US." This work used the Global Change Assessment Model (GCAM) with state-level representation of the U.S. energy system (GCAM-USA). GCAM and GCAM-USA are developed and released by the University of Maryland/Pacific Northwest National Laboratory Joint Global Change Research Center (JGCRI). For further details, see the GCAM documentation: jgcri.github.io/gcam-doc. The model source code is available at github.com/JGCRI/gcam-core. A modified version of GCAMv4.3 was used for this analysis. Source code and input data specific for this paper are available upon request. This dataset contains Excel spreadsheets and an R script that link to comma-separated values (CSV) files that were extracted from the model output. The spreadsheets and scripts show the data and reproduce each of the figures in the paper. This dataset is associated with the following publication: Ou, Y., J. West, S. Smith, C. Nolte, and D. Loughlin. Air pollution control strategies directly limiting national health damages in the US.. Nature Communications. Nature Publishing Group, London, UK, 11: 957, (2020).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Photo credit: taken from the Weekly Oil Bulletin homepage. License: taken from Copyright notice.
"To improve the transparency of oil prices and to strengthen the internal market, the European Commission's Oil Bulletin presents weekly consumer prices for petroleum products in EU countries."
The Weekly Oil Bulletin is a weekly publication that began in 1994 and contains information about petroleum products prices. Unfortunately, the data publication method is not up to date to modern standards and they do not seem to be up to the task in updating their publication method [1].
Having Excel files with multiple sheets and different formats is not an appropriate format for efficient data analysis, so I decided to refactor the available data in a single file for easy manipulation. All prices are in EUR and specify the currency exchange rate.
You can find the code to reproduce in dbt-eu-oil-bulletin, feel free to open an issue if you find errors or have any doubts about the procedures.
The dataset is distributed in the Apache Parquet format, and contains the following columns:
| Column | Type | Description |
|---|---|---|
| date | DATE | |
| country_name | STR | |
| country_code | STR | Two letter country code |
| product_name | STR | |
| price_units | STR | '1000L' or 't' (tonne) |
| euro_exch_rate | DECIMAL | |
| currency_code | STR | Three letter code with the original country currency price |
| price | DECIMAL | |
| taxes | BOOLEAN | Whether the price includes taxes or not |
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FireClay_data_table.xlsx is an excel file with the raw data 40Ar/39Ar single fusion measurements on the Fire Clay tonstein. This data is full reduced and defines the interference corrections used and methodology of the mass discrimination.Final_Bayes_Input_Data.xlsx is an excel file with all input data into the model. This file contains: 238U/206Pb - measured ratios, uncertainties, and MSWD (where available); 235U/207Pb - measured ratios, uncertainties, and MSWD (where available); RX_FCs - measured ratios, uncertainties, and MSWD (where available). File also contains some notes and descriptions of each sample.BayesCal_InputData.ipynb -(Jupyter notebook) Reads in the preliminary file from pathway, modifies and propagates uncertainties using linear matrix algebra and sqrt(mswd) saves final data frame to user defined excel fileRunning_Input_Data Class.ipynb (Jupyter notebook) - Example of how to read in preliminary data and get formatted final input dataset for the model.Uni_delta.xlsx - Excel Spread sheet of delta ages between 235U/2307Pb and 40Ar/39Ar and 238U/2306Pb and 40Ar/39ArUni_Res_Time.xlsx - Excel Spread sheet of residence times for all appropriate samplesUni_U_decay_constants.xlsx - Excel Spread sheet of posterior values and uncertainties for 235U and 238U decay constantsUni_lam.xlsx - Summary of all posterior values for 40K decay schemeUni_FCs.xlsx - Summary of all posterior values for Fish Canyon sanidine Uni_u238_Xi.xlsx - Summary of all posterior values for age perturbations for each 238U/206Pb age of each sampleUni_u235_Xi.xlsx - Summary of all posterior values for age perturbations for each 235U/207Pb age of each sampleUni_Ar_Xi.xlsx - Summary of all posterior values for age perturbations for each 40Ar/39Ar age of each sampleUni_K40K_samples.xlsx - Summary of all posterior values for 40K/K ratios of each sampleUni_K40K_standards.xlsx - Summary of all posterior values for 40K/K ratio for Fish Canyon sanidine and the 40K/K for the decay constant materialPotassium Isotopic Variability and implications for age.ipynb - Notebook for summary plot of potassium isotopic variability - Figures S8 and S9 in the supplementary materialCalibration Comparison Figure.ipynb - Notebook for plotting comparison of Min/Kuiper, Renne et al., and the Bayesian calibrations as a function of age. Figure 3 in the manuscript.Bayesian Calibration Paper Plots.ipynb - Notebook for plotting Figures 1, 2, and 4 in the associated manuscript.Residence time and delta age plots.ipynb - Plots of age perturbation parameter values and residence time - Figures S5 and S6 in the supplementary material. Ar_ages_and_uncertainties (Jupyter notebook) - Notebook for calculating ages and uncertainties given a R-value relationship between the unknown and Fish Canyon sanidine. Covariance between the total decay constant and FCs is included. It is assumed that the the covariance in a measured R-value to both the FCs age and total decay constant is zero. Ages are given for the R-values reported in Table 9 in the manuscript and reported in section 4.5. All ages should agree.BayesCal Comp (Jupyter notebook) - Jupyter notebook containing the entire code for the MCMC algorithm described in the "Bayesian Calibration of the 40K Decay Scheme and its implications for 40K-based geochronology" manuscript. Some plots are given in this note book but, the majority are store as summary excel files to be plotted elsewhere.Potassium Isotopic Variability and implications for age.ipynb - Notebook for plotting and summary of the implications for a calculated 40Ar/39Ar age accounting for the model variance in the 40K/K ratio between samples, FCs neutron fluence monitor, and material used to measure the total decay constant.Residence time and delta age plots.ipynb - Notebook for plotting the output residence times summary and the age-perturbing parameters summary.40K decay constant comparison.ipynb - Notebook for plotting the model estimate 40K decay constant against all other decay constants (Table 1 in the manuscript). Also contains calculation for updating the decay constant for variability in 40K/K.Bayesian Calibration Paper Plots.ipynb - Notebook for reading in the model outputs and plotting Figures 1, 2, and 4 in the manuscript.Calibration Comparison Figure.ipynb - Notebook for comparison of the most widely used calibrations of Min/Kuiper et al. and Renne et al. 2011 with the Bayesian calibration this study.Z-score_comparison.ipynb - Notebook for calculating the Z-scores of the Renne et al. (2011) and Bayesian calibrations ages of ACs, Melilla sample, and FCs with the astronomical age estimates.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This model was developed and applied by the EFSA Working Group on Working Group on the evaluation of substances used to remove microbial contamination from product of animal origin during the preparatory work on the Scientific Opinion ‘Evaluation of the safety and efficacy of the organic acids lactic and acetic acids to reduce microbiological surface contamination on pork carcasses and pork cuts' (see http://doi.org/10.2903/j.efsa.2018.5482).
The code (SAS and R) has been used to evaluate the efficacy of two organic acids, lactic and acetic acid, intended to be used individually by food business operators during processing to reduce microbiological surface contamination on carcasses and cuts from pork. The reduction is expressed as log10 reduction, i.e. the difference between the means of the log10 concentrations of control group and treated group and corresponding 95% confidence interval (95% CI) when this information was available.
The code may be run using the input data from the excel table 'Data extraction.xlsx'.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Categorical scatterplots with R for biologists: a step-by-step guide
Benjamin Petre1, Aurore Coince2, Sophien Kamoun1
1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK
Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.
Protocol
• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.
• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.
• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.
Notes
• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.
• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.
replicates
graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()
References
Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.
Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035
Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128