34 datasets found

Petre_Slide_CategoricalScatterplotFigShare.pptx
figshare.com
pptx
Updated Sep 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1
Explore at:
pptxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3840102.v1
Dataset updated
Sep 19, 2016
Dataset provided by
Figsharehttp://figshare.com/
Authors
Benj Petre; Aurore Coince; Sophien Kamoun
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/
e
Eximpedia Export Import Trade
eximpedia.app
Updated Oct 7, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seair Exim (2025). Eximpedia Export Import Trade [Dataset]. https://www.eximpedia.app/
Explore at:
.bin, .xml, .csv, .xlsAvailable download formats
Dataset updated
Oct 7, 2025
Dataset provided by
Eximpedia Export Import Trade Data
Eximpedia PTE LTD
Authors
Seair Exim
Area covered
Bahrain, Swaziland, Tokelau, Indonesia, Lithuania, India, Virgin Islands (British), Haiti, Palau, Romania
Description
Eximpedia Export import trade data lets you search trade data and active Exporters, Importers, Buyers, Suppliers, manufacturers exporters from over 209 countries
e
The simple and new SAS and R codes to estimate optimum and base selection...
ebi.ac.uk
Updated Jun 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
mehdi rahimi (2022). The simple and new SAS and R codes to estimate optimum and base selection indices to choice superior genotypes in plants and animals breeding program [Dataset]. https://www.ebi.ac.uk/biostudies/studies/S-BSST853
Explore at:
Dataset updated
Jun 10, 2022
Authors
mehdi rahimi
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The SAS code (Supplementary File 1) and R program code (Supplementary File 2). For the analysis to proceed, this code requires an input data file (Supplementary File 3-5) prepared in excel format (CSV). Data can be stored in any format such as xlsx, txt, xls and others. Economic values in the SAS code are entered manually in the code, but in the R code are stored in an Excel file (Supplementary File 6).
g
StateIO v0.4.0 Two-Region Economic Input-Output Models for 50 U.S. States:...
gimi9.com
Updated Apr 24, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2025). StateIO v0.4.0 Two-Region Economic Input-Output Models for 50 U.S. States: 2012-2023 | gimi9.com [Dataset]. https://gimi9.com/dataset/data-gov_stateio-v0-4-0-two-region-economic-input-output-models-for-50-u-s-states-2012-2023/
Explore at:
Dataset updated
Apr 24, 2025
Area covered
United States
Description
These data represent an update to the dataset, "State IO Two-Region Economic Input-Output Models for 50 U.S. States 2012-2017" based on the methods described by Li et al. (2022). They are an update to that dataset published with an expanded time series. These models were produced with the stateior R package, v0.4.0. Excel files (50 in total) are provided for two region (State of Interest and Rest of U.S.) Make and Use tables for each U.S. State IO model for years 2012-2023. Additional data files supporting this release including all intermediate and final products in native R format (.RDS) and can be opened directly in R software or through the stateior package. See the stateior github page for more details. https://dmap-data-commons-ord.s3.amazonaws.com/index.html#stateio/ All values are in current dollar years (e.g "Make 2012" is the Make table in 2012 USD in a given model). For a description of the methods used and survey of results see the Addendum 1 on the EPA Science Inventory page for the original publication. Please cite this dataset as: Young, Ben, Julie Chen, Jorge Vendries, and Wesley Ingwersen. 2025. “StateIO v0.4.0 Two-Region Economic Input-Output Models for 50 U.S. States: 2012-2023.” Data.gov. https://doi.org/10.23719/1532211. This dataset is associated with the following publication: Li, M., J. Ferreira, C.D. Court, D. Meyer, M. Li, and W.W. Ingwersen. StateIO - Open Source Economic Input-Output Models for the 50 States of the United States of America. International Regional Science Review. SAGE Publications, THOUSAND OAKS, CA, USA, 46(4): 428-481, (2023).
Market Basket Analysis
kaggle.com
zip
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
zip(23875170 bytes)Available download formats
Dataset updated
Dec 9, 2021
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Harmonization of sediment diatoms from hundreds of lakes in the northeastern...
catalog.data.gov
datasets.ai
+1more
Updated Sep 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. EPA Office of Research and Development (ORD) (2022). Harmonization of sediment diatoms from hundreds of lakes in the northeastern United States [Dataset]. https://catalog.data.gov/dataset/harmonization-of-sediment-diatoms-from-hundreds-of-lakes-in-the-northeastern-united-states
Explore at:
Dataset updated
Sep 13, 2022
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Area covered
Northeastern United States, United States
Description
Sediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. The Dataset consists of a Portable Document Format (.pdf) file of the Voucher Flora, six Microsoft Excel (.xlsx) data files, an R script, and five output Comma Separated Values (.csv) files. The Voucher Flora documents the morphological species concepts in the dataset using diatom images compiled into plates (NE_Lakes_Voucher_Flora_102421.pdf) and the translation scheme of the OTU codes to diatom scientific or provisional names with identification sources, references, and notes (VoucherFloraTranslation_102421.xlsx). The file Slide_accession_numbers_102421.xlsx has slide accession numbers in the ANS Diatom Herbarium. The “DiatomHarmonization_032222_files for R.zip” archive contains four Excel input data files, the R code, and a subfolder “OUTPUT” with five .csv files. The file Counts_original_long_102421.xlsx contains original diatom count data in long format. The file Harmonization_102421.xlsx is the taxonomic harmonization scheme with notes and references. The file SiteInfo_031922.xlsx contains sampling site- and sample-level information. WaterQualityData_021822.xlsx is a supplementary file with water quality data. R code (DiatomHarmonization_032222.R) was used to apply the harmonization scheme to the original diatom counts to produce the output files. The resulting output files are five wide format files containing diatom count data at different harmonization steps (Counts_1327_wide.csv, Step1_1327_wide.csv, Step2_1327_wide.csv, Step3_1327_wide.csv) and the summary of the Indicator Species Analysis (INDVAL_RESULT.csv). The harmonization scheme (Harmonization_102421.xlsx) can be further modified based on additional taxonomic investigations, while the associated R code (DiatomHarmonization_032222.R) provides a straightforward mechanism to diatom data versioning. This dataset is associated with the following publication: Potapova, M., S. Lee, S. Spaulding, and N. Schulte. A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States. Scientific Data. Springer Nature, New York, NY, 9(540): 1-8, (2022).
4
Industrial cluster modeling in Linn-R; supporting the publication: Exploring...
data.4tu.nl
figshare.com
zip
Updated Jan 24, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shiva Noori (2022). Industrial cluster modeling in Linn-R; supporting the publication: Exploring the emergence of waste recovery and exchange in industrial clusters [Dataset]. http://doi.org/10.4121/17886707.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.4121/17886707.v1
Dataset updated
Jan 24, 2022
Dataset provided by
4TU.ResearchData
Authors
Shiva Noori
License
Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
License information was derived automatically
Description
This dataset provides basic information about Linny-R software developed in TU Delft for industrial system optimization and describes its functionalities in cluster modeling. Moreover, it presents technical and economic input data to Linny-R models generated in the manuscript: Exploring the emergence of waste recovery and exchange in industrial clusters. Then, it presents the Linny-R models and their results in an excel file.
Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series...
osti.gov
dataone.org
+1more
Updated Dec 31, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Environmental System Science Data Infrastructure for a Virtual Ecosystem (2020). Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series Data for Billy Barr, East River, Colorado USA [Dataset]. http://doi.org/10.15485/1823516
Explore at:
Unique identifier
https://doi.org/10.15485/1823516
Dataset updated
Dec 31, 2020
Dataset provided by
Office of Sciencehttp://www.er.doe.gov/
Environmental System Science Data Infrastructure for a Virtual Ecosystem
Area covered
Colorado, East River, United States
Description
A comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework consists of three major phases: Phase 1—Preliminary raw data sets exploration, including time formatting and combining datasets of different lengths and different time intervals; Phase 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme values; and Phase 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado) were analyzed. The developed statistical framework is suitable for both real-time and post-data-collection QA/QC analysis of meteorological datasets.The files that are in this data package include one excel file, converted to CSV format (Billy_Barr_raw_qaqc.csv) that contains the raw meteorological data, i.e., input data used for the QA/QC analysis. The second CSV file (Billy_Barr_1hr.csv) is the QA/QC and flagged meteorological data, i.e., output data from the QA/QC analysis. The last file (QAQC_Billy_Barr_2021-03-22.R) is a script written in R that implements the QA/QC and flagging process. The purpose of the CSV data files included in this package is to provide input and output files implemented in the R script.
data excel
kaggle.com
zip
Updated Apr 10, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
tijurrr (2023). data excel [Dataset]. https://www.kaggle.com/datasets/tijurrr/data-excel
Explore at:
zip(111996 bytes)Available download formats
Dataset updated
Apr 10, 2023
Authors
tijurrr
Description
emografie; vraag Q2 Academic Bouyancy; Vraag Q3 (Academic Buoyancy Scale, Martin & Marsh, 2008a, 2008b)). Welbevinden/Positieve geestelijke gezondheid (MHC-SF); Vraag Q4 Persoonlijke groei; Vraag Q5 (Onderdeel van de Psychologisch Welbevinden Schaal (PWB)) Psychopathologie (HADS); Vraag Q6 Vitaliteit; Vraag Q7 (Subjective vitality scale, Ryan en Frederick (1997)) Flourishing Scale; Vraag Q8 (Diener, E., Wirtz, D., Tov, W., Kim-Prieto, C., Choi, D., Oishi, S., & Biswas-Diener, R. (2009)). Warwick-Edinburgh Mental Well-being Scale; Vraag Q9 Negatief affect (MDES State); Vraag Q10 Welbevinden(SWLS); Vraag Q11 Psychologisch Welbevinden Schaal (PWB)–Doelen; Vraag Q12 (http://danrobertsgroup.com/wp-content/uploads/2018/02/PWB-Scale.pdf) De Basic Psychological Need Satisfaction Scale was employed (Deci & Ryan, 2000; Gagné, 2003) Vraag Q13 (Annette F.J. Custers , Gerben J. Westerhof , Yolande Kuin & Marianne RiksenWalraven (2010) Need fulfillment in caring relationships: Its relation with well-being of residents in somatic nursing homes, Aging & Mental Health, 14:6, 731-739, DOI: 10.1080/13607861003713133) Zelfcompassie; Vraag Q14 (Self-Compassion Scale- Short Form) Psychologisch Welbevinden Schaal(PWB) – Zelfacceptatie; Vraag Q15 SF-12; Vraag Q16 Coping (CISS); Vraag Q18 (http://www.psychiatryinvestigation.org/journal/view.php?number=374) Veerkracht (RS-NL); Vraag Q19
Survey Data of the socio-demographic, economic and water source types that...
zenodo.org
datadryad.org
bin, csv
Updated Jun 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael (2022). Survey Data of the socio-demographic, economic and water source types that influences HHs drinking water supply [Dataset]. http://doi.org/10.5061/dryad.mw6m905w8
Explore at:
bin, csvAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.mw6m905w8
Dataset updated
Jun 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Shewayiref Geremew Gebremichael; Shewayiref Geremew Gebremichael
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
Background: Clean water is an essential part of human healthy life and wellbeing. More recently, rapid population growth, high illiteracy rate, lack of sustainable development, and climate change; faces a global challenge in developing countries. The discontinuity of drinking water supply forces households either to use unsafe water storage materials or to use water from unsafe sources. The present study aimed to identify the determinants of water source types, use, quality of water, and sanitation perception of physical parameters among urban households in North-West Ethiopia.

Methods: A community-based cross-sectional study was conducted among households from February to March 2019. An interview-based a pretested and structured questionnaire was used to collect the data. Data collection samples were selected randomly and proportional to each of the kebeles' households. MS Excel and R Version 3.6.2 were used to enter and analyze the data; respectively. Descriptive statistics using frequencies and percentages were used to explain the sample data concerning the predictor variable. Both bivariate and multivariate logistic regressions were used to assess the association between independent and response variables.

Results: Four hundred eighteen (418) households have participated. Based on the study undertaken,78.95% of households used improved and 21.05% of households used unimproved drinking water sources. Households drinking water sources were significantly associated with the age of the participant (x2 = 20.392, df=3), educational status(x2 = 19.358, df=4), source of income (x2 = 21.777, df=3), monthly income (x2 = 13.322, df=3), availability of additional facilities (x2 = 98.144, df=7), cleanness status (x2 =42.979, df=4), scarcity of water (x2 = 5.1388, df=1) and family size (x2 = 9.934, df=2). The logistic regression analysis also indicated that those factors are significantly determining the water source types used by the households. Factors such as availability of toilet facility, household member type, and sex of the head of the household were not significantly associated with drinking water sources.

Conclusion: The uses of drinking water from improved sources were determined by different demographic, socio-economic, sanitation, and hygiene-related factors. Therefore, ; the local, regional, and national governments and other supporting organizations shall improve the accessibility and adequacy of drinking water from improved sources in the area.
Yahoo Finance Dataset (2018-2023)
kaggle.com
zip
Updated May 9, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Suruchi Arora (2023). Yahoo Finance Dataset (2018-2023) [Dataset]. https://www.kaggle.com/datasets/suruchiarora/yahoo-finance-dataset-2018-2023
Explore at:
zip(79394 bytes)Available download formats
Dataset updated
May 9, 2023
Authors
Suruchi Arora
License
Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
License information was derived automatically
Description
The "yahoo_finance_dataset(2018-2023)" dataset is a financial dataset containing daily stock market data for multiple assets such as equities, ETFs, and indexes. It spans from April 1, 2018 to March 31, 2023, and contains 1257 rows and 7 columns. The data was sourced from Yahoo Finance, and the purpose of the dataset is to provide researchers, analysts, and investors with a comprehensive dataset that they can use to analyze stock market trends, identify patterns, and develop investment strategies. The dataset can be used for various tasks, including stock price prediction, trend analysis, portfolio optimization, and risk management. The dataset is provided in XLSX format, which makes it easy to import into various data analysis tools, including Python, R, and Excel.

The dataset includes the following columns:

Date: The date on which the stock market data was recorded. Open: The opening price of the asset on the given date. High: The highest price of the asset on the given date. Low: The lowest price of the asset on the given date. Close*: The closing price of the asset on the given date. Note that this price does not take into account any after-hours trading that may have occurred after the market officially closed. Adj Close**: The adjusted closing price of the asset on the given date. This price takes into account any dividends, stock splits, or other corporate actions that may have occurred, which can affect the stock price. Volume: The total number of shares of the asset that were traded on the given date.
d
Data for: Aspects of distribution, abundance, habitat, and life history of...
datadryad.org
zip
Updated May 15, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brittany McCall; Brook Fluker (2023). Data for: Aspects of distribution, abundance, habitat, and life history of the Caddo Madtom (Noturus taylori), a narrow endemic of the Ouachita Highlands [Dataset]. http://doi.org/10.5061/dryad.f7m0cfz20
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.f7m0cfz20
Dataset updated
May 15, 2023
Dataset provided by
Dryad
Authors
Brittany McCall; Brook Fluker
Time period covered
May 10, 2023
Description
Length data were collected for wild-caught Noturus taylori from sample sites throughout the known distribution range. Habitat variables were collected from each 2x2 meter plot where Noturus taylori was captured during seasonal sampling.
Additional file 2 of FunMappOne: a tool to hierarchically organize and...
springernature.figshare.com
xlsx
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Giovanni Scala; Angela Serra; Veer Marwah; Laura SaarimĂ¤ki; Dario Greco (2023). Additional file 2 of FunMappOne: a tool to hierarchically organize and visually navigate functional gene annotations in multiple experiments [Dataset]. http://doi.org/10.6084/m9.figshare.7729541.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7729541.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Figsharehttp://figshare.com/
Authors
Giovanni Scala; Angela Serra; Veer Marwah; Laura SaarimĂ¤ki; Dario Greco
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Excel file containing input data for the case study. The excel file is composed of one sheet for each exposure and a last sheet containing grouping information. Each exposure sheet is named with the exposure ID and contains two columns containing the list of selected genes and the associated t-statistics, respectively. The last sheet contains two columns: one reporting the list of exposure IDs and another the corresponding group. (XLSX 63 kb)
Overall Matrix.xlsx
figshare.com
xlsx
Updated Dec 27, 2017
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David Baltrus (2017). Overall Matrix.xlsx [Dataset]. http://doi.org/10.6084/m9.figshare.5734125.v2
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.5734125.v2
Dataset updated
Dec 27, 2017
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
David Baltrus
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
An Excel spreadsheet containing a sheet that shows overall killing data for all strains in the associated manuscript. Also contains a sheet for the subset of strains used in killing and sensitivity matrix clustering, which contains the calculations for overall amount of killing activity shown in Figure 2.
r
Data from: INDILACT – Extended voluntary waiting period in primiparous dairy...
researchdata.se
Updated Mar 13, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Anna Edvardsson Rasmussen (2025). INDILACT – Extended voluntary waiting period in primiparous dairy cows. Part 2: Customized VWP – Metadata and R–scripts with statistical calculations [Dataset]. https://researchdata.se/en/catalogue/dataset/2024-424
Explore at:
(108812), (428272), (22393), (5799), (3243), (23818), (5034), (747913), (7170)Available download formats
Dataset updated
Mar 13, 2025
Dataset provided by
Swedish University of Agricultural Sciences (SLU)
Authors
Anna Edvardsson Rasmussen
Time period covered
Jan 1, 2019 - Oct 27, 2022
Area covered
Sweden
Description
This is part 2 of INDILACT, part 1 is published separately.

The objective of this study is to investigate how a customized voluntary waiting period before first insemination in prmiparous dairy cows would affect milk production, fertility and health of primparous dairy cows during their first calving interval.

The data was registered between January 2019 and october 2022.

This data is archived: - Metadata (publically available) - Raw data (.txt files) from the Swedish national herd recording scheme (SNDRS), operated by Växa Sverige: access restricted due to agreements with the principle owners of the data, Växa Sverige and the farms. Code lists are available in INDILACT part 1. - Aggregated data (Excel files): access restricted due to agreements with the principle owners of the data, Växa Sverige and the farms - R- scripts with statistical calculations (Openly available)

Metadata (3 filer): - Metadata gentypning: The only new file type compared to INDILACT Part 1, description of how this data category have been handled. The other file-types have been handled in the same way as in INDILACT Part 1. - Metadata - del 2 - General summary of initioal data handeling for aggregation of the files of the same types (dates etc.) to create excel-files used in the R-scripts. - DisCodes: Divisions of the diagnoses into categories.

Raw data: -59 .txt files containing data retrieved from SNDRS from 8 separate occacions. -Data from 18 Swedish farms from Jan 2019 to Oct 2022.

Aggregeated data: - 29 Excelfiles. The textfiles have been transformed to Excel formate and all data from the same file type is aggregated into one file. - Data collected from the farms by email and phone contact, about individual cows enrolled in the trial, from Oct 2020 to Oct 2022. - One merged Script derived from initial data handeling in R where relevant variables were calculated and aggregated to be used for statistical calculations.

R-script with data handeling and statistical calculations: - "Data analysis part 2 - final": Data handeling to create the file used in the statistical calculations. - "Part 2 - Binomial models - Fertility": Statistiscal calculations of variables using Binomial models. - "Part 2 - glmmTMB models - Fertility": Statistiscal calculations of variables using glmmTMB models. - "Part 2 - linear models - Fertility": Statistiscal calculations of fertility variables using linear models. - "Part 2 - linear models": Statistiscal calculations of milk variables using linear models.

Running the R scripts requires access to the restricted files. The files should be unpacked in a subdirectory "data" relative to the working directory for the scripts. See also the file "sessionInfo.txt" for information on R packages used.
f
Desktop analysis and examination of six key food composition databases...
plos.figshare.com
xls
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Annabel K. Clancy; Kaitlyn Woods; Anne McMahon; Yasmine Probst (2023). Desktop analysis and examination of six key food composition databases format. [Dataset]. http://doi.org/10.1371/journal.pone.0142137.t003
Explore at:
xlsAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0142137.t003
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Annabel K. Clancy; Kaitlyn Woods; Anne McMahon; Yasmine Probst
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
a Food Standards Australia and New Zealand,b United States Department of Agriculture,c Food Standards Agency,d Separate databases for flavonoids, carotenoids, proanthocyanidins and isoflavones,e Eurofir EBASIS contains bioactive data for UK and Europe,f National Health Survey,ghttps://www.xyris.com.au/foodworks/fw_pro.html,hhttp://www.nutribase.com/highend.html,ihttp://www.foodresearch.ca/wp-content/uploads/2013/06/candat-features-1.pdf,j Tinuviel Software,i Downlees Systems,k Forestfield Software,l Kelicomp,mhttp://www.tinuvielsoftware.com/faqs.htm,nhttp://www.dietsoftware.com/canada.html,o Text file: a file that only contains text,p A file containing tables of information stored in columns and separated by tabs (can be exported into almost any spreadsheet program),q Microsoft Excel spreadsheet,r Microsoft Access Database file: is a database file with automated functions and queries,s American Standard Code for Information Interchange (a standard file type that can be used by many programs),t Database File Format (this file type can be opened with Microsoft Excel and Access),u information to create Excel or PDF available,v Composition of Foods, Australia,w International Network of Food Data System,x Users guide states food name is most descriptive & recognisable of food referencedyhttp://www.foodstandards.gov.au/science/monitoringnutrients/nutrientables/nuttab/Pages/NUTTAB-2010-electronic-database-files.aspx,zhttp://www.foodstandards.gov.au/science/monitoringnutrients/ausnut/ausnutdatafiles/Pages/default.aspx,aahttp://ndb.nal.usda.gov/ndb/search/list,bbhttp://tna.europarchive.org/20110116113217/http://www.food.gov.uk/science/dietarysurveys/dietsurveys/,cchttp://webprod3.hc-sc.gc.ca/cnf-fce/index-eng.jspDesktop analysis and examination of six key food composition databases format.
g
Data from "Air pollution control strategies directly limiting national...
gimi9.com
catalog.data.gov
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Data from "Air pollution control strategies directly limiting national health damages in the US", by Ou et al. [Dataset]. https://gimi9.com/dataset/data-gov_data-from-air-pollution-control-strategies-directly-limiting-national-health-damages-in-th/
Explore at:
Area covered
United States
Description
This file describes the dataset used in Ou et al., "Air pollution control strategies directly limiting national health damages in the US." This work used the Global Change Assessment Model (GCAM) with state-level representation of the U.S. energy system (GCAM-USA). GCAM and GCAM-USA are developed and released by the University of Maryland/Pacific Northwest National Laboratory Joint Global Change Research Center (JGCRI). For further details, see the GCAM documentation: jgcri.github.io/gcam-doc. The model source code is available at github.com/JGCRI/gcam-core. A modified version of GCAMv4.3 was used for this analysis. Source code and input data specific for this paper are available upon request. This dataset contains Excel spreadsheets and an R script that link to comma-separated values (CSV) files that were extracted from the model output. The spreadsheets and scripts show the data and reproduce each of the figures in the paper. This dataset is associated with the following publication: Ou, Y., J. West, S. Smith, C. Nolte, and D. Loughlin. Air pollution control strategies directly limiting national health damages in the US.. Nature Communications. Nature Publishing Group, London, UK, 11: 957, (2020).

EU Weekly Oil Bulletin (1994 - 2022)

kaggle.com

zip

Updated Jun 29, 2022

+ more versions

Facebook

Twitter

Click to copy link

Link copied

Cite

Maurici (2022). EU Weekly Oil Bulletin (1994 - 2022) [Dataset]. https://www.kaggle.com/datasets/mauriciy/eu-weekly-oil-bulletin

Explore at:

zip(1478697 bytes)Available download formats

Dataset updated

Jun 29, 2022

Authors

Maurici

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Photo credit: taken from the Weekly Oil Bulletin homepage. License: taken from Copyright notice.

Weekly Oil Bulletin

"To improve the transparency of oil prices and to strengthen the internal market, the European Commission's Oil Bulletin presents weekly consumer prices for petroleum products in EU countries."

The Weekly Oil Bulletin is a weekly publication that began in 1994 and contains information about petroleum products prices. Unfortunately, the data publication method is not up to date to modern standards and they do not seem to be up to the task in updating their publication method [1].

Having Excel files with multiple sheets and different formats is not an appropriate format for efficient data analysis, so I decided to refactor the available data in a single file for easy manipulation. All prices are in EUR and specify the currency exchange rate.

You can find the code to reproduce in dbt-eu-oil-bulletin, feel free to open an issue if you find errors or have any doubts about the procedures.

Dataset Column Description

The dataset is distributed in the Apache Parquet format, and contains the following columns:

Column	Type	Description
date	DATE
country_name	STR
country_code	STR	Two letter country code
product_name	STR
price_units	STR	'1000L' or 't' (tonne)
euro_exch_rate	DECIMAL
currency_code	STR	Three letter code with the original country currency price
price	DECIMAL
taxes	BOOLEAN	Whether the price includes taxes or not

Bayesian Calibration of the 40K Decay Scheme and its implications for...
figshare.com
xlsx
Updated Oct 30, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Carter (2024). Bayesian Calibration of the 40K Decay Scheme and its implications for 40K-based geochronology [Dataset]. http://doi.org/10.6084/m9.figshare.27284775.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.27284775.v1
Dataset updated
Oct 30, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
John Carter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
FireClay_data_table.xlsx is an excel file with the raw data 40Ar/39Ar single fusion measurements on the Fire Clay tonstein. This data is full reduced and defines the interference corrections used and methodology of the mass discrimination.Final_Bayes_Input_Data.xlsx is an excel file with all input data into the model. This file contains: 238U/206Pb - measured ratios, uncertainties, and MSWD (where available); 235U/207Pb - measured ratios, uncertainties, and MSWD (where available); RX_FCs - measured ratios, uncertainties, and MSWD (where available). File also contains some notes and descriptions of each sample.BayesCal_InputData.ipynb -(Jupyter notebook) Reads in the preliminary file from pathway, modifies and propagates uncertainties using linear matrix algebra and sqrt(mswd) saves final data frame to user defined excel fileRunning_Input_Data Class.ipynb (Jupyter notebook) - Example of how to read in preliminary data and get formatted final input dataset for the model.Uni_delta.xlsx - Excel Spread sheet of delta ages between 235U/2307Pb and 40Ar/39Ar and 238U/2306Pb and 40Ar/39ArUni_Res_Time.xlsx - Excel Spread sheet of residence times for all appropriate samplesUni_U_decay_constants.xlsx - Excel Spread sheet of posterior values and uncertainties for 235U and 238U decay constantsUni_lam.xlsx - Summary of all posterior values for 40K decay schemeUni_FCs.xlsx - Summary of all posterior values for Fish Canyon sanidine Uni_u238_Xi.xlsx - Summary of all posterior values for age perturbations for each 238U/206Pb age of each sampleUni_u235_Xi.xlsx - Summary of all posterior values for age perturbations for each 235U/207Pb age of each sampleUni_Ar_Xi.xlsx - Summary of all posterior values for age perturbations for each 40Ar/39Ar age of each sampleUni_K40K_samples.xlsx - Summary of all posterior values for 40K/K ratios of each sampleUni_K40K_standards.xlsx - Summary of all posterior values for 40K/K ratio for Fish Canyon sanidine and the 40K/K for the decay constant materialPotassium Isotopic Variability and implications for age.ipynb - Notebook for summary plot of potassium isotopic variability - Figures S8 and S9 in the supplementary materialCalibration Comparison Figure.ipynb - Notebook for plotting comparison of Min/Kuiper, Renne et al., and the Bayesian calibrations as a function of age. Figure 3 in the manuscript.Bayesian Calibration Paper Plots.ipynb - Notebook for plotting Figures 1, 2, and 4 in the associated manuscript.Residence time and delta age plots.ipynb - Plots of age perturbation parameter values and residence time - Figures S5 and S6 in the supplementary material. Ar_ages_and_uncertainties (Jupyter notebook) - Notebook for calculating ages and uncertainties given a R-value relationship between the unknown and Fish Canyon sanidine. Covariance between the total decay constant and FCs is included. It is assumed that the the covariance in a measured R-value to both the FCs age and total decay constant is zero. Ages are given for the R-values reported in Table 9 in the manuscript and reported in section 4.5. All ages should agree.BayesCal Comp (Jupyter notebook) - Jupyter notebook containing the entire code for the MCMC algorithm described in the "Bayesian Calibration of the 40K Decay Scheme and its implications for 40K-based geochronology" manuscript. Some plots are given in this note book but, the majority are store as summary excel files to be plotted elsewhere.Potassium Isotopic Variability and implications for age.ipynb - Notebook for plotting and summary of the implications for a calculated 40Ar/39Ar age accounting for the model variance in the 40K/K ratio between samples, FCs neutron fluence monitor, and material used to measure the total decay constant.Residence time and delta age plots.ipynb - Notebook for plotting the output residence times summary and the age-perturbing parameters summary.40K decay constant comparison.ipynb - Notebook for plotting the model estimate 40K decay constant against all other decay constants (Table 1 in the manuscript). Also contains calculation for updating the decay constant for variability in 40K/K.Bayesian Calibration Paper Plots.ipynb - Notebook for reading in the model outputs and plotting Figures 1, 2, and 4 in the manuscript.Calibration Comparison Figure.ipynb - Notebook for comparison of the most widely used calibrations of Min/Kuiper et al. and Renne et al. 2011 with the Bayesian calibration this study.Z-score_comparison.ipynb - Notebook for calculating the Z-scores of the Renne et al. (2011) and Bayesian calibrations ages of ACs, Melilla sample, and FCs with the astronomical age estimates.
Statistical analysis of the efficacy of the decontamination treatment
zenodo.org
data.niaid.nih.gov
+1more
txt
Updated Jan 24, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
EFSA CEP Panel; EFSA CEP Panel (2020). Statistical analysis of the efficacy of the decontamination treatment [Dataset]. http://doi.org/10.5281/zenodo.1479671
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.1479671
Dataset updated
Jan 24, 2020
Dataset provided by
Zenodohttp://zenodo.org/
Authors
EFSA CEP Panel; EFSA CEP Panel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This model was developed and applied by the EFSA Working Group on Working Group on the evaluation of substances used to remove microbial contamination from product of animal origin during the preparatory work on the Scientific Opinion ‘Evaluation of the safety and efficacy of the organic acids lactic and acetic acids to reduce microbiological surface contamination on pork carcasses and pork cuts' (see http://doi.org/10.2903/j.efsa.2018.5482).

The code (SAS and R) has been used to evaluate the efficacy of two organic acids, lactic and acetic acid, intended to be used individually by food business operators during processing to reduce microbiological surface contamination on carcasses and cuts from pork. The reduction is expressed as log₁₀ reduction, i.e. the difference between the means of the log₁₀ concentrations of control group and treated group and corresponding 95% confidence interval (95% CI) when this information was available.

The code may be run using the input data from the excel table 'Data extraction.xlsx'.

Facebook

Twitter

Click to copy link

Link copied

Cite

Benj Petre; Aurore Coince; Sophien Kamoun (2016). Petre_Slide_CategoricalScatterplotFigShare.pptx [Dataset]. http://doi.org/10.6084/m9.figshare.3840102.v1

Petre_Slide_CategoricalScatterplotFigShare.pptx

Explore at:

pptxAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3840102.v1

Dataset updated

Sep 19, 2016

Dataset provided by

Figsharehttp://figshare.com/

Authors

Benj Petre; Aurore Coince; Sophien Kamoun

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Categorical scatterplots with R for biologists: a step-by-step guide

Benjamin Petre1, Aurore Coince2, Sophien Kamoun1

1 The Sainsbury Laboratory, Norwich, UK; 2 Earlham Institute, Norwich, UK

Weissgerber and colleagues (2015) recently stated that ‘as scientists, we urgently need to change our practices for presenting continuous data in small sample size studies’. They called for more scatterplot and boxplot representations in scientific papers, which ‘allow readers to critically evaluate continuous data’ (Weissgerber et al., 2015). In the Kamoun Lab at The Sainsbury Laboratory, we recently implemented a protocol to generate categorical scatterplots (Petre et al., 2016; Dagdas et al., 2016). Here we describe the three steps of this protocol: 1) formatting of the data set in a .csv file, 2) execution of the R script to generate the graph, and 3) export of the graph as a .pdf file.

Protocol

• Step 1: format the data set as a .csv file. Store the data in a three-column excel file as shown in Powerpoint slide. The first column ‘Replicate’ indicates the biological replicates. In the example, the month and year during which the replicate was performed is indicated. The second column ‘Condition’ indicates the conditions of the experiment (in the example, a wild type and two mutants called A and B). The third column ‘Value’ contains continuous values. Save the Excel file as a .csv file (File -> Save as -> in ‘File Format’, select .csv). This .csv file is the input file to import in R.

• Step 2: execute the R script (see Notes 1 and 2). Copy the script shown in Powerpoint slide and paste it in the R console. Execute the script. In the dialog box, select the input .csv file from step 1. The categorical scatterplot will appear in a separate window. Dots represent the values for each sample; colors indicate replicates. Boxplots are superimposed; black dots indicate outliers.

• Step 3: save the graph as a .pdf file. Shape the window at your convenience and save the graph as a .pdf file (File -> Save as). See Powerpoint slide for an example.

Notes

• Note 1: install the ggplot2 package. The R script requires the package ‘ggplot2’ to be installed. To install it, Packages & Data -> Package Installer -> enter ‘ggplot2’ in the Package Search space and click on ‘Get List’. Select ‘ggplot2’ in the Package column and click on ‘Install Selected’. Install all dependencies as well.

• Note 2: use a log scale for the y-axis. To use a log scale for the y-axis of the graph, use the command line below in place of command line #7 in the script.

7 Display the graph in a separate window. Dot colors indicate

replicates

graph + geom_boxplot(outlier.colour='black', colour='black') + geom_jitter(aes(col=Replicate)) + scale_y_log10() + theme_bw()

References

Dagdas YF, Belhaj K, Maqbool A, Chaparro-Garcia A, Pandey P, Petre B, et al. (2016) An effector of the Irish potato famine pathogen antagonizes a host autophagy cargo receptor. eLife 5:e10856.

Petre B, Saunders DGO, Sklenar J, Lorrain C, Krasileva KV, Win J, et al. (2016) Heterologous Expression Screens in Nicotiana benthamiana Identify a Candidate Effector of the Wheat Yellow Rust Pathogen that Associates with Processing Bodies. PLoS ONE 11(2):e0149035

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4):e1002128

https://cran.r-project.org/

http://ggplot2.org/

Clear search

Close search

Google apps

Main menu

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate

Eximpedia Export Import Trade

The simple and new SAS and R codes to estimate optimum and base selection...

StateIO v0.4.0 Two-Region Economic Input-Output Models for 50 U.S. States:...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Harmonization of sediment diatoms from hundreds of lakes in the northeastern...

Industrial cluster modeling in Linn-R; supporting the publication: Exploring...

Quality Assurance and Quality Control (QA/QC) of Meteorological Time Series...

data excel

Survey Data of the socio-demographic, economic and water source types that...

Yahoo Finance Dataset (2018-2023)

Data for: Aspects of distribution, abundance, habitat, and life history of...

Additional file 2 of FunMappOne: a tool to hierarchically organize and...

Overall Matrix.xlsx

Data from: INDILACT – Extended voluntary waiting period in primiparous dairy...

Desktop analysis and examination of six key food composition databases...

Data from "Air pollution control strategies directly limiting national...

EU Weekly Oil Bulletin (1994 - 2022)

Weekly Oil Bulletin

Dataset Column Description

Bayesian Calibration of the 40K Decay Scheme and its implications for...

Statistical analysis of the efficacy of the decontamination treatment

Petre_Slide_CategoricalScatterplotFigShare.pptx

7 Display the graph in a separate window. Dot colors indicate