92 datasets found

Supplement 1. R code demonstrating how to fit a logistic regression model,...
wiley.figshare.com
figshare.com
html
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
David I. Warton; Francis K. C. Hui (2023). Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference. [Dataset]. http://doi.org/10.6084/m9.figshare.3550407.v1
Explore at:
htmlAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.3550407.v1
Dataset updated
Jun 1, 2023
Dataset provided by
Wileyhttps://www.wiley.com/
Authors
David I. Warton; Francis K. C. Hui
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.
n
Data from: Generalizable EHR-R-REDCap pipeline for a national...
data.niaid.nih.gov
explore.openaire.eu
+2more
zip
Updated Jan 9, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller (2022). Generalizable EHR-R-REDCap pipeline for a national multi-institutional rare tumor patient registry [Dataset]. http://doi.org/10.5061/dryad.rjdfn2zcm
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5061/dryad.rjdfn2zcm
Dataset updated
Jan 9, 2022
Dataset provided by
Harvard Medical School
Massachusetts General Hospital
Authors
Sophia Shalhout; Farees Saqlain; Kayla Wright; Oladayo Akinyemi; David Miller
License
https://spdx.org/licenses/CC0-1.0.htmlhttps://spdx.org/licenses/CC0-1.0.html
Description
Objective: To develop a clinical informatics pipeline designed to capture large-scale structured EHR data for a national patient registry.

Materials and Methods: The EHR-R-REDCap pipeline is implemented using R-statistical software to remap and import structured EHR data into the REDCap-based multi-institutional Merkel Cell Carcinoma (MCC) Patient Registry using an adaptable data dictionary.

Results: Clinical laboratory data were extracted from EPIC Clarity across several participating institutions. Labs were transformed, remapped and imported into the MCC registry using the EHR labs abstraction (eLAB) pipeline. Forty-nine clinical tests encompassing 482,450 results were imported into the registry for 1,109 enrolled MCC patients. Data-quality assessment revealed highly accurate, valid labs. Univariate modeling was performed for labs at baseline on overall survival (N=176) using this clinical informatics pipeline.

Conclusion: We demonstrate feasibility of the facile eLAB workflow. EHR data is successfully transformed, and bulk-loaded/imported into a REDCap-based national registry to execute real-world data analysis and interoperability.

Methods eLAB Development and Source Code (R statistical software):

eLAB is written in R (version 4.0.3), and utilizes the following packages for processing: DescTools, REDCapR, reshape2, splitstackshape, readxl, survival, survminer, and tidyverse. Source code for eLAB can be downloaded directly (https://github.com/TheMillerLab/eLAB).

eLAB reformats EHR data abstracted for an identified population of patients (e.g. medical record numbers (MRN)/name list) under an Institutional Review Board (IRB)-approved protocol. The MCCPR does not host MRNs/names and eLAB converts these to MCCPR assigned record identification numbers (record_id) before import for de-identification.

Functions were written to remap EHR bulk lab data pulls/queries from several sources including Clarity/Crystal reports or institutional EDW including Research Patient Data Registry (RPDR) at MGB. The input, a csv/delimited file of labs for user-defined patients, may vary. Thus, users may need to adapt the initial data wrangling script based on the data input format. However, the downstream transformation, code-lab lookup tables, outcomes analysis, and LOINC remapping are standard for use with the provided REDCap Data Dictionary, DataDictionary_eLAB.csv. The available R-markdown ((https://github.com/TheMillerLab/eLAB) provides suggestions and instructions on where or when upfront script modifications may be necessary to accommodate input variability.

The eLAB pipeline takes several inputs. For example, the input for use with the ‘ehr_format(dt)’ single-line command is non-tabular data assigned as R object ‘dt’ with 4 columns: 1) Patient Name (MRN), 2) Collection Date, 3) Collection Time, and 4) Lab Results wherein several lab panels are in one data frame cell. A mock dataset in this ‘untidy-format’ is provided for demonstration purposes (https://github.com/TheMillerLab/eLAB).

Bulk lab data pulls often result in subtypes of the same lab. For example, potassium labs are reported as “Potassium,” “Potassium-External,” “Potassium(POC),” “Potassium,whole-bld,” “Potassium-Level-External,” “Potassium,venous,” and “Potassium-whole-bld/plasma.” eLAB utilizes a key-value lookup table with ~300 lab subtypes for remapping labs to the Data Dictionary (DD) code. eLAB reformats/accepts only those lab units pre-defined by the registry DD. The lab lookup table is provided for direct use or may be re-configured/updated to meet end-user specifications. eLAB is designed to remap, transform, and filter/adjust value units of semi-structured/structured bulk laboratory values data pulls from the EHR to align with the pre-defined code of the DD.

Data Dictionary (DD)

EHR clinical laboratory data is captured in REDCap using the ‘Labs’ repeating instrument (Supplemental Figures 1-2). The DD is provided for use by researchers at REDCap-participating institutions and is optimized to accommodate the same lab-type captured more than once on the same day for the same patient. The instrument captures 35 clinical lab types. The DD serves several major purposes in the eLAB pipeline. First, it defines every lab type of interest and associated lab unit of interest with a set field/variable name. It also restricts/defines the type of data allowed for entry for each data field, such as a string or numerics. The DD is uploaded into REDCap by every participating site/collaborator and ensures each site collects and codes the data the same way. Automation pipelines, such as eLAB, are designed to remap/clean and reformat data/units utilizing key-value look-up tables that filter and select only the labs/units of interest. eLAB ensures the data pulled from the EHR contains the correct unit and format pre-configured by the DD. The use of the same DD at every participating site ensures that the data field code, format, and relationships in the database are uniform across each site to allow for the simple aggregation of the multi-site data. For example, since every site in the MCCPR uses the same DD, aggregation is efficient and different site csv files are simply combined.

Study Cohort

This study was approved by the MGB IRB. Search of the EHR was performed to identify patients diagnosed with MCC between 1975-2021 (N=1,109) for inclusion in the MCCPR. Subjects diagnosed with primary cutaneous MCC between 2016-2019 (N= 176) were included in the test cohort for exploratory studies of lab result associations with overall survival (OS) using eLAB.

Statistical Analysis

OS is defined as the time from date of MCC diagnosis to date of death. Data was censored at the date of the last follow-up visit if no death event occurred. Univariable Cox proportional hazard modeling was performed among all lab predictors. Due to the hypothesis-generating nature of the work, p-values were exploratory and Bonferroni corrections were not applied.
f
Comparing spatial regression to random forests for large environmental data...
plos.figshare.com
pdf
Updated Jun 1, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen (2023). Comparing spatial regression to random forests for large environmental data sets [Dataset]. http://doi.org/10.1371/journal.pone.0229509
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pone.0229509
Dataset updated
Jun 1, 2023
Dataset provided by
PLOS ONE
Authors
Eric W. Fox; Jay M. Ver Hoef; Anthony R. Olsen
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates with nonlinear relationships, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records that are spatially autocorrelated. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. A primary application is mapping MMI predictions and prediction errors at 1.1 million perennial stream reaches across the conterminous United States. For the spatial regression model, we develop a novel transformation procedure that estimates Box-Cox transformations to linearize covariate relationships and handles possibly zero-inflated covariates. We find that the spatial regression model with transformations, and a subsequent selection of significant covariates, has cross-validation performance comparable to random forests. We also find that prediction interval coverage is close to nominal for each method, but that spatial regression prediction intervals tend to be narrower and have less variability than quantile regression forest prediction intervals. A simulation study is used to generalize results and clarify advantages of each modeling approach.
Data from: Data and scripts associated with a manuscript investigating...
osti.gov
Updated Feb 20, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arnon, Shai; Bar-Zeev, Edo; Borton, Mikayla A.; Brooks, Scott; Chu, Rosalie; Danczak, Robert E.; Garayburu-Caruso, Vanessa A.; Goldman, Amy E.; Graham, Emily B.; Jones, Michael; Jones, Nikki; Lewandowski, Jorg; Meile, Christof; Morad, Joseph W.; Muller, Birgit M.; Powers-McCormack, Beck; Renteria, Lupita; Schalles, John; Schulz, Hanna; Stegen, James C.; Toyoda, Jason G.; Ward, Adam; Wells, Jacqueline R. (2024). Data and scripts associated with a manuscript investigating dissolved organic matter and microbial community linkages across seven globally distributed rivers [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/2319037
Explore at:
Dataset updated
Feb 20, 2024
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
33.1805,35.6156|33.1805,35.6156|33.1805,35.6156|33.1805,35.6156|33.1805,35.615652.4764,13.6257|52.4764,13.6257|52.4764,13.6257|52.4764,13.6257|52.4764,13.625744.2065,-122.2566|44.2065,-122.2566|44.2065,-122.2566|44.2065,-122.2566|44.2065,-122.256631.3346,-81.4793|31.3346,-81.4793|31.3346,-81.4793|31.3346,-81.4793|31.3346,-81.479346.373,-119.272|46.373,-119.272|46.373,-119.272|46.373,-119.272|46.373,-119.27246.7386,-121.9181|46.7386,-121.9181|46.7386,-121.9181|46.7386,-121.9181|46.7386,-121.918135.9662,-84.3584|35.9662,-84.3584|35.9662,-84.3584|35.9662,-84.3584|35.9662,-84.3584
Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) (United States)
Authors
Arnon, Shai; Bar-Zeev, Edo; Borton, Mikayla A.; Brooks, Scott; Chu, Rosalie; Danczak, Robert E.; Garayburu-Caruso, Vanessa A.; Goldman, Amy E.; Graham, Emily B.; Jones, Michael; Jones, Nikki; Lewandowski, Jorg; Meile, Christof; Morad, Joseph W.; Muller, Birgit M.; Powers-McCormack, Beck; Renteria, Lupita; Schalles, John; Schulz, Hanna; Stegen, James C.; Toyoda, Jason G.; Ward, Adam; Wells, Jacqueline R.
Description
This data package is associated with the publication “Meta-metabolome ecology reveals that geochemistry and microbial functional potential are linked to organic matter development across seven rivers” submitted to Science of the Total Environment. This data package includes the data necessary to replicate the analyses presented within the manuscript to investigate dissolved organic matter (DOM) development across broad spatial distances and within divergent biomes. Specifically, we included the Fourier transform ion cyclotron mass spectrometry (FTICR-MS) data, geochemistry data, annotated metagenomic data, and results from ecological null modeling analyses in this data package. Additionally, we included the scripts necessary to generate the figures from the manuscript.Complete metagenomic data associated with this data package can be found at the National Center for Biotechnology (NCBI) under Bioproject PRJNA946291.This dataset consists of (1) four folders; (2) a file-level metadata (flmd) file; (3) a data dictionary (dd) file; (4) a factor sheet describing samples; and (5) a readme. The FTICR Data folder contains (1) the processed Fourier transform ion cyclotron mass spectrometry (FTICR-MS) data; (2) a transformation-weighted characteristics dendrogram generated from the FTICR-MS data; and (3) the script used to generate all FTICR-MS related figures. The Geochemical Data folder contains (1) the single geochemistry data filemore » and (2) the R script responsible for generating associated figures. The Metagenomic Data folder contains (1) annotation information across different levels; (2) carbohydrate active enzyme (CAZyme) information from the dbCAN database (Yin et al., 2012); (3) phylogenetic tree data (FASTAs, alignments, and tree file); and (4) the scripts necessary to analyze all of these data and generate figures. The Null Modeling Data folder contains (1) data generated during null modeling for each river and all rivers combined and (2) the R scripts necessary to process the data. All files are .csv, .pdf, .tsv, .tre, .faa, .afa, .tree, or .R.« less
t
Solar self-sufficient households as a driving factor for sustainability...
service.tib.eu
Updated Nov 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2024). Solar self-sufficient households as a driving factor for sustainability transformation - Vdataset - LDM [Dataset]. https://service.tib.eu/ldmservice/dataset/luh-solar-self-sufficient-households-as-a-driving-factor-for-sustainability-transformation
Explore at:
Dataset updated
Nov 14, 2024
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,). To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures. To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R. To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R
F
Data from: Solar self-sufficient households as a driving factor for...
data.uni-hannover.de
.zip, r, rdata +2
Updated Dec 12, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Institut für Kartographie und Geoinformatik (2024). Solar self-sufficient households as a driving factor for sustainability transformation [Dataset]. https://data.uni-hannover.de/eu/dataset/19503682-5752-4352-97f6-511ae31d97df
Explore at:
rdata(426), rdata(1024592), r(21968), txt(1431), rdata(408277), text/x-sh(183), .zip, r(63854), r(24773), r(3406), r(6280)Available download formats
Dataset updated
Dec 12, 2024
Dataset authored and provided by
Institut für Kartographie und Geoinformatik
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
To get the consumption model from Section 3.1, one needs load execute the file consumption_data.R. Load the data for the 3 Phases ./data/CONSUMPTION/PL1.csv, PL2.csv, PL3.csv, transform the data and build the model (starting line 225). The final consumption data can be found in one file for each year in ./data/CONSUMPTION/MEGA_CONS_list.Rdata

To get the results for the optimization problem, one needs to execute the file analyze_data.R. It provides the functions to compare production and consumption data, and to optimize for the different values (PV, MBC,).

To reproduce the figures one needs to execute the file visualize_results.R. It provides the functions to reproduce the figures.

To calculate the solar radiation that is needed in the Section Production Data, follow file calculate_total_radiation.R.

To reproduce the radiation data from from ERA5, that can be found in data.zip, do the following steps: 1. ERA5 - download the reanalysis datasets as GRIB file. For FDIR select "Total sky direct solar radiation at surface", for GHI select "Surface solar radiation downwards", and for ALBEDO select "Forecast albedo". 2. convert GRIB to csv with the file era5toGRID.sh 3. convert the csv file to the data that is used in this paper with the file convert_year_to_grid.R
d
Data from: Reference transcriptomics of porcine peripheral immune cells...
catalog.data.gov
agdatacommons.nal.usda.gov
+3more
Updated Jun 5, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Agricultural Research Service (2025). Data from: Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing [Dataset]. https://catalog.data.gov/dataset/data-from-reference-transcriptomics-of-porcine-peripheral-immune-cells-created-through-bul-e667c
Explore at:
Dataset updated
Jun 5, 2025
Dataset provided by
Agricultural Research Service
Description
This dataset contains files reconstructing single-cell data presented in 'Reference transcriptomics of porcine peripheral immune cells created through bulk and single-cell RNA sequencing' by Herrera-Uribe & Wiarda et al. 2021. Samples of peripheral blood mononuclear cells (PBMCs) were collected from seven pigs and processed for single-cell RNA sequencing (scRNA-seq) in order to provide a reference annotation of porcine immune cell transcriptomics at enhanced, single-cell resolution. Analysis of single-cell data allowed identification of 36 cell clusters that were further classified into 13 cell types, including monocytes, dendritic cells, B cells, antibody-secreting cells, numerous populations of T cells, NK cells, and erythrocytes. Files may be used to reconstruct the data as presented in the manuscript, allowing for individual query by other users. Scripts for original data analysis are available at https://github.com/USDA-FSEPRU/PorcinePBMCs_bulkRNAseq_scRNAseq. Raw data are available at https://www.ebi.ac.uk/ena/browser/view/PRJEB43826. Funding for this dataset was also provided by NRSP8: National Animal Genome Research Program (https://www.nimss.org/projects/view/mrp/outline/18464). Resources in this dataset:Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells 10X Format. File Name: PBMC7_AllCells.zipResource Description: Zipped folder containing PBMC counts matrix, gene names, and cell IDs. Files are as follows: matrix of gene counts* (matrix.mtx.gx) gene names (features.tsv.gz) cell IDs (barcodes.tsv.gz) *The ‘raw’ count matrix is actually gene counts obtained following ambient RNA removal. During ambient RNA removal, we specified to calculate non-integer count estimations, so most gene counts are actually non-integer values in this matrix but should still be treated as raw/unnormalized data that requires further normalization/transformation. Data can be read into R using the function Read10X().Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells Metadata. File Name: PBMC7_AllCells_meta.csvResource Description: .csv file containing metadata for cells included in the final dataset. Metadata columns include: nCount_RNA = the number of transcripts detected in a cell nFeature_RNA = the number of genes detected in a cell Loupe = cell barcodes; correspond to the cell IDs found in the .h5Seurat and 10X formatted objects for all cells prcntMito = percent mitochondrial reads in a cell Scrublet = doublet probability score assigned to a cell seurat_clusters = cluster ID assigned to a cell PaperIDs = sample ID for a cell celltypes = cell type ID assigned to a cellResource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells PCA Coordinates. File Name: PBMC7_AllCells_PCAcoord.csvResource Description: .csv file containing first 100 PCA coordinates for cells. Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells t-SNE Coordinates. File Name: PBMC7_AllCells_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells UMAP Coordinates. File Name: PBMC7_AllCells_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for all cells.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells t-SNE Coordinates. File Name: PBMC7_CD4only_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - CD4 T Cells UMAP Coordinates. File Name: PBMC7_CD4only_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only CD4 T cells (clusters 0, 3, 4, 28). A dataset of only CD4 T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells UMAP Coordinates. File Name: PBMC7_GDonly_UMAPcoord.csvResource Description: .csv file containing UMAP coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and UMAP coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gamma Delta T Cells t-SNE Coordinates. File Name: PBMC7_GDonly_tSNEcoord.csvResource Description: .csv file containing t-SNE coordinates for only gamma delta T cells (clusters 6, 21, 24, 31). A dataset of only gamma delta T cells can be re-created from the PBMC7_AllCells.h5Seurat, and t-SNE coordinates used in publication can be re-assigned using this .csv file.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - Gene Annotation Information. File Name: UnfilteredGeneInfo.txtResource Description: .txt file containing gene nomenclature information used to assign gene names in the dataset. 'Name' column corresponds to the name assigned to a feature in the dataset.Resource Title: Herrera-Uribe & Wiarda et al. PBMCs - All Cells H5Seurat. File Name: PBMC7.tarResource Description: .h5Seurat object of all cells in PBMC dataset. File needs to be untarred, then read into R using function LoadH5Seurat().
D
The Response Scale Transformation Project
ssh.datastations.nl
datacatalogue.cessda.eu
ods, odt +3
Updated Dec 9, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
de . de Jonge; de . de Jonge; R. Veenhoven; R. Veenhoven (2020). The Response Scale Transformation Project [Dataset]. http://doi.org/10.17026/DANS-ZX5-P7PE
Explore at:
odt(29645), text/x-fixed-field(22452), ods(11942), ods(11139), ods(12335), ods(13098), text/x-fixed-field(59602), ods(11639), text/x-fixed-field(132938), ods(17048), ods(11734), ods(28694), text/x-fixed-field(1966), ods(14745), ods(11407), ods(14469), ods(12777), ods(12209), text/x-fixed-field(24328), ods(11706), text/x-fixed-field(50572), text/x-fixed-field(18563), odt(9924), text/x-fixed-field(10883), text/x-fixed-field(202843), text/x-fixed-field(2386), ods(97328), text/x-fixed-field(11385), ods(102829), tsv(72547), ods(12341), ods(10873), ods(14458), text/x-fixed-field(10660), ods(38470), text/x-fixed-field(20572), ods(11725), text/x-fixed-field(8864), text/x-fixed-field(165989), ods(14478), text/x-fixed-field(152269), ods(28686), text/x-fixed-field(24767), text/x-fixed-field(2484), text/x-fixed-field(2201), zip(75713), ods(16139), text/x-fixed-field(4196), text/x-fixed-field(14402), odt(923518), ods(11649), text/x-fixed-field(141193), ods(12861), ods(11589), ods(12903), ods(11757), text/x-fixed-field(49805), text/x-fixed-field(12389), text/x-fixed-field(16732), text/x-fixed-field(195127), text/x-fixed-field(122450), ods(11357), text/x-fixed-field(2288), ods(110889), text/x-fixed-field(4853), odt(14649), text/x-fixed-field(1928), ods(12818), ods(12681), ods(11897), text/x-fixed-field(20730), text/x-fixed-field(82219), ods(12707), ods(12159), ods(12189), text/x-fixed-field(12852), odt(866474), ods(12251), text/x-fixed-field(110342), ods(12822), ods(11213), text/x-fixed-field(56990), ods(11821), ods(11480), ods(103685), ods(11803), odt(128432), text/x-fixed-field(19990), ods(12672), ods(12570), text/x-fixed-field(15210), ods(12086), text/x-fixed-field(27258), odt(48839), text/x-fixed-field(3925), ods(12771), tsv(8015), tsv(1382), tsv(1156), tsv(55966), tsv(15059), tsv(22016), tsv(108268), tsv(39979), tsv(117623), tsv(176555), tsv(96630), tsv(13894), tsv(8531), tsv(10586), tsv(2795), tsv(1843), tsv(145500), tsv(879), tsv(9895), tsv(3784), tsv(123192), tsv(136784), tsv(12070), tsv(11112), tsv(17100), tsv(15037), tsv(15159), tsv(1230), tsv(938), tsv(10114), tsv(17282), tsv(3351), tsv(18397), tsv(24102), tsv(43030), tsv(20907), tsv(47877), tsv(173744)Available download formats
Unique identifier
https://doi.org/10.17026/DANS-ZX5-P7PE
Dataset updated
Dec 9, 2020
Dataset provided by
DANS Data Station Social Sciences and Humanities
Authors
de . de Jonge; de . de Jonge; R. Veenhoven; R. Veenhoven
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
In this project we have reviewed existing methods used to homogenize data and developed several new methods for dealing with this diversity in survey questions on the same subject. The project is a spin-off from the World Database of Happiness, the main aim of which is to collate and make available research findings on the subjective enjoyment of life and to prepare these data for research synthesis. The first methods we discuss were proposed in the book ‘Happiness in Nations’ and which were used at the inception of the World Database of Happiness. Some 10 years later a new method was introduced: the International Happiness Scale Interval Study (HSIS). Taking the HSIS as a basis the Continuum Approach was developed. Then, building on this approach, we developed the Reference Distribution Method.
Statistical analysis for: Mode I fracture of beech-adhesive bondline at...
zenodo.org
data.niaid.nih.gov
bin, csv, html, txt
Updated Oct 4, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Michael Burnard; Michael Burnard; Jaka Gašper Pečnik; Jaka Gašper Pečnik (2022). Statistical analysis for: Mode I fracture of beech-adhesive bondline at three different temperatures [Dataset]. http://doi.org/10.5281/zenodo.6839197
Explore at:
csv, html, bin, txtAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.6839197
Dataset updated
Oct 4, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Michael Burnard; Michael Burnard; Jaka Gašper Pečnik; Jaka Gašper Pečnik
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset collects a raw dataset and a processed dataset derived from the raw dataset. There is a document containing the analytical code for statistical analysis of the processed dataset in .Rmd format and .html format.

The study examined some aspects of mechanical performance of solid wood composites. We were interested in certain properties of solid wood composites made using different adhesives with different grain orientations at the bondline, then treated at different temperatures prior to testing.

Performance was tested by assessing fracture energy and critical fracture energy, lap shear strength, and compression strength of the composites. This document concerns only the fracture properties, which are the focus of the related paper.

Notes:

* the raw data is provided in this upload, but the processing is not addressed here.
* the authors of this document are a subset of the authors of the related paper.
* this document and the related data files were uploaded at the time of submission for review. An update providing the doi of the related paper will be provided when it is available.
Z
Transformations in PubChem - Full Dataset
data.niaid.nih.gov
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang, Jian (Jeff) (2025). Transformations in PubChem - Full Dataset [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_5644560
Explore at:
Dataset updated
Mar 14, 2025
Dataset provided by
Cheng, Tiejun
Blanke, Gerd
Bolton, Evan
Thiessen, Paul
Schymanski, Emma
Zhang, Jian (Jeff)
Helmus, Rick
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This is an archive of the data contained in the "Transformations" section in PubChem for integration into patRoon and other workflows.

For further details see the ECI GitLab site: README and main "tps" folder.

Credits:

Concepts: E Schymanski, E Bolton, J Zhang, T Cheng;

Code (in R): E Schymanski, R Helmus, P Thiessen

Transformations: E Schymanski, J Zhang, T Cheng and many contributors to various lists!

PubChem infrastructure: PubChem team

Reaction InChI (RInChI) calculations (v1.0): Gerd Blanke (previous versions of these files)

Acknowledgements: ECI team who contributed to related efforts, especially: J. Krier, A. Lai, M. Narayanan, T. Kondic, P. Chirsir, E. Palm. All contributors to the NORMAN-SLE transformations!

March 2025 released as v0.2.0 since the dataset grew by >3000 entries! The stats are:

14 March 2025

Unique Transformation Entries: 10904# Unique Reactions by CID: 9152# Unique Reactions by IK: 9139# Unique Reactions by IKFB: 8574# Unique NORMAN-SLE Compounds by CID: 8207# Unique ChEMBL Compounds by CID: 1419# Unique Compounds (all) by CID: 9267# Unique Predecessors (all) by CID: 3724# Unique Successors (all) by CID: 7331# Range of XlogP Differences: -9.9,10# Range of Mass Differences: -957.97490813,820.227106427
Data from: FTICR-MS and Biochemical Transformation Data from Global Inland...
osti.gov
search.dataone.org
+2more
Updated Jan 1, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Chu, Rosalie; Danczak, Robert E.; Fansler, Sarah J.; Garayburu-Caruso, Vanessa A.; Goldman, Amy E.; Renteria, Lupita; Stegen, James C.; Tagestad, Jerry D.; Tfaily, Malak M.; Toyoda, Jason G. (2021). FTICR-MS and Biochemical Transformation Data from Global Inland River Water and Sediment Associated with: "Organic Matter Transformations are Disconnected Between Surface Water and the Hyporheic Zone" [Dataset]. https://www.osti.gov/dataexplorer/biblio/dataset/1839188-fticr-ms-biochemical-transformation-data-from-global-inland-river-water-sediment-associated-organic-matter-transformations-disconnected-between-surface-water-hyporheic-zone
Explore at:
Dataset updated
Jan 1, 2021
Dataset provided by
United States Department of Energyhttp://energy.gov/
Office of Sciencehttp://www.er.doe.gov/
Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) (United States)
38.8944,-78.1474|38.8944,-78.1474|38.8944,-78.1474|38.8944,-78.1474|38.8944,-78.147435.1986,-106.6431|35.1986,-106.6431|35.1986,-106.6431|35.1986,-106.6431|35.1986,-106.643146.253,-119.4772|46.253,-119.4772|46.253,-119.4772|46.253,-119.4772|46.253,-119.477235.9573,-84.2787|35.9573,-84.2787|35.9573,-84.2787|35.9573,-84.2787|35.9573,-84.278740.3108,-88.3227|40.3108,-88.3227|40.3108,-88.3227|40.3108,-88.3227|40.3108,-88.322731.8531,-88.1575|31.8531,-88.1575|31.8531,-88.1575|31.8531,-88.1575|31.8531,-88.157525.4098,-80.9642|25.4098,-80.9642|25.4098,-80.9642|25.4098,-80.9642|25.4098,-80.964242.1325,-74.6955|42.1325,-74.6955|42.1325,-74.6955|42.1325,-74.6955|42.1325,-74.695540.6151,-78.0069|40.6151,-78.0069|40.6151,-78.0069|40.6151,-78.0069|40.6151,-78.006944.2073,-122.2579|44.2073,-122.2579|44.2073,-122.2579|44.2073,-122.2579|44.2073,-122.257940.7801,-111.8054|40.7801,-111.8054|40.7801,-111.8054|40.7801,-111.8054|40.7801,-111.805441.7207,-111.8864|41.7207,-111.8864|41.7207,-111.8864|41.7207,-111.8864|41.7207,-111.886442.4718,-72.3311|42.4718,-72.3311|42.4718,-72.3311|42.4718,-72.3311|42.4718,-72.331140.013,-105.4589|40.013,-105.4589|40.013,-105.4589|40.013,-105.4589|40.013,-105.458935.9662,-84.3574|35.9662,-84.3574|35.9662,-84.3574|35.9662,-84.3574|35.9662,-84.357446.3763,-119.273|46.3763,-119.273|46.3763,-119.273|46.3763,-119.273|46.3763,-119.27348.1686,-117.3904|48.1686,-117.3904|48.1686,-117.3904|48.1686,-117.3904|48.1686,-117.390439.3397,-123.752|39.3397,-123.752|39.3397,-123.752|39.3397,-123.752|39.3397,-123.75243.9547,-71.7228|43.9547,-71.7228|43.9547,-71.7228|43.9547,-71.7228|43.9547,-71.722833.9557,-83.4377|33.9557,-83.4377|33.9557,-83.4377|33.9557,-83.4377|33.9557,-83.437731.1859,-84.4373|31.1859,-84.4373|31.1859,-84.4373|31.1859,-84.4373|31.1859,-84.437342.9875,-108.3982|42.9875,-108.3982|42.9875,-108.3982|42.9875,-108.3982|42.9875,-108.398238.5096,-122.8834|38.5096,-122.8834|38.5096,-122.8834|38.5096,-122.8834|38.5096,-122.883441.9504,-111.5805|41.9504,-111.5805|41.9504,-111.5805|41.9504,-111.5805|41.9504,-111.580537.2101,-80.4451|37.2101,-80.4451|37.2101,-80.4451|37.2101,-80.4451|37.2101,-80.445132.3273,-110.7595|32.3273,-110.7595|32.3273,-110.7595|32.3273,-110.7595|32.3273,-110.759545.7922,-121.9299|45.7922,-121.9299|45.7922,-121.9299|45.7922,-121.9299|45.7922,-121.929939.3118,-76.7175|39.3118,-76.7175|39.3118,-76.7175|39.3118,-76.7175|39.3118,-76.717537.0581,-119.256|37.0581,-119.256|37.0581,-119.256|37.0581,-119.256|37.0581,-119.25634.9509,-106.6814|34.9509,-106.6814|34.9509,-106.6814|34.9509,-106.6814|34.9509,-106.681443.6367,-123.5719|43.6367,-123.5719|43.6367,-123.5719|43.6367,-123.5719|43.6367,-123.571946.6794,-119.4626|46.6794,-119.4626|46.6794,-119.4626|46.6794,-119.4626|46.6794,-119.462640.6647,-105.2248|40.6647,-105.2248|40.6647,-105.2248|40.6647,-105.2248|40.6647,-105.224838.9603,-76.926|38.9603,-76.926|38.9603,-76.926|38.9603,-76.926|38.9603,-76.92644.3001,-120.8583|44.3001,-120.8583|44.3001,-120.8583|44.3001,-120.8583|44.3001,-120.858343.1232,-116.7745|43.1232,-116.7745|43.1232,-116.7745|43.1232,-116.7745|43.1232,-116.774537.0523,-119.1954|37.0523,-119.1954|37.0523,-119.1954|37.0523,-119.1954|37.0523,-119.195439.8596,-75.7841|39.8596,-75.7841|39.8596,-75.7841|39.8596,-75.7841|39.8596,-75.784131.6679,-81.8403|31.6679,-81.8403|31.6679,-81.8403|31.6679,-81.8403|31.6679,-81.840337.5728,-77.024|37.5728,-77.024|37.5728,-77.024|37.5728,-77.024|37.5728,-77.02433.1308,-79.8102|33.1308,-79.8102|33.1308,-79.8102|33.1308,-79.8102|33.1308,-79.810239.3623,-123.7348|39.3623,-123.7348|39.3623,-123.7348|39.3623,-123.7348|39.3623,-123.734839.8907,-105.9116|39.8907,-105.9116|39.8907,-105.9116|39.8907,-105.9116|39.8907,-105.911636.9613,-119.0283|36.9613,-119.0283|36.9613,-119.0283|36.9613,-119.0283|36.9613,-119.028339.1041,-96.6028|39.1041,-96.6028|39.1041,-96.6028|39.1041,-96.6028|39.1041,-96.602843.145,-89.4721|43.145,-89.4721|43.145,-89.4721|43.145,-89.4721|43.145,-89.472143.1091,-89.6403|43.1091,-89.6403|43.1091,-89.6403|43.1091,-89.6403|43.1091,-89.640346.6949,-119.4657|46.6949,-119.4657|46.6949,-119.4657|46.6949,-119.4657|46.6949,-119.465736.955,-119.0238|36.955,-119.0238|36.955,-119.0238|36.955,-119.0238|36.955,-119.023838.9224,-106.9514|38.9224,-106.9514|38.9224,-106.9514|38.9224,-106.9514|38.9224,-106.951446.7322,-117.1804|46.7322,-117.1804|46.7322,-117.1804|46.7322,-117.1804|46.7322,-117.180440.7846,-111.7955|40.7846,-111.7955|40.7846,-111.7955|40.7846,-111.7955|40.7846,-111.795533.7508,-111.5081|33.7508,-111.5081|33.7508,-111.5081|33.7508,-111.5081|33.7508,-111.508146.3728,-119.272|46.3728,-119.272|46.3728,-119.272|46.3728,-119.272|46.3728,-119.27245.2477,7.8043|45.2477,7.8043|45.2477,7.8043|45.2477,7.8043|45.2477,7.804341.8696,-72.7968|41.8696,-72.7968|41.8696,-72.7968|41.8696,-72.7968|41.8696,-72.796838.5448,-106.9501|38.5448,-106.9501|38.5448,-106.9501|38.5448,-106.9501|38.5448,-106.950142.5234,-71.1855|42.5234,-71.1855|42.5234,-71.1855|42.5234,-71.1855|42.5234,-71.185543.6482,-115.9889|43.6482,-115.9889|43.6482,-115.9889|43.6482,-115.9889|43.6482,-115.988937.2914,-75.9308|37.2914,-75.9308|37.2914,-75.9308|37.2914,-75.9308|37.2914,-75.930842.3919,-85.354|42.3919,-85.354|42.3919,-85.354|42.3919,-85.354|42.3919,-85.35439.8579,-75.7831|39.8579,-75.7831|39.8579,-75.7831|39.8579,-75.7831|39.8579,-75.783132.9614,-87.4083|32.9614,-87.4083|32.9614,-87.4083|32.9614,-87.4083|32.9614,-87.408338.9468,-96.4439|38.9468,-96.4439|38.9468,-96.4439|38.9468,-96.4439|38.9468,-96.443940.035,-105.5439|40.035,-105.5439|40.035,-105.5439|40.035,-105.5439|40.035,-105.543934.4425,-96.6226|34.4425,-96.6226|34.4425,-96.6226|34.4425,-96.6226|34.4425,-96.622630.2503,-97.7096|30.2503,-97.7096|30.2503,-97.7096|30.2503,-97.7096|30.2503,-97.709635.6918,-83.5041|35.6918,-83.5041|35.6918,-83.5041|35.6918,-83.5041|35.6918,-83.504138.875,-76.5465|38.875,-76.5465|38.875,-76.5465|38.875,-76.5465|38.875,-76.546539.0944,-77.9815|39.0944,-77.9815|39.0944,-77.9815|39.0944,-77.9815|39.0944,-77.981535.868,-106.5233|35.868,-106.5233|35.868,-106.5233|35.868,-106.5233|35.868,-106.523333.3377,-81.7182|33.3377,-81.7182|33.3377,-81.7182|33.3377,-81.7182|33.3377,-81.718244.2606,-122.1643|44.2606,-122.1643|44.2606,-122.1643|44.2606,-122.1643|44.2606,-122.164339.0877,-96.5848|39.0877,-96.5848|39.0877,-96.5848|39.0877,-96.5848|39.0877,-96.584833.3783,-97.7834|33.3783,-97.7834|33.3783,-97.7834|33.3783,-97.7834|33.3783,-97.783443.4314,-123.5856|43.4314,-123.5856|43.4314,-123.5856|43.4314,-123.5856|43.4314,-123.585632.5411,-87.7982|32.5411,-87.7982|32.5411,-87.7982|32.5411,-87.7982|32.5411,-87.798239.7585,-102.4492|39.7585,-102.4492|39.7585,-102.4492|39.7585,-102.4492|39.7585,-102.449242.7697,-70.9169|42.7697,-70.9169|42.7697,-70.9169|42.7697,-70.9169|42.7697,-70.916939.2387,-84.7124|39.2387,-84.7124|39.2387,-84.7124|39.2387,-84.7124|39.2387,-84.7124
Authors
Chu, Rosalie; Danczak, Robert E.; Fansler, Sarah J.; Garayburu-Caruso, Vanessa A.; Goldman, Amy E.; Renteria, Lupita; Stegen, James C.; Tagestad, Jerry D.; Tfaily, Malak M.; Toyoda, Jason G.
Description
This data package is associated with the publication "Organic Matter Transformations are Disconnected Between Surface Water and the Hyporheic Zone" submitted to Biogeosciences (Stegen et al., 2022). The study aims to understand how the diversity of OM transformations varies across surface and subsurface components of river corridors using inland surface water and sediments collected along river corridors across the contiguous United States. Sediment extracts and water samples were analyzed using ultrahigh resolution Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS). This dataset is comprised of one folder (WHONDR_S19S) which contains (1) a subfolder with R scripts used to process the data and to calculate biochemical transformations, (2) processed FTICR-MS data as csv files, sample collection metadata and climate data as csv files, (3) biochemical transformations profile, classifications and database as csv files, and (4) a readme file with more information regarding WHONDRS raw FTICR-MS data and processing scripts. Outside of the main folders there is a csv containing file-level metadata and a csv data dictionary defining column headers for all csv files contained in the data package. The samples were part of a WHONDRS (https://whondrs.pnnl.gov) study. The raw, unprocessed FTICR-MS data with additional data can be found at doi:10.15485/1729719 formore » sediments and doi:10.15485/1603775 for water. This data package contains the processed data used in the associated manuscript.« less
Z
Data and R-scripts for "Land-use trajectories for sustainable land system...
data.niaid.nih.gov
Updated Oct 14, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dominic A. Martin (2021). Data and R-scripts for "Land-use trajectories for sustainable land system transformations: identifying leverage points in a global biodiversity hotspot" (V2) [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4601599
Explore at:
Dataset updated
Oct 14, 2021
Dataset authored and provided by
Dominic A. Martin
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Sustainable land system transformations are necessary to avert biodiversity and climate collapse. However, it remains unclear where entry points for transformations exist in complex land systems. Here, we conceptualize land systems along land-use trajectories, which allows us to identify and evaluate leverage points; i.e., entry points on the trajectory where targeted interventions have particular leverage to influence land-use decisions. We apply this framework in the biodiversity hotspot Madagascar. In the Northeast, smallholder agriculture results in a land-use trajectory originating in old-growth forests, spanning forest fragments, and reaching shifting hill rice cultivation and vanilla agroforests. Integrating interdisciplinary empirical data on seven taxa, five ecosystem services, and three measures of agricultural productivity, we assess trade-offs and co-benefits of land-use decisions at three leverage points along the trajectory. These trade-offs and co-benefits differ between leverage points: two leverage points are situated at the conversion of old-growth forests and forest fragments to shifting cultivation and agroforestry, resulting in considerable trade-offs, especially between endemic biodiversity and agricultural productivity. Here, interventions enabling smallholders to conserve forests are necessary. This is urgent since ongoing forest loss threatens to eliminate these leverage points due to path-dependency. The third leverage point allows for the restoration of land under shifting cultivation through vanilla agroforests and offers co-benefits between restoration goals and agricultural productivity. The co-occurring leverage points highlight that conservation and restoration are simultaneously necessary. Methodologically, the framework shows how leverage points can be identified, evaluated, and harnessed for land system transformations under the consideration of path-dependency along trajectories.
Market Basket Analysis
kaggle.com
Updated Dec 9, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 9, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Aslan Ahmedov
Description
Market Basket Analysis

Market basket analysis with Apriori algorithm

The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

Introduction

Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

An Example of Association Rules

Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

Strategy

Data Import

Data Understanding and Exploration

Transformation of the data – so that is ready to be consumed by the association rules algorithm

Running association rules

Exploring the rules generated

Filtering the generated rules

Visualization of Rule

Dataset Description

File name: Assignment-1_Data

List name: retaildata

File format: . xlsx

Number of Row: 522065

Number of Attributes: 7

BillNo: 6-digit number assigned to each transaction. Nominal.

Itemname: Product name. Nominal.

Quantity: The quantities of each product per transaction. Numeric.

Date: The day and time when each transaction was generated. Numeric.

Price: Product price. Numeric.

CustomerID: 5-digit number assigned to each customer. Nominal.

Country: Name of the country where each customer resides. Nominal.

https://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

Libraries in R

First, we need to load required libraries. Shortly I describe all libraries.

arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).

arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.

tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.

readxl - Read Excel Files in R.

plyr - Tools for Splitting, Applying and Combining Data.

ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.

knitr - Dynamic Report generation in R.

magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.

dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.

tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

https://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

Data Pre-processing

Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

https://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> https://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

After we will clear our data frame, will remove missing values.

https://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...
Source data for Brabham et al. 2024 bioRxiv
figshare.com
application/x-gzip
Updated Mar 27, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Matthew Moscou; Helen Brabham (2025). Source data for Brabham et al. 2024 bioRxiv [Dataset]. http://doi.org/10.6084/m9.figshare.28680800.v1
Explore at:
application/x-gzipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28680800.v1
Dataset updated
Mar 27, 2025
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Matthew Moscou; Helen Brabham
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Source data for Brabham et al. (2024) bioRxiv that includes raw data, uncropped images, and scripts used for data analysis and figure preparation. https://doi.org/10.1101/2024.06.25.599845
Raw Data for IntoValue Dataset
zenodo.org
zip
Updated Feb 2, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maia Salholz-Hillel; Maia Salholz-Hillel; Delwen Franzen; Delwen Franzen; Benjamin Gregory Carlisle; Benjamin Gregory Carlisle; Nico Riedel; Nico Riedel (2023). Raw Data for IntoValue Dataset [Dataset]. http://doi.org/10.5281/zenodo.7590083
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7590083
Dataset updated
Feb 2, 2023
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Maia Salholz-Hillel; Maia Salholz-Hillel; Delwen Franzen; Delwen Franzen; Benjamin Gregory Carlisle; Benjamin Gregory Carlisle; Nico Riedel; Nico Riedel
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data deposit includes large raw data used for the "IntoValue" dataset, which underlies several projects at the QUEST Center for Responsible Research in the Berlin Institute of Health (BIH) @ Charité. An initial version of the IntoValue dataset is available in Zenodo: https://doi.org/10.5281/zenodo.5141342. Based on this initial version, the dataset is actively developed and maintained in GitHub: https://github.com/maia-sh/intovalue-data. This Zenodo deposit serves to store large raw data files for individual trials and are used in that GitHub repository. These data are deposited for computational reproducibility and documentation; they are not intended to be used for additional projects and do not reflect the most current/accurate data available from each source.

This deposit contains raw data from the following sources:

PubMed (pubmed.zip): PubMed XML files are provided courtesty of the U.S. National Library of Medicine and were accessed via the Entrez Programming Utilities (E-utilities) API. The files were downloaded on 2021-08-15 and do not reflect the most current/accurate data available from NLM. The following scripts were used to download and create these files: get-pubmed.R; download-pubmed.R.

German Clinical Trials Registry (DRKS) (drks.zip): DRKS does not provide an API and was webscrapped on 2022-11-01. The following scripts were used to download and create these XML files: get-drks.R; drks-functions.R

ClinicalTrials.gov (ctgov.zip): ClinicalTrials.gov was accessed via the Clinical Trials Transformation Initiative (CTTI) Aggregate Content of ClinicalTrials.gov (AACT) via its PostgreSQL database API.The API was queried and CSV files were generated on 2022-11-01. The following scripts were used to download and create these files: get-process-aact.R.

ClinicalTrials.gov 2018 (ctgov_2018.zip): Additional trial data for 2018. ClinicalTrials.gov was accessed via the Clinical Trials Transformation Initiative (CTTI) Aggregate Content of ClinicalTrials.gov (AACT) via its PostgreSQL database API.The API was queried and CSV files were generated on 2022-11-01. The following scripts were used to download and create these files: get-process-aact.R.
o
Data from: Cooperation and coexpression: how coexpression networks shift in...
explore.openaire.eu
data.niaid.nih.gov
+3more
Updated Mar 19, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sathvik X. Palakurty; John R. Stinchcombe; Michelle E. Afkhami (2018). Data from: Cooperation and coexpression: how coexpression networks shift in response to multiple mutualists [Dataset]. http://doi.org/10.5061/dryad.2hj343f
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.2hj343f
Dataset updated
Mar 19, 2018
Authors
Sathvik X. Palakurty; John R. Stinchcombe; Michelle E. Afkhami
Description
A mechanistic understanding of community ecology requires tackling the nonadditive effects of multispecies interactions, a challenge that necessitates integration of ecological and molecular complexity-- namely moving beyond pairwise ecological interaction studies and the ‘gene at a time’ approach to mechanism. Here, we investigate the consequences of multispecies mutualisms for the structure and function of genome-wide coexpression networks for the first time, using the tractable and ecologically-important interaction between legume Medicago truncatula, rhizobia, and mycorrhizal fungi. First, we found that genes whose expression is affected nonadditively by multiple mutualists are more highly connected in gene networks than expected by chance and had 94% greater network centrality than genes showing additive effects, suggesting that nonadditive genes may be key players in the widespread transcriptomic responses to multispecies symbioses. Second, multispecies mutualisms substantially changed coexpression network structure of host plants and symbionts. Less than 50% of the plant and 10% of mycorrhizal fungi coexpression modules detected with rhizobia present were preserved in its absence, indicating that third-party mutualists can cause significant rewiring of plant and fungal molecular networks. Third, we identified unique sets of coexpressed genes that explain variation in plant performance only when multiple mutualists were present. Finally, an ‘across-symbiosis’ approach identified sets of coexpressed plant and mycorrhizal genes that were significantly associated with plant performance, were unique to the multiple mutualist context, and suggested coupled responses across the plant-mycorrhizal interaction to third-party mutualists. Taken together, these results show multispecies mutualism have substantial effects on the molecular interactions in host plants, microbes, and across symbiotic boundaries. Differential Coexpression ScriptThis script contains the use of previously normalized data to execute the DiffCoEx computational pipeline on an experiment with four treatment groups.differentialCoexpression.rNormalized Transformed Expression Count DataNormalized, transformed expression count data of Medicago truncatula and mycorrhizal fungi is given as an R data frame where the columns denote different genes and rows denote different samples. This data is used for downstream differential coexpression analyses.Expression_Data.zipNormalization and Transformation of Raw Count Data ScriptRaw count data is transformed and normalized with available R packages and RNA-Seq best practices.dataPrep.rRaw_Count_Data_Mycorrhizal_FungiRaw count data from HtSeq for mycorrhizal fungi reads are later transformed and normalized for use in differential coexpression analysis. 'R+' indicates that the sample was obtained from a plant grown in the presence of both mycorrhizal fungi and rhizobia. 'R-' indicates that the sample was obtained from a plant grown only in the presence of mycorrhizal fungi.Raw Count Data Medicago truncatulaRaw count data from HtSeq for Medicago truncatula reads are later transformed and normalized for use in differential coexpression analysis. 'M+R+' indicates that the sample was obtained from a plant grown in the presence of both mycorrhizal fungi and rhizobia. 'M+R-' indicates that the sample was obtained from a plant grown only in the presence of mycorrhizal fungi. 'M-R+' indicates that the sample was obtained from a plant grown only in the presence of rhizobia. 'M-R-' indicates that the sample was obtained from a plant grown in a sterile environment.Raw_Count_Data_Medicago_truncatula.zip
Pesticide and transformation product concentrations and risk quotients in...
datasets.ai
data.usgs.gov
+2more
55
Updated Aug 10, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of the Interior (2024). Pesticide and transformation product concentrations and risk quotients in U.S. headwater streams [Dataset]. https://datasets.ai/datasets/pesticide-and-transformation-product-concentrations-and-risk-quotients-in-u-s-headwater-st
Explore at:
55Available download formats
Dataset updated
Aug 10, 2024
Dataset provided by
United States Department of the Interiorhttp://www.doi.gov/
Authors
Department of the Interior
Area covered
United States
Description
This dataset includes a subset of previously released pesticide data (Morace and others, 2020) from the U.S. Geological Survey (USGS) National Water Quality Assessment Program (NAWQA) Regional Stream Quality Assessment (RSQA) project and the corresponding hazard index results calculated using the R package toxEval, which are relevant to Mahler and others, 2020. Pesticide and transformation products were analyzed at the USGS National Water Quality Laboratory in Denver, Colorado. Files are grouped as pesticides (parent compounds), transformation products (degradate compounds), compounds with no Acute Invertebrate (AI) benchmarks, compounds with no Acute Non-Vascular Plant (ANVP) benchmarks, and compounds not evaluated through the toxEval R program. See Morace and others, 2020 for corresponding quality assurance or quality control data.
C2Metadata test files
openicpsr.org
spss, zip
Updated Aug 16, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Alter (2020). C2Metadata test files [Dataset]. http://doi.org/10.3886/E120642V1
Explore at:
spss, zipAvailable download formats
Unique identifier
https://doi.org/10.3886/E120642V1
Dataset updated
Aug 16, 2020
Dataset provided by
Inter-university Consortium for Political and Social Researchhttps://www.icpsr.umich.edu/web/pages/
Authors
George Alter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The C2Metadata (“Continuous Capture of Metadata”) Project automates one of the most burdensome aspects of documenting the provenance of research data: describing data transformations performed by statistical software. Researchers in many fields use statistical software (SPSS, Stata, SAS, R, Python) for data transformation and data management as well as analysis. Scripts used with statistical software are translated into an independent Structured Data Transformation Language (SDTL), which serves as an intermediate language for describing data transformations. SDTL can be used to add variable-level provenance to data catalogs and codebooks and to create “variable lineages” for auditing software operations. This repository provides examples of scripts and metadata for use in testing C2Metadata tools.
Predictive Models for In Vitro Toxicokinetic Parameters to Inform...
datasets.ai
catalog.data.gov
+1more
0, 21, 57
Updated Sep 4, 2024
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
U.S. Environmental Protection Agency (2024). Predictive Models for In Vitro Toxicokinetic Parameters to Inform High-throughput Risk-assessment_Prachi [Dataset]. https://datasets.ai/datasets/predictive-models-for-in-vitro-toxicokinetic-parameters-to-inform-high-throughput-risk-ass
Explore at:
57, 21, 0Available download formats
Dataset updated
Sep 4, 2024
Dataset provided by
United States Environmental Protection Agencyhttp://www.epa.gov/
Authors
U.S. Environmental Protection Agency
Description
The data used in this analysis was obtained from published literature and available through the high-throughput toxicokinetic (HTTK) R package. The dataset consists of 1486 chemicals that span a variety of use classes including pharmaceuticals, food-use chemicals, pesticides and industrial chemicals of which 1139 chemicals had experimental human in vitro fraction unbound data and 642 chemicals that had experimental human in vitro intrinsic clearance data. Structures were curated and obtained from the DSSTox database. The distribution of experimental values for fraction unbound and intrinsic clearance is shown in Supplementary Figure S1. Since the data were non-normally distributed they were appropriately transformed before any analysis was conducted. The details of the transformation and the transformed data distribution are presented in the results section and Supplementary Figures S2 and S3. A complete list of chemicals with CAS registry numbers (CASRN), DSSTox generic substance IDs (DTXSIDs), structure and experimental data for both parameters are included as supplemental data (1.ChemicalListData.csv and 1.ChemicalList-QSARready.sdf).

This dataset is associated with the following publication: Pradeep, P., G. Patlewicz, R. Pearce, J. Wambaugh, B. Wetmore, and R. Judson. Using Chemical Structure Information to Develop Predictive Models for In Vitro Toxicokinetic Parameters to Inform High-throughput Risk-assessment. Computational Toxicology. Elsevier B.V., Amsterdam, NETHERLANDS, 16: 100136, (2020).
g
R-scripts for uncertainty analysis v01
gimi9.com
researchdata.edu.au
+2more
Updated Apr 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). R-scripts for uncertainty analysis v01 [Dataset]. https://gimi9.com/dataset/au_322c38ef-272f-4e77-964c-a14259abe9cf/
Explore at:
Dataset updated
Apr 13, 2022
License
Attribution 3.0 (CC BY 3.0)https://creativecommons.org/licenses/by/3.0/
License information was derived automatically
Description
Abstract This dataset was created within the Bioregional Assessment Programme. Data has not been derived from any source datasets. Metadata has been compiled by the Bioregional Assessment Programme. This dataset contains a set of generic R scripts that are used in the propagation of uncertainty through numerical models. ## Dataset History The dataset contains a set of R scripts that are loaded as a library. The R scripts are used to carry out the propagation of uncertainty through numerical models. The scripts contain the functions to create the statistical emulators and do the necessary data transformations and backtransformations. The scripts are self-documenting and created by Dan Pagendam (CSIRO) and Warren Jin (CSIRO). ## Dataset Citation Bioregional Assessment Programme (2016) R-scripts for uncertainty analysis v01. Bioregional Assessment Source Dataset. Viewed 13 March 2019, http://data.bioregionalassessments.gov.au/dataset/322c38ef-272f-4e77-964c-a14259abe9cf.

Facebook

Twitter

Click to copy link

Link copied

Cite

David I. Warton; Francis K. C. Hui (2023). Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference. [Dataset]. http://doi.org/10.6084/m9.figshare.3550407.v1

Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference.

Explore at:

htmlAvailable download formats

Unique identifier

https://doi.org/10.6084/m9.figshare.3550407.v1

Dataset updated

Jun 1, 2023

Dataset provided by

Wileyhttps://www.wiley.com/

Authors

David I. Warton; Francis K. C. Hui

License

CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically

Description

File List glmmeg.R: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data. boot.glmm.R: R code for estimating P-values by applying the bootstrap to a GLMM likelihood ratio statistic. Description glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate. boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.

Clear search

Close search

Google apps

Main menu

Supplement 1. R code demonstrating how to fit a logistic regression model,...

Data from: Generalizable EHR-R-REDCap pipeline for a national...

Comparing spatial regression to random forests for large environmental data...

Data from: Data and scripts associated with a manuscript investigating...

Solar self-sufficient households as a driving factor for sustainability...

Data from: Solar self-sufficient households as a driving factor for...

Data from: Reference transcriptomics of porcine peripheral immune cells...

The Response Scale Transformation Project

Statistical analysis for: Mode I fracture of beech-adhesive bondline at...

Transformations in PubChem - Full Dataset

14 March 2025

Data from: FTICR-MS and Biochemical Transformation Data from Global Inland...

Data and R-scripts for "Land-use trajectories for sustainable land system...

Market Basket Analysis

Market Basket Analysis

Introduction

An Example of Association Rules

Strategy

Dataset Description

Libraries in R

Data Pre-processing

Source data for Brabham et al. 2024 bioRxiv

Raw Data for IntoValue Dataset

Data from: Cooperation and coexpression: how coexpression networks shift in...

Pesticide and transformation product concentrations and risk quotients in...

C2Metadata test files

Predictive Models for In Vitro Toxicokinetic Parameters to Inform...

R-scripts for uncertainty analysis v01

Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference.