Facebook
TwitterAdditional file 2: Supplemental Figure 1. Flowcharts of the analytic samples for FOCUS 1.0 and FOCUS 2.0 survey waves. Supplemental Figure 2A. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.1.0 departments. Supplemental Figure 2B. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.2.0 departments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset contains the prioritization provided by a panel of 15 experts to a set of 28 barriers categories for 8 different roles of the future energy system. A Delphi method was followed and the scores provided in the three rounds carried out are included. The dataset also contains the scripts used to assess the results and the output of this assessment. A list of the information contained in this file is: data folder: this folders includes the scores given by the 15 experts in the 3 rounds. Every round is in an individual folder. There is a file per expert that has the scores between -5 (not relevant at all) to 5 (completely relevant) per barrier (rows) and actor (columns). There is also a file with the description of the experts in terms of their position in the company, the type of company and the country. fig folder: this folder includes the figures created to assess the information provided by the experts. For each round, the following figures are created (in each respective folder): Boxplot with the distribution of scores per barriers and roles. Heatmap with the mean scores per barriers and roles. Boxplots with the comparison of the different distributions provided by the experts of each group (depending on the keywords) per barrier and role. Heatmap with the mean score per barrier and use case and with the prioritization per barrier and use case. Finally, bar plots with the mean scores differences between rounds and boxplot with comparisons of the scores distributions are also provided. stat folder: this folder includes the files with the results of the different statistical assessment carried out. For each round, the following figures are created (in each respective folder): The statistics used to assess the scores (Intraclass correlation coefficient, Inter-rater agreement, Inter-rater agreement p-value, Homogeneity of Variances, Average interquartile range, Standard Deviation of interquartile ranges, Friedman test p-value Average power post hoc) per barrier and per role. The results of the post hoc of the Friedman Test per berries and per roles. The average score per barrier and per role. The mean value of the scores provided by the experts grouped by the keywords per barrier and role. P-value of the comparison of these two values. The end prioritization of the barrier for the use case (averaging the scores or merging the critical sets) Finally, the differences between the mean and standard deviations of the scores between two consecutive rounds are provided.
Facebook
Twitterhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QEhttps://entrepot.recherche.data.gouv.fr/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.15454/AGU4QE
WIDEa is R-based software aiming to provide users with a range of functionalities to explore, manage, clean and analyse "big" environmental and (in/ex situ) experimental data. These functionalities are the following, 1. Loading/reading different data types: basic (called normal), temporal, infrared spectra of mid/near region (called IR) with frequency (wavenumber) used as unit (in cm-1); 2. Interactive data visualization from a multitude of graph representations: 2D/3D scatter-plot, box-plot, hist-plot, bar-plot, correlation matrix; 3. Manipulation of variables: concatenation of qualitative variables, transformation of quantitative variables by generic functions in R; 4. Application of mathematical/statistical methods; 5. Creation/management of data (named flag data) considered as atypical; 6. Study of normal distribution model results for different strategies: calibration (checking assumptions on residuals), validation (comparison between measured and fitted values). The model form can be more or less complex: mixed effects, main/interaction effects, weighted residuals.
Facebook
TwitterBank has multiple banking products that it sells to customer such as saving account, credit cards, investments etc. It wants to which customer will purchase its credit cards. For the same it has various kind of information regarding the demographic details of the customer, their banking behavior etc. Once it can predict the chances that customer will purchase a product, it wants to use the same to make pre-payment to the authors.
In this part I will demonstrate how to build a model, to predict which clients will subscribing to a term deposit, with inception of machine learning. In the first part we will deal with the description and visualization of the analysed data, and in the second we will go to data classification models.
-Desire target -Data Understanding -Preprocessing Data -Machine learning Model -Prediction -Comparing Results
Predict if a client will subscribe (yes/no) to a term deposit — this is defined as a classification problem.
The dataset (Assignment-2_data.csv) used in this assignment contains bank customers’ data. File name: Assignment-2_Data File format: . csv Numbers of Row: 45212 Numbers of Attributes: 17 non- empty conditional attributes attributes and one decision attribute.
https://user-images.githubusercontent.com/91852182/143783430-eafd25b0-6d40-40b8-ac5b-1c4f67ca9e02.png">
https://user-images.githubusercontent.com/91852182/143783451-3e49b817-29a6-4108-b597-ce35897dda4a.png">
Data pre-processing is a main step in Machine Learning as the useful information which can be derived it from data set directly affects the model quality so it is extremely important to do at least necessary preprocess for our data before feeding it into our model.
In this assignment, we are going to utilize python to develop a predictive machine learning model. First, we will import some important and necessary libraries.
Below we are can see that there are various numerical and categorical columns. The most important column here is y, which is the output variable (desired target): this will tell us if the client subscribed to a term deposit(binary: ‘yes’,’no’).
https://user-images.githubusercontent.com/91852182/143783456-78c22016-149b-4218-a4a5-765ca348f069.png">
We must to check missing values in our dataset if we do have any and do, we have any duplicated values or not.
https://user-images.githubusercontent.com/91852182/143783471-a8656640-ec57-4f38-8905-35ef6f3e7f30.png">
We can see that in 'age' 9 missing values and 'balance' as well 3 values missed. In this case based that our dataset it has around 45k row I will remove them from dataset. on Pic 1 and 2 you will see before and after.
https://user-images.githubusercontent.com/91852182/143783474-b3898011-98e3-43c8-bd06-2cfcde714694.png">
From the above analysis we can see that only 5289 people out of 45200 have subscribed which is roughly 12%. We can see that our dataset highly unbalanced. we need to take it as a note.
https://user-images.githubusercontent.com/91852182/143783534-a05020a8-611d-4da1-98cf-4fec811cb5d8.png">
Our list of categorical variables.
https://user-images.githubusercontent.com/91852182/143783542-d40006cd-4086-4707-a683-f654a8cb2205.png">
Our list of numerical variables.
https://user-images.githubusercontent.com/91852182/143783551-6b220f99-2c4d-47d0-90ab-18ede42a4ae5.png">
In above boxplot we can see that some point in very young age and as well impossible age. So,
https://user-images.githubusercontent.com/91852182/143783564-ad0e2a27-5df5-4e04-b5d7-6d218cabd405.png">
https://user-images.githubusercontent.com/91852182/143783589-5abf0a0b-8bab-4192-98c8-d2e04f32a5c5.png">
Now, we don’t have issues on this feature so we can use it
https://user-images.githubusercontent.com/91852182/143783599-5205eddb-a0f5-446d-9f45-cc1adbfcce67.png">
https://user-images.githubusercontent.com/91852182/143783601-e520d59c-3b21-4627-a9bb-cac06f415a1e.png">
https://user-images.githubusercontent.com/91852182/143783634-03e5a584-a6fb-4bcb-8dc5-1f3cc50f9507.png">
https://user-images.githubusercontent.com/91852182/143783640-f6e71323-abbe-49c1-9935-35ffb2d10569.png">
This attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes...
Facebook
TwitterThis data release contains time series and plots summarizing mean monthly temperature (TAVE) and total monthly precipitation (PPT), and runoff (RO) from the U.S. Geological Survey Monthly Water Balance Model at 115 National Wildlife Refuges within the U.S. Fish and Wildlife Service Mountain-Prairie Region (CO, KS, MT, NE, ND, SD, UT, and WY). These three variables are derived from two sets of statistically-downscaled general circulation models from 1951 through 2099. Three variables (TAVE, PPT, and RO for refuge areas) were summarized for comparison across four 19-year periods: historic (1951-1969), baseline (1981-1999), 2050 (2041-2059), and 2080 (2071-2089). For each refuge, mean monthly plots, seasonal box plots, and annual envelope plots were produced for each of the four periods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectivesThis study aimed to explore the correlation between the sentiment of nursing notes and the one-year mortality of sepsis patients.MethodsThe box plot was used to compare the differences in sentiment polarity/sentiment subjectivity between different groups. Multivariate logistic regression was used to explore the correlation between sentiment polarity/sentiment subjectivity and one-year mortality of elderly sepsis patients. Ridge regression, XGBoost regression, and random forest were used to explore the importance of sentiment polarity and subjectivity in the one-year mortality of elderly sepsis patients. Restricted cubic spline (RCS) was used to explore whether there was a linear relationship between sentiment polarity, sentiment subjectivity and the one-year mortality of elderly sepsis patients. Kaplan-Meier (KM) curve was used to explore the relationship between the sentiment polarity (or sentiment subjectivity) and the 1-year death of the patient.ResultsCompared with the control group, the one-year mortality group year had lower sentiment polarity and higher sentiment subjectivity. Sentiment polarity and sentiment subjectivity were independently related to the one-year mortality of elderly sepsis patients. There was a linear relationship between sentiment polarity and the one-year mortality of elderly sepsis patients. At the same time, there was a nonlinear relationship between sentiment subjectivity and the one-year mortality of elderly sepsis patients.KM.ConclusionsThe sentiment of nursing notes was correlated with the one-year mortality of elderly sepsis patients.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Figure 1A: Box and Whisker Plots of log Aldosterone to Renin Ratio, additionally adjusted for body mass index Supplementary Figure 1B. Box and Whisker Plots of log Renin, additionally adjusted for body mass index Supplementary Figure 1C. Box and Whisker Plots of log Aldosterone, additionally adjusted for body mass index Supplementary Figure 2. Box and Whisker Plots of log ACE activity, additionally adjusted for body mass index
Facebook
TwitterFigure S1. CRISPR-KO screening data quality assessment. (A) Average correlation between sgRNAs read-count replicates across cell lines. (B) Receiver operating characteristic (ROC) curve obtained from classifying fitness essential (FE) and non-essential genes based on the average logFC of their targeting sgRNAs. An example cell line OVCAR-8 is shown. (C) Area under the ROC (AUROC) curve obtained for cell lines from classifying FE and non-essential genes based on the average logFC of their targeting sgRNAs. (D) Recall for sets of a priori known essential genes from MSigDB and from literature when classifying FE and non-essential genes across cell lines (5% FDR). Each circle represents a cell line and coloured by tissue type. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. (E) Genes ranked based on the average logFC of targeting sgRNAs for OVCAR-8 and enrichment of genes belonging to predefined sets of a priori known essential genes from MSigDB, at an FDR equal to 5% when classifying FE (second last column) and non-essential genes (last column). Blue numbers at the bottom indicate the classification true positive rate (recall). Figure S2. Assessment of copy number bias before and after CRISPRcleanR correction across cell lines. sgRNA logFC values before and after CRISPRcleanR for eight cell lines are shown classified based on copy number (amplified or deleted) and expression status. Copy number segments were identified using Genomics of Drug Sensitivity in Cancer (GDSC) and Cell Line Encyclopedia (CCLE) datasets. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. Asterisks indicate significant associations between sgRNA LogFC values (Welchs t-test, p < 0,005) and their different effect sizes accounting for the standard deviation (Cohen’s D value), compared to the whole sgRNA library. Figure S3. CN-associated effect on sgRNA logFC values in highly biased cell lines. For 3 cell lines, recall curves of non-essential genes, fitness essential genes, copy number (CN) amplified and CN amplified non-expressed genes obtained when classifying genes based on the average logFC values of their targeting sgRNAs. Figure S4. Assessment of CN-associated bias across all cell lines. LogFC values of sgRNAs averaged within segments of equal copy number (CN). One plot per cell line, with CN values at which a significant differences (Welchs t-test, p < 0.05) with respect to the logFCs corresponding to CN = 2 are initially observed (bias starting point) and start to significantly increase continuously (bias critical point). CN-associated bias is shown for all sgRNA, when excluding FE genes and histones, and for non-expressed genes only. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. Figure S5. CRISPRcleanR correction varying the minimal number of genes required and the effect of fitness essential genes. Recall reduction of (A) amplified or (B) amplified not-expressed genes versus that of fitness essential and other prior known essential genes, when comparing CRISPRcleanR correction varying the minimal number of genes to be targeted by sgRNA in a biased segment (default parameter is n = 3). Similar results were observed when performing the analysis including or excluding known essential genes. Figure S6. CRISPRcleanR performances across 342 cell lines from an independent dataset. Recall at 5% FDR of predefined sets of genes based on their uncorrected or corrected logFCs (coordinates on the two axis) averaged across targeting sgRNAs for 342 cell lines from the Project Achilles. Figure S7. CRISPRcleanR performances in relation to data quality. The impact of data quality on recall at 5% false discovery rate (FDR) assessed following CRISPRcleanR correction for predefined set of genes. Project Achilles data (n = 342 cell lines) was binned based on the quality of uncorrected essentiality profile. This is obtained by measuring the recall at 5% FDR for predefined essential genes (from the Molecular Signature Database) and grouping the cell lines in 10 equidistant bins (1 lowest quality and 10 highest quality) when sorting them based on this value. Recall increment for fitness essential genes was greatest for the lower quality data, indicating that CRISPRcleanR can improve true signal of gene depletion in low quality data. Figure S8. Minimal impact of CRISPRcleanR on loss/gain-of-fitness effects. (A) The percentage of genes where the significance of their fitness effect (gain- or loss-of-fitness) is altered after CRISPRcleanR for Project Score and Project Achilles data. The upper row shows correction effects for all screened genes and the lower row for the subset of genes with a significant effect in the uncorrected data. Each dot is a separate cell line. Blue dots indicate the percentage of genes where significance is lost or gained post correction. Green dots indicate the percentage of genes where the fitness effect is distorted and the effect is opposite in the uncorrected data. (B) The majority of the loss-of-fitness genes impacted by correction are putative false positive effects affecting genes which are either not-expressed (FPKM < 0.5), amplified, known non-essential, or exhibit a mild phenotype in the screening data. (C) Summary of overall impact of CRISPRcleanR on fitness effects following correction when considering data for all cell lines. The colors reflect the percentage of genes with a loss-of-fitness, no phenotype or gain-of-fitness effect which are retained in the corrected data. Figure S9. CRISPRcleanR retains cancer driver gene dependencies in Project Score and Achilles data. (A) Each circle represents a tested cancer driver gene dependency (mutation or amplification of a copy number segment) and the statistical significance using MaGeCK before (x-axis) and after (y-axis) CRISPRcleanR correction, across the two screens. Plots in the first row show depletion FDR values pre/post-correction, whereas those in the second row show depletion FDR values pre-correction and enrichment FDR values post-correction. (B) Details of the tested genetic dependencies and whether they are shared before and after CRISPRcleanR correction at two different thresholds of statistical significance (5 and 10% FDR, respectively for 1st and 2nd row of plots). The third row indicates the type of alteration involving the cancer driver genes under consideration and the total number of cell lines with an alteration. (ZIP 191 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on optimal design of industrial refrigeration system.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison results of different algorithms on CEC2017 functions with D=30.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on three-bar truss design problem.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on welded beam design problem.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on pressure vessel design problem.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on speed reducer design problem.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison results of different algorithms on 23 benchmark functions with D=30.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison results of different algorithms on CEC2019 functions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
metagenomicsSupplementary Figure 1. Comparison of Immune Cell Counts Between HIV-detected patients and HIV-undetected individuals. Comparison of CD4+ T-cell counts, CD8+ T-cell counts, and CD4+/CD8+ ratios between individuals with detectable HIV (HIV_detected) ad individuals with undetectable HIV (HIV_undetected). Each violin plot illustrates the distribution of the data, with the box plot representing the interquartile range (IQR), and the median indicated by the line within the box. (a) A highly significant difference in CD4+ T-cell counts is observed between the HIV_detected group and the HIV_undetected group (p
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The supplementary file includes the computational details of ClusterPSO and supplementary Figures and Tables. Length distribution of the results of CpGIS, CpGCluster, CPSORL, and ClusterPSO in the human genome (Figure A). Distribution of the results of CpG islands in the human genome (Figure B). XY charts comparing the true positive and false positive rates amongst the six methods for six contig sequences (Figure C). Box plot comparing the stability of five methods in six contig sequences (Figure D). Box plot of the O/E ratio for each interval length in the human genome (Figure E). Number of CpG islands located in gene regions identified with CPSORL and ClusterPSO (Table A). Performance measurement of ClusterPSO and CPSORL for all chromosomes in the human genome (Table B). Number of detection CpG islands overlapping on true CpG islands for CpGcluster, CPSORL and ClusterPSO for all chromosomes in the human genome (Table C). (DOC)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The objective of the max-cut problem is to cut any graph in such a way that the total weight of the edges that are cut off is maximum in both subsets of vertices that are divided due to the cut of the edges. Although it is an elementary graph partitioning problem, it is one of the most challenging combinatorial optimization-based problems, and tons of application areas make this problem highly admissible. Due to its admissibility, the problem is solved using the Harris Hawk Optimization algorithm (HHO). Though HHO effectively solved some engineering optimization problems, is sensitive to parameter settings and may converge slowly, potentially getting trapped in local optima. Thus, HHO and some additional operators are used to solve the max-cut problem. Crossover and refinement operators are used to modify the fitness of the hawk in such a way that they can provide precise results. A mutation mechanism along with an adjustment operator has improvised the outcome obtained from the updated hawk. To accept the potential result, the acceptance criterion has been used, and then the repair operator is applied in the proposed approach. The proposed system provided comparatively better outcomes on the G-set dataset than other state-of-the-art algorithms. It obtained 533 cuts more than the discrete cuckoo search algorithm in 9 instances, 1036 cuts more than PSO-EDA in 14 instances, and 1021 cuts more than TSHEA in 9 instances. But for four instances, the cuts are lower than PSO-EDA and TSHEA. Besides, the statistical significance has also been tested using the Wilcoxon signed rank test to provide proof of the superior performance of the proposed method. In terms of solution quality, MC-HHO can produce outcomes that are quite competitive when compared to other related state-of-the-art algorithms.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Correlation between sentiment polarity/sentiment subjectivity and one-year mortality of elderly sepsis patients.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAdditional file 2: Supplemental Figure 1. Flowcharts of the analytic samples for FOCUS 1.0 and FOCUS 2.0 survey waves. Supplemental Figure 2A. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.1.0 departments. Supplemental Figure 2B. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.2.0 departments.