Facebook
TwitterAdditional file 2: Supplemental Figure 1. Flowcharts of the analytic samples for FOCUS 1.0 and FOCUS 2.0 survey waves. Supplemental Figure 2A. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.1.0 departments. Supplemental Figure 2B. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.2.0 departments.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging results for the used measures described in the paper. For each subject, it includes multiple columns: A. a sequential student ID B an ID that defines a random group label and the notation C. the used notation: user Story or use Cases D. the case they were assigned to: IFA, Sim, or Hos E. the subject's exam grade (total points out of 100). Empty cells mean that the subject did not take the first exam F. a categorical representation of the grade L/M/H, where H is greater or equal to 80, M is between 65 included and 80 excluded, L otherwise G. the total number of classes in the student's conceptual model H. the total number of relationships in the student's conceptual model I. the total number of classes in the expert's conceptual model J. the total number of relationships in the expert's conceptual model K-O. the total number of encountered situations of alignment, wrong representation, system-oriented, omitted, missing (see tagging scheme below) P. the researchers' judgement on how well the derivation process explanation was explained by the student: well explained (a systematic mapping that can be easily reproduced), partially explained (vague indication of the mapping ), or not present.
Tagging scheme:
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., user'' instead ofurban
planner'');
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate;
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud;
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert.
All the calculations and information provided in the following sheets
originate from that raw data.
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process derivation rigor category, and per exam grade category.
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of classes within the expert model is calculated (describing the size ratio). We provide box plots to allow a visual comparison of the shape of the distribution, its central value, and its variability for each group (by case, notation, process, and exam grade) . The primary focus in this study is on the number of classes. However, we also provided the size ratio for the number of relationships between student and expert model.
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations, completeness, and correctness, respectively. Correctness is defined as the ratio of classes in a student model that is fully aligned with the classes in the corresponding expert model. It is calculated by dividing the number of aligned concepts (AL) by the sum of the number of aligned concepts (AL), omitted concepts (OM), system-oriented concepts (SO), and wrong representations (WR). Completeness on the other hand, is defined as the ratio of classes in a student model that are correctly or incorrectly represented over the number of classes in the expert model. Completeness is calculated by dividing the sum of aligned concepts (AL) and wrong representations (WR) by the sum of the number of aligned concepts (AL), wrong representations (WR) and omitted concepts (OM). The overview is complemented with general diverging stacked bar charts that illustrate correctness and completeness.
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and mediated variables. The charts are based on the relative numbers of encountered situations for each student. In addition, a "Buffer" is calculated witch solely serves the purpose of constructing the diverging stacked bar charts in Excel. Finally, at the bottom of each sheet, the significance (T-test) and effect size (Hedges' g) for both completeness and correctness are provided. Hedges' g was calculated with an online tool: https://www.psychometrica.de/effect_size.html. The independent and moderating variables can be found as follows:
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC, US.
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS, IFA.
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the derivation process is explained - well explained, partially explained, not present.
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades, converted to categorical values High, Low , and Medium.
Facebook
TwitterAttribution 2.0 (CC BY 2.0)https://creativecommons.org/licenses/by/2.0/
License information was derived automatically
Representative bright field microscopic images of TDP-43 condensates (in Hepes buffer), Bar, 25 µm (F) and quantification of condensate number (G) Box plots show the comparison of median and inter-quartile range (upper and lower quartiles) of all fields of view (FOV) from Min to Max (whiskers) of two replicates (≥ 22 FOV per condition). *p < 0.0332, **p < 0.0021 and ***p < 0.0002 by one-way ANOVA with Dunnett´s multiple comparison test to Wt, comparing the respective concentration condition (5, 10, 20 µM).. List of tagged entities: TARDBP (uniprot:Q13148), TARDBP (ncbigene:23435), TEV protease (uniprot:P04517), brightfield microscopy (bao:BAO_0000457),dose response design (obi:OBI_0001418)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on three-bar truss design problem.
Facebook
TwitterBank has multiple banking products that it sells to customer such as saving account, credit cards, investments etc. It wants to which customer will purchase its credit cards. For the same it has various kind of information regarding the demographic details of the customer, their banking behavior etc. Once it can predict the chances that customer will purchase a product, it wants to use the same to make pre-payment to the authors.
In this part I will demonstrate how to build a model, to predict which clients will subscribing to a term deposit, with inception of machine learning. In the first part we will deal with the description and visualization of the analysed data, and in the second we will go to data classification models.
-Desire target -Data Understanding -Preprocessing Data -Machine learning Model -Prediction -Comparing Results
Predict if a client will subscribe (yes/no) to a term deposit — this is defined as a classification problem.
The dataset (Assignment-2_data.csv) used in this assignment contains bank customers’ data. File name: Assignment-2_Data File format: . csv Numbers of Row: 45212 Numbers of Attributes: 17 non- empty conditional attributes attributes and one decision attribute.
https://user-images.githubusercontent.com/91852182/143783430-eafd25b0-6d40-40b8-ac5b-1c4f67ca9e02.png">
https://user-images.githubusercontent.com/91852182/143783451-3e49b817-29a6-4108-b597-ce35897dda4a.png">
Data pre-processing is a main step in Machine Learning as the useful information which can be derived it from data set directly affects the model quality so it is extremely important to do at least necessary preprocess for our data before feeding it into our model.
In this assignment, we are going to utilize python to develop a predictive machine learning model. First, we will import some important and necessary libraries.
Below we are can see that there are various numerical and categorical columns. The most important column here is y, which is the output variable (desired target): this will tell us if the client subscribed to a term deposit(binary: ‘yes’,’no’).
https://user-images.githubusercontent.com/91852182/143783456-78c22016-149b-4218-a4a5-765ca348f069.png">
We must to check missing values in our dataset if we do have any and do, we have any duplicated values or not.
https://user-images.githubusercontent.com/91852182/143783471-a8656640-ec57-4f38-8905-35ef6f3e7f30.png">
We can see that in 'age' 9 missing values and 'balance' as well 3 values missed. In this case based that our dataset it has around 45k row I will remove them from dataset. on Pic 1 and 2 you will see before and after.
https://user-images.githubusercontent.com/91852182/143783474-b3898011-98e3-43c8-bd06-2cfcde714694.png">
From the above analysis we can see that only 5289 people out of 45200 have subscribed which is roughly 12%. We can see that our dataset highly unbalanced. we need to take it as a note.
https://user-images.githubusercontent.com/91852182/143783534-a05020a8-611d-4da1-98cf-4fec811cb5d8.png">
Our list of categorical variables.
https://user-images.githubusercontent.com/91852182/143783542-d40006cd-4086-4707-a683-f654a8cb2205.png">
Our list of numerical variables.
https://user-images.githubusercontent.com/91852182/143783551-6b220f99-2c4d-47d0-90ab-18ede42a4ae5.png">
In above boxplot we can see that some point in very young age and as well impossible age. So,
https://user-images.githubusercontent.com/91852182/143783564-ad0e2a27-5df5-4e04-b5d7-6d218cabd405.png">
https://user-images.githubusercontent.com/91852182/143783589-5abf0a0b-8bab-4192-98c8-d2e04f32a5c5.png">
Now, we don’t have issues on this feature so we can use it
https://user-images.githubusercontent.com/91852182/143783599-5205eddb-a0f5-446d-9f45-cc1adbfcce67.png">
https://user-images.githubusercontent.com/91852182/143783601-e520d59c-3b21-4627-a9bb-cac06f415a1e.png">
https://user-images.githubusercontent.com/91852182/143783634-03e5a584-a6fb-4bcb-8dc5-1f3cc50f9507.png">
https://user-images.githubusercontent.com/91852182/143783640-f6e71323-abbe-49c1-9935-35ffb2d10569.png">
This attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on optimal design of industrial refrigeration system.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This database studies the performance inconsistency on the biomass HHV ultimate analysis. The research null hypothesis is the consistency in the rank of a biomass HHV model. Fifteen biomass models are trained and tested in four datasets. In each dataset, the rank invariability of these 15 models indicates the performance consistency.
The database includes the datasets and source codes to analyze the performance consistency of the biomass HHV. These datasets are stored in tabular on an excel workbook. The source codes are the biomass HHV machine learning model through the MATLAB Objected Orient Program (OOP). These machine learning models consist of eight regressions, four supervised learnings, and three neural networks.
An excel workbook, "BiomassDataSetUltimate.xlsx," collects the research datasets in six worksheets. The first worksheet, "Ultimate," contains 908 HHV data from 20 pieces of literature. The names of the worksheet column indicate the elements of the ultimate analysis on a % dry basis. The HHV column refers to the higher heating value in MJ/kg. The following worksheet, "Full Residuals," backups the model testing's residuals based on the 20-fold cross-validations. The article (Kijkarncharoensin & Innet, 2021) verifies the performance consistency through these residuals. The other worksheets present the literature datasets implemented to train and test the model performance in many pieces of literature.
A file named "SourceCodeUltimate.rar" collects the MATLAB machine learning models implemented in the article. The list of the folders in this file is the class structure of the machine learning models. These classes extend the features of the original MATLAB's Statistics and Machine Learning Toolbox to support, e.g., the k-fold cross-validation. The MATLAB script, name "runStudyUltimate.m," is the article's main program to analyze the performance consistency of the biomass HHV model through the ultimate analysis. The script instantly loads the datasets from the excel workbook and automatically fits the biomass model through the OOP classes.
The first section of the MATLAB script generates the most accurate model by optimizing the model's higher parameters. It takes a few hours for the first run to train the machine learning model via the trial and error process. The trained models can be saved in MATLAB .mat file and loaded back to the MATLAB workspace. The remaining script, separated by the script section break, performs the residual analysis to inspect the performance consistency. Furthermore, the figure of the biomass data in the 3D scatter plot, and the box plots of the prediction residuals are exhibited. Finally, the interpretations of these results are examined in the author's article.
Reference : Kijkarncharoensin, A., & Innet, S. (2022). Performance inconsistency of the Biomass Higher Heating Value (HHV) Models derived from Ultimate Analysis [Manuscript in preparation]. University of the Thai Chamber of Commerce.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison results of different algorithms on CEC2017 functions with D=30.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Supplementary Figure 1A: Box and Whisker Plots of log Aldosterone to Renin Ratio, additionally adjusted for body mass index Supplementary Figure 1B. Box and Whisker Plots of log Renin, additionally adjusted for body mass index Supplementary Figure 1C. Box and Whisker Plots of log Aldosterone, additionally adjusted for body mass index Supplementary Figure 2. Box and Whisker Plots of log ACE activity, additionally adjusted for body mass index
Facebook
TwitterFigure S1. CRISPR-KO screening data quality assessment. (A) Average correlation between sgRNAs read-count replicates across cell lines. (B) Receiver operating characteristic (ROC) curve obtained from classifying fitness essential (FE) and non-essential genes based on the average logFC of their targeting sgRNAs. An example cell line OVCAR-8 is shown. (C) Area under the ROC (AUROC) curve obtained for cell lines from classifying FE and non-essential genes based on the average logFC of their targeting sgRNAs. (D) Recall for sets of a priori known essential genes from MSigDB and from literature when classifying FE and non-essential genes across cell lines (5% FDR). Each circle represents a cell line and coloured by tissue type. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. (E) Genes ranked based on the average logFC of targeting sgRNAs for OVCAR-8 and enrichment of genes belonging to predefined sets of a priori known essential genes from MSigDB, at an FDR equal to 5% when classifying FE (second last column) and non-essential genes (last column). Blue numbers at the bottom indicate the classification true positive rate (recall). Figure S2. Assessment of copy number bias before and after CRISPRcleanR correction across cell lines. sgRNA logFC values before and after CRISPRcleanR for eight cell lines are shown classified based on copy number (amplified or deleted) and expression status. Copy number segments were identified using Genomics of Drug Sensitivity in Cancer (GDSC) and Cell Line Encyclopedia (CCLE) datasets. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. Asterisks indicate significant associations between sgRNA LogFC values (Welchs t-test, p < 0,005) and their different effect sizes accounting for the standard deviation (Cohen’s D value), compared to the whole sgRNA library. Figure S3. CN-associated effect on sgRNA logFC values in highly biased cell lines. For 3 cell lines, recall curves of non-essential genes, fitness essential genes, copy number (CN) amplified and CN amplified non-expressed genes obtained when classifying genes based on the average logFC values of their targeting sgRNAs. Figure S4. Assessment of CN-associated bias across all cell lines. LogFC values of sgRNAs averaged within segments of equal copy number (CN). One plot per cell line, with CN values at which a significant differences (Welchs t-test, p < 0.05) with respect to the logFCs corresponding to CN = 2 are initially observed (bias starting point) and start to significantly increase continuously (bias critical point). CN-associated bias is shown for all sgRNA, when excluding FE genes and histones, and for non-expressed genes only. Box and whisker plots show median, inter-quartile ranges and 95% confidence intervals. Figure S5. CRISPRcleanR correction varying the minimal number of genes required and the effect of fitness essential genes. Recall reduction of (A) amplified or (B) amplified not-expressed genes versus that of fitness essential and other prior known essential genes, when comparing CRISPRcleanR correction varying the minimal number of genes to be targeted by sgRNA in a biased segment (default parameter is n = 3). Similar results were observed when performing the analysis including or excluding known essential genes. Figure S6. CRISPRcleanR performances across 342 cell lines from an independent dataset. Recall at 5% FDR of predefined sets of genes based on their uncorrected or corrected logFCs (coordinates on the two axis) averaged across targeting sgRNAs for 342 cell lines from the Project Achilles. Figure S7. CRISPRcleanR performances in relation to data quality. The impact of data quality on recall at 5% false discovery rate (FDR) assessed following CRISPRcleanR correction for predefined set of genes. Project Achilles data (n = 342 cell lines) was binned based on the quality of uncorrected essentiality profile. This is obtained by measuring the recall at 5% FDR for predefined essential genes (from the Molecular Signature Database) and grouping the cell lines in 10 equidistant bins (1 lowest quality and 10 highest quality) when sorting them based on this value. Recall increment for fitness essential genes was greatest for the lower quality data, indicating that CRISPRcleanR can improve true signal of gene depletion in low quality data. Figure S8. Minimal impact of CRISPRcleanR on loss/gain-of-fitness effects. (A) The percentage of genes where the significance of their fitness effect (gain- or loss-of-fitness) is altered after CRISPRcleanR for Project Score and Project Achilles data. The upper row shows correction effects for all screened genes and the lower row for the subset of genes with a significant effect in the uncorrected data. Each dot is a separate cell line. Blue dots indicate the percentage of genes where significance is lost or gained post correction. Green dots indicate the percentage of genes where the fitness effect is distorted and the effect is opposite in the uncorrected data. (B) The majority of the loss-of-fitness genes impacted by correction are putative false positive effects affecting genes which are either not-expressed (FPKM < 0.5), amplified, known non-essential, or exhibit a mild phenotype in the screening data. (C) Summary of overall impact of CRISPRcleanR on fitness effects following correction when considering data for all cell lines. The colors reflect the percentage of genes with a loss-of-fitness, no phenotype or gain-of-fitness effect which are retained in the corrected data. Figure S9. CRISPRcleanR retains cancer driver gene dependencies in Project Score and Achilles data. (A) Each circle represents a tested cancer driver gene dependency (mutation or amplification of a copy number segment) and the statistical significance using MaGeCK before (x-axis) and after (y-axis) CRISPRcleanR correction, across the two screens. Plots in the first row show depletion FDR values pre/post-correction, whereas those in the second row show depletion FDR values pre-correction and enrichment FDR values post-correction. (B) Details of the tested genetic dependencies and whether they are shared before and after CRISPRcleanR correction at two different thresholds of statistical significance (5 and 10% FDR, respectively for 1st and 2nd row of plots). The third row indicates the type of alteration involving the cancer driver genes under consideration and the total number of cell lines with an alteration. (ZIP 191 kb)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on welded beam design problem.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on pressure vessel design problem.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison results of different algorithms on 23 benchmark functions with D=30.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Comparison of result on speed reducer design problem.
Facebook
TwitterValues refer to the box plots of figure 3A showing the amount of dead cells/total cells in treated samples compared to relative control samples. Data were generated from 4 independent experiments (3 replicates/experiments, n = 12).
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The comparison results of different algorithms on CEC2019 functions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
ObjectivesThis study aimed to explore the correlation between the sentiment of nursing notes and the one-year mortality of sepsis patients.MethodsThe box plot was used to compare the differences in sentiment polarity/sentiment subjectivity between different groups. Multivariate logistic regression was used to explore the correlation between sentiment polarity/sentiment subjectivity and one-year mortality of elderly sepsis patients. Ridge regression, XGBoost regression, and random forest were used to explore the importance of sentiment polarity and subjectivity in the one-year mortality of elderly sepsis patients. Restricted cubic spline (RCS) was used to explore whether there was a linear relationship between sentiment polarity, sentiment subjectivity and the one-year mortality of elderly sepsis patients. Kaplan-Meier (KM) curve was used to explore the relationship between the sentiment polarity (or sentiment subjectivity) and the 1-year death of the patient.ResultsCompared with the control group, the one-year mortality group year had lower sentiment polarity and higher sentiment subjectivity. Sentiment polarity and sentiment subjectivity were independently related to the one-year mortality of elderly sepsis patients. There was a linear relationship between sentiment polarity and the one-year mortality of elderly sepsis patients. At the same time, there was a nonlinear relationship between sentiment subjectivity and the one-year mortality of elderly sepsis patients.KM.ConclusionsThe sentiment of nursing notes was correlated with the one-year mortality of elderly sepsis patients.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Decomposition of concentration indices of stunting by background characteristics, India, 2015–16.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Prevalence (%) of stunting among under-five children by wealth quintile and background characteristics, India, 2015–16.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Concentration index of child stunting by background characteristics, India, 2019–21.
Not seeing a result you expected?
Learn how you can add new datasets to our index.
Facebook
TwitterAdditional file 2: Supplemental Figure 1. Flowcharts of the analytic samples for FOCUS 1.0 and FOCUS 2.0 survey waves. Supplemental Figure 2A. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.1.0 departments. Supplemental Figure 2B. Box and whisker plots comparing FOCUS safety climate scores by size variables for FOCUSv.2.0 departments.