Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.
The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.
Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:
Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.
Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.
Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.
Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).
We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.
Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.
Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
A typical approach to the joint analysis of two high-dimensional datasets is to decompose each data matrix into three parts: a low-rank common matrix that captures the shared information across datasets, a low-rank distinctive matrix that characterizes the individual information within a single dataset, and an additive noise matrix. Existing decomposition methods often focus on the orthogonality between the common and distinctive matrices, but inadequately consider the more necessary orthogonal relationship between the two distinctive matrices. The latter guarantees that no more shared information is extractable from the distinctive matrices. We propose decomposition-based canonical correlation analysis (D-CCA), a novel decomposition method that defines the common and distinctive matrices from the ℒ2 space of random variables rather than the conventionally used Euclidean space, with a careful construction of the orthogonal relationship between distinctive matrices. D-CCA represents a natural generalization of the traditional canonical correlation analysis. The proposed estimators of common and distinctive matrices are shown to be consistent and have reasonably better performance than some state-of-the-art methods in both simulated data and the real data analysis of breast cancer data obtained from The Cancer Genome Atlas.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset contains statistical data of International Trade Network (ITN) literature from 2003 to 2023. It includes the data sources, research content, and citation counts for each piece of literature (01_Comprehensive Statistics.xlsx). Additionally, for structure prediction (02_Structure Prediction.xlsx) and correlation analysis (03_Correlation Analysis.xlsx), a detailed classification of methodologies and analytical perspectives is provided. Finally, for each data source, we have compiled the total citation counts (04_citations_of_data.xlsx) and the total number of publications (05_publications_of_data.xlsx).
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description A DataSet of Supply Chains used by the company DataCo Global was used for the analysis. Dataset of Supply Chain , which allows the use of Machine Learning Algorithms and R Software. Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.
Type Data : Structured Data : DataCoSupplyChainDataset.csv Unstructured Data : tokenized_access_logs.csv (Clickstream)
Types of Products : Clothing , Sports , and Electronic Supplies
Additionally it is attached in another file called DescriptionDataCoSupplyChain.csv, the description of each of the variables of the DataCoSupplyChainDatasetc.csv. Categories Data Mining, Supply Chain Management, Machine Learning, Big Data Analytics
Facebook
TwitterIntroductionThis study uses a non-linear model to explore the impact mechanism of change rates between internet search behavior and confirmed COVID-19 cases. The research background focuses on epidemic monitoring, leveraging internet search data as a real-time tool to capture public interest and predict epidemic development. The goal is to establish a widely applicable mathematical framework through the analysis of long-term disease data.MethodsData were sourced from the Baidu Index for COVID-19-related search behavior and confirmed COVID-19 case data from the National Health Commission of China. A logistic-based non-linear differential equation model was employed to analyze the mutual influence mechanism between confirmed case numbers and the rate of change in search behavior. Structural and operator relationships between variables were determined through segmented data fitting and regression analysis.ResultsThe results indicated a significant non-linear correlation between search behavior and confirmed COVID-19 cases. The non-linear differential equation model constructed in this study successfully passed both structural and correlation tests, with dynamic data fitting showing a high degree of consistency. The study further quantified the mutual influence between search behavior and confirmed cases, revealing a strong feedback loop between the two: changes in search behavior significantly drove the growth of confirmed cases, while the increase in confirmed cases also stimulated the public's search behavior. This finding suggests that search behavior not only reflects the development trend of the epidemic but can also serve as an effective indicator for predicting the evolution of the pandemic.DiscussionThis study enriches the understanding of epidemic transmission mechanisms by quantifying the dynamic interaction between public search behavior and epidemic spread. Compared to simple prediction models, this study focuses more on stable common mechanisms and structural analysis, laying a foundation for future research on public health events.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Generalized estimating equations (GEE) incorporate a working correlation structure that is important because the more accurately this structure reflects the true structure, the more efficiently regression parameters may be estimated. Numerous criteria have therefore been proposed to select a working structure, although no criterion will always work better than all other criteria. In practice, it will be unknown which criterion will work best. Therefore, in this manuscript we propose how to utilize information from multiple criteria. We demonstrate the benefits of our proposed approach via a simulation study in a variety of settings and then in an application example.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Joint and Individual Variation Explained (JIVE) is a model that decomposes multiple datasets obtained on the same subjects into shared structure, structure unique to each dataset, and noise. JIVE is an important tool for multimodal data integration in neuroimaging. The two most common algorithms are R.JIVE, an iterative approach, and AJIVE, which uses principal angle analysis. The joint structure in JIVE is defined by shared subspaces, but interpreting these subspaces can be challenging. In this paper, we reinterpret AJIVE as a canonical correlation analysis of principal component scores. This reformulation, which we call CJIVE, (1) provides an intuitive view of AJIVE; (2) uses a permutation test for the number of joint components; (3) can be used to predict subject scores for out-of-sample observations; and (4) is computationally fast. We conduct simulation studies that show CJIVE and AJIVE are accurate when the total signal ranks are correctly specified but, generally inaccurate when the total ranks are too large. CJIVE and AJIVE can still extract joint signal even when the joint signal variance is relatively small. JIVE methods are applied to integrate functional connectivity (resting-state fMRI) and structural connectivity (diffusion MRI) from the Human Connectome Project. Surprisingly, the edges with largest loadings in the joint component in functional connectivity do not coincide with the same edges in the structural connectivity, indicating more complex patterns than assumed in spatial priors. Using these loadings, we accurately predict joint subject scores in new participants. We also find joint scores are associated with fluid intelligence, highlighting the potential for JIVE to reveal important shared structure.
Facebook
TwitterMany of previous neuroimaging studies on neuronal structures in patients with obsessive-compulsive disorder (OCD) used univariate statistical tests on unimodal imaging measurements. Although the univariate methods revealed important aberrance of local morphometry in OCD patients, the covariance structure of the anatomical alterations remains unclear. Motivated by recent developments of multivariate techniques in the neuroimaging field, we applied a fusion method called “mCCA+jICA†on multimodal structural data of T1-weighted magnetic resonance imaging (MRI) and diffusion tensor imaging (DTI) of 30 unmedicated patients with OCD and 34 healthy controls. Amongst six highly correlated multimodal networks (p < 0.0001), we found significant alterations of the interrelated gray and white matter networks over occipital and parietal cortices, frontal interhemispheric connections and cerebella (False Discovery Rate q ≤ 0.05). In addition, we found white matter networks around basal ganglia tha...
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.
Facebook
TwitterIt is well-known that correlations in microarray data represent a serious nuisance deteriorating the performance of gene selection procedures. This paper is intended to demonstrate that the correlation structure of microarray data provides a rich source of useful information. We discuss distinct correlation substructures revealed in microarray gene expression data by an appropriate ordering of genes. These substructures include stochastic proportionality of expression signals in a large percentage of all gene pairs, negative correlations hidden in ordered gene triples, and a long sequence of weakly dependent random variables associated with ordered pairs of genes. The reported striking regularities are of general biological interest and they also have far-reaching implications for theory and practice of statistical methods of microarray data analysis. We illustrate the latter point with a method for testing differential expression of non-overlapping gene pairs. While designed for testing a different null hypothesis, this method provides an order of magnitude more accurate control of type 1 error rate compared to conventional methods of individual gene expre ssion profiling. In addition, this method is robust to the technical noise. Quantitative inference of the correlation structure has the potential to extend the analysis of microarray data far beyond currently practiced methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data sets used to prepare Figure 1 -14 in the Journal of Chemical Physics article entitled "Structure of molten NaCl and the decay of the pair-correlations." The data sets refer to the measured and simulated structure and thermodynamic properties of molten NaCl.
Facebook
TwitterUsing a structural dynamic programming model, we investigate the relative importance of family background variables and individual specific abilities in explaining cross-sectional differences in schooling attainments and wages. Each type of ability is the sum of one component correlated with family background variables and a residual (orthogonal) component which is purely individual specific. Household background variables (especially parents' education) account for 68% of the explained cross-sectional variations in schooling attainments, while ability correlated with background variables accounts for 17% and pure individual specific ability accounts for 15%. Interestingly, individual differences in wages are mostly explained by pure individual specific abilities as they account for as much as 73% of the explained variations in wages. Family background variables account for only 19%, while ability endowments correlated with family background account for 8%.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This dataset is about books. It has 1 row and is filtered where the book is The ovary : a correlation of structure and function in mammals. It features 7 columns including author, publication date, language, and book publisher.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Most methods for structural equation modeling (SEM) focused on the analysis of covariance matrices. However, “Historically, interesting psychological theories have been phrased in terms of correlation coefficients.” This might be because data in social and behavioral sciences typically do not have predefined metrics. While proper methods for conducting correlation structure analysis have been developed, they emphasized on either how to get consistent standard errors of parameter estimates or how to ensure that the model-implied matrix remains to be a correlation matrix. Motivated by the fundamental needs for more efficient/accurate parameter estimates and greater power in conducting statistical tests, this article explores advantages of correlation structure analysis over its conventional covariance counterpart. Issues related to reparameterization and placement of parameters are discussed. A new concept is introduced for comparing efficiency/accuracy of parameter estimates that are not on the same scale. Via the analysis of many real datasets, meta results show that correlation structure analysis yields uniformly more accurate parameter estimates and more powerful statistical tests than its covariance-structure-analysis counterpart on parameters that are of substantive interests. The same pattern of results between the two model parameterizations is also found by Monte Carlo simulation. Issues related to correlation structure analysis and substantive elaboration of models that are not scale-invariant are discussed as well. The results are expected to promote technical and software developments of correlation structure analysis as well as its adoption in data analysis.
Facebook
TwitterU.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
The datasets for this investigation consist of microtremor array data collected at: 1) 18 sites in Salt Lake and Utah valleys, Utah, and 2) two sites as part of the Frontier Observatory for Research in Geothermal Energy (FORGE) near Milford, Utah. Each of the 18 sites in the Salt Lake and Utah valleys were acquired with four-sensor arrays with three-component (3C) sensors having flat response from 0.033 Hz to 50 Hz. The data acquired as part of the FORGE investigation used both 3C broadband and 5-Hz geophone sensors. Additional information on these datasets can be found in the supporting documentation provided in this data release as well as in the paper by Zhang and others (2019) that utilized these data.
Facebook
TwitterThis study determined the correlation between creative thinking aptitude, measured by the Torrance Test of Creative Thinking–Figural (TTCT–F), and five-year academic achievement. The TTCT–F was administered to 135 first-year medical students at a Tokyo-based medical school in 2018. Participants’ academic records—annual GPAs over five years—were averaged, and data were analyzed in 2023. Pearson correlation coefficients examined the relationship between the TTCT–F Creativity Index and the five-year average GPA; multiple linear regression assessed the predictive value of TTCT–F components on GPA; canonical correlation analysis explored multivariate relationships. The Creativity Index demonstrated a weak, non-significant correlation with the five-year average GPA. Fluency, Originality, and Elaboration components were not significantly correlated, while Abstractness of Titles demonstrated a moderate positive correlation. Linear regression indicated that Abstractness of Titles signi..., Participants We conducted a retrospective cohort study, administering the Torrance Test of Creative Thinking Figural (TTCT–F) in 2018 as a proctored and timed test to a cohort of 135 first-year medical students at Juntendo University Faculty of Medicine in Chiba, Japan. The participants took the test simultaneously and were between the ages of 18 and 23 years old at the time. The cohort comprised 42 women (31.1%) and 93 men (68.9%) (see Table 1). The study was approved by the Juntendo University Institutional Review Board and conducted in accordance with ethical guidelines to ensure participant confidentiality and data anonymity. All participants provided informed consent before participation and allowed access to their academic records for research purposes. No exclusion criteria were applied, and all first-year medical students in the cohort were eligible to participate. Data were collected and stored in compliance with ethical guidelines to ensure confidentiality and anonymity. Instr..., , # Can creative thinking predict academic success in medical education? Correlating Torrance Test of Creative Thinking scores and five-year GPAs of Japanese medical students
https://doi.org/10.5061/dryad.79cnp5j6p
The data set shows the following:
TTCT-F scores and categories are described above in the methods section.
GPA = Grade Point Average.,
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This data set includes the results of digital image correlation analysis applied to nine experiments (Table 1) on magma-tectonic interaction performed at the Helmholtz Laboratory for Tectonic Modelling (HelTec) of the GFZ German Research Centre for Geosciences in Potsdam in the framework of EPOS transnational access activities in 2017. The models use silicone oil (PDMS G30M, Rudolf et al., 2016) and Quartz sand (G12, Rosenau et al., 2018) to simulate pre-, syn- and post-tectonic intrusion of granitic magma into upper crustal shear zones of simple shear and transtensional (15° obliquity) kinematics. Three reference experiments (simple shear, transtension, intrusion) are also reported. Detailed descriptions of the experiments can be found in Michail et al. (submitted) to which this data set is supplement. The models have been monitored by means of digital image correlation (DIC) analysis including Particle Image Velocimetry (PIV; Adam et al., 2005) and Structure from Motion photogrammetry (SfM; Donnadieu et al., 2003; Westoby et al., 2012). DIC analysis yields quantitative model surface deformation information by means of 3D surface topography and displacements from which surface strain has been calculated. The data presented here are visualized as surface deformation maps and movies, as well as digital elevation and intrusion models. The results of a shape analysis of the model plutons is provided, too.
Facebook
TwitterGas production data from 63 wells in the Cottageville Gas Field, producing from Devonian shales, are studied in relationship to structure above and below producing horizons, isopach data and dip of producing shales, and basement structure trends. Gas production data are studied from several aspects including highest accumulated production, mean annual production, initial well pressure, and calculated loss ratio values for four different time periods. A trend correlation of these parameters is presented. The initial pressure trends correlate with all geological parameters, i.e., Devonian shale dip and strike, 40 to 50/sup 0/ NE fracture facies trend, structure on the base of the Huron, structure on the top of the Onondaga, and the basement magnetic density data. Production data trends show greatest correlation with structure on the top of the Onondaga and with fracture facies trends from the Baler well. Production decline data in terms of loss ratio values show trends correlating with all geologic parameters except the Onondaga. Two loss ratio maps correlate with the structure on the bottom of the Huron. The strike of Onondaga structure correlates with the 40 to 50/sup 0/ NE fracture facies trend. These parameters may be generally viewed as the production maps representing free gas pockets and migration-accumulation trends; the loss ratios as possible permeability and migration trend indictors; and the geologic parameters as possible constraints or causative agents. The lack of correlation of geologic parameters with production data trends a few degrees west of north may be suggestive of a fault or faults in that direction, providing the correlative causative agent. This is not an unreasonable possibility from the production data maps. It is concluded that this approach could be useful in gas exploration and development evaluation of Appalachian Devonian shale gas fields. Similar relationships will be examined in the Eastern Kentucky Gas Field(s) Study presently in progress.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
This repository contains data from experimental tests on plastered rubble stone masonry walls conducted at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This deposition includes the diffraction images generated by the paired polystyrene spheres in random orientations. These images were used to determine and phase the single particle diffraction volume from their autocorrelation functions.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.
The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.
Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:
Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.
Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.
Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.
Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).
We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.
Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.
Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise