100+ datasets found

Z
Data Analysis for the Systematic Literature Review of DL4SE
data.niaid.nih.gov
data-staging.niaid.nih.gov
Updated Jul 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
Explore at:
Dataset updated
Jul 19, 2024
Dataset provided by
College of William and Mary
Washington and Lee University
Authors
Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
f
Data from: D-CCA: A Decomposition-based Canonical Correlation Analysis for...
figshare.com
zip
Updated Feb 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Hai Shu; Xiao Wang; Hongtu Zhu (2024). D-CCA: A Decomposition-based Canonical Correlation Analysis for High-Dimensional Datasets* [Dataset]. http://doi.org/10.6084/m9.figshare.7461734.v1
Explore at:
zipAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7461734.v1
Dataset updated
Feb 9, 2024
Dataset provided by
Taylor & Francis
Authors
Hai Shu; Xiao Wang; Hongtu Zhu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
A typical approach to the joint analysis of two high-dimensional datasets is to decompose each data matrix into three parts: a low-rank common matrix that captures the shared information across datasets, a low-rank distinctive matrix that characterizes the individual information within a single dataset, and an additive noise matrix. Existing decomposition methods often focus on the orthogonality between the common and distinctive matrices, but inadequately consider the more necessary orthogonal relationship between the two distinctive matrices. The latter guarantees that no more shared information is extractable from the distinctive matrices. We propose decomposition-based canonical correlation analysis (D-CCA), a novel decomposition method that defines the common and distinctive matrices from the ℒ2 space of random variables rather than the conventionally used Euclidean space, with a careful construction of the orthogonal relationship between distinctive matrices. D-CCA represents a natural generalization of the traditional canonical correlation analysis. The proposed estimators of common and distinctive matrices are shown to be consistent and have reasonably better performance than some state-of-the-art methods in both simulated data and the real data analysis of breast cancer data obtained from The Cancer Genome Atlas.
Statistical Dataset Supporting the Review Paper of International Trade...
figshare.com
xlsx
Updated Jul 14, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Donghai Liu; Lingli Xing (2024). Statistical Dataset Supporting the Review Paper of International Trade Network Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.26300167.v1
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.26300167.v1
Dataset updated
Jul 14, 2024
Dataset provided by
figshare
Figsharehttp://figshare.com/
Authors
Donghai Liu; Lingli Xing
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset contains statistical data of International Trade Network (ITN) literature from 2003 to 2023. It includes the data sources, research content, and citation counts for each piece of literature (01_Comprehensive Statistics.xlsx). Additionally, for structure prediction (02_Structure Prediction.xlsx) and correlation analysis (03_Correlation Analysis.xlsx), a detailed classification of methodologies and analytical perspectives is provided. Finally, for each data source, we have compiled the total citation counts (04_citations_of_data.xlsx) and the total number of publications (05_publications_of_data.xlsx).
DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS
kaggle.com
zip
Updated Oct 31, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ali Noranian (2023). DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS [Dataset]. https://www.kaggle.com/datasets/alinoranianesfahani/dataco-smart-supply-chain-for-big-data-analysis/data
Explore at:
zip(26920609 bytes)Available download formats
Dataset updated
Oct 31, 2023
Authors
Ali Noranian
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Description A DataSet of Supply Chains used by the company DataCo Global was used for the analysis. Dataset of Supply Chain , which allows the use of Machine Learning Algorithms and R Software. Areas of important registered activities : Provisioning , Production , Sales , Commercial Distribution.It also allows the correlation of Structured Data with Unstructured Data for knowledge generation.

Type Data : Structured Data : DataCoSupplyChainDataset.csv Unstructured Data : tokenized_access_logs.csv (Clickstream)

Types of Products : Clothing , Sports , and Electronic Supplies

Additionally it is attached in another file called DescriptionDataCoSupplyChain.csv, the description of each of the variables of the DataCoSupplyChainDatasetc.csv. Categories Data Mining, Supply Chain Management, Machine Learning, Big Data Analytics
f
Data Sheet 1_Non-linear correlation analysis between internet searches and...
datasetcatalog.nlm.nih.gov
frontiersin.figshare.com
Updated Apr 4, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
He, Yongzhang; Xia, Yixue; Wang, Yang; Huang, Fengxiang; Ran, Lingshi (2025). Data Sheet 1_Non-linear correlation analysis between internet searches and epidemic trends.xlsx [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0002036779
Explore at:
Dataset updated
Apr 4, 2025
Authors
He, Yongzhang; Xia, Yixue; Wang, Yang; Huang, Fengxiang; Ran, Lingshi
Description
IntroductionThis study uses a non-linear model to explore the impact mechanism of change rates between internet search behavior and confirmed COVID-19 cases. The research background focuses on epidemic monitoring, leveraging internet search data as a real-time tool to capture public interest and predict epidemic development. The goal is to establish a widely applicable mathematical framework through the analysis of long-term disease data.MethodsData were sourced from the Baidu Index for COVID-19-related search behavior and confirmed COVID-19 case data from the National Health Commission of China. A logistic-based non-linear differential equation model was employed to analyze the mutual influence mechanism between confirmed case numbers and the rate of change in search behavior. Structural and operator relationships between variables were determined through segmented data fitting and regression analysis.ResultsThe results indicated a significant non-linear correlation between search behavior and confirmed COVID-19 cases. The non-linear differential equation model constructed in this study successfully passed both structural and correlation tests, with dynamic data fitting showing a high degree of consistency. The study further quantified the mutual influence between search behavior and confirmed cases, revealing a strong feedback loop between the two: changes in search behavior significantly drove the growth of confirmed cases, while the increase in confirmed cases also stimulated the public's search behavior. This finding suggests that search behavior not only reflects the development trend of the epidemic but can also serve as an effective indicator for predicting the evolution of the pandemic.DiscussionThis study enriches the understanding of epidemic transmission mechanisms by quantifying the dynamic interaction between public search behavior and epidemic spread. Compared to simple prediction models, this study focuses more on stable common mechanisms and structural analysis, laying a foundation for future research on public health events.
Data from: Approaches for the utilization of multiple criteria to select a...
tandf.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip M. Westgate (2023). Approaches for the utilization of multiple criteria to select a working correlation structure for use within generalized estimating equations [Dataset]. http://doi.org/10.6084/m9.figshare.7422707.v2
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.7422707.v2
Dataset updated
May 31, 2023
Dataset provided by
Taylor & Francishttps://taylorandfrancis.com/
Authors
Philip M. Westgate
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Generalized estimating equations (GEE) incorporate a working correlation structure that is important because the more accurately this structure reflects the true structure, the more efficiently regression parameters may be estimated. Numerous criteria have therefore been proposed to select a working structure, although no criterion will always work better than all other criteria. In practice, it will be unknown which criterion will work best. Therefore, in this manuscript we propose how to utilize information from multiple criteria. We demonstrate the benefits of our proposed approach via a simulation study in a variety of settings and then in an application example.
Data_Sheet_1_Interpretive JIVE: Connections with CCA and an application to...
frontiersin.figshare.com
pdf
Updated May 31, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Raphiel J. Murden; Zhengwu Zhang; Ying Guo; Benjamin B. Risk (2023). Data_Sheet_1_Interpretive JIVE: Connections with CCA and an application to brain connectivity.PDF [Dataset]. http://doi.org/10.3389/fnins.2022.969510.s001
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.3389/fnins.2022.969510.s001
Dataset updated
May 31, 2023
Dataset provided by
Frontiers Mediahttp://www.frontiersin.org/
Authors
Raphiel J. Murden; Zhengwu Zhang; Ying Guo; Benjamin B. Risk
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Joint and Individual Variation Explained (JIVE) is a model that decomposes multiple datasets obtained on the same subjects into shared structure, structure unique to each dataset, and noise. JIVE is an important tool for multimodal data integration in neuroimaging. The two most common algorithms are R.JIVE, an iterative approach, and AJIVE, which uses principal angle analysis. The joint structure in JIVE is defined by shared subspaces, but interpreting these subspaces can be challenging. In this paper, we reinterpret AJIVE as a canonical correlation analysis of principal component scores. This reformulation, which we call CJIVE, (1) provides an intuitive view of AJIVE; (2) uses a permutation test for the number of joint components; (3) can be used to predict subject scores for out-of-sample observations; and (4) is computationally fast. We conduct simulation studies that show CJIVE and AJIVE are accurate when the total signal ranks are correctly specified but, generally inaccurate when the total ranks are too large. CJIVE and AJIVE can still extract joint signal even when the joint signal variance is relatively small. JIVE methods are applied to integrate functional connectivity (resting-state fMRI) and structural connectivity (diffusion MRI) from the Human Connectome Project. Surprisingly, the edges with largest loadings in the joint component in functional connectivity do not coincide with the same edges in the structural connectivity, indicating more complex patterns than assumed in spatial priors. Using these loadings, we accurately predict joint subject scores in new participants. We also find joint scores are associated with fluid intelligence, highlighting the potential for JIVE to reveal important shared structure.
d
Alterations of gray and white matter networks in patients with...
search.dataone.org
data.niaid.nih.gov
+3more
Updated Apr 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Seung-Goo Kim; Wi Hoon Jung; Sung Nyun Kim; Joon Hwan Jang; Jun Soo Kwon (2025). Alterations of gray and white matter networks in patients with obsessive-compulsive disorder: a multimodal fusion analysis of structural MRI and DTI using mCCA+jICA [Dataset]. http://doi.org/10.5061/dryad.5jv56
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.5jv56
Dataset updated
Apr 2, 2025
Dataset provided by
Dryad Digital Repository
Authors
Seung-Goo Kim; Wi Hoon Jung; Sung Nyun Kim; Joon Hwan Jang; Jun Soo Kwon
Time period covered
Apr 24, 2016
Description
Many of previous neuroimaging studies on neuronal structures in patients with obsessive-compulsive disorder (OCD) used univariate statistical tests on unimodal imaging measurements. Although the univariate methods revealed important aberrance of local morphometry in OCD patients, the covariance structure of the anatomical alterations remains unclear. Motivated by recent developments of multivariate techniques in the neuroimaging field, we applied a fusion method called â€œmCCA+jICAâ€ on multimodal structural data of T1-weighted magnetic resonance imaging (MRI) and diffusion tensor imaging (DTI) of 30 unmedicated patients with OCD and 34 healthy controls. Amongst six highly correlated multimodal networks (p < 0.0001), we found significant alterations of the interrelated gray and white matter networks over occipital and parietal cortices, frontal interhemispheric connections and cerebella (False Discovery Rate q â‰¤ 0.05). In addition, we found white matter networks around basal ganglia tha...
Expression reflects population structure
plos.figshare.com
pdf
Updated Jun 4, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Brielin C. Brown; Nicolas L. Bray; Lior Pachter (2023). Expression reflects population structure [Dataset]. http://doi.org/10.1371/journal.pgen.1007841
Explore at:
pdfAvailable download formats
Unique identifier
https://doi.org/10.1371/journal.pgen.1007841
Dataset updated
Jun 4, 2023
Dataset provided by
PLOShttp://plos.org/
Authors
Brielin C. Brown; Nicolas L. Bray; Lior Pachter
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Population structure in genotype data has been extensively studied, and is revealed by looking at the principal components of the genotype matrix. However, no similar analysis of population structure in gene expression data has been conducted, in part because a naïve principal components analysis of the gene expression matrix does not cluster by population. We identify a linear projection that reveals population structure in gene expression data. Our approach relies on the coupling of the principal components of genotype to the principal components of gene expression via canonical correlation analysis. Our method is able to determine the significance of the variance in the canonical correlation projection explained by each gene. We identify 3,571 significant genes, only 837 of which had been previously reported to have an associated eQTL in the GEUVADIS results. We show that our projections are not primarily driven by differences in allele frequency at known cis-eQTLs and that similar projections can be recovered using only several hundred randomly selected genes and SNPs. Finally, we present preliminary work on the consequences for eQTL analysis. We observe that using our projection co-ordinates as covariates results in the discovery of slightly fewer genes with eQTLs, but that these genes replicate in GTEx matched tissue at a slightly higher rate.
d
Replication data for: Diverse Correlation Structures in Microarray Gene...
datamed.org
dataverse.harvard.edu
Updated Oct 8, 2007
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2007). Replication data for: Diverse Correlation Structures in Microarray Gene Expression Data [Dataset]. https://datamed.org/display-item.php?repository=0012&idName=ID&id=56d4b887e4b0e644d313513b
Explore at:
Dataset updated
Oct 8, 2007
Description
It is well-known that correlations in microarray data represent a serious nuisance deteriorating the performance of gene selection procedures. This paper is intended to demonstrate that the correlation structure of microarray data provides a rich source of useful information. We discuss distinct correlation substructures revealed in microarray gene expression data by an appropriate ordering of genes. These substructures include stochastic proportionality of expression signals in a large percentage of all gene pairs, negative correlations hidden in ordered gene triples, and a long sequence of weakly dependent random variables associated with ordered pairs of genes. The reported striking regularities are of general biological interest and they also have far-reaching implications for theory and practice of statistical methods of microarray data analysis. We illustrate the latter point with a method for testing differential expression of non-overlapping gene pairs. While designed for testing a different null hypothesis, this method provides an order of magnitude more accurate control of type 1 error rate compared to conventional methods of individual gene expre ssion profiling. In addition, this method is robust to the technical noise. Quantitative inference of the correlation structure has the potential to extend the analysis of microarray data far beyond currently practiced methods.
U
Data sets for "Structure of molten NaCl and the decay of the...
researchdata.bath.ac.uk
jpeg, txt
Updated Aug 26, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Philip Salmon; Anita Zeidler (2022). Data sets for "Structure of molten NaCl and the decay of the pair-correlations" [Dataset]. http://doi.org/10.15125/BATH-01165
Explore at:
jpeg, txtAvailable download formats
Unique identifier
https://doi.org/10.15125/BATH-01165
Dataset updated
Aug 26, 2022
Dataset provided by
University of Bath
Authors
Philip Salmon; Anita Zeidler
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Dataset funded by
Japan Society for the Promotion of Science
Engineering and Physical Sciences Research Council
Description
Data sets used to prepare Figure 1 -14 in the Journal of Chemical Physics article entitled "Structure of molten NaCl and the decay of the pair-correlations." The data sets refer to the measured and simulated structure and thermodynamic properties of molten NaCl.
r
Structural estimates of the intergenerational education correlation...
resodate.org
Updated Oct 6, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Christian Belzil (2025). Structural estimates of the intergenerational education correlation (replication data) [Dataset]. https://resodate.org/resources/aHR0cHM6Ly9qb3VybmFsZGF0YS56YncuZXUvZGF0YXNldC9zdHJ1Y3R1cmFsLWVzdGltYXRlcy1vZi10aGUtaW50ZXJnZW5lcmF0aW9uYWwtZWR1Y2F0aW9uLWNvcnJlbGF0aW9u
Explore at:
Dataset updated
Oct 6, 2025
Dataset provided by
ZBW Journal Data Archive
ZBW
Journal of Applied Econometrics
Authors
Christian Belzil
Description
Using a structural dynamic programming model, we investigate the relative importance of family background variables and individual specific abilities in explaining cross-sectional differences in schooling attainments and wages. Each type of ability is the sum of one component correlated with family background variables and a residual (orthogonal) component which is purely individual specific. Household background variables (especially parents' education) account for 68% of the explained cross-sectional variations in schooling attainments, while ability correlated with background variables accounts for 17% and pure individual specific ability accounts for 15%. Interestingly, individual differences in wages are mostly explained by pure individual specific abilities as they account for as much as 73% of the explained variations in wages. Family background variables account for only 19%, while ability endowments correlated with family background account for 8%.
w
Dataset of books called The ovary : a correlation of structure and function...
workwithdata.com
Updated Apr 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Work With Data (2025). Dataset of books called The ovary : a correlation of structure and function in mammals [Dataset]. https://www.workwithdata.com/datasets/books?f=1&fcol0=book&fop0=%3D&fval0=The+ovary+%3A+a+correlation+of+structure+and+function+in+mammals
Explore at:
Dataset updated
Apr 17, 2025
Dataset authored and provided by
Work With Data
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This dataset is about books. It has 1 row and is filtered where the book is The ovary : a correlation of structure and function in mammals. It features 7 columns including author, publication date, language, and book publisher.
f
Data from: Parameterizing the LISREL Model as a Correlation Structure Model...
tandf.figshare.com
txt
Updated May 14, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ke-Hai Yuan; Zhiyong Zhang (2025). Parameterizing the LISREL Model as a Correlation Structure Model for More Efficient Parameter Estimates and More Powerful Statistical Tests [Dataset]. http://doi.org/10.6084/m9.figshare.28410180.v1
Explore at:
txtAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.28410180.v1
Dataset updated
May 14, 2025
Dataset provided by
Taylor & Francis
Authors
Ke-Hai Yuan; Zhiyong Zhang
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Most methods for structural equation modeling (SEM) focused on the analysis of covariance matrices. However, “Historically, interesting psychological theories have been phrased in terms of correlation coefficients.” This might be because data in social and behavioral sciences typically do not have predefined metrics. While proper methods for conducting correlation structure analysis have been developed, they emphasized on either how to get consistent standard errors of parameter estimates or how to ensure that the model-implied matrix remains to be a correlation matrix. Motivated by the fundamental needs for more efficient/accurate parameter estimates and greater power in conducting statistical tests, this article explores advantages of correlation structure analysis over its conventional covariance counterpart. Issues related to reparameterization and placement of parameters are discussed. A new concept is introduced for comparing efficiency/accuracy of parameter estimates that are not on the same scale. Via the analysis of many real datasets, meta results show that correlation structure analysis yields uniformly more accurate parameter estimates and more powerful statistical tests than its covariance-structure-analysis counterpart on parameters that are of substantive interests. The same pattern of results between the two model parameterizations is also found by Monte Carlo simulation. Issues related to correlation structure analysis and substantive elaboration of models that are not scale-invariant are discussed as well. The results are expected to promote technical and software developments of correlation structure analysis as well as its adoption in data analysis.
U
A Bayesian Monte-Carlo Inversion of Spatial Auto-Correlation (SPAC) for...
data.usgs.gov
datasets.ai
+2more
Updated Apr 5, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zhang Huajun; Pankow Kristine; Stephenson William J (2019). A Bayesian Monte-Carlo Inversion of Spatial Auto-Correlation (SPAC) for Near-Surface Vs Structure Applied to both Broadband and Geophone Data - Data release [Dataset]. http://doi.org/10.5066/P9OXYQST
Explore at:
Unique identifier
https://doi.org/10.5066/P9OXYQST
Dataset updated
Apr 5, 2019
Dataset provided by
United States Geological Survey
Authors
Zhang Huajun; Pankow Kristine; Stephenson William J
License
U.S. Government Workshttps://www.usa.gov/government-works
License information was derived automatically
Time period covered
Sep 3, 2007 - Aug 18, 2017
Description
The datasets for this investigation consist of microtremor array data collected at: 1) 18 sites in Salt Lake and Utah valleys, Utah, and 2) two sites as part of the Frontier Observatory for Research in Geothermal Energy (FORGE) near Milford, Utah. Each of the 18 sites in the Salt Lake and Utah valleys were acquired with four-sensor arrays with three-component (3C) sensors having flat response from 0.033 Hz to 50 Hz. The data acquired as part of the FORGE investigation used both 3C broadband and 5-Hz geophone sensors. Additional information on these datasets can be found in the supporting documentation provided in this data release as well as in the paper by Zhang and others (2019) that utilized these data.
d
Data from: Can creative thinking predict academic success in medical...
search.dataone.org
datadryad.org
Updated Apr 10, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Marcellus Nealy; Takeo Higuchi; Hiroyuki Daida; Yuichi Tomiki; Dennis Dew (2025). Can creative thinking predict academic success in medical education? Correlating Torrance Test of Creative Thinking scores and five-year GPAs of Japanese medical students [Dataset]. http://doi.org/10.5061/dryad.79cnp5j6p
Explore at:
Unique identifier
https://doi.org/10.5061/dryad.79cnp5j6p
Dataset updated
Apr 10, 2025
Dataset provided by
Dryad Digital Repository
Authors
Marcellus Nealy; Takeo Higuchi; Hiroyuki Daida; Yuichi Tomiki; Dennis Dew
Description
This study determined the correlation between creative thinking aptitude, measured by the Torrance Test of Creative Thinkingâ€“Figural (TTCTâ€“F), and five-year academic achievement. The TTCTâ€“F was administered to 135 first-year medical students at a Tokyo-based medical school in 2018. Participantsâ€™ academic recordsâ€”annual GPAs over five yearsâ€”were averaged, and data were analyzed in 2023. Pearson correlation coefficients examined the relationship between the TTCTâ€“F Creativity Index and the five-year average GPA; multiple linear regression assessed the predictive value of TTCTâ€“F components on GPA; canonical correlation analysis explored multivariate relationships. The Creativity Index demonstrated a weak, non-significant correlation with the five-year average GPA. Fluency, Originality, and Elaboration components were not significantly correlated, while Abstractness of Titles demonstrated a moderate positive correlation. Linear regression indicated that Abstractness of Titles signi..., Participants We conducted a retrospective cohort study, administering the Torrance Test of Creative Thinking Figural (TTCTâ€“F) in 2018 as a proctored and timed test to a cohort of 135 first-year medical students at Juntendo University Faculty of Medicine in Chiba, Japan. The participants took the test simultaneously and were between the ages of 18 and 23 years old at the time. The cohort comprised 42 women (31.1%) and 93 men (68.9%) (see Table 1). The study was approved by the Juntendo University Institutional Review Board and conducted in accordance with ethical guidelines to ensure participant confidentiality and data anonymity. All participants provided informed consent before participation and allowed access to their academic records for research purposes. No exclusion criteria were applied, and all first-year medical students in the cohort were eligible to participate. Data were collected and stored in compliance with ethical guidelines to ensure confidentiality and anonymity. Instr..., , # Can creative thinking predict academic success in medical education? Correlating Torrance Test of Creative Thinking scores and five-year GPAs of Japanese medical students

https://doi.org/10.5061/dryad.79cnp5j6p

Description of the data and file structure

Data Set

The data set shows the following:

An anonymous student number is assigned to each student from 1 to 135.

Gender

Torrance Test of Creative Thinking Figural (TTCT-F) Fluency ScoreÂ

TTCT-F Originality Score

TTCT-F Elaboration Score

TTCT-F Abstractness of Titles Score

TTCT-F Premature Closure Score

TTCT-F Sum Score

TTCT-F Average Standard Score

TTCT-F Creativity Index Score

First year of medical school (M1) GPA - Fifth year of medical school (M5) GPA

Average 5-year GPA

TTCT-F scores and categories are described above in the methods section.

GPA = Grade Point Average.,
g
Data from: Digital image correlation data from analogue modelling...
dataservices.gfz-potsdam.de
Updated 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Maria Michail; Michael Rudolf; Matthias Rosenau; Alberto Riva; Piero Gianolla; Massimo Coltorti; Alberto Riva (2021). Digital image correlation data from analogue modelling experiments addressing magma emplacement along simple shear and transtensional fault zones [Dataset]. http://doi.org/10.5880/gfz.4.1.2021.004
Explore at:
Unique identifier
https://doi.org/10.5880/gfz.4.1.2021.004
Dataset updated
2021
Dataset provided by
datacite
GFZ Data Services
Authors
Maria Michail; Michael Rudolf; Matthias Rosenau; Alberto Riva; Piero Gianolla; Massimo Coltorti; Alberto Riva
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This data set includes the results of digital image correlation analysis applied to nine experiments (Table 1) on magma-tectonic interaction performed at the Helmholtz Laboratory for Tectonic Modelling (HelTec) of the GFZ German Research Centre for Geosciences in Potsdam in the framework of EPOS transnational access activities in 2017. The models use silicone oil (PDMS G30M, Rudolf et al., 2016) and Quartz sand (G12, Rosenau et al., 2018) to simulate pre-, syn- and post-tectonic intrusion of granitic magma into upper crustal shear zones of simple shear and transtensional (15° obliquity) kinematics. Three reference experiments (simple shear, transtension, intrusion) are also reported. Detailed descriptions of the experiments can be found in Michail et al. (submitted) to which this data set is supplement. The models have been monitored by means of digital image correlation (DIC) analysis including Particle Image Velocimetry (PIV; Adam et al., 2005) and Structure from Motion photogrammetry (SfM; Donnadieu et al., 2003; Westoby et al., 2012). DIC analysis yields quantitative model surface deformation information by means of 3D surface topography and displacements from which surface strain has been calculated. The data presented here are visualized as surface deformation maps and movies, as well as digital elevation and intrusion models. The results of a shape analysis of the model plutons is provided, too.
w
Pilot study of gas production analysis methods applied to Cottageville field...
data.wu.ac.at
html
Updated Sep 29, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2016). Pilot study of gas production analysis methods applied to Cottageville field [Dataset]. https://data.wu.ac.at/odso/edx_netl_doe_gov/MDA2M2I4NmYtNTBkNS00MGM1LTlkYWQtZjQ5NTcwMjMyZGFj
Explore at:
htmlAvailable download formats
Dataset updated
Sep 29, 2016
Description
Gas production data from 63 wells in the Cottageville Gas Field, producing from Devonian shales, are studied in relationship to structure above and below producing horizons, isopach data and dip of producing shales, and basement structure trends. Gas production data are studied from several aspects including highest accumulated production, mean annual production, initial well pressure, and calculated loss ratio values for four different time periods. A trend correlation of these parameters is presented. The initial pressure trends correlate with all geological parameters, i.e., Devonian shale dip and strike, 40 to 50/sup 0/ NE fracture facies trend, structure on the base of the Huron, structure on the top of the Onondaga, and the basement magnetic density data. Production data trends show greatest correlation with structure on the top of the Onondaga and with fracture facies trends from the Baler well. Production decline data in terms of loss ratio values show trends correlating with all geologic parameters except the Onondaga. Two loss ratio maps correlate with the structure on the bottom of the Huron. The strike of Onondaga structure correlates with the 40 to 50/sup 0/ NE fracture facies trend. These parameters may be generally viewed as the production maps representing free gas pockets and migration-accumulation trends; the loss ratios as possible permeability and migration trend indictors; and the geologic parameters as possible constraints or causative agents. The lack of correlation of geologic parameters with production data trends a few degrees west of north may be suggestive of a fault or faults in that direction, providing the correlative causative agent. This is not an unreasonable possibility from the production data maps. It is concluded that this approach could be useful in gas exploration and development evaluation of Appalachian Devonian shale gas fields. Similar relationships will be examined in the Eastern Kentucky Gas Field(s) Study presently in progress.
b
Experimental data on plastered rubble stone masonry walls
experiments.builtenvdata.eu
zenodo.org
Updated Oct 24, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Radhakrishna Achanta; Katrin Beyer; Katrin Beyer; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie (2024). Experimental data on plastered rubble stone masonry walls [Dataset]. http://doi.org/10.5281/zenodo.5052675
Explore at:
Unique identifier
https://doi.org/10.5281/zenodo.5052675
Dataset updated
Oct 24, 2024
Dataset provided by
EUCENTRE
Authors
Radhakrishna Achanta; Katrin Beyer; Katrin Beyer; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie; Radhakrishna Achanta; Michele Godio; Amir Rezaie
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
This repository contains data from experimental tests on plastered rubble stone masonry walls conducted at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland.
c
Data from: Single-particle structure determination by correlations of...
cxidb.org
Updated Mar 25, 2013
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
D. Starodub (2013). Single-particle structure determination by correlations of snapshot X-ray diffraction patterns [Dataset]. http://doi.org/10.11577/1096925
Explore at:
Unique identifier
https://doi.org/10.11577/1096925
Dataset updated
Mar 25, 2013
Authors
D. Starodub
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
This deposition includes the diffraction images generated by the paired polystyrene spheres in random orientations. These images were used to determine and phase the single particle diffraction volume from their autocorrelation functions.

Facebook

Twitter

Click to copy link

Link copied

Cite

Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586

Data Analysis for the Systematic Literature Review of DL4SE

Explore at:

Dataset updated

Jul 19, 2024

Dataset provided by

College of William and Mary
Washington and Lee University

Authors

Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk

License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise

Clear search

Close search

Google apps

Main menu

Data Analysis for the Systematic Literature Review of DL4SE

Data from: D-CCA: A Decomposition-based Canonical Correlation Analysis for...

Statistical Dataset Supporting the Review Paper of International Trade...

DataCo SMART SUPPLY CHAIN FOR BIG DATA ANALYSIS

Data Sheet 1_Non-linear correlation analysis between internet searches and...

Data from: Approaches for the utilization of multiple criteria to select a...

Data_Sheet_1_Interpretive JIVE: Connections with CCA and an application to...

Alterations of gray and white matter networks in patients with...

Expression reflects population structure

Replication data for: Diverse Correlation Structures in Microarray Gene...

Data sets for "Structure of molten NaCl and the decay of the...

Structural estimates of the intergenerational education correlation...

Dataset of books called The ovary : a correlation of structure and function...

Data from: Parameterizing the LISREL Model as a Correlation Structure Model...

A Bayesian Monte-Carlo Inversion of Spatial Auto-Correlation (SPAC) for...

Data from: Can creative thinking predict academic success in medical...

Description of the data and file structure

Data Set

Data from: Digital image correlation data from analogue modelling...

Pilot study of gas production analysis methods applied to Cottageville field...

Experimental data on plastered rubble stone masonry walls

Data from: Single-particle structure determination by correlations of...

Data Analysis for the Systematic Literature Review of DL4SE