Facebook
TwitterDescription of the data transformation methods for compositional data and forecasting models.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
List of data transformation methods.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
To achieve true data interoperability is to eliminate format and data model barriers, allowing you to seamlessly access, convert, and model any data, independent of format. The ArcGIS Data Interoperability extension is based on the powerful data transformation capabilities of the Feature Manipulation Engine (FME), giving you the data you want, when and where you want it.In this course, you will learn how to leverage the ArcGIS Data Interoperability extension within ArcCatalog and ArcMap, enabling you to directly read, translate, and transform spatial data according to your independent needs. In addition to components that allow you to work openly with a multitude of formats, the extension also provides a complex data model solution with a level of control that would otherwise require custom software.After completing this course, you will be able to:Recognize when you need to use the Data Interoperability tool to view or edit your data.Choose and apply the correct method of reading data with the Data Interoperability tool in ArcCatalog and ArcMap.Choose the correct Data Interoperability tool and be able to use it to convert your data between formats.Edit a data model, or schema, using the Spatial ETL tool.Perform any desired transformations on your data's attributes and geometry using the Spatial ETL tool.Verify your data transformations before, after, and during a translation by inspecting your data.Apply best practices when creating a workflow using the Data Interoperability extension.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data transformation methods, hyperparameter optimization and feature selection used in prior studies.
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global dbt for Regulated Data Transformations market size reached USD 1.32 billion in 2024, reflecting the rising adoption of compliant data transformation solutions across multiple regulated industries. The market is projected to grow at a CAGR of 19.7% through the forecast period, reaching USD 4.85 billion by 2033. This robust expansion is driven primarily by increasing regulatory scrutiny, the surge in data volumes, and the critical need for transparent, auditable, and secure data transformation processes in sectors such as healthcare, BFSI, and government.
The growth of the dbt for Regulated Data Transformations market is strongly influenced by the escalating complexity of regulatory requirements worldwide. Organizations in highly regulated industries are under constant pressure to ensure that their data transformation workflows are not only efficient but also fully traceable and compliant with standards such as GDPR, HIPAA, and SOX. The adoption of dbt (data build tool) frameworks provides a structured, auditable, and version-controlled approach to data transformation, which is essential for passing audits and avoiding hefty fines. Furthermore, the proliferation of data privacy laws and the growing emphasis on data governance have made it imperative for organizations to invest in compliant data transformation solutions, thereby fueling market demand.
Another significant growth factor is the rapid digital transformation initiatives being undertaken by enterprises globally. As organizations migrate their data infrastructure to the cloud and adopt advanced analytics, the volume and diversity of data being handled have increased exponentially. This surge necessitates robust data transformation pipelines that can handle large-scale, complex, and sensitive data while ensuring compliance with industry-specific regulations. dbt’s ability to automate, document, and test data transformations in a transparent manner makes it a preferred choice for organizations seeking to modernize their data operations without compromising regulatory compliance. The integration of dbt with leading cloud data platforms further accelerates its adoption, especially among enterprises prioritizing scalability and agility in their data ecosystems.
The increasing awareness of the risks associated with data breaches and non-compliance is also a key driver for the dbt for Regulated Data Transformations market. High-profile data breaches and regulatory penalties have underscored the importance of robust data governance and compliance-focused data management practices. dbt’s open-source and enterprise offerings empower organizations to implement standardized, repeatable, and auditable data transformation processes, mitigating the risk of human error and unauthorized data manipulation. This, in turn, enhances organizational resilience and builds stakeholder trust, further supporting market growth.
From a regional perspective, North America currently dominates the dbt for Regulated Data Transformations market, accounting for nearly 41% of the global revenue in 2024. This leadership position is attributed to the presence of stringent regulatory frameworks, a mature data analytics ecosystem, and a high concentration of large enterprises in the region. Europe follows closely, driven by the enforcement of GDPR and other data protection mandates. The Asia Pacific region is expected to witness the fastest growth over the forecast period, with a projected CAGR of 22.4%, fueled by rapid digitalization, expanding regulatory frameworks, and increasing investments in data infrastructure across emerging economies.
The dbt for Regulated Data Transformations market is segmented by component into Software and Services. The software segment currently holds the largest market share, accounting for more than 60% of total revenue in 2024. This dominance is attributed to the widespread adoption of dbt’s core software platform, which enables organizations to design, document, and test data transformation pipelines in a transparent, auditable manner. The software’s open-source roots and robust integration capabilities with leading cloud data warehouses have made it a staple in regulated industries seeking scalable and compliant data transformation solutions. Enhance
Facebook
Twitterhttps://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy
According to our latest research, the global manufacturing data transformation platform market size reached USD 5.7 billion in 2024, reflecting a robust demand for digital solutions in the sector. The market is anticipated to achieve a value of USD 18.2 billion by 2033, expanding at a striking CAGR of 13.6% during the forecast period. This significant growth is primarily driven by the increasing adoption of Industry 4.0 practices, which emphasize automation, data exchange, and advanced analytics in manufacturing environments. Enterprises across the globe are leveraging data transformation platforms to enhance operational efficiency, ensure data integrity, and enable real-time decision-making, thus fueling the expansion of this dynamic market.
One of the primary growth factors for the manufacturing data transformation platform market is the escalating need for real-time data integration and analytics within manufacturing operations. As manufacturers strive to maintain competitiveness in a rapidly evolving landscape, the ability to collect, process, and analyze data from diverse sources such as IoT devices, sensors, and legacy systems has become indispensable. Data transformation platforms provide the necessary infrastructure to convert raw data into actionable insights, enabling predictive maintenance, process optimization, and agile supply chain management. The integration of artificial intelligence and machine learning further amplifies the value proposition of these platforms by facilitating advanced analytics and automation, ultimately leading to reduced downtime and improved productivity across manufacturing plants.
Another critical driver is the increasing regulatory pressure and focus on quality management across various manufacturing industries. Compliance with stringent standards such as ISO, FDA, and automotive-specific regulations necessitates comprehensive data documentation, traceability, and reporting. Manufacturing data transformation platforms empower organizations to automate compliance processes, ensure data accuracy, and generate audit-ready reports with minimal manual intervention. Moreover, the growing trend towards digital twins and smart factories is pushing manufacturers to invest in platforms that can seamlessly integrate with enterprise resource planning (ERP), manufacturing execution systems (MES), and other digital tools. This interconnected ecosystem not only enhances operational transparency but also supports continuous improvement initiatives, propelling market growth.
The surge in demand for cloud-based deployment models is also shaping the trajectory of the manufacturing data transformation platform market. Cloud solutions offer unparalleled scalability, flexibility, and cost-efficiency, making them an attractive option for both large enterprises and small and medium-sized manufacturers. The ability to access and manage data remotely, coupled with robust security features, has accelerated the adoption of cloud platforms, especially in the wake of global disruptions such as the COVID-19 pandemic. As manufacturers increasingly embrace remote monitoring, collaboration, and digital workflows, the reliance on data transformation platforms is expected to intensify, further bolstering market expansion.
From a regional perspective, Asia Pacific is emerging as a pivotal market for manufacturing data transformation platforms, driven by rapid industrialization, government initiatives supporting digital transformation, and the presence of a large manufacturing base. North America and Europe continue to exhibit strong demand, fueled by advanced manufacturing practices and high investment in automation technologies. Meanwhile, Latin America and the Middle East & Africa are witnessing gradual adoption, supported by increasing awareness and modernization of manufacturing infrastructures. The global landscape is thus characterized by diverse adoption patterns, with each region contributing uniquely to the overall market growth.
The manufacturing data transformation platform market is segmented by component into software and services, each playing a crucial role in the ecosystem. Software solutions form the backbone of data transformation initiatives, offering functionalities such as data integration, cleansing, validation, and advanced analytics. These platforms are designed to handle the complexities of heterogeneous manufacturing environments, enabling seam
Facebook
TwitterMetamodels define a foundation for describing software system interfaces which can be used during software or data integration processes. The report is part of the BIZYCLE project, which examines applicability of model-based methods, technologies and tools to the large-scale industrial software and data integration scenarios. The developed metamodels are thus part of the overall BIZYCLE process, comprising of semantic, structural, communication, behavior and property analysis, aiming at facilitating and improving standard integration practice. Therefore, the project framework will be briefly introduced first, followed by the detailed metamodel and transformation description as well as motivation/illustration scenarios.
Facebook
TwitterFunctional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
CoDa-RMSE and CoDa-MAPE (in brackets) values of NNETTS combined with different data transformation methods in the test set.
Facebook
TwitterBackground Efficient transformation and regeneration methods are a priority for successful application of genetic engineering to vegetative propagated plants such as grape. The current methods for the production of transgenic grape plants are based on Agrobacterium-mediated transformation followed by regeneration from embryogenic callus. However, grape embryogenic calli are laborious to establish and the phenotype of the regenerated plants can be altered. Results Transgenic grape plants (V. vinifera, table-grape cultivars Silcora and Thompson Seedless) were produced using a method based on regeneration via organogenesis. In vitro proliferating shoots were cultured in the presence of increasing concentrations of N6-benzyl adenine. The apical dome of the shoot was removed at each transplantation which, after three months, produced meristematic bulk tissue characterized by a strong capacity to differentiate adventitious shoots. Slices prepared from the meristematic bulk were used for Agrobacterium-mediated transformation of grape plants with the gene DefH9-iaaM. After rooting on kanamycin containing media and greenhouse acclimatization, transgenic plants were transferred to the field. At the end of the first year of field cultivation, DefH9-iaaM grape plants were phenotypically homogeneous and did not show any morphological alterations in vegetative growth. The expression of DefH9-iaaM gene was detected in transgenic flower buds of both cultivars. Conclusions The phenotypic homogeneity of the regenerated plants highlights the validity of this method for both propagation and genetic transformation of table grape cultivars. Expression of the DefH9-iaaM gene takes place in young flower buds of transgenic plants from both grape cultivars.
Facebook
Twitter
According to our latest research, the global feature transformation platform market size reached USD 2.1 billion in 2024, with a robust year-over-year expansion driven by surging demand for advanced data analytics and machine learning capabilities. The market is expected to grow at a compelling CAGR of 18.7% from 2025 to 2033, reaching a projected value of USD 10.3 billion by 2033. This growth is primarily fueled by the increasing adoption of artificial intelligence (AI) and machine learning (ML) across diverse industry verticals, as organizations seek to extract actionable insights from complex and high-volume datasets, optimize business operations, and enhance customer experiences.
A key growth factor for the feature transformation platform market is the exponential rise in data generation across industries such as BFSI, healthcare, retail, and manufacturing. As organizations accumulate vast amounts of structured and unstructured data, the need for sophisticated tools to preprocess, transform, and engineer features becomes paramount. Feature transformation platforms enable data scientists and engineers to automate and streamline data preparation, ensuring higher quality inputs for ML models. This not only accelerates the model development lifecycle but also significantly improves the accuracy and reliability of predictive analytics. The proliferation of IoT devices and digital transformation initiatives is further amplifying the demand for these platforms, as businesses strive to harness real-time data for strategic decision-making.
Another significant driver is the increasing complexity of machine learning workflows, which necessitates advanced feature engineering and transformation capabilities. Traditional data preparation methods are often labor-intensive and prone to human error, resulting in suboptimal model performance. Feature transformation platforms address these challenges by providing automated, scalable, and reproducible processes for data preprocessing and feature engineering. These platforms integrate seamlessly with existing data pipelines and ML frameworks, empowering organizations to build more robust and interpretable models. The integration of cutting-edge technologies such as deep learning, natural language processing, and computer vision within these platforms is expanding their applicability across new use cases, further propelling market growth.
The growing emphasis on regulatory compliance and data governance is also contributing to the expansion of the feature transformation platform market. Industries such as BFSI and healthcare are subject to stringent data privacy and security regulations, requiring organizations to maintain transparency and traceability in data processing workflows. Feature transformation platforms offer comprehensive audit trails, version control, and data lineage features, enabling organizations to meet compliance requirements while maintaining agility in model development. As data privacy concerns continue to intensify, the adoption of secure and compliant feature transformation solutions is expected to rise, creating new avenues for market growth.
From a regional perspective, North America currently dominates the feature transformation platform market, accounting for the largest revenue share in 2024, followed closely by Europe and Asia Pacific. The strong presence of leading technology providers, early adoption of AI/ML technologies, and robust investment in digital infrastructure are key factors driving market growth in these regions. Asia Pacific is emerging as a high-growth market, with countries such as China, India, and Japan witnessing rapid digital transformation and increased focus on AI-driven innovation. Latin America and the Middle East & Africa are also expected to experience steady growth, supported by expanding IT ecosystems and rising awareness of data-driven decision-making.
In the realm of data analytics, the role of a Data Preparation Platform is becoming increasingly pivotal. These platforms are essential for transforming raw data into a format that is suitable for analysis, ensuring that data is clean, consistent, and ready for use in machine learning models. By automating the data preparation process, organizations can significantly reduce the time and effort required to prepare dat
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
A recently discovered universal rank-based matrix method to extract trends from noisy time series is described in Ierley and Kostinski (2019) but the formula for the output matrix elements, implemented there as an open-access supplement MATLAB computer code, is O(N^4), with N the matrix dimension. This can become prohibitively large for time series with hundreds of sample points or more. Based on recurrence relations, here we derive a much faster O(N^2) algorithm and provide code implementations in MATLAB and in open-source JULIA. In some cases one has the output matrix and needs to solve an inverse problem to obtain the input matrix. A fast algorithm and code for this companion problem, also based on the recurrence relations, are given. Finally, in the narrower, but common, domains of (i) trend detection and (ii) parameter estimation of a linear trend, users require, not the individual matrix elements, but simply their accumulated mean value. For this latter case we provide a yet faster O(N) heuristic approximation that relies on a series of rank one matrices. These algorithms are illustrated on a time series of high energy cosmic rays with N > 4 x 10^4 .
Facebook
TwitterAlthough metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract In this paper, the author presents fundamentally new methods for transforming the value of commodities into original and equilibrium prices of production using Karl Marx's five-sector tables from the third volume of Capital. Mathematical verification of methods is performed using sequential iterations and the program Wolfram Mathematics. For the first time, the method of inverse transformation of production prices of commodities into value prices is presented. It is proved that the pricing systems based on the principles of value and price of production are not mutually exclusive. They complement each other, representing a single whole. A comprehensive solution to the transformation problem shows that Karl Marx does not have the mistakes attributed to him by critics. JEL classification: B14, B16, B24, E11, E20, E21, E22, P16, P17 Keywords: transformation problem, original transformation, the individual sphere of production, the equilibrium price of production, inverse transformation
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Abstract In this paper, the author presents fundamentally new methods for transforming the value of commodities into original and equilibrium prices of production using Marx's five-sphere tables from the third volume of Capital. Mathematical verification of models is performed by method sequential iterations and using the Wolfram Mathematica program. It also presents the method of inverse transformation of prices of production of commodities into values. Proved that pricing systems based on the principles of value and price of production are not mutually exclusive, they complement each other, representing a single whole. A comprehensive solution to the transformation problem shows that Marx does not have the mistakes attributed to him by critics. JEL classification: B14, B16, B24, E11, E20, E21, E22, P16, P17 Keywords: Karl Marx, transformation problem, mechanisms, original transformation, individual sphere of production, equilibrium price of production, real wages, direct and inverse transformation, iteration methods, mathematical verification.
Facebook
Twitterhttps://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/https://www.elsevier.com/about/policies/open-access-licenses/elsevier-user-license/cpc-license/
Abstract Most of the commonly used fast Fourier transform subroutines can not handle large data matrices because of the restriction imposed by the system's core memory. In this paper we present a two dimensional FFT program (SW2DFFT) and its long write-up. SW2DFFT is a Fortran program capable of handling large data matrices both square and rectangular. The data matrix is stored externally in a direct access mass storage. The program uses a stepwise approach in computing the large matrices based on the...
Title of program: SW2DFFT Catalogue Id: ABFB_v1_0
Nature of problem Any problem that requires Fourier Transformation of a large 2-D data matrix.
Versions of this program held in the CPC repository in Mendeley Data ABFB_v1_0; SW2DFFT; 10.1016/0010-4655(89)90108-2
This program has been imported from the CPC Program Library held at Queen's University Belfast (1969-2019)
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Sufficient dimension reduction (SDR) techniques have proven to be very useful data analysis tools in various applications. Underlying many SDR techniques is a critical assumption that the predictors are elliptically contoured. When this assumption appears to be wrong, practitioners usually try variable transformation such that the transformed predictors become (nearly) normal. The transformation function is often chosen from the log and power transformation family, as suggested in the celebrated Box–Cox model. However, any parametric transformation can be too restrictive, causing the danger of model misspecification. We suggest a nonparametric variable transformation method after which the predictors become normal. To demonstrate the main idea, we combine this flexible transformation method with two well-established SDR techniques, sliced inverse regression (SIR) and inverse regression estimator (IRE). The resulting SDR techniques are referred to as TSIR and TIRE, respectively. Both simulation and real data results show that TSIR and TIRE have very competitive performance. Asymptotic theory is established to support the proposed method. The technical proofs are available as supplementary materials.
Facebook
TwitterThis Application Programming Interface (API) provides instant conversion between HK 1980 Grid Coordinates (Northing and Easting) and WGS84 (ITRF96) Geodetic Coordinates (Latitude and Longitude). The conversion methods, parameters and formulas used in the coordinate conversion tool provided in this API are maintained by the Survey and Mapping Office, Lands Department. It is only applicable for coordinates within Hong Kong. Users SHOULD NOT use the results for applications requiring precise point positions. Transformation between datums does not improve the accuracy. In most cases, the transformation coordinates would be less accurate, because of the errors in the transformation and projection computation would be added to the results. Please seek advice from professional land surveyors. For enquiry, please contact the Geodetic Survey Section, Survey and Mapping Office, Lands Department. For details, please refer to User Manual (English Only): http://www.geodetic.gov.hk/transform/tformAPI_manual.pdf
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.
The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.
Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:
Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.
Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.
Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.
Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).
We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.
Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.
Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise
Facebook
TwitterDescription of the data transformation methods for compositional data and forecasting models.