60 datasets found
  1. a

    02.2 Transforming Data Using Extract, Transform, and Load Processes

    • training-iowadot.opendata.arcgis.com
    • hub.arcgis.com
    Updated Feb 18, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Iowa Department of Transportation (2017). 02.2 Transforming Data Using Extract, Transform, and Load Processes [Dataset]. https://training-iowadot.opendata.arcgis.com/documents/bcf59a09380b4731923769d3ce6ae3a3
    Explore at:
    Dataset updated
    Feb 18, 2017
    Dataset authored and provided by
    Iowa Department of Transportation
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    To achieve true data interoperability is to eliminate format and data model barriers, allowing you to seamlessly access, convert, and model any data, independent of format. The ArcGIS Data Interoperability extension is based on the powerful data transformation capabilities of the Feature Manipulation Engine (FME), giving you the data you want, when and where you want it.In this course, you will learn how to leverage the ArcGIS Data Interoperability extension within ArcCatalog and ArcMap, enabling you to directly read, translate, and transform spatial data according to your independent needs. In addition to components that allow you to work openly with a multitude of formats, the extension also provides a complex data model solution with a level of control that would otherwise require custom software.After completing this course, you will be able to:Recognize when you need to use the Data Interoperability tool to view or edit your data.Choose and apply the correct method of reading data with the Data Interoperability tool in ArcCatalog and ArcMap.Choose the correct Data Interoperability tool and be able to use it to convert your data between formats.Edit a data model, or schema, using the Spatial ETL tool.Perform any desired transformations on your data's attributes and geometry using the Spatial ETL tool.Verify your data transformations before, after, and during a translation by inspecting your data.Apply best practices when creating a workflow using the Data Interoperability extension.

  2. Data from: Integrating Data Transformation in Principal Components Analysis

    • tandf.figshare.com
    pdf
    Updated Jun 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mehdi Maadooliat; Jianhua Z. Huang; Jianhua Hu (2023). Integrating Data Transformation in Principal Components Analysis [Dataset]. http://doi.org/10.6084/m9.figshare.960499.v3
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 4, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Mehdi Maadooliat; Jianhua Z. Huang; Jianhua Hu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Principal component analysis (PCA) is a popular dimension-reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples. Supplementary materials for this article are available online.

  3. D

    Dbt For Regulated Data Transformations Market Research Report 2033

    • dataintelo.com
    csv, pdf, pptx
    Updated Sep 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Dbt For Regulated Data Transformations Market Research Report 2033 [Dataset]. https://dataintelo.com/report/dbt-for-regulated-data-transformations-market
    Explore at:
    pptx, csv, pdfAvailable download formats
    Dataset updated
    Sep 30, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    dbt for Regulated Data Transformations Market Outlook




    According to our latest research, the global dbt for Regulated Data Transformations market size reached USD 1.32 billion in 2024, reflecting the rising adoption of compliant data transformation solutions across multiple regulated industries. The market is projected to grow at a CAGR of 19.7% through the forecast period, reaching USD 4.85 billion by 2033. This robust expansion is driven primarily by increasing regulatory scrutiny, the surge in data volumes, and the critical need for transparent, auditable, and secure data transformation processes in sectors such as healthcare, BFSI, and government.




    The growth of the dbt for Regulated Data Transformations market is strongly influenced by the escalating complexity of regulatory requirements worldwide. Organizations in highly regulated industries are under constant pressure to ensure that their data transformation workflows are not only efficient but also fully traceable and compliant with standards such as GDPR, HIPAA, and SOX. The adoption of dbt (data build tool) frameworks provides a structured, auditable, and version-controlled approach to data transformation, which is essential for passing audits and avoiding hefty fines. Furthermore, the proliferation of data privacy laws and the growing emphasis on data governance have made it imperative for organizations to invest in compliant data transformation solutions, thereby fueling market demand.




    Another significant growth factor is the rapid digital transformation initiatives being undertaken by enterprises globally. As organizations migrate their data infrastructure to the cloud and adopt advanced analytics, the volume and diversity of data being handled have increased exponentially. This surge necessitates robust data transformation pipelines that can handle large-scale, complex, and sensitive data while ensuring compliance with industry-specific regulations. dbt’s ability to automate, document, and test data transformations in a transparent manner makes it a preferred choice for organizations seeking to modernize their data operations without compromising regulatory compliance. The integration of dbt with leading cloud data platforms further accelerates its adoption, especially among enterprises prioritizing scalability and agility in their data ecosystems.




    The increasing awareness of the risks associated with data breaches and non-compliance is also a key driver for the dbt for Regulated Data Transformations market. High-profile data breaches and regulatory penalties have underscored the importance of robust data governance and compliance-focused data management practices. dbt’s open-source and enterprise offerings empower organizations to implement standardized, repeatable, and auditable data transformation processes, mitigating the risk of human error and unauthorized data manipulation. This, in turn, enhances organizational resilience and builds stakeholder trust, further supporting market growth.




    From a regional perspective, North America currently dominates the dbt for Regulated Data Transformations market, accounting for nearly 41% of the global revenue in 2024. This leadership position is attributed to the presence of stringent regulatory frameworks, a mature data analytics ecosystem, and a high concentration of large enterprises in the region. Europe follows closely, driven by the enforcement of GDPR and other data protection mandates. The Asia Pacific region is expected to witness the fastest growth over the forecast period, with a projected CAGR of 22.4%, fueled by rapid digitalization, expanding regulatory frameworks, and increasing investments in data infrastructure across emerging economies.



    Component Analysis




    The dbt for Regulated Data Transformations market is segmented by component into Software and Services. The software segment currently holds the largest market share, accounting for more than 60% of total revenue in 2024. This dominance is attributed to the widespread adoption of dbt’s core software platform, which enables organizations to design, document, and test data transformation pipelines in a transparent, auditable manner. The software’s open-source roots and robust integration capabilities with leading cloud data warehouses have made it a staple in regulated industries seeking scalable and compliant data transformation solutions. Enhance

  4. D

    Data Transformation Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Transformation Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/data-transformation-tools-1452290
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Feb 12, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Data Transformation Tools market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.

  5. Dataset for Privacy Exercises

    • kaggle.com
    zip
    Updated Apr 9, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shining (2024). Dataset for Privacy Exercises [Dataset]. https://www.kaggle.com/datasets/shiningana/dataset-for-privacy-exercises
    Explore at:
    zip(7327312 bytes)Available download formats
    Dataset updated
    Apr 9, 2024
    Authors
    Shining
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset gives some data of a hypothetical business that can be used to practice your privacy data transformation and analysis skills.

    The dataset contains the following files/tables: 1. customer_orders_for_privacy_exercises.csv contains data of a business about customer orders (columns separated by commas) 2. users_web_browsing_for_privacy_exercises.csv contains data collected by the business website about its users (columns separated by commas) 3. iot_example.csv contains data collected by a smart device on users' bio-metric data (columns separated by commas) 4. members.csv contains data collected by a library on its users (columns separated by commas)

  6. Use of AI in administrative and data analysis tasks in the USA and UK 2023

    • statista.com
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Statista, Use of AI in administrative and data analysis tasks in the USA and UK 2023 [Dataset]. https://www.statista.com/statistics/1453320/use-share-ai-routine-logic-based-tasks/
    Explore at:
    Dataset authored and provided by
    Statistahttp://statista.com/
    Time period covered
    Jul 2023
    Area covered
    United Kingdom
    Description

    As of 2023, about ** percent of surveyed employees from companies in the United States of America and United Kingdom claim to use artificial intelligence (AI) in the logic-based task of data analysis. Approximately ** percent claim to use it for routine administrative tasks. These numbers are forecasted to grow, as the share of employees that wish to use the technology for both tasks is much higher, lying around ** percent.

  7. D

    Data Transformation Software Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Feb 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Transformation Software Report [Dataset]. https://www.datainsightsmarket.com/reports/data-transformation-software-1370057
    Explore at:
    ppt, pdf, docAvailable download formats
    Dataset updated
    Feb 1, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The size of the Data Transformation Software market was valued at USD XXX million in 2024 and is projected to reach USD XXX million by 2033, with an expected CAGR of XX% during the forecast period.

  8. D

    Data Visualization Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Data Visualization Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/data-visualization-platform-1940964
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Sep 15, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global Data Visualization Platform market is poised for substantial expansion, projected to reach an estimated $65,000 million by 2025 and exhibiting a robust Compound Annual Growth Rate (CAGR) of 12% through 2033. This impressive growth is largely propelled by the escalating demand for actionable insights from vast datasets across diverse industries. Key drivers include the burgeoning adoption of smart city initiatives, where real-time data analysis is crucial for optimizing urban infrastructure and services, and the increasing focus on ultimate digital materialization spaces, necessitating sophisticated tools for understanding complex digital environments. The platform's ability to transform raw data into understandable visual formats empowers organizations to make informed decisions, identify trends, and detect anomalies with greater efficiency, thereby driving its widespread integration into business intelligence strategies. The market segmentation reveals a strong preference for Flow Analysis and Mixed Data Analysis applications, reflecting the need to understand dynamic processes and integrate disparate data sources for comprehensive insights. While the market is characterized by its dynamic nature, with established players like Microsoft and Tableau leading the charge, emerging technologies and innovative startups are continuously shaping the competitive landscape. The dominant presence of North America, particularly the United States, in terms of market share underscores its advanced technological infrastructure and early adoption of data-driven strategies. However, the Asia Pacific region is anticipated to witness significant growth, fueled by rapid digitalization and increasing investments in data analytics solutions in countries like China and India. Despite the promising outlook, challenges such as data security concerns and the need for skilled data professionals could potentially temper the market's full potential, though these are being actively addressed through technological advancements and training initiatives. This comprehensive report delves into the dynamic and rapidly evolving Data Visualization Platform market, projecting its trajectory from a historical baseline in 2025 to a significant forecast period extending to 2033. With a particular focus on the Study Period of 2019-2033 and the Base Year of 2025, this analysis offers unparalleled insights into market dynamics, technological advancements, and strategic opportunities.

  9. Data Center Transformation Market Size, Trends, Growth & Share Analysis 2030...

    • mordorintelligence.com
    pdf,excel,csv,ppt
    Updated Jul 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Mordor Intelligence (2025). Data Center Transformation Market Size, Trends, Growth & Share Analysis 2030 [Dataset]. https://www.mordorintelligence.com/industry-reports/data-center-transformation-market
    Explore at:
    pdf,excel,csv,pptAvailable download formats
    Dataset updated
    Jul 2, 2025
    Dataset authored and provided by
    Mordor Intelligence
    License

    https://www.mordorintelligence.com/privacy-policyhttps://www.mordorintelligence.com/privacy-policy

    Time period covered
    2019 - 2030
    Area covered
    Global
    Description

    Data Center Transformation Market Report Segments the Industry Into Services (Consolidation Services, Optimization Services, and More), Data Center (Tier 1, Tier 2, Tier 3, Tier 4), by End User (Data Center Providers, Enterprises), Deployment Model(On-Premises, Colocation and More)and by Geography (North America, Europe, and More). The Market Forecasts are Provided in Terms of Value (USD).

  10. D

    Data Transformation Tools Report

    • marketresearchforecast.com
    doc, pdf, ppt
    Updated Jul 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Market Research Forecast (2025). Data Transformation Tools Report [Dataset]. https://www.marketresearchforecast.com/reports/data-transformation-tools-544878
    Explore at:
    doc, ppt, pdfAvailable download formats
    Dataset updated
    Jul 24, 2025
    Dataset authored and provided by
    Market Research Forecast
    License

    https://www.marketresearchforecast.com/privacy-policyhttps://www.marketresearchforecast.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    Unlock the potential of your data with our in-depth analysis of the booming Data Transformation Tools market. Discover key trends, growth forecasts (CAGR 15%), leading vendors (Informatica, MuleSoft, etc.), and regional market shares. Learn how this $15 billion market is reshaping data management strategies.

  11. Z

    Data Analysis for the Systematic Literature Review of DL4SE

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    College of William and Mary
    Washington and Lee University
    Authors
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

    The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

    Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

    Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

    Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

    Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

    Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

    We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

    Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

    Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise

  12. Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open...

    • plos.figshare.com
    docx
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne (2023). Scalable Predictive Analysis in Critically Ill Patients Using a Visual Open Data Analysis Platform [Dataset]. http://doi.org/10.1371/journal.pone.0145791
    Explore at:
    docxAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Sven Van Poucke; Zhongheng Zhang; Martin Schmitz; Milan Vukicevic; Margot Vander Laenen; Leo Anthony Celi; Cathy De Deyne
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    With the accumulation of large amounts of health related data, predictive analytics could stimulate the transformation of reactive medicine towards Predictive, Preventive and Personalized (PPPM) Medicine, ultimately affecting both cost and quality of care. However, high-dimensionality and high-complexity of the data involved, prevents data-driven methods from easy translation into clinically relevant models. Additionally, the application of cutting edge predictive methods and data manipulation require substantial programming skills, limiting its direct exploitation by medical domain experts. This leaves a gap between potential and actual data usage. In this study, the authors address this problem by focusing on open, visual environments, suited to be applied by the medical community. Moreover, we review code free applications of big data technologies. As a showcase, a framework was developed for the meaningful use of data from critical care patients by integrating the MIMIC-II database in a data mining environment (RapidMiner) supporting scalable predictive analytics using visual tools (RapidMiner’s Radoop extension). Guided by the CRoss-Industry Standard Process for Data Mining (CRISP-DM), the ETL process (Extract, Transform, Load) was initiated by retrieving data from the MIMIC-II tables of interest. As use case, correlation of platelet count and ICU survival was quantitatively assessed. Using visual tools for ETL on Hadoop and predictive modeling in RapidMiner, we developed robust processes for automatic building, parameter optimization and evaluation of various predictive models, under different feature selection schemes. Because these processes can be easily adopted in other projects, this environment is attractive for scalable predictive analytics in health research.

  13. Cyclist Google Data Analytics Capstone Project R

    • kaggle.com
    zip
    Updated Oct 17, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    ShrutiJainn (2022). Cyclist Google Data Analytics Capstone Project R [Dataset]. https://www.kaggle.com/datasets/shrutijainn/cyclist-google-data-analytics-capstone-project-r
    Explore at:
    zip(13225 bytes)Available download formats
    Dataset updated
    Oct 17, 2022
    Authors
    ShrutiJainn
    Description

    Scenario

    The director of marketing at Cyclists, a bike-share company in Chicago, believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclists bikes differently and design a new marketing strategy to convert casual riders into annual members.

    Objective/Purpose

    Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. The manager and her team are interested in analyzing the Cyclists historical bike trip data to identify trends.

    Business Task Identify trends to better understand user purchase behavior and recommend marketing strategies to convert casual riders into annual members.

    Data sources used We have used Cyclistic’s historical trip data to analyze and identify trends. We will use 12 months of Cyclistic trip data from January 2021 to December 2021. This is public data that we will use to explore how different customer types are using Cyclists' bikes.

    Documentation of any cleaning or manipulation of data The dataset from January 2021 to December 2021 is more than 1 GB in size. So it'll be somewhat difficult to perform data manipulation and transformation in spreadsheets because of the size of the file. So we can use SQL or R as they're comparatively more capable to handle heavier files. We'll be using R to perform the above-mentioned actions. sSo I have prepare the analysis using only First Quarter i.e from January-March'21.

  14. Data from: Nonparametric Variable Transformation in Sufficient Dimension...

    • tandf.figshare.com
    pdf
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Qing Mai; Hui Zou (2023). Nonparametric Variable Transformation in Sufficient Dimension Reduction [Dataset]. http://doi.org/10.6084/m9.figshare.1323269.v2
    Explore at:
    pdfAvailable download formats
    Dataset updated
    Jun 1, 2023
    Dataset provided by
    Taylor & Francishttps://taylorandfrancis.com/
    Authors
    Qing Mai; Hui Zou
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Sufficient dimension reduction (SDR) techniques have proven to be very useful data analysis tools in various applications. Underlying many SDR techniques is a critical assumption that the predictors are elliptically contoured. When this assumption appears to be wrong, practitioners usually try variable transformation such that the transformed predictors become (nearly) normal. The transformation function is often chosen from the log and power transformation family, as suggested in the celebrated Box–Cox model. However, any parametric transformation can be too restrictive, causing the danger of model misspecification. We suggest a nonparametric variable transformation method after which the predictors become normal. To demonstrate the main idea, we combine this flexible transformation method with two well-established SDR techniques, sliced inverse regression (SIR) and inverse regression estimator (IRE). The resulting SDR techniques are referred to as TSIR and TIRE, respectively. Both simulation and real data results show that TSIR and TIRE have very competitive performance. Asymptotic theory is established to support the proposed method. The technical proofs are available as supplementary materials.

  15. Analysis of proportional data in reproductive and developmental toxicity...

    • catalog.data.gov
    Updated Nov 12, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2020). Analysis of proportional data in reproductive and developmental toxicity studies: comparison of logit transformation, arcsine square root transformation, and nonparametric analysis [Dataset]. https://catalog.data.gov/dataset/analysis-of-proportional-data-in-reproductive-and-developmental-toxicity-studies-compariso
    Explore at:
    Dataset updated
    Nov 12, 2020
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    We conducted power calculations to compare different approaches (nonparametric, arcsine square root-transformed, logit-transformed, untransformed) for analyzing litter-based proportional data. A reproductive toxicity study with a control and one treated group provided data for two endpoints: prenatal loss, and fertility by in utero insemination (IUI). Type I error and power were estimated by 10,000 simulations based on two-sample one-tailed t-tests with varying numbers of litters per group. To further compare the different approaches, we conducted additional analyses with the mean proportions shifted toward zero to produce illustrative scenarios. Analyses based on logit-transformed proportions had greater power than those based on untransformed or arcsine square root-transformed proportions, or nonparametric procedures.

  16. D

    Data Center Transformation Report

    • archivemarketresearch.com
    doc, pdf, ppt
    Updated Feb 25, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Archive Market Research (2025). Data Center Transformation Report [Dataset]. https://www.archivemarketresearch.com/reports/data-center-transformation-47185
    Explore at:
    doc, pdf, pptAvailable download formats
    Dataset updated
    Feb 25, 2025
    Dataset authored and provided by
    Archive Market Research
    License

    https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global data center transformation market is projected to reach $6327.3 million by 2033, growing at a CAGR of 7.2% from 2025 to 2033. The growth of the market is attributed to the increasing adoption of cloud computing, big data, and artificial intelligence (AI). These technologies are driving the need for more efficient and scalable data centers that can handle the increasing volume of data. Key drivers of the market include the need for improved data center efficiency, the need for increased data center capacity, and the need for improved data center security. Trends in the market include the adoption of cloud computing, the adoption of big data, and the adoption of AI. Restraints on the market include the high cost of data center transformation and the lack of skilled IT professionals.

  17. d

    Soil images in DICOM format including Python programs for data...

    • search.dataone.org
    • datadryad.org
    Updated Apr 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ralf Wieland (2025). Soil images in DICOM format including Python programs for data transformation, 3D analysis, CNN traininig, CNN analysis [Dataset]. http://doi.org/10.5061/dryad.66t1g1k0c
    Explore at:
    Dataset updated
    Apr 24, 2025
    Dataset provided by
    Dryad Digital Repository
    Authors
    Ralf Wieland
    Time period covered
    Jan 1, 2020
    Description

    The 'Use of Deep Learning for structural analysis of CT-images of soil samples' used a set of soil sample data (CT-images). All the data and programs used here are open source and were created with the help of open source software. All steps are made by Python programs which are included in the data set.

  18. Market Basket Analysis

    • kaggle.com
    zip
    Updated Dec 9, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aslan Ahmedov (2021). Market Basket Analysis [Dataset]. https://www.kaggle.com/datasets/aslanahmedov/market-basket-analysis
    Explore at:
    zip(23875170 bytes)Available download formats
    Dataset updated
    Dec 9, 2021
    Authors
    Aslan Ahmedov
    Description

    Market Basket Analysis

    Market basket analysis with Apriori algorithm

    The retailer wants to target customers with suggestions on itemset that a customer is most likely to purchase .I was given dataset contains data of a retailer; the transaction data provides data around all the transactions that have happened over a period of time. Retailer will use result to grove in his industry and provide for customer suggestions on itemset, we be able increase customer engagement and improve customer experience and identify customer behavior. I will solve this problem with use Association Rules type of unsupervised learning technique that checks for the dependency of one data item on another data item.

    Introduction

    Association Rule is most used when you are planning to build association in different objects in a set. It works when you are planning to find frequent patterns in a transaction database. It can tell you what items do customers frequently buy together and it allows retailer to identify relationships between the items.

    An Example of Association Rules

    Assume there are 100 customers, 10 of them bought Computer Mouth, 9 bought Mat for Mouse and 8 bought both of them. - bought Computer Mouth => bought Mat for Mouse - support = P(Mouth & Mat) = 8/100 = 0.08 - confidence = support/P(Mat for Mouse) = 0.08/0.09 = 0.89 - lift = confidence/P(Computer Mouth) = 0.89/0.10 = 8.9 This just simple example. In practice, a rule needs the support of several hundred transactions, before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.

    Strategy

    • Data Import
    • Data Understanding and Exploration
    • Transformation of the data – so that is ready to be consumed by the association rules algorithm
    • Running association rules
    • Exploring the rules generated
    • Filtering the generated rules
    • Visualization of Rule

    Dataset Description

    • File name: Assignment-1_Data
    • List name: retaildata
    • File format: . xlsx
    • Number of Row: 522065
    • Number of Attributes: 7

      • BillNo: 6-digit number assigned to each transaction. Nominal.
      • Itemname: Product name. Nominal.
      • Quantity: The quantities of each product per transaction. Numeric.
      • Date: The day and time when each transaction was generated. Numeric.
      • Price: Product price. Numeric.
      • CustomerID: 5-digit number assigned to each customer. Nominal.
      • Country: Name of the country where each customer resides. Nominal.

    imagehttps://user-images.githubusercontent.com/91852182/145270162-fc53e5a3-4ad1-4d06-b0e0-228aabcf6b70.png">

    Libraries in R

    First, we need to load required libraries. Shortly I describe all libraries.

    • arules - Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules).
    • arulesViz - Extends package 'arules' with various visualization. techniques for association rules and item-sets. The package also includes several interactive visualizations for rule exploration.
    • tidyverse - The tidyverse is an opinionated collection of R packages designed for data science.
    • readxl - Read Excel Files in R.
    • plyr - Tools for Splitting, Applying and Combining Data.
    • ggplot2 - A system for 'declaratively' creating graphics, based on "The Grammar of Graphics". You provide the data, tell 'ggplot2' how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
    • knitr - Dynamic Report generation in R.
    • magrittr- Provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. There is flexible support for the type of right-hand side expressions.
    • dplyr - A fast, consistent tool for working with data frame like objects, both in memory and out of memory.
    • tidyverse - This package is designed to make it easy to install and load multiple 'tidyverse' packages in a single step.

    imagehttps://user-images.githubusercontent.com/91852182/145270210-49c8e1aa-9753-431b-a8d5-99601bc76cb5.png">

    Data Pre-processing

    Next, we need to upload Assignment-1_Data. xlsx to R to read the dataset.Now we can see our data in R.

    imagehttps://user-images.githubusercontent.com/91852182/145270229-514f0983-3bbb-4cd3-be64-980e92656a02.png"> imagehttps://user-images.githubusercontent.com/91852182/145270251-6f6f6472-8817-435c-a995-9bc4bfef10d1.png">

    After we will clear our data frame, will remove missing values.

    imagehttps://user-images.githubusercontent.com/91852182/145270286-05854e1a-2b6c-490e-ab30-9e99e731eacb.png">

    To apply Association Rule mining, we need to convert dataframe into transaction data to make all items that are bought together in one invoice will be in ...

  19. G

    National Transformation Analysis Data Tables 1 to 12

    • open.canada.ca
    • datasets.ai
    html, xls, xlsx
    Updated Jul 24, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Government of Alberta (2024). National Transformation Analysis Data Tables 1 to 12 [Dataset]. https://open.canada.ca/data/dataset/2f2c8a98-ed4d-435b-ae03-63c21db84364
    Explore at:
    xls, xlsx, htmlAvailable download formats
    Dataset updated
    Jul 24, 2024
    Dataset provided by
    Government of Alberta
    License

    Open Government Licence - Canada 2.0https://open.canada.ca/en/open-government-licence-canada
    License information was derived automatically

    Description

    The National Transformation Analysis Data tables allows land surveyors, engineers and others to gain insight into the quality of transformation grids that allow them to transformation coordinate data from NAD27 to NAD83(Original) and vice-versa, NAD83(Original) to NAD83(CSRS) Epoch 2002 and vice-versa, and NAD83(Original) to NAD83(CSRS)v7 Epoch 2010 and vice versa.

  20. f

    Data_Sheet_1_Overview of data preprocessing for machine learning...

    • datasetcatalog.nlm.nih.gov
    • frontiersin.figshare.com
    Updated Oct 5, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    D’Elia, Domenica; Stres, Blaž; Hron, Karel; Dhamo, Xhilda; Ibrahimi, Eliana; Berland, Magali; Shigdel, Rajesh; Marcos-Zambrano, Laura Judith; Simeon, Andrea; Lopes, Marta B. (2023). Data_Sheet_1_Overview of data preprocessing for machine learning applications in human microbiome research.pdf [Dataset]. https://datasetcatalog.nlm.nih.gov/dataset?q=0001030478
    Explore at:
    Dataset updated
    Oct 5, 2023
    Authors
    D’Elia, Domenica; Stres, Blaž; Hron, Karel; Dhamo, Xhilda; Ibrahimi, Eliana; Berland, Magali; Shigdel, Rajesh; Marcos-Zambrano, Laura Judith; Simeon, Andrea; Lopes, Marta B.
    Description

    Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Iowa Department of Transportation (2017). 02.2 Transforming Data Using Extract, Transform, and Load Processes [Dataset]. https://training-iowadot.opendata.arcgis.com/documents/bcf59a09380b4731923769d3ce6ae3a3

02.2 Transforming Data Using Extract, Transform, and Load Processes

Explore at:
Dataset updated
Feb 18, 2017
Dataset authored and provided by
Iowa Department of Transportation
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Description

To achieve true data interoperability is to eliminate format and data model barriers, allowing you to seamlessly access, convert, and model any data, independent of format. The ArcGIS Data Interoperability extension is based on the powerful data transformation capabilities of the Feature Manipulation Engine (FME), giving you the data you want, when and where you want it.In this course, you will learn how to leverage the ArcGIS Data Interoperability extension within ArcCatalog and ArcMap, enabling you to directly read, translate, and transform spatial data according to your independent needs. In addition to components that allow you to work openly with a multitude of formats, the extension also provides a complex data model solution with a level of control that would otherwise require custom software.After completing this course, you will be able to:Recognize when you need to use the Data Interoperability tool to view or edit your data.Choose and apply the correct method of reading data with the Data Interoperability tool in ArcCatalog and ArcMap.Choose the correct Data Interoperability tool and be able to use it to convert your data between formats.Edit a data model, or schema, using the Spatial ETL tool.Perform any desired transformations on your data's attributes and geometry using the Spatial ETL tool.Verify your data transformations before, after, and during a translation by inspecting your data.Apply best practices when creating a workflow using the Data Interoperability extension.

Search
Clear search
Close search
Google apps
Main menu