100+ datasets found
  1. Ecommerce Dataset for Data Analysis

    • kaggle.com
    zip
    Updated Sep 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
    Explore at:
    zip(2028853 bytes)Available download formats
    Dataset updated
    Sep 19, 2024
    Authors
    Shrishti Manja
    Description

    This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

    About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

    Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

    This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

    This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

  2. Z

    Data Analysis for the Systematic Literature Review of DL4SE

    • data.niaid.nih.gov
    • data-staging.niaid.nih.gov
    Updated Jul 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk (2024). Data Analysis for the Systematic Literature Review of DL4SE [Dataset]. https://data.niaid.nih.gov/resources?id=zenodo_4768586
    Explore at:
    Dataset updated
    Jul 19, 2024
    Dataset provided by
    College of William and Mary
    Washington and Lee University
    Authors
    Cody Watson; Nathan Cooper; David Nader; Kevin Moran; Denys Poshyvanyk
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Data Analysis is the process that supports decision-making and informs arguments in empirical studies. Descriptive statistics, Exploratory Data Analysis (EDA), and Confirmatory Data Analysis (CDA) are the approaches that compose Data Analysis (Xia & Gong; 2014). An Exploratory Data Analysis (EDA) comprises a set of statistical and data mining procedures to describe data. We ran EDA to provide statistical facts and inform conclusions. The mined facts allow attaining arguments that would influence the Systematic Literature Review of DL4SE.

    The Systematic Literature Review of DL4SE requires formal statistical modeling to refine the answers for the proposed research questions and formulate new hypotheses to be addressed in the future. Hence, we introduce DL4SE-DA, a set of statistical processes and data mining pipelines that uncover hidden relationships among Deep Learning reported literature in Software Engineering. Such hidden relationships are collected and analyzed to illustrate the state-of-the-art of DL techniques employed in the software engineering context.

    Our DL4SE-DA is a simplified version of the classical Knowledge Discovery in Databases, or KDD (Fayyad, et al; 1996). The KDD process extracts knowledge from a DL4SE structured database. This structured database was the product of multiple iterations of data gathering and collection from the inspected literature. The KDD involves five stages:

    Selection. This stage was led by the taxonomy process explained in section xx of the paper. After collecting all the papers and creating the taxonomies, we organize the data into 35 features or attributes that you find in the repository. In fact, we manually engineered features from the DL4SE papers. Some of the features are venue, year published, type of paper, metrics, data-scale, type of tuning, learning algorithm, SE data, and so on.

    Preprocessing. The preprocessing applied was transforming the features into the correct type (nominal), removing outliers (papers that do not belong to the DL4SE), and re-inspecting the papers to extract missing information produced by the normalization process. For instance, we normalize the feature “metrics” into “MRR”, “ROC or AUC”, “BLEU Score”, “Accuracy”, “Precision”, “Recall”, “F1 Measure”, and “Other Metrics”. “Other Metrics” refers to unconventional metrics found during the extraction. Similarly, the same normalization was applied to other features like “SE Data” and “Reproducibility Types”. This separation into more detailed classes contributes to a better understanding and classification of the paper by the data mining tasks or methods.

    Transformation. In this stage, we omitted to use any data transformation method except for the clustering analysis. We performed a Principal Component Analysis to reduce 35 features into 2 components for visualization purposes. Furthermore, PCA also allowed us to identify the number of clusters that exhibit the maximum reduction in variance. In other words, it helped us to identify the number of clusters to be used when tuning the explainable models.

    Data Mining. In this stage, we used three distinct data mining tasks: Correlation Analysis, Association Rule Learning, and Clustering. We decided that the goal of the KDD process should be oriented to uncover hidden relationships on the extracted features (Correlations and Association Rules) and to categorize the DL4SE papers for a better segmentation of the state-of-the-art (Clustering). A clear explanation is provided in the subsection “Data Mining Tasks for the SLR od DL4SE”. 5.Interpretation/Evaluation. We used the Knowledge Discover to automatically find patterns in our papers that resemble “actionable knowledge”. This actionable knowledge was generated by conducting a reasoning process on the data mining outcomes. This reasoning process produces an argument support analysis (see this link).

    We used RapidMiner as our software tool to conduct the data analysis. The procedures and pipelines were published in our repository.

    Overview of the most meaningful Association Rules. Rectangles are both Premises and Conclusions. An arrow connecting a Premise with a Conclusion implies that given some premise, the conclusion is associated. E.g., Given that an author used Supervised Learning, we can conclude that their approach is irreproducible with a certain Support and Confidence.

    Support = Number of occurrences this statement is true divided by the amount of statements Confidence = The support of the statement divided by the number of occurrences of the premise

  3. Exploratory data analysis of a clinical study group: Development of a...

    • plos.figshare.com
    txt
    Updated May 31, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański (2023). Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data [Dataset]. http://doi.org/10.1371/journal.pone.0201950
    Explore at:
    txtAvailable download formats
    Dataset updated
    May 31, 2023
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Bogumil M. Konopka; Felicja Lwow; Magdalena Owczarz; Łukasz Łaczmański
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.

  4. Youtube cookery channels viewers comments in Hinglish

    • zenodo.org
    csv
    Updated Jan 24, 2020
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek Kaushik; Abhishek Kaushik; Gagandeep Kaur; Gagandeep Kaur (2020). Youtube cookery channels viewers comments in Hinglish [Dataset]. http://doi.org/10.5281/zenodo.2841848
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jan 24, 2020
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Abhishek Kaushik; Abhishek Kaushik; Gagandeep Kaur; Gagandeep Kaur
    License

    Open Data Commons Attribution License (ODC-By) v1.0https://www.opendatacommons.org/licenses/by/1.0/
    License information was derived automatically

    Area covered
    YouTube
    Description

    The data was collected from the famous cookery Youtube channels in India. The major focus was to collect the viewers' comments in Hinglish languages. The datasets are taken from top 2 Indian cooking channel named Nisha Madhulika channel and Kabita’s Kitchen channel.

    Both the datasets comments are divided into seven categories:-

    Label 1- Gratitude

    Label 2- About the recipe

    Label 3- About the video

    Label 4- Praising

    Label 5- Hybrid

    Label 6- Undefined

    Label 7- Suggestions and queries

    All the labelling has been done manually.

    Nisha Madhulika dataset:

    Dataset characteristics: Multivariate

    Number of instances: 4900

    Area: Cooking

    Attribute characteristics: Real

    Number of attributes: 3

    Date donated: March, 2019

    Associate tasks: Classification

    Missing values: Null

    Kabita Kitchen dataset:

    Dataset characteristics: Multivariate

    Number of instances: 4900

    Area: Cooking

    Attribute characteristics: Real

    Number of attributes: 3

    Date donated: March, 2019

    Associate tasks: Classification

    Missing values: Null

    There are two separate datasets file of each channel named as preprocessing and main file .

    The files with preprocessing names are generated after doing the preprocessing and exploratory data analysis on both the datasets. This file includes:

    • Id
    • Comment text
    • Labels
    • Count of stop-words
    • Uppercase words
    • Hashtags
    • Word count
    • Char count
    • Average words
    • Numeric

    The main file includes:

    • Id
    • comment text
    • Labels

    Please cite the paper

    https://www.mdpi.com/2504-2289/3/3/37

    MDPI and ACS Style

    Kaur, G.; Kaushik, A.; Sharma, S. Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach. Big Data Cogn. Comput. 2019, 3, 37.

  5. Electronic Store Sales Data

    • kaggle.com
    zip
    Updated Jun 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Saumay Dhaundiyal (2023). Electronic Store Sales Data [Dataset]. https://www.kaggle.com/saumaydhaundiyal/electronic-store-sales-data
    Explore at:
    zip(4996940 bytes)Available download formats
    Dataset updated
    Jun 1, 2023
    Authors
    Saumay Dhaundiyal
    Description

    ***The dataset contains sales data for an electronic store for 12 month in 12 different csv files.* **

    Your are expected to pre-process, clean data and perform EDA on it.

    Questions the Business Owner would like answered.

    Question 1: What was the best month for sales? Question2: Which city sold the most product? Question3: What time should we display advertisements to maximize likelihood of customer's buying product? Question4: What products are most often sold together? Question5: What product sold the most?

  6. Orange dataset table

    • figshare.com
    xlsx
    Updated Mar 4, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Simões (2022). Orange dataset table [Dataset]. http://doi.org/10.6084/m9.figshare.19146410.v1
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Mar 4, 2022
    Dataset provided by
    figshare
    Figsharehttp://figshare.com/
    Authors
    Rui Simões
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The complete dataset used in the analysis comprises 36 samples, each described by 11 numeric features and 1 target. The attributes considered were caspase 3/7 activity, Mitotracker red CMXRos area and intensity (3 h and 24 h incubations with both compounds), Mitosox oxidation (3 h incubation with the referred compounds) and oxidation rate, DCFDA fluorescence (3 h and 24 h incubations with either compound) and oxidation rate, and DQ BSA hydrolysis. The target of each instance corresponds to one of the 9 possible classes (4 samples per class): Control, 6.25, 12.5, 25 and 50 µM for 6-OHDA and 0.03, 0.06, 0.125 and 0.25 µM for rotenone. The dataset is balanced, it does not contain any missing values and data was standardized across features. The small number of samples prevented a full and strong statistical analysis of the results. Nevertheless, it allowed the identification of relevant hidden patterns and trends.

    Exploratory data analysis, information gain, hierarchical clustering, and supervised predictive modeling were performed using Orange Data Mining version 3.25.1 [41]. Hierarchical clustering was performed using the Euclidean distance metric and weighted linkage. Cluster maps were plotted to relate the features with higher mutual information (in rows) with instances (in columns), with the color of each cell representing the normalized level of a particular feature in a specific instance. The information is grouped both in rows and in columns by a two-way hierarchical clustering method using the Euclidean distances and average linkage. Stratified cross-validation was used to train the supervised decision tree. A set of preliminary empirical experiments were performed to choose the best parameters for each algorithm, and we verified that, within moderate variations, there were no significant changes in the outcome. The following settings were adopted for the decision tree algorithm: minimum number of samples in leaves: 2; minimum number of samples required to split an internal node: 5; stop splitting when majority reaches: 95%; criterion: gain ratio. The performance of the supervised model was assessed using accuracy, precision, recall, F-measure and area under the ROC curve (AUC) metrics.

  7. f

    Data from: The Often-Overlooked Power of Summary Statistics in Exploratory...

    • acs.figshare.com
    xlsx
    Updated Jun 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford (2023). The Often-Overlooked Power of Summary Statistics in Exploratory Data Analysis: Comparison of Pattern Recognition Entropy (PRE) to Other Summary Statistics and Introduction of Divided Spectrum-PRE (DS-PRE) [Dataset]. http://doi.org/10.1021/acs.jcim.1c00244.s002
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    Jun 8, 2023
    Dataset provided by
    ACS Publications
    Authors
    Tahereh G. Avval; Behnam Moeini; Victoria Carver; Neal Fairley; Emily F. Smith; Jonas Baltrusaitis; Vincent Fernandez; Bonnie. J. Tyler; Neal Gallagher; Matthew R. Linford
    License

    Attribution-NonCommercial 4.0 (CC BY-NC 4.0)https://creativecommons.org/licenses/by-nc/4.0/
    License information was derived automatically

    Description

    Unsupervised exploratory data analysis (EDA) is often the first step in understanding complex data sets. While summary statistics are among the most efficient and convenient tools for exploring and describing sets of data, they are often overlooked in EDA. In this paper, we show multiple case studies that compare the performance, including clustering, of a series of summary statistics in EDA. The summary statistics considered here are pattern recognition entropy (PRE), the mean, standard deviation (STD), 1-norm, range, sum of squares (SSQ), and X4, which are compared with principal component analysis (PCA), multivariate curve resolution (MCR), and/or cluster analysis. PRE and the other summary statistics are direct methods for analyzing datathey are not factor-based approaches. To quantify the performance of summary statistics, we use the concept of the “critical pair,” which is employed in chromatography. The data analyzed here come from different analytical methods. Hyperspectral images, including one of a biological material, are also analyzed. In general, PRE outperforms the other summary statistics, especially in image analysis, although a suite of summary statistics is useful in exploring complex data sets. While PRE results were generally comparable to those from PCA and MCR, PRE is easier to apply. For example, there is no need to determine the number of factors that describe a data set. Finally, we introduce the concept of divided spectrum-PRE (DS-PRE) as a new EDA method. DS-PRE increases the discrimination power of PRE. We also show that DS-PRE can be used to provide the inputs for the k-nearest neighbor (kNN) algorithm. We recommend PRE and DS-PRE as rapid new tools for unsupervised EDA.

  8. f

    Detailed characterization of the dataset.

    • figshare.com
    xls
    Updated Sep 26, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda (2024). Detailed characterization of the dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0310707.t006
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Sep 26, 2024
    Dataset provided by
    PLOS ONE
    Authors
    Rodrigo Gutiérrez Benítez; Alejandra Segura Navarrete; Christian Vidal-Castro; Claudia Martínez-Araneda
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Over the last ten years, social media has become a crucial data source for businesses and researchers, providing a space where people can express their opinions and emotions. To analyze this data and classify emotions and their polarity in texts, natural language processing (NLP) techniques such as emotion analysis (EA) and sentiment analysis (SA) are employed. However, the effectiveness of these tasks using machine learning (ML) and deep learning (DL) methods depends on large labeled datasets, which are scarce in languages like Spanish. To address this challenge, researchers use data augmentation (DA) techniques to artificially expand small datasets. This study aims to investigate whether DA techniques can improve classification results using ML and DL algorithms for sentiment and emotion analysis of Spanish texts. Various text manipulation techniques were applied, including transformations, paraphrasing (back-translation), and text generation using generative adversarial networks, to small datasets such as song lyrics, social media comments, headlines from national newspapers in Chile, and survey responses from higher education students. The findings show that the Convolutional Neural Network (CNN) classifier achieved the most significant improvement, with an 18% increase using the Generative Adversarial Networks for Sentiment Text (SentiGan) on the Aggressiveness (Seriousness) dataset. Additionally, the same classifier model showed an 11% improvement using the Easy Data Augmentation (EDA) on the Gender-Based Violence dataset. The performance of the Bidirectional Encoder Representations from Transformers (BETO) also improved by 10% on the back-translation augmented version of the October 18 dataset, and by 4% on the EDA augmented version of the Teaching survey dataset. These results suggest that data augmentation techniques enhance performance by transforming text and adapting it to the specific characteristics of the dataset. Through experimentation with various augmentation techniques, this research provides valuable insights into the analysis of subjectivity in Spanish texts and offers guidance for selecting algorithms and techniques based on dataset features.

  9. G

    EDA with AI Market Research Report 2033

    • growthmarketreports.com
    csv, pdf, pptx
    Updated Aug 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Growth Market Reports (2025). EDA with AI Market Research Report 2033 [Dataset]. https://growthmarketreports.com/report/eda-with-ai-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Aug 29, 2025
    Dataset authored and provided by
    Growth Market Reports
    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    EDA with AI Market Outlook



    According to our latest research, the global EDA with AI market size reached USD 7.9 billion in 2024, reflecting robust demand for advanced automation in electronic design automation (EDA) powered by artificial intelligence. The sector is experiencing a strong compound annual growth rate (CAGR) of 18.2% from 2025 to 2033. By the end of 2033, the market is forecasted to reach USD 37.2 billion, driven by the increasing complexity of semiconductor devices, rapid growth in AI-enabled chip design, and the need for faster, more efficient design cycles. These advancements are further supported by the proliferation of IoT devices and the expansion of high-performance computing, which are contributing significantly to the marketÂ’s expansion as per our latest research.




    One of the primary growth factors for the EDA with AI market is the escalating complexity of semiconductor designs, which demands more sophisticated solutions for verification, simulation, and optimization. Traditional EDA tools are struggling to keep pace with the miniaturization of nodes and the integration of multi-billion transistor chips. AI-powered EDA solutions are revolutionizing the industry by automating complex tasks such as floorplanning, routing, and verification, significantly reducing time-to-market and design errors. These AI-driven tools are also enabling predictive analytics and intelligent optimization, allowing design teams to anticipate bottlenecks and improve overall productivity. As chipmakers race to develop next-generation processors for applications like autonomous vehicles, 5G, and quantum computing, the adoption of AI-enhanced EDA tools is accelerating across the globe.




    Another critical growth driver is the increasing adoption of AI and machine learning across various industries, which is fueling demand for specialized hardware and custom chipsets. This trend is particularly evident in sectors such as automotive, healthcare, and consumer electronics, where smart devices and advanced driver-assistance systems (ADAS) require highly reliable and efficient silicon. The integration of AI into EDA workflows is not only improving design accuracy but also facilitating the development of application-specific integrated circuits (ASICs) and system-on-chip (SoC) solutions. Furthermore, the shift towards cloud-based EDA platforms is democratizing access to advanced design tools, enabling startups and small enterprises to compete alongside established industry players. As a result, the ecosystem for EDA with AI is becoming more vibrant and inclusive, spurring innovation at an unprecedented pace.




    The third major growth factor lies in the convergence of EDA with AI and emerging technologies such as the Internet of Things (IoT), edge computing, and 5G communications. The proliferation of connected devices is driving the need for power-efficient, high-performance chips capable of real-time data processing. AI-driven EDA solutions are uniquely positioned to address these requirements by optimizing designs for power, performance, and area (PPA) metrics. Additionally, the use of AI in verification and simulation is reducing the incidence of costly design respins, thereby lowering overall development costs. Strategic collaborations between EDA vendors, semiconductor foundries, and cloud service providers are further enhancing the capabilities of AI-powered design tools, paving the way for the next wave of semiconductor innovation.



    EDA Software plays a crucial role in the burgeoning EDA with AI market, as it forms the backbone of the design automation process. These software solutions are essential for managing the increasing complexity of chip designs, offering tools that automate routine tasks, enhance simulation accuracy, and enable predictive analytics. As the demand for custom and complex chips grows, the reliance on advanced EDA software will only intensify. The software's ability to incorporate machine learning algorithms that learn from historical design data, optimize layouts, and minimize errors is pivotal in maintaining competitive advantage in the fast-evolving semiconductor industry. As such, EDA Software is not just a tool but a strategic asset that drives innovation and efficiency in electronic design.




    From a regional perspective, Asia Pacific continues to dominate the EDA with AI market, accounting f

  10. Understanding Placement Factors

    • kaggle.com
    zip
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Hetavi Ganatra (2024). Understanding Placement Factors [Dataset]. https://www.kaggle.com/datasets/hetavig/understanding-placement-factors/discussion
    Explore at:
    zip(1632 bytes)Available download formats
    Dataset updated
    Sep 25, 2024
    Authors
    Hetavi Ganatra
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    As a part of our Data Analytics Assignment, we did a survey on factors that affect placements of under-graduate students. The responses once filled were processed and used for performing EDA which helped us to Analyze the data and answer some crucial questions that concern the students currently pursuing their UG. Below are the Links you can access the codes for the Pre-processing and EDA from: Preprocessing: https://www.kaggle.com/code/hetavig/pre-processing EDA-1: https://www.kaggle.com/code/hetavig/exploratory-data-analysis-1 EDA-2: https://www.kaggle.com/code/hetavig/exploratory-data-analysis-2 EDA-3: https://www.kaggle.com/code/hetavig/exploratory-data-analysis-3 EDA-4: https://www.kaggle.com/code/hetavig/exploratory-data-analysis-4 EDA-5: https://www.kaggle.com/code/hetavig/exploratory-data-analysis-5

  11. SEM regression for H1-5.

    • plos.figshare.com
    xls
    Updated Nov 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Daan Kolkman; Gwendolyn K. Lee; Arjen van Witteloostuijn (2024). SEM regression for H1-5. [Dataset]. http://doi.org/10.1371/journal.pone.0309318.t004
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Nov 4, 2024
    Dataset provided by
    PLOShttp://plos.org/
    Authors
    Daan Kolkman; Gwendolyn K. Lee; Arjen van Witteloostuijn
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Recent calls to take up data science either revolve around the superior predictive performance associated with machine learning or the potential of data science techniques for exploratory data analysis. Many believe that these strengths come at the cost of explanatory insights, which form the basis for theorization. In this paper, we show that this trade-off is false. When used as a part of a full research process, including inductive, deductive and abductive steps, machine learning can offer explanatory insights and provide a solid basis for theorization. We present a systematic five-step theory-building and theory-testing cycle that consists of: 1. Element identification (reduction); 2. Exploratory analysis (induction); 3. Hypothesis development (retroduction); 4. Hypothesis testing (deduction); and 5. Theorization (abduction). We demonstrate the usefulness of this approach, which we refer to as co-duction, in a vignette where we study firm growth with real-world observational data.

  12. w

    Global EDA in Industrial Electronics Market Research Report: By Application...

    • wiseguyreports.com
    Updated Sep 15, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2025). Global EDA in Industrial Electronics Market Research Report: By Application (Automation, Control Systems, Signal Processing, Data Acquisition), By End Use (Manufacturing, Energy, Transportation, Telecommunications), By Product Type (Embedded Systems, Industrial Process Control, Industrial Communication Equipment), By Technology (Analog Electronics, Digital Electronics, Power Electronics) and By Regional (North America, Europe, South America, Asia Pacific, Middle East and Africa) - Forecast to 2035 [Dataset]. https://www.wiseguyreports.com/reports/eda-in-industrial-electronic-market
    Explore at:
    Dataset updated
    Sep 15, 2025
    License

    https://www.wiseguyreports.com/pages/privacy-policyhttps://www.wiseguyreports.com/pages/privacy-policy

    Time period covered
    Sep 25, 2025
    Area covered
    Global
    Description
    BASE YEAR2024
    HISTORICAL DATA2019 - 2023
    REGIONS COVEREDNorth America, Europe, APAC, South America, MEA
    REPORT COVERAGERevenue Forecast, Competitive Landscape, Growth Factors, and Trends
    MARKET SIZE 20242.26(USD Billion)
    MARKET SIZE 20252.45(USD Billion)
    MARKET SIZE 20355.5(USD Billion)
    SEGMENTS COVEREDApplication, End Use, Product Type, Technology, Regional
    COUNTRIES COVEREDUS, Canada, Germany, UK, France, Russia, Italy, Spain, Rest of Europe, China, India, Japan, South Korea, Malaysia, Thailand, Indonesia, Rest of APAC, Brazil, Mexico, Argentina, Rest of South America, GCC, South Africa, Rest of MEA
    KEY MARKET DYNAMICSTechnological advancements, Growing automation demand, Rising electronic complexity, Increased adoption of IoT, Emergence of smart manufacturing
    MARKET FORECAST UNITSUSD Billion
    KEY COMPANIES PROFILEDMicrochip Technology, Analog Devices, Synopsys, Cadence Design Systems, Texas Instruments, Infineon Technologies, Keysight Technologies, ANSYS, NXP Semiconductors, STMicroelectronics, Altium, Maxim Integrated, Rohm Semiconductor, Siemens, Broadcom, Mentor Graphics
    MARKET FORECAST PERIOD2025 - 2035
    KEY MARKET OPPORTUNITIESAI integration for design optimization, IoT expansion in manufacturing systems, Rise in smart factory implementations, Increasing demand for energy-efficient solutions, Growth in autonomous industrial applications
    COMPOUND ANNUAL GROWTH RATE (CAGR) 8.5% (2025 - 2035)
  13. W

    Wafer Fabrication EDA Tools Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Wafer Fabrication EDA Tools Report [Dataset]. https://www.datainsightsmarket.com/reports/wafer-fabrication-eda-tools-504653
    Explore at:
    ppt, doc, pdfAvailable download formats
    Dataset updated
    Jul 1, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The Wafer Fabrication EDA Tools market, valued at $1667 million in 2025, is projected to experience robust growth, driven by the increasing complexity of semiconductor designs and the rising demand for advanced process nodes. The market's Compound Annual Growth Rate (CAGR) of 6.4% from 2025 to 2033 reflects a consistent need for sophisticated Electronic Design Automation (EDA) tools to optimize wafer fabrication processes. Key drivers include the miniaturization of semiconductor devices, the proliferation of 5G and AI technologies fueling demand for high-performance chips, and the growing adoption of advanced packaging techniques. Leading players like Synopsys, Cadence, and Siemens EDA are at the forefront of innovation, continuously improving the accuracy, speed, and efficiency of their EDA tools to meet the evolving needs of the semiconductor industry. The market is also witnessing trends such as the integration of AI and machine learning into EDA workflows, enhancing design automation and optimization. While the market faces some restraints, such as high costs associated with advanced EDA tools and the complexities of software integration, the overall growth trajectory remains positive due to the continued technological advancements and increasing demand for high-performance computing. This growth is further fueled by strong regional demands, particularly in North America and Asia, where significant investments in semiconductor manufacturing facilities are occurring. The competitive landscape is characterized by both established industry giants and emerging players, leading to continuous innovation and improved tool capabilities. Despite the challenges of maintaining high accuracy in complex simulations and keeping up with the rapid pace of technological advancement, the wafer fabrication EDA tools market's expansion is likely to continue as the semiconductor industry progresses towards smaller, faster, and more energy-efficient chips. The market's segmentation (while not detailed in the provided data) is likely to reflect different EDA tool categories, such as physical verification, layout design, and process simulation, each exhibiting distinct growth rates.

  14. i

    Data from: A novel spatial prediction method integrating Exploratory Spatial...

    • ieee-dataport.org
    Updated Mar 19, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Bingbo Gao (2025). A novel spatial prediction method integrating Exploratory Spatial Data Analysis into Random Forest for large scale daily air temperature mapping [Dataset]. https://ieee-dataport.org/documents/novel-spatial-prediction-method-integrating-exploratory-spatial-data-analysis-random
    Explore at:
    Dataset updated
    Mar 19, 2025
    Authors
    Bingbo Gao
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    environmental management

  15. m

    Data from: Wrist-worn sensor validation for heart rate variability and...

    • data.mendeley.com
    • data.niaid.nih.gov
    Updated Jun 21, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Simone Costantini (2023). Wrist-worn sensor validation for heart rate variability and electrodermal activity detection in a stressful driving environment [Dataset]. http://doi.org/10.17632/npnv4tsbg7.1
    Explore at:
    Dataset updated
    Jun 21, 2023
    Authors
    Simone Costantini
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The current dataset contributes to assess the accuracy of the Empatica 4 (E4) wristband for the detection of heart rate variability (HRV) and electrodermal activity (EDA) metrics in stress-inducing conditions and growing-risk driving scenarios. Heart Rate Variability (HRV) and ElectroDermal Activity (EDA) signals were recorded over six experimental conditions (i.e., Baseline, Video Clip, Scream, No Risk Driving, Low-Risk Driving, and High-Risk Driving) and by means of two measurement systems: the E4 device and a gold standard system. The raw quality of the physiological signals was enhanced by means of robust semi-automatic reconstruction algorithms. Heart Rate Variability time-domain parameters showed high accuracy in motion-free experimental conditions, while Heart Rate Variability frequency-domain parameters reported sufficient accuracy in almost every experimental condition.

    Folder 01 contains both HRV and EDA parameters for every experimental condition, according to the Gold Standard measurement system and the Empatica 4 device, in two separate Excel files.

    Folder 02 contains supplementary material on the assessment of the signals quality.

    Folder 03 contains the Bland-Altman plot for each HRV and EDA parameter and for each condition (1 .png file per each parameter), and an excel file that resumes the Bland-Altman analyses numerical outcomes.

  16. g

    Electrodermal Activity (EDA) of Bi-cultural Visitors In Virtual Park...

    • gimi9.com
    Updated Dec 14, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2022). Electrodermal Activity (EDA) of Bi-cultural Visitors In Virtual Park Settings | gimi9.com [Dataset]. https://gimi9.com/dataset/eu_5f6276bf-6e08-4432-bf3b-9c77672976ba-envidat/
    Explore at:
    Dataset updated
    Dec 14, 2022
    Description

    This repository contains data on EDA measurements of visitors with different cultural backgrounds in virtual urban park settings. The parks are a Persian garden (Shiraz, Iran) and a historical park in Zurich, Switzerland. The cultural background of the visitors is Persian and Central European. The repository contains raw data from EDA, processed time series and statistical procedures.

  17. n

    HadISD: Global sub-daily, surface meteorological station data, 1931-2023,...

    • data-search.nerc.ac.uk
    Updated Jul 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). HadISD: Global sub-daily, surface meteorological station data, 1931-2023, v3.4.0.2023f [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=dewpoint
    Explore at:
    Dataset updated
    Jul 24, 2021
    Description

    This is version v3.4.0.2023f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data. This update (v3.4.0.2023f) to HadISD corrects a long-standing bug which was discovered in autumn 2023 whereby the neighbour checks (and associated [un]flagging for some other tests) were not being implemented. For more details see the posts on the HadISD blog: https://hadisd.blogspot.com/2023/10/bug-in-buddy-checks.html & https://hadisd.blogspot.com/2024/01/hadisd-v3402023f-future-look.html The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information. The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20240101_v3.4.1.2023f.nc. The station codes can be found under the docs tab. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height. To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS. For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/ References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) : Dunn, R. J. H., (2019), HadISD version 3: monthly updates, Hadley Centre Technical Note. Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016. Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012 Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704–708, doi:10.1175/2011BAMS3015.1 For a homogeneity assessment of HadISD please see this following reference Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.

  18. u

    Data from: Supplementary Material for "Sonification for Exploratory Data...

    • pub.uni-bielefeld.de
    • search.datacite.org
    Updated Feb 5, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Thomas Hermann (2019). Supplementary Material for "Sonification for Exploratory Data Analysis" [Dataset]. https://pub.uni-bielefeld.de/record/2920448
    Explore at:
    Dataset updated
    Feb 5, 2019
    Authors
    Thomas Hermann
    License

    Open Database License (ODbL) v1.0https://www.opendatacommons.org/licenses/odbl/1.0/
    License information was derived automatically

    Description

    Sonification for Exploratory Data Analysis

    Chapter 8: Sonification Models

    In Chapter 8 of the thesis, 6 sonification models are presented to give some examples for the framework of Model-Based Sonification, developed in Chapter 7. Sonification models determine the rendering of the sonification and possible interactions. The "model in mind" helps the user to interprete the sound with respect to the data.

    8.1 Data Sonograms

    Data Sonograms use spherical expanding shock waves to excite linear oscillators which are represented by point masses in model space.

    • Table 8.2, page 87: Sound examples for Data Sonograms
    File:
    Iris dataset: started in plot "https://pub.uni-bielefeld.de/download/2920448/2920454">(a) at S0 (b) at S1 (c) at S2
    10d noisy circle dataset: started in plot (c) at "https://pub.uni-bielefeld.de/download/2920448/2920451">S0 (mean) (d) at S1 (edge)
    10d Gaussian: plot (d) started at S0
    3 clusters: Example 1
    3 clusters: invisible columns used as output variables: "https://pub.uni-bielefeld.de/download/2920448/2920450">Example 2
    Description:
    Data Sonogram Sound examples for synthetic datasets and the Iris dataset
    Duration:
    about 5 s
    8.2 Particle Trajectory Sonification Model

    This sonification model explores features of a data distribution by computing the trajectories of test particles which are injected into model space and move according to Newton's laws of motion in a potential given by the dataset.

    • Sound example: page 93, PTSM-Ex-1 Audification of 1 particle in the potential of phi(x).
    • Sound example: page 93, PTSM-Ex-2 Audification of a sequence of 15 particles in the potential of a dataset with 2 clusters.
    • Sound example: page 94, PTSM-Ex-3 Audification of 25 particles simultaneous in a potential of a dataset with 2 clusters.
    • Sound example: page 94, PTSM-Ex-4 Audification of 25 particles simultaneous in a potential of a dataset with 1 cluster.
    • Sound example: page 95, PTSM-Ex-5 sigma-step sequence for a mixture of three Gaussian clusters
    • Sound example: page 95, PTSM-Ex-6 sigma-step sequence for a Gaussian cluster
    • Sound example: page 96, PTSM-Iris-1 Sonification for the Iris Dataset with 20 particles per step.
    • Sound example: page 96, PTSM-Iris-2 Sonification for the Iris Dataset with 3 particles per step.
    • Sound example: page 96, PTSM-Tetra-1 Sonification for a 4d tetrahedron clusters dataset.
    8.3 Markov chain Monte Carlo Sonification

    The McMC Sonification Model defines a exploratory process in the domain of a given density p such that the acoustic representation summarizes features of p, particularly concerning the modes of p by sound.

    • Sound Example: page 105, MCMC-Ex-1 McMC Sonification, stabilization of amplitudes.
    • Sound Example: page 106, MCMC-Ex-2 Trajectory Audification for 100 McMC steps in 3 cluster dataset
    • McMC Sonification for Cluster Analysis, dataset with three clusters, page 107
    • McMC Sonification for Cluster
  19. f

    Exploratory data analysis.

    • plos.figshare.com
    • datasetcatalog.nlm.nih.gov
    • +1more
    xls
    Updated Jun 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Oscar Ngesa; Henry Mwambi; Thomas Achia (2023). Exploratory data analysis. [Dataset]. http://doi.org/10.1371/journal.pone.0103299.t001
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jun 5, 2023
    Dataset provided by
    PLOS ONE
    Authors
    Oscar Ngesa; Henry Mwambi; Thomas Achia
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Exploratory data analysis.

  20. n

    HadISD: Global sub-daily, surface meteorological station data, 1931-2017,...

    • data-search.nerc.ac.uk
    • catalogue.ceda.ac.uk
    Updated Jul 24, 2021
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2021). HadISD: Global sub-daily, surface meteorological station data, 1931-2017, v2.0.2.2017f [Dataset]. https://data-search.nerc.ac.uk/geonetwork/srv/search?keyword=dewpoint
    Explore at:
    Dataset updated
    Jul 24, 2021
    Description

    This is version 2.0.2.2017f of Met Office Hadley Centre's Integrated Surface Database, HadISD. These data are global sub-daily surface meteorological data that extends HadISD v2.0.1.2016p to include 2017 and so spans 1931-2017, it replaces the preliminary version (v2.0.2.2017p) as the ISD data for 2017 are now finalised. The quality controlled variables in this dataset are: temperature, dewpoint temperature, sea-level pressure, wind speed and direction, cloud data (total, low, mid and high level). Past significant weather and precipitation data are also included, but have not been quality controlled, so their quality and completeness cannot be guaranteed. Quality control flags and data values which have been removed during the quality control process are provided in the qc_flags and flagged_values fields, and ancillary data files show the station listing with a station listing with IDs, names and location information. The data are provided as one NetCDF file per station. Files in the station_data folder station data files have the format "station_code"_HadISD_HadOBS_19310101-20171231_v2-0-2-2017f.nc. The station codes can be found under the docs tab or on the archive beside the station_data folder. The station codes file has five columns as follows: 1) station code, 2) station name 3) station latitude 4) station longitude 5) station height. To keep informed about updates, news and announcements follow the HadOBS team on twitter @metofficeHadOBS. For more detailed information e.g bug fixes, routine updates and other exploratory analysis, see the HadISD blog: http://hadisd.blogspot.co.uk/ For a more detailed description of precipitation see: http://hadisd.blogspot.co.uk/2018/03/precipitation-in-hadisd.html References: When using the dataset in a paper you must cite the following papers (see Docs for link to the publications) and this dataset (using the "citable as" reference) : Dunn, R. J. H., Willett, K. M., Parker, D. E., and Mitchell, L.: Expanding HadISD: quality-controlled, sub-daily station data from 1931, Geosci. Instrum. Method. Data Syst., 5, 473-491, doi:10.5194/gi-5-473-2016, 2016. Dunn, R. J. H., et al. (2012), HadISD: A Quality Controlled global synoptic report database for selected variables at long-term stations from 1973-2011, Clim. Past, 8, 1649-1679, 2012, doi:10.5194/cp-8-1649-2012 Smith, A., N. Lott, and R. Vose, 2011: The Integrated Surface Database: Recent Developments and Partnerships. Bulletin of the American Meteorological Society, 92, 704–708, doi:10.1175/2011BAMS3015.1 For a homogeneity assessment of HadISD please see this following reference Dunn, R. J. H., K. M. Willett, C. P. Morice, and D. E. Parker. "Pairwise homogeneity assessment of HadISD." Climate of the Past 10, no. 4 (2014): 1501-1522. doi:10.5194/cp-10-1501-2014, 2014.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Shrishti Manja (2024). Ecommerce Dataset for Data Analysis [Dataset]. https://www.kaggle.com/datasets/shrishtimanja/ecommerce-dataset-for-data-analysis/code
Organization logo

Ecommerce Dataset for Data Analysis

Exploratory Data Analysis, Data Visualisation and Machine Learning

Explore at:
zip(2028853 bytes)Available download formats
Dataset updated
Sep 19, 2024
Authors
Shrishti Manja
Description

This dataset contains 55,000 entries of synthetic customer transactions, generated using Python's Faker library. The goal behind creating this dataset was to provide a resource for learners like myself to explore, analyze, and apply various data analysis techniques in a context that closely mimics real-world data.

About the Dataset: - CID (Customer ID): A unique identifier for each customer. - TID (Transaction ID): A unique identifier for each transaction. - Gender: The gender of the customer, categorized as Male or Female. - Age Group: Age group of the customer, divided into several ranges. - Purchase Date: The timestamp of when the transaction took place. - Product Category: The category of the product purchased, such as Electronics, Apparel, etc. - Discount Availed: Indicates whether the customer availed any discount (Yes/No). - Discount Name: Name of the discount applied (e.g., FESTIVE50). - Discount Amount (INR): The amount of discount availed by the customer. - Gross Amount: The total amount before applying any discount. - Net Amount: The final amount after applying the discount. - Purchase Method: The payment method used (e.g., Credit Card, Debit Card, etc.). - Location: The city where the purchase took place.

Use Cases: 1. Exploratory Data Analysis (EDA): This dataset is ideal for conducting EDA, allowing users to practice techniques such as summary statistics, visualizations, and identifying patterns within the data. 2. Data Preprocessing and Cleaning: Learners can work on handling missing data, encoding categorical variables, and normalizing numerical values to prepare the dataset for analysis. 3. Data Visualization: Use tools like Python’s Matplotlib, Seaborn, or Power BI to visualize purchasing trends, customer demographics, or the impact of discounts on purchase amounts. 4. Machine Learning Applications: After applying feature engineering, this dataset is suitable for supervised learning models, such as predicting whether a customer will avail a discount or forecasting purchase amounts based on the input features.

This dataset provides an excellent sandbox for honing skills in data analysis, machine learning, and visualization in a structured but flexible manner.

This is not a real dataset. This dataset was generated using Python's Faker library for the sole purpose of learning

Search
Clear search
Close search
Google apps
Main menu