8 datasets found
  1. Shopping Mall Customer Data Segmentation Analysis

    • kaggle.com
    Updated Aug 4, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    DataZng (2024). Shopping Mall Customer Data Segmentation Analysis [Dataset]. https://www.kaggle.com/datasets/datazng/shopping-mall-customer-data-segmentation-analysis/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 4, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    DataZng
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Demographic Analysis of Shopping Behavior: Insights and Recommendations

    Dataset Information: The Shopping Mall Customer Segmentation Dataset comprises 15,079 unique entries, featuring Customer ID, age, gender, annual income, and spending score. This dataset assists in understanding customer behavior for strategic marketing planning.

    Cleaned Data Details: Data cleaned and standardized, 15,079 unique entries with attributes including - Customer ID, age, gender, annual income, and spending score. Can be used by marketing analysts to produce a better strategy for mall specific marketing.

    Challenges Faced: 1. Data Cleaning: Overcoming inconsistencies and missing values required meticulous attention. 2. Statistical Analysis: Interpreting demographic data accurately demanded collaborative effort. 3. Visualization: Crafting informative visuals to convey insights effectively posed design challenges.

    Research Topics: 1. Consumer Behavior Analysis: Exploring psychological factors driving purchasing decisions. 2. Market Segmentation Strategies: Investigating effective targeting based on demographic characteristics.

    Suggestions for Project Expansion: 1. Incorporate External Data: Integrate social media analytics or geographic data to enrich customer insights. 2. Advanced Analytics Techniques: Explore advanced statistical methods and machine learning algorithms for deeper analysis. 3. Real-Time Monitoring: Develop tools for agile decision-making through continuous customer behavior tracking. This summary outlines the demographic analysis of shopping behavior, highlighting key insights, dataset characteristics, team contributions, challenges, research topics, and suggestions for project expansion. Leveraging these insights can enhance marketing strategies and drive business growth in the retail sector.

    References OpenAI. (2022). ChatGPT [Computer software]. Retrieved from https://openai.com/chatgpt. Mustafa, Z. (2022). Shopping Mall Customer Segmentation Data [Data set]. Kaggle. Retrieved from https://www.kaggle.com/datasets/zubairmustafa/shopping-mall-customer-segmentation-data Donkeys. (n.d.). Kaggle Python API [Jupyter Notebook]. Kaggle. Retrieved from https://www.kaggle.com/code/donkeys/kaggle-python-api/notebook Pandas-Datareader. (n.d.). Retrieved from https://pypi.org/project/pandas-datareader/

  2. Spotify tracks

    • kaggle.com
    Updated Aug 29, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    BharathiD8 (2024). Spotify tracks [Dataset]. https://www.kaggle.com/datasets/bharathid8/spotify-tracks
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 29, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    BharathiD8
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by BharathiD8

    Released under Apache 2.0

    Contents

  3. Google Play Store_Cleaned

    • kaggle.com
    Updated Mar 26, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Yash (2023). Google Play Store_Cleaned [Dataset]. https://www.kaggle.com/datasets/yash16jr/google-play-store-cleaned
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 26, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Yash
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This Dataset is the cleaned up version of the Google Play Store Data dataset , available on Kaggle. The EDA and data cleaning was performed using Python .

  4. Datasets for manuscript "Tracking end-of-life stage of chemicals: a scalable...

    • catalog.data.gov
    • s.cnmilf.com
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    U.S. EPA Office of Research and Development (ORD) (2023). Datasets for manuscript "Tracking end-of-life stage of chemicals: a scalable data-centric and chemical-centric approach" [Dataset]. https://catalog.data.gov/dataset/datasets-for-manuscript-tracking-end-of-life-stage-of-chemicals-a-scalable-data-centric-an
    Explore at:
    Dataset updated
    May 30, 2023
    Dataset provided by
    United States Environmental Protection Agencyhttp://www.epa.gov/
    Description

    As described in the README.md file, the GitHub repository PRTR_transfers are Python scripts written to run a data-centric and chemical-centric framework for tracking EoL chemical flow transfers, identifying potential EoL exposure scenarios, and performing Chemical Flow Analysis (CFA). Also, the created Extract, Transform, and Load (ETL) pipeline leverages publicly-accessible Pollutant Release and Transfer Register (PRTR) systems belonging to Organization for Economic Cooperation and Development (OECD) member countries. The Life Cycle Inventory (LCI) data obtained by the ETL is stored in a Structured Query Language (SQL) database called PRTR_transfers that could be connected to Machine Learning Operations (MLOps) in production environments, making the framework scalable for real-world applications. The data ingestion pipeline can supply data at an annual rate, ensuring labeled data can be ingested into data-driven models if retraining is needed, especially to face problems like data and concept drift that could drastically affect the performance of data-driven models. Also, it describes the Python libraries required for running the code, how to use it, the obtained outputs files after running the Python script, and how to obtain all manuscript figures (file Manuscript Figures-EDA.ipynb) and results. This dataset is associated with the following publication: Hernandez-Betancur, J.D., G.J. Ruiz-Mercado, and M. Martín. Tracking end-of-life stage of chemicals: A scalable data-centric and chemical-centric approach. Resources, Conservation and Recycling. Elsevier Science BV, Amsterdam, NETHERLANDS, 196: 107031, (2023).

  5. Cyclistic Bike - Data Analysis (Python)

    • kaggle.com
    Updated Sep 25, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Amirthavarshini (2024). Cyclistic Bike - Data Analysis (Python) [Dataset]. https://www.kaggle.com/datasets/amirthavarshini12/cyclistic-bike-data-analysis-python/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Sep 25, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Amirthavarshini
    Description

    Conducted an in-depth analysis of Cyclistic bike-share data to uncover customer usage patterns and trends. Cleaned and processed raw data using Python libraries such as pandas and NumPy to ensure data quality. Performed exploratory data analysis (EDA) to identify insights, including peak usage times, customer demographics, and trip duration patterns. Created visualizations using Matplotlib and Seaborn to effectively communicate findings. Delivered actionable recommendations to enhance customer engagement and optimize operational efficiency.

  6. Bird Migration Dataset (Data Visualization / EDA)

    • kaggle.com
    Updated May 13, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahir Maharaj (2025). Bird Migration Dataset (Data Visualization / EDA) [Dataset]. https://www.kaggle.com/datasets/sahirmaharajj/bird-migration-dataset-data-visualization-eda/data?select=bird_migration_with_origin_destination.csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 13, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahir Maharaj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains 10,000 synthetic records simulating the migratory behavior of various bird species across global regions. Each entry represents a single bird tagged with a tracking device and includes detailed information such as flight distance, speed, altitude, weather conditions, tagging information, and migration outcomes.

    The data was entirely synthetically generated using randomized yet realistic values based on known ranges from ornithological studies. It is ideal for practicing data analysis and visualization techniques without privacy concerns or real-world data access restrictions. Because it’s artificial, the dataset can be freely used in education, portfolio projects, demo dashboards, machine learning pipelines, or business intelligence training.

    With over 40 columns, this dataset supports a wide array of analysis types. Analysts can explore questions like “Do certain species migrate in larger flocks?”, “How does weather impact nesting success?”, or “What conditions lead to migration interruptions?”. Users can also perform geospatial mapping of start and end locations, cluster birds by behavior, or build time series models based on migration months and environmental factors.

    For data visualization, tools like Power BI, Python (Matplotlib/Seaborn/Plotly), or Excel can be used to create insightful dashboards and interactive charts.

    Join the Fabric Community DataViz Contest | May 2025: https://community.fabric.microsoft.com/t5/Power-BI-Community-Blog/%EF%B8%8F-Fabric-Community-DataViz-Contest-May-2025/ba-p/4668560

  7. f

    Attention and Cognitive Workload

    • figshare.com
    csv
    Updated Jun 4, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rui Varandas; Inês Silveira; Hugo Gamboa (2025). Attention and Cognitive Workload [Dataset]. http://doi.org/10.6084/m9.figshare.28184417.v3
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jun 4, 2025
    Dataset provided by
    figshare
    Authors
    Rui Varandas; Inês Silveira; Hugo Gamboa
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description
    1. Attention and Cognitive Workload1.1. Experimental designTwo standard cognitive tasks, N-Back and mental subtraction, were conducted using PsychoPy. The N-Back task is a working memory task where participants are presented with a sequence of stimuli and are required to indicate when the current stimulus matched the one from 'n' steps earlier in the sequence, with 'n' varying across different levels. To avoid any interference from reading the instructions, rest periods of 60 seconds were incorporated before, in between, and after the two main tasks, along with a 20-second rest period between the explanation of tasks and the procedure. Additionally, a 10-second rest period was introduced between the different difficulty levels of the N-Back task and between the subtraction periods. The N-Back task was divided into 4 levels, each consisting of 60 trials. The mental subtraction task involved 20 periods of 10 seconds each, during which participants were required to continuously subtract a given number from the result of the previous subtraction, all while a visual cue was displayed.In the final stage, participants engaged in a practical learning task that required them to complete a Python tutorial, which included both theoretical concepts and practical examples. During this phase of the data collection process, not only were physiological sensors used, but HCI was also tracked.1.2. Data recordingData was collected from a group of 8 volunteers (including 4 females), who were all between the ages of 20 and 27 (average age=22.9, standard deviation=2.1). Each participant was right-handed and did not report any psychological or neurological conditions. None of them were on any medication, except for contraceptive pills.The data for subject 2 do not include the 2nd part of the acquisition (python task) because the equipment stopped acquiring; subject 3 has the 1st (N-Back task and mental subtraction) and the 2nd part (python tutorial) together in the First part folder (file D1_S3_PB_description.json indicates the start and end of each task); subject 4 only has the mental subtraction task in the 1st part acquisition and in subject 8, the subtraction task data is included in the 2nd part acquisition, along with python task.1.3. Data labellingData labeling can be performed in two ways: to categorize data into cognitive workload levels and baseline, either the PB description JSON files or the task_results.csv files can be used. Separately, the labelling of data into cognitive states was carried out every 10 seconds by researchers in biomedical engineering, in which they used image captures of the participants at various instants of the experiment, response time and signals from the respiration sensor to label the subjects’ state as bored, frustrated, interested and at rest. These cognitive state labels are stored in the cognitive_states_labels.txt files located within each subject's folder.1.4. Data descriptionBiosignals include EEG, fNIRS (not converted to oxi and deoxiHb), ECG, EDA, respiration (RIP), accelerometer (ACC), and push-button (PB) data. All signals have already been converted to physical units. In each biosignal file, the first column corresponds to the timestamps. For the first dataset, the biosignals folder is split into two parts: part 1 corresponds to the mental n-back and subtraction tasks, and part 2 corresponds to the Python tutorial. The PB files can be inside each part of the Biosignals folder, in case there are 2 files instead of 1.HCI features encompass keyboard, mouse, and screenshot data. A Python code snippet for extracting screenshot files from the screenshots csv file can be found below.import base64from os import mkdirfrom os.path import joinfile = '...'with open(file, 'r') as f: lines = f.readlines()for line in lines[1:]: timestamp = line.split(',')[0] code = line.split(',')[-1][:-2] imgdata = base64.b64decode(code) filename = str(timestamp) + '.jpeg' mkdir('screenshot') with open(join('screenshot', filename), 'wb') as f: f.write(imgdata)A characterization file containing age and gender information for all subjects in each dataset is provided within the respective dataset folder (e.g., D1_subject-info.csv). Other complementary files include (i) description of the pushbuttons to help segment the signals (e.g., D1_S2_PB_description.json) and (ii) labelling (e.g., D1_S2_cognitive_states_labels.txt). The D1_Sx_task_results.csv files show the results for the n-back task. A result of -1 means no answer, 0 wrong answer and 1 right answer. As for difficulty, 0 corresponds to baseline or rest periods, 1 corresponds to the 0-back task, 2 to 1-back, 3 to 2-back and 4 to 3-back. In the case of the mental subtraction task, we only distinguish between rest, represented with 0, and task, represented with 1. The response time refers to the time it takes the subject to respond and the key answer was the key the subject pressed ('y' corresponding to yes if, for example, for the 0-back task, the letter shown on the screen was identical to the previous one, 'n' corresponding to no if it wasn't and 'None' if there was no response). This file also provides the information needed to segment the signals into the different tasks and baselines.
  8. Adventure Works DW 2008

    • kaggle.com
    Updated Oct 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    James Vasanth (2024). Adventure Works DW 2008 [Dataset]. https://www.kaggle.com/datasets/jamesvasanth/adventure-works-dw-2008/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 5, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    James Vasanth
    Description

    The AdventureWorks DW 2008 dataset, originally provided by Microsoft, has been converted into CSV files for easier use, making it accessible for data exploration on platforms like Kaggle. The dataset is licensed under the Microsoft Public License (MS-PL), which is a permissive open-source license. This means you are free to use, modify, and share the dataset, whether for personal or commercial purposes, provided that you include the original license terms. However, it's important to note that the dataset is provided "as-is" without any warranty or guarantee from Microsoft.

    I really enjoy working with the AdventureWorks DW 2008 dataset. It offers a rich and well-structured environment that's perfect for writing and learning SQL queries. The data warehouse includes a variety of tables, such as facts and dimensions, making it an excellent resource for both beginners and experienced SQL users to practice querying and exploring relational databases.

    Now, with the dataset available in CSV format, it can be easily used with Python for exploratory data analysis (EDA), and it’s also well-suited for applying machine learning techniques such as regression, classification, and clustering.

    If you’re planning to dive into the data, all the best! It's a fantastic resource to learn from and experiment with. Cheers!

  9. Not seeing a result you expected?
    Learn how you can add new datasets to our index.

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
DataZng (2024). Shopping Mall Customer Data Segmentation Analysis [Dataset]. https://www.kaggle.com/datasets/datazng/shopping-mall-customer-data-segmentation-analysis/data
Organization logo

Shopping Mall Customer Data Segmentation Analysis

Python API Web Scrapping connected to Kaggle, Data EDA, Visualization

Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 4, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
DataZng
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Demographic Analysis of Shopping Behavior: Insights and Recommendations

Dataset Information: The Shopping Mall Customer Segmentation Dataset comprises 15,079 unique entries, featuring Customer ID, age, gender, annual income, and spending score. This dataset assists in understanding customer behavior for strategic marketing planning.

Cleaned Data Details: Data cleaned and standardized, 15,079 unique entries with attributes including - Customer ID, age, gender, annual income, and spending score. Can be used by marketing analysts to produce a better strategy for mall specific marketing.

Challenges Faced: 1. Data Cleaning: Overcoming inconsistencies and missing values required meticulous attention. 2. Statistical Analysis: Interpreting demographic data accurately demanded collaborative effort. 3. Visualization: Crafting informative visuals to convey insights effectively posed design challenges.

Research Topics: 1. Consumer Behavior Analysis: Exploring psychological factors driving purchasing decisions. 2. Market Segmentation Strategies: Investigating effective targeting based on demographic characteristics.

Suggestions for Project Expansion: 1. Incorporate External Data: Integrate social media analytics or geographic data to enrich customer insights. 2. Advanced Analytics Techniques: Explore advanced statistical methods and machine learning algorithms for deeper analysis. 3. Real-Time Monitoring: Develop tools for agile decision-making through continuous customer behavior tracking. This summary outlines the demographic analysis of shopping behavior, highlighting key insights, dataset characteristics, team contributions, challenges, research topics, and suggestions for project expansion. Leveraging these insights can enhance marketing strategies and drive business growth in the retail sector.

References OpenAI. (2022). ChatGPT [Computer software]. Retrieved from https://openai.com/chatgpt. Mustafa, Z. (2022). Shopping Mall Customer Segmentation Data [Data set]. Kaggle. Retrieved from https://www.kaggle.com/datasets/zubairmustafa/shopping-mall-customer-segmentation-data Donkeys. (n.d.). Kaggle Python API [Jupyter Notebook]. Kaggle. Retrieved from https://www.kaggle.com/code/donkeys/kaggle-python-api/notebook Pandas-Datareader. (n.d.). Retrieved from https://pypi.org/project/pandas-datareader/

Search
Clear search
Close search
Google apps
Main menu