100+ datasets found
  1. New 1000 Sales Records Data 2

    • kaggle.com
    zip
    Updated Jan 12, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
    Explore at:
    zip(49305 bytes)Available download formats
    Dataset updated
    Jan 12, 2023
    Authors
    Calvin Oko Mensah
    Description

    This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

  2. 5.7M+ Records -Most Comprehensive Football Dataset

    • kaggle.com
    zip
    Updated Sep 15, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    salimt (2025). 5.7M+ Records -Most Comprehensive Football Dataset [Dataset]. https://www.kaggle.com/datasets/xfkzujqjvx97n/football-datasets
    Explore at:
    zip(85313220 bytes)Available download formats
    Dataset updated
    Sep 15, 2025
    Authors
    salimt
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    About Dataset – TL;DR

    Comprehensive football (soccer) data lake from Transfermarkt, clean and structured for analysis and machine learning.

    • 93,000+ players worldwide
    • 2,200+ clubs across all major leagues
    • 5.7M+ total records across 10 categories
    • 902,000+ market valuations
    • 1.9M+ player performance stats
    • 1.2M+ player transfer histories
    • 144,000+ injuries & 93,000+ national team appearances
    • 1.3M+ teammate relationships

    Everything in raw CSV format – perfect for EDA, ML, and advanced football analytics.

    The Most Comprehensive Transfermarkt Football Dataset

    A complete football data lake covering players, teams, transfers, performances, market values, injuries, and national team stats. Perfect for analysts, data scientists, researchers, and enthusiasts.

    🗺 Entity-Relationship Overview

    Here’s the high-level schema to help you understand the dataset structure:

    https://i.imgur.com/WXLIx3L.png" alt="Transfermarkt Dataset ER Diagram">

    📊 Key Coverage

    • Players: 93,000+ professional players
    • Teams: 2,200+ clubs, 7,700+ club relationships
    • Data Volume: 5.7M+ total records
    • Global Scope: Major leagues and competitions worldwide

    🗂 Data Structure

    Organized into 10 well-structured CSV categories:

    Player Data (7 categories)

    • Player Profiles
    • Performances (matches, goals, assists, cards, minutes)
    • Market Values (historical valuations)
    • Transfer Histories
    • Injury Records
    • National Team Performances
    • Teammate Networks

    Team Data (3 categories)

    • Team Details (club info)
    • Competitions & Seasons
    • Parent/Child Team Relations

    🔗 What’s Inside?

    • 902K+ market value records to track valuation trends
    • 1.1M+ transfer histories with fees and movement
    • 1.9M+ performance stats across seasons and competitions
    • 144K+ injury records with days and matches missed
    • 93K+ national team appearances
    • 1.3M+ teammate relationships for chemistry analysis

    💡 Why This Dataset?

    Most football datasets are pre-processed and restrictive. This one is raw, rich, and flexible:

    • Build custom KPIs and models
    • Perform deep exploratory analysis (EDA)
    • Train machine learning and prediction pipelines
    • Combine with other football data sources

    🚀 Example Use Cases

    • Predictive Modeling – Player ratings, transfer value forecasts, injury risk
    • Data Visualization & Dashboards – Club comparisons, performance analytics
    • Scouting & Recruitment – Discover undervalued talent
    • Network Analysis – Teammate relationships and synergy

    🖥 Technical Details

    • Format: CSV files, UTF-8 encoded
    • Easy to Use: Ready for Python (pandas, numpy), R, SQL, BI tools
    • Scalable: 5.7M+ rows for big-data analysis

    💡 Working on a Cool Project?

    I’m always excited to collaborate on innovative football data projects. If you’ve got an idea, let’s make it happen together!

    📬 Contact Me

    • GitHub: @salimt
    • Issues: Feel free to use GitHub Issues if you’ve got dataset-specific questions.

    Support & Visibility

    If this dataset helps you:
    - Upvote on Kaggle
    - Star the GitHub repo
    - Share with others in the football analytics community

    Tags

    football analytics soccer dataset transfermarkt sports analytics machine learning football research player statistics

    🔥 Analyze football like never before. Your next AI or analytics project starts here.

  3. MyMart: A Comprehensive Sales Dataset

    • kaggle.com
    zip
    Updated Apr 8, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dave Darshan (2024). MyMart: A Comprehensive Sales Dataset [Dataset]. https://www.kaggle.com/datasets/davedarshan/mymart-a-comprehensive-sales-dataset
    Explore at:
    zip(277198 bytes)Available download formats
    Dataset updated
    Apr 8, 2024
    Authors
    Dave Darshan
    Description

    This data is artificially generated. It can be used for practicing data visualization and analysis skills. Please note that since the data is generated randomly, it may not reflect real-world sales data accurately. However, it should serve as a good starting point for practicing data analysis and visualization.

    Description :

    Sales Date: This column contains the date of each sale. The dates are generated for a period of 120 days starting from January 1, 2023. • Category: This column contains the category of the product sold. The categories include ‘Electronics’, ‘Clothing’, and ‘Home & Kitchen’. • Subcategory: This column contains the subcategory of the product sold. Each category has its own set of subcategories. For example, the ‘Electronics’ category includes subcategories such as ‘Communication’, ‘Computers’, and ‘Wearables’. • ProductName: This column contains the name of the product sold. Each subcategory has its own set of products. For example, the ‘Communication’ subcategory includes products such as ‘Walkie Talkie’, ‘Cell Phone’, and ‘Smart Phone’. • Salesperson: This column contains the name of the salesperson who made the sale. There are different salespersons assigned to each category. • Gender: This column contains the gender of the salesperson. The gender is determined based on the salesperson’s name. • Unit sold: This column contains the number of units of the product sold in the sale. The number of units sold is a random number between 1 and 100. • Original Price: This column contains the original price of the product. The original price is a random number between 10 and 1000. • Sales Price: This column contains the sales price of the product. The sales price is calculated as a random fraction of the original price, ensuring that the sales price is always slightly higher than the original price.

    For information on 'How to generate a dataset', click here.

  4. Web Analytics Dataset

    • kaggle.com
    zip
    Updated Oct 12, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Merve Afranur ARTAR (2020). Web Analytics Dataset [Dataset]. https://www.kaggle.com/datasets/afranur/web-analytics-dataset
    Explore at:
    zip(7376 bytes)Available download formats
    Dataset updated
    Oct 12, 2020
    Authors
    Merve Afranur ARTAR
    Description

    Dataset

    This dataset was created by Merve Afranur ARTAR

    Contents

  5. YouTube Dataset of all Data Science Channels🎓🧾

    • kaggle.com
    zip
    Updated Jun 21, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Abhishek0032 (2024). YouTube Dataset of all Data Science Channels🎓🧾 [Dataset]. https://www.kaggle.com/datasets/abhishek0032/youtube-dataset-all-data-scienceanalyst-channels
    Explore at:
    zip(732289 bytes)Available download formats
    Dataset updated
    Jun 21, 2024
    Authors
    Abhishek0032
    Area covered
    YouTube
    Description

    Description: This dataset contains detailed information about videos from various YouTube channels that specialize in data science and analytics. It includes metrics such as views, likes, comments, and publication dates. The dataset consists of 22862 rows, providing a robust sample for analyzing trends in content engagement, popularity of topics over time, and comparison of channels' performance.

    Column Descriptors:

    Channel_Name: The name of the YouTube channel. Title: The title of the video. Published_date: The date when the video was published. Views: The number of views the video has received. Like_count: The number of likes the video has received. Comment_Count: The number of comments on the video.

    This dataset contains information from the following YouTube channels:

    ['sentdex', 'freeCodeCamp.org' ,'CampusX', 'Darshil Parmar',' Keith Galli' ,'Alex The Analyst', 'Socratica' , Krish Naik', 'StatQuest with Josh Starmer', 'Nicholas Renotte', 'Leila Gharani', 'Rob Mulla' ,'Ryan Nolan Data', 'techTFQ', 'Dataquest' ,'WsCube Tech', 'Chandoo', 'Luke Barousse', 'Andrej Karpathy', 'Thu Vu data analytics', 'Guy in a Cube', 'Tableau Tim', 'codebasics', 'DeepLearningAI', 'Rishabh Mishra' 'ExcelIsFun', 'Kevin Stratvert' ' Ken Jee','Kaggle' , 'Tina Huang']

    This dataset can be used for various analyses, including but not limited to:

    Identifying the most popular videos and channels in the data science field.

    Understanding viewer engagement trends over time.

    Comparing the performance of different types of content across multiple channels.

    Performing a comparison between different channels to find the best-performing ones.

    Identifying the best videos to watch for specific topics in data science and analytics.

    Conducting a detailed analysis of your favorite YouTube channel to understand its content strategy and performance.

    Note: The data is current as of the date of extraction and may not reflect real-time changes on YouTube. For any analyses, ensure to consider the date when the data was last updated to maintain accuracy and relevance.

  6. Retail Store Sales: Dirty for Data Cleaning

    • kaggle.com
    zip
    Updated Jan 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Retail Store Sales: Dirty for Data Cleaning [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/retail-store-sales-dirty-for-data-cleaning
    Explore at:
    zip(226740 bytes)Available download formats
    Dataset updated
    Jan 18, 2025
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dirty Retail Store Sales Dataset

    Overview

    The Dirty Retail Store Sales dataset contains 12,575 rows of synthetic data representing sales transactions from a retail store. The dataset includes eight product categories with 25 items per category, each having static prices. It is designed to simulate real-world sales data, including intentional "dirtiness" such as missing or inconsistent values. This dataset is suitable for practicing data cleaning, exploratory data analysis (EDA), and feature engineering.

    File Information

    • File Name: retail_store_sales.csv
    • Number of Rows: 12,575
    • Number of Columns: 11

    Columns Description

    Column NameDescriptionExample Values
    Transaction IDA unique identifier for each transaction. Always present and unique.TXN_1234567
    Customer IDA unique identifier for each customer. 25 unique customers.CUST_01
    CategoryThe category of the purchased item.Food, Furniture
    ItemThe name of the purchased item. May contain missing values or None.Item_1_FOOD, None
    Price Per UnitThe static price of a single unit of the item. May contain missing or None values.4.00, None
    QuantityThe quantity of the item purchased. May contain missing or None values.1, None
    Total SpentThe total amount spent on the transaction. Calculated as Quantity * Price Per Unit.8.00, None
    Payment MethodThe method of payment used. May contain missing or invalid values.Cash, Credit Card
    LocationThe location where the transaction occurred. May contain missing or invalid values.In-store, Online
    Transaction DateThe date of the transaction. Always present and valid.2023-01-15
    Discount AppliedIndicates if a discount was applied to the transaction. May contain missing values.True, False, None

    Categories and Items

    The dataset includes the following categories, each containing 25 items with corresponding codes, names, and static prices:

    Electric Household Essentials

    Item CodeItem NamePrice
    Item_1_EHEBlender5.0
    Item_2_EHEMicrowave6.5
    Item_3_EHEToaster8.0
    Item_4_EHEVacuum Cleaner9.5
    Item_5_EHEAir Purifier11.0
    Item_6_EHEElectric Kettle12.5
    Item_7_EHERice Cooker14.0
    Item_8_EHEIron15.5
    Item_9_EHECeiling Fan17.0
    Item_10_EHETable Fan18.5
    Item_11_EHEHair Dryer20.0
    Item_12_EHEHeater21.5
    Item_13_EHEHumidifier23.0
    Item_14_EHEDehumidifier24.5
    Item_15_EHECoffee Maker26.0
    Item_16_EHEPortable AC27.5
    Item_17_EHEElectric Stove29.0
    Item_18_EHEPressure Cooker30.5
    Item_19_EHEInduction Cooktop32.0
    Item_20_EHEWater Dispenser33.5
    Item_21_EHEHand Blender35.0
    Item_22_EHEMixer Grinder36.5
    Item_23_EHESandwich Maker38.0
    Item_24_EHEAir Fryer39.5
    Item_25_EHEJuicer41.0

    Furniture

    Item CodeItem NamePrice
    Item_1_FUROffice Chair5.0
    Item_2_FURSofa6.5
    Item_3_FURCoffee Table8.0
    Item_4_FURDining Table9.5
    Item_5_FURBookshelf11.0
    Item_6_FURBed F...
  7. Job Dataset

    • kaggle.com
    zip
    Updated Sep 17, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ravender Singh Rana (2023). Job Dataset [Dataset]. https://www.kaggle.com/datasets/ravindrasinghrana/job-description-dataset
    Explore at:
    zip(479575920 bytes)Available download formats
    Dataset updated
    Sep 17, 2023
    Authors
    Ravender Singh Rana
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Job Dataset

    This dataset provides a comprehensive collection of synthetic job postings to facilitate research and analysis in the field of job market trends, natural language processing (NLP), and machine learning. Created for educational and research purposes, this dataset offers a diverse set of job listings across various industries and job types.

    Descriptions for each of the columns in the dataset:

    1. Job Id: A unique identifier for each job posting.
    2. Experience: The required or preferred years of experience for the job.
    3. Qualifications: The educational qualifications needed for the job.
    4. Salary Range: The range of salaries or compensation offered for the position.
    5. Location: The city or area where the job is located.
    6. Country: The country where the job is located.
    7. Latitude: The latitude coordinate of the job location.
    8. Longitude: The longitude coordinate of the job location.
    9. Work Type: The type of employment (e.g., full-time, part-time, contract).
    10. Company Size: The approximate size or scale of the hiring company.
    11. Job Posting Date: The date when the job posting was made public.
    12. Preference: Special preferences or requirements for applicants (e.g., Only Male or Only Female, or Both)
    13. Contact Person: The name of the contact person or recruiter for the job.
    14. Contact: Contact information for job inquiries.
    15. Job Title: The job title or position being advertised.
    16. Role: The role or category of the job (e.g., software developer, marketing manager).
    17. Job Portal: The platform or website where the job was posted.
    18. Job Description: A detailed description of the job responsibilities and requirements.
    19. Benefits: Information about benefits offered with the job (e.g., health insurance, retirement plans).
    20. Skills: The skills or qualifications required for the job.
    21. Responsibilities: Specific responsibilities and duties associated with the job.
    22. Company Name: The name of the hiring company.
    23. Company Profile: A brief overview of the company's background and mission.

    Potential Use Cases:

    • Building predictive models to forecast job market trends.
    • Enhancing job recommendation systems for job seekers.
    • Developing NLP models for resume parsing and job matching.
    • Analyzing regional job market disparities and opportunities.
    • Exploring salary prediction models for various job roles.

    Acknowledgements:

    We would like to express our gratitude to the Python Faker library for its invaluable contribution to the dataset generation process. Additionally, we appreciate the guidance provided by ChatGPT in fine-tuning the dataset, ensuring its quality, and adhering to ethical standards.

    Note:

    Please note that the examples provided are fictional and for illustrative purposes. You can tailor the descriptions and examples to match the specifics of your dataset. It is not suitable for real-world applications and should only be used within the scope of research and experimentation. You can also reach me via email at: rrana157@gmail.com

  8. Delhi Metro Dataset- EDA & Data Visualization

    • kaggle.com
    zip
    Updated Nov 6, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nikhil Kumar Mishra (2025). Delhi Metro Dataset- EDA & Data Visualization [Dataset]. https://www.kaggle.com/datasets/nikhilkumar766/delhi-metro-dataset
    Explore at:
    zip(3372272 bytes)Available download formats
    Dataset updated
    Nov 6, 2025
    Authors
    Nikhil Kumar Mishra
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Area covered
    Delhi
    Description

    📊 Delhi Metro Ridership & Operational Statistics Dataset

    Subtitle

    A comprehensive dataset representing ridership, ticket revenue, and operational performance of the Delhi Metro one of the largest urban transit systems in the world.

    About the Dataset

    The Delhi Metro is a rapid transit system serving the National Capital Region (NCR) of India. It plays a crucial role in reducing traffic congestion and providing sustainable public transportation to millions of passengers every day.

    This dataset captures multiple performance indicators of the Delhi Metro network over time, including:

    Total metro trips operated Daily total passengers Ticket revenue Average passenger distance traveled per trip Top stations based on passenger demand Total stations operational

    These data points help in analyzing metro usage patterns, operational efficiency, and transit demand in the region.

    Why This Dataset is Useful

    This dataset enables research in:

    Urban transport planning Revenue & demand forecasting Passenger travel behavior analysis Transportation infrastructure optimization Dashboard development & data storytelling Academic machine learning projects

    Potential Use Cases

    • Time-series forecasting of passengers and revenue
    • Peak-hour identification using station-based data
    • Policy evaluation on fare changes and new lines
    • Visualization dashboards (e.g., Plotly/Dash, Power BI, Tableau)

    Source

    Data has been collected, cleaned, and aggregated using publicly available metro operational insights, news reports, and transit performance summaries released by the Delhi Metro Rail Corporation (DMRC).

    File Details

    FieldDescription
    DateDate of operation
    Total_TripsNumber of train trips operated on that day
    Total_PassengersTotal ridership for that day
    Total_RevenueTicketing revenue (₹ INR)
    Avg_FareRevenue divided by passengers
    Avg_DistanceEstimated average travel distance per passenger
    Passengers_per_TripRidership divided by number of trips
    Revenue_TicketTicket revenue per trip
    Ticket_Type (optional)Type of ticket or trip category
    Top_StationsHighest-demand stations on that day

    (Adjust fields based on your actual dataset columns — I can refine if you share final structure.)

    Licensing

    License: CC BY 4.0 (Users must provide attribution when using the dataset)

    If you want, I can also add:

    Thumbnail Image for Kaggle Dataset Tags & Categories for better discoverability Example Notebooks (Exploration + Forecast models) Dashboard Preview Screenshots

  9. Car Prices Dataset

    • kaggle.com
    zip
    Updated Mar 30, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GIRITHARAN MANI (2024). Car Prices Dataset [Dataset]. https://www.kaggle.com/datasets/mystifoe77/car-prices
    Explore at:
    zip(19753181 bytes)Available download formats
    Dataset updated
    Mar 30, 2024
    Authors
    GIRITHARAN MANI
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The Car Prices dataset contains detailed information about various car models, including their manufacturing year, make, model, trim, body type, transmission, and state of condition. With over 550,000 entries, this dataset is an excellent resource for exploring trends in car prices, analyzing market value fluctuations, and developing predictive models for the automotive industry.

    Features

    1. Year: Manufacturing year of the vehicle (Range: 1982-2015).
    2. Make: Brand of the car (Top makes: Ford - 17%, Chevrolet - 11%).
    3. Model: Specific car models, including Altima (3%), F-150 (3%), and others.
    4. Trim: Vehicle trim level or variant (Base - 10%, SE - 8%).
    5. Body: Body type, such as Sedan (36%), SUV (21%), etc.
    6. Transmission: Type of transmission (85% Automatic, 12% unknown).
    7. VIN: Vehicle Identification Number, unique to each car.
    8. State: Location where the vehicle is registered (e.g., CA - 13%, FL - 15%).
    9. Condition: Condition score of the car (1–5 scale).
    10. Odometer: Mileage covered by the car (Range: 1 to 1,000,000+ miles).

    Use Cases

    • Price Prediction: Build machine learning models to predict car prices based on features like make, model, year, and mileage.
    • Market Analysis: Identify trends in car preferences and sales across different regions.
    • Depreciation Study: Study how different brands and models lose value over time.

    Statistics Overview

    • Years Distribution: The dataset spans from 1982 to 2015, with most entries concentrated in the 2000s.
    • Popular Brands: Ford and Chevrolet dominate the entries.
    • Body Type Distribution: Sedans and SUVs make up over 50% of the dataset.
    • Condition Scores: Vehicles with a condition score of 4-5 are most common.

    Example Records

    YearMakeModelTrimBodyTransmissionStateConditionOdometer
    2015KiaSorentoLXSUVAutomaticCA516,639
    2014BMW3 Series328iSedanAutomaticCA413,310
    2015NissanAltima2.5 SSedanAutomaticCA15,554
    2014ChevroletCamaroLTConvertibleAutomaticCA34,809
    2015FordFusionSESedanAutomaticCA25,559

    License

    This dataset is available under the MIT License, making it suitable for both commercial and non-commercial use.

    Tags

    • Car Pricing
    • Automotive Market
    • Machine Learning
    • Price Prediction
    • Data Analysis

    Download Now and explore the intricacies of car prices with this rich and diverse dataset!

  10. Comprehensive Diabetes Clinical Dataset(100k rows)

    • kaggle.com
    zip
    Updated Jul 20, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyam Choksi (2024). Comprehensive Diabetes Clinical Dataset(100k rows) [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/100000-diabetes-clinical-dataset
    Explore at:
    zip(917848 bytes)Available download formats
    Dataset updated
    Jul 20, 2024
    Authors
    Priyam Choksi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Detailed dataset comprising health and demographic data of 100,000 individuals, aimed at facilitating diabetes-related research and predictive modeling. This dataset includes information on gender, age, location, race, hypertension, heart disease, smoking history, BMI, HbA1c level, blood glucose level, and diabetes status.

    Dataset Use Cases

    This dataset can be used for various analytical and machine learning purposes, such as:

    1. Predictive Modeling: Build models to predict the likelihood of diabetes based on demographic and health-related features.
    2. Health Analytics: Analyze the correlation between different health metrics (e.g., BMI, HbA1c level) and diabetes.
    3. Demographic Studies: Examine the distribution of diabetes across different demographic groups and locations.
    4. Public Health Research: Identify risk factors for diabetes and target interventions to high-risk groups.
    5. Clinical Research: Study the relationship between comorbid conditions like hypertension and heart disease with diabetes.

    Potential Analyses

    • Descriptive Statistics: Summarize the dataset to understand the central tendencies and dispersion of features.
    • Correlation Analysis: Identify the relationships between features.
    • Classification Models: Use machine learning algorithms to classify individuals as diabetic or non-diabetic.
    • Trend Analysis: Analyze trends over the years to see how diabetes prevalence has changed.
  11. E-Commerce Customer Behavior & Sales Analysis -TR

    • kaggle.com
    zip
    Updated Oct 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    UmutUygurr (2025). E-Commerce Customer Behavior & Sales Analysis -TR [Dataset]. https://www.kaggle.com/datasets/umuttuygurr/e-commerce-customer-behavior-and-sales-analysis-tr
    Explore at:
    zip(138245 bytes)Available download formats
    Dataset updated
    Oct 29, 2025
    Authors
    UmutUygurr
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    🛒 E-Commerce Customer Behavior and Sales Dataset 📊 Dataset Overview This comprehensive dataset contains 5,000 e-commerce transactions from a Turkish online retail platform, spanning from January 2023 to March 2024. The dataset provides detailed insights into customer demographics, purchasing behavior, product preferences, and engagement metrics.

    🎯 Use Cases This dataset is perfect for:

    Customer Segmentation Analysis: Identify distinct customer groups based on behavior Sales Forecasting: Predict future sales trends and patterns Recommendation Systems: Build product recommendation engines Customer Lifetime Value (CLV) Prediction: Estimate customer value Churn Analysis: Identify customers at risk of leaving Marketing Campaign Optimization: Target customers effectively Price Optimization: Analyze price sensitivity across categories Delivery Performance Analysis: Optimize logistics and shipping 📁 Dataset Structure The dataset contains 18 columns with the following features:

    Order Information Order_ID: Unique identifier for each order (ORD_XXXXXX format) Date: Transaction date (2023-01-01 to 2024-03-26) Customer Demographics Customer_ID: Unique customer identifier (CUST_XXXXX format) Age: Customer age (18-75 years) Gender: Customer gender (Male, Female, Other) City: Customer city (10 major Turkish cities) Product Information Product_Category: 8 categories (Electronics, Fashion, Home & Garden, Sports, Books, Beauty, Toys, Food) Unit_Price: Price per unit (in TRY/Turkish Lira) Quantity: Number of units purchased (1-5) Transaction Details Discount_Amount: Discount applied (if any) Total_Amount: Final transaction amount after discount Payment_Method: Payment method used (5 types) Customer Behavior Metrics Device_Type: Device used for purchase (Mobile, Desktop, Tablet) Session_Duration_Minutes: Time spent on website (1-120 minutes) Pages_Viewed: Number of pages viewed during session (1-50) Is_Returning_Customer: Whether customer has purchased before (True/False) Post-Purchase Metrics Delivery_Time_Days: Delivery duration (1-30 days) Customer_Rating: Customer satisfaction rating (1-5 stars) 📈 Key Statistics Total Records: 5,000 transactions Date Range: January 2023 - March 2024 (15 months) Average Transaction Value: ~450 TRY Customer Satisfaction: 3.9/5.0 average rating Returning Customer Rate: 60% Mobile Usage: 55% of transactions 🔍 Data Quality ✅ No missing values ✅ Consistent formatting across all fields ✅ Realistic data distributions ✅ Proper data types for all columns ✅ Logical relationships between features 💡 Sample Analysis Ideas Customer Segmentation with K-Means Clustering

    Segment customers based on spending, frequency, and recency Sales Trend Analysis

    Identify seasonal patterns and peak shopping periods Product Category Performance

    Compare revenue, ratings, and return rates across categories Device-Based Behavior Analysis

    Understand how device choice affects purchasing patterns Predictive Modeling

    Build models to predict customer ratings or purchase amounts City-Level Market Analysis

    Compare market performance across different cities 🛠️ Technical Details File Format: CSV (Comma-Separated Values) Encoding: UTF-8 File Size: ~500 KB Delimiter: Comma (,) 📚 Column Descriptions Column Name Data Type Description Example Order_ID String Unique order identifier ORD_001337 Customer_ID String Unique customer identifier CUST_01337 Date DateTime Transaction date 2023-06-15 Age Integer Customer age 35 Gender String Customer gender Female City String Customer city Istanbul Product_Category String Product category Electronics Unit_Price Float Price per unit 1299.99 Quantity Integer Units purchased 2 Discount_Amount Float Discount applied 129.99 Total_Amount Float Final amount paid 2469.99 Payment_Method String Payment method Credit Card Device_Type String Device used Mobile Session_Duration_Minutes Integer Session time 15 Pages_Viewed Integer Pages viewed 8 Is_Returning_Customer Boolean Returning customer True Delivery_Time_Days Integer Delivery duration 3 Customer_Rating Integer Satisfaction rating 5 🎓 Learning Outcomes By working with this dataset, you can learn:

    Data cleaning and preprocessing techniques Exploratory Data Analysis (EDA) with Python/R Statistical analysis and hypothesis testing Machine learning model development Data visualization best practices Business intelligence and reporting 📝 Citation If you use this dataset in your research or project, please cite:

    E-Commerce Customer Behavior and Sales Dataset (2024) Turkish Online Retail Platform Data (2023-2024) Available on Kaggle ⚖️ License This dataset is released under the CC0: Public Domain license. You are free to use it for any purpose.

    🤝 Contribution Found any issues or have suggestions? Feel free to provide feedback!

    📞 Contact For questions or collaborations, please reach out through Kaggle.

    Happy Analyzing! 🚀

    Keywords: e-c...

  12. Diabetes Dataset 2019

    • kaggle.com
    zip
    Updated Jul 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Usama Raheem (2023). Diabetes Dataset 2019 [Dataset]. https://www.kaggle.com/datasets/usamaraheem/diabetes-dataset-2019
    Explore at:
    zip(9118 bytes)Available download formats
    Dataset updated
    Jul 1, 2023
    Authors
    Usama Raheem
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Overview

    Dataset Title: Diabetes Dataset 2019 Year: 2019

    Variables (Columns)

    Pregnancies: Number of times pregnant Glucose: Plasma glucose concentration (mg/dL) BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skinfold thickness (mm) Insulin: 2-Hour serum insulin (mu U/ml) BMI: Body mass index (weight in kg / (height in m)^2) DiabetesPedigreeFunction: Diabetes pedigree function (a measure of genetic influence) Age: Age (years)

    Outcome

    Binary variable indicating the presence (1) or absence (0) of diabetes Data Examples:

    The dataset contains multiple rows, with each row representing an individual case or patient. Each row includes information on the number of pregnancies, glucose levels, blood pressure, skin thickness, insulin levels, BMI, diabetes pedigree function, age, and outcome (diabetes presence or absence).

    The purpose of this dataset is to be focused on studying the relationship between various factors (e.g., pregnancies, glucose levels, BMI) and the presence or absence of diabetes.

    Diabetes Dataset Analysis

    Exploratory Data Analysis (EDA): Explore the distributions, relationships, and summary statistics of the variables. Predictive Modeling: Develop predictive models to determine the likelihood of diabetes based on the given variables. Feature Importance: Assess the importance of each variable in predicting the presence or absence of diabetes. Risk Assessment: Identify key risk factors associated with diabetes based on the dataset.

  13. Sentiment Analysis Datasets

    • kaggle.com
    zip
    Updated Jan 11, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    willian oliveira (2024). Sentiment Analysis Datasets [Dataset]. https://www.kaggle.com/datasets/willianoliveiragibin/sentiment-analysis-datasets
    Explore at:
    zip(49156 bytes)Available download formats
    Dataset updated
    Jan 11, 2024
    Authors
    willian oliveira
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2Ffde74c2056b6a7ba8fa37e973d59a4a0%2Ffile327c32b276d.gif?generation=1705002897579997&alt=media" alt=""> https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F16731800%2F7e743528f7b168b29fa457cb906744b8%2Fdensity.png?generation=1705003015226745&alt=media" alt="">

    The Social Media Sentiments Analysis Dataset offers a fascinating glimpse into the intricate tapestry of emotions, trends, and interactions prevalent across diverse social media platforms. This dataset serves as a snapshot of user-generated content, encompassing textual expressions, timestamps, hashtags, geographical locations, engagement metrics such as likes and retweets, and user identifiers. Each entry unveils a unique narrative—moments of surprise, excitement, admiration, thrill, contentment, and more—shared by individuals globally.

    Key Features

    Text: The user-generated content, a window into diverse sentiments.

    Sentiment: Emotions categorized for insightful analysis.

    Timestamp: Date and time details providing a temporal dimension.

    User: Unique identifiers of contributors, enabling user-specific insights.

    Platform: Indicates the social media platform of origin, allowing platform-specific analysis.

    Hashtags: Identifies trending topics and themes, unraveling popular narratives.

    Likes: Quantifies user engagement, reflecting content appreciation.

    Retweets: Reflects content popularity, showcasing the extent of its reach.

    Country: Geographical origin of each post, facilitating geographical analysis.

    Year, Month, Day, Hour: Temporal details for comprehensive temporal analysis.

    How to Utilize The Social Media Sentiments Analysis Dataset 📊

    The richness of the dataset allows for versatile analytical applications:

    Sentiment Analysis: Explore the emotional landscape by categorizing user-generated content into surprise, excitement, admiration, thrill, contentment, and more.

    Temporal Analysis: Investigate trends over time, identifying patterns, fluctuations, or recurring themes in social media content.

    User Behavior Insights: Analyze user engagement through likes and retweets, discovering popular content and user preferences.

    Platform-Specific Analysis: Examine variations in content across different social media platforms, understanding how sentiments vary.

    Hashtag Trends: Identify trending topics and themes by analyzing hashtags, uncovering popular or recurring ones.

    Geographical Analysis: Explore content distribution based on the country of origin, understanding regional variations in sentiment and topic preferences.

    User Identification: Utilize user identifiers to track specific contributors, analyzing the impact of influential users on sentiment trends.

    Cross-Analysis: Combine multiple features for in-depth insights. For example, analyze sentiment trends over time or across different platforms and countries.

    In conclusion, the Social Media Sentiments Analysis Dataset provides a robust foundation for nuanced explorations into the dynamic world of social media interactions, offering researchers and analysts a wealth of opportunities for comprehensive insights.

  14. Data from: Retail Sales Analysis

    • kaggle.com
    Updated Jun 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sahir Maharaj (2024). Retail Sales Analysis [Dataset]. https://www.kaggle.com/datasets/sahirmaharajj/retail-sales-analysis
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 23, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sahir Maharaj
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    This dataset contains a list of sales and movement data by item and department appended monthly.

    It is rich in information that can be leveraged for various data science applications. For instance, analyzing this dataset can offer insights into consumer behavior, such as preferences for specific types of beverages (e.g., wine, beer) during different times of the year. Furthermore, the dataset can be used to identify trends in sales and transfers, highlighting seasonal effects or the impact of certain suppliers on the market.

    One could start with exploratory data analysis (EDA) to understand the basic distribution of sales and transfers across different item types and suppliers. Time series analysis can provide insights into seasonal trends and sales forecasts. Cluster analysis might reveal groups of suppliers or items with similar sales patterns, which can be useful for targeted marketing and inventory management.

  15. Customer Shopping Trends Dataset

    • kaggle.com
    zip
    Updated Oct 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sourav Banerjee (2023). Customer Shopping Trends Dataset [Dataset]. https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset
    Explore at:
    zip(149846 bytes)Available download formats
    Dataset updated
    Oct 5, 2023
    Authors
    Sourav Banerjee
    Description

    Context

    The Customer Shopping Preferences Dataset offers valuable insights into consumer behavior and purchasing patterns. Understanding customer preferences and trends is critical for businesses to tailor their products, marketing strategies, and overall customer experience. This dataset captures a wide range of customer attributes including age, gender, purchase history, preferred payment methods, frequency of purchases, and more. Analyzing this data can help businesses make informed decisions, optimize product offerings, and enhance customer satisfaction. The dataset stands as a valuable resource for businesses aiming to align their strategies with customer needs and preferences. It's important to note that this dataset is a Synthetic Dataset Created for Beginners to learn more about Data Analysis and Machine Learning.

    Content

    This dataset encompasses various features related to customer shopping preferences, gathering essential information for businesses seeking to enhance their understanding of their customer base. The features include customer age, gender, purchase amount, preferred payment methods, frequency of purchases, and feedback ratings. Additionally, data on the type of items purchased, shopping frequency, preferred shopping seasons, and interactions with promotional offers is included. With a collection of 3900 records, this dataset serves as a foundation for businesses looking to apply data-driven insights for better decision-making and customer-centric strategies.

    Dataset Glossary (Column-wise)

    • Customer ID - Unique identifier for each customer
    • Age - Age of the customer
    • Gender - Gender of the customer (Male/Female)
    • Item Purchased - The item purchased by the customer
    • Category - Category of the item purchased
    • Purchase Amount (USD) - The amount of the purchase in USD
    • Location - Location where the purchase was made
    • Size - Size of the purchased item
    • Color - Color of the purchased item
    • Season - Season during which the purchase was made
    • Review Rating - Rating given by the customer for the purchased item
    • Subscription Status - Indicates if the customer has a subscription (Yes/No)
    • Shipping Type - Type of shipping chosen by the customer
    • Discount Applied - Indicates if a discount was applied to the purchase (Yes/No)
    • Promo Code Used - Indicates if a promo code was used for the purchase (Yes/No)
    • Previous Purchases - The total count of transactions concluded by the customer at the store, excluding the ongoing transaction
    • Payment Method - Customer's most preferred payment method
    • Frequency of Purchases - Frequency at which the customer makes purchases (e.g., Weekly, Fortnightly, Monthly)

    Structure of the Dataset

    https://i.imgur.com/6UEqejq.png" alt="">

    Acknowledgement

    This dataset is a synthetic creation generated using ChatGPT to simulate a realistic customer shopping experience. Its purpose is to provide a platform for beginners and data enthusiasts, allowing them to create, enjoy, practice, and learn from a dataset that mirrors real-world customer shopping behavior. The aim is to foster learning and experimentation in a simulated environment, encouraging a deeper understanding of data analysis and interpretation in the context of consumer preferences and retail scenarios.

    Cover Photo by: Freepik

    Thumbnail by: Clothing icons created by Flat Icons - Flaticon

  16. Papers by Subject

    • kaggle.com
    zip
    Updated Aug 4, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arman Mohammadi (2023). Papers by Subject [Dataset]. https://www.kaggle.com/datasets/arplusman/papers-by-subject
    Explore at:
    zip(18101984 bytes)Available download formats
    Dataset updated
    Aug 4, 2023
    Authors
    Arman Mohammadi
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Overview

    This extensive dataset comprises approximately 50,000 academic papers along with their corresponding metadata, designed to facilitate various natural language processing (NLP) tasks such as classification and retrieval. The dataset covers a diverse range of research domains, including but not limited to computer science, biology, social sciences, engineering, and more. The list of all categories can be found here. With its comprehensive collection of academic papers and enriched metadata, this dataset serves as a valuable resource for researchers and data enthusiasts interested in advancing NLP applications in the academic domain.

    Key Features

    Metadata: The dataset includes essential metadata for each paper, such as the publish date, title, summary/abstract, author(s), and category. The metadata is meticulously curated to ensure accuracy and consistency, enabling researchers to swiftly extract valuable insights and conduct exploratory data analysis.

    Vast Paper Collection: With nearly 50,000 academic papers, this dataset encompasses a broad spectrum of research topics and domains, making it suitable for a wide range of NLP tasks, including but not limited to document classification, topic modeling, and document retrieval.

    Application Flexibility: The dataset is meticulously preprocessed and annotated, making it adaptable for various NLP applications. Researchers and practitioners can use it for tasks like sentiment analysis, keyword extraction, and more.

    Potential Use Cases

    Document Classification: Leverage this dataset to build powerful classifiers capable of categorizing academic papers into relevant research domains or topics. This can aid in automated content organization and information retrieval.

    Document Retrieval: Develop efficient retrieval models that can quickly identify and retrieve relevant papers based on user queries or specific keywords. Such models can streamline the research process and assist researchers in finding relevant literature faster.

    Topic Modeling: Use this dataset to perform topic modeling and extract meaningful topics or themes present within the academic papers. This can provide valuable insights into the prevailing research trends and interests within different disciplines.

    Recommendation Systems: Employ the dataset to build personalized recommendation systems that suggest relevant papers to researchers based on their previous interests or research focus.

    Acknowledgment

    We would like to express our gratitude to the authors and publishers of the academic papers included in this dataset for their valuable contributions to the research community. By making this dataset publicly available, we hope to foster advancements in natural language processing and support data-driven research across diverse domains.

    Disclaimer

    As the curators of this dataset, we have made every effort to ensure the accuracy and quality of the data. However, we cannot guarantee the absolute correctness of the information or the suitability of the dataset for any specific purpose. Users are encouraged to exercise their judgment and discretion while utilizing the dataset for their research projects.

    We sincerely hope that this dataset proves to be a valuable resource for the NLP community and contributes to the development of innovative solutions in academic research and beyond. Happy analyzing and modeling!

  17. Dataset Kaggle Survey 2018-2021

    • kaggle.com
    zip
    Updated Nov 16, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Anatoly Burenok (2021). Dataset Kaggle Survey 2018-2021 [Dataset]. https://www.kaggle.com/renokan/dataset-kaggle-survey-2018-2021
    Explore at:
    zip(7119369 bytes)Available download formats
    Dataset updated
    Nov 16, 2021
    Authors
    Anatoly Burenok
    Description

    Context

    This dataset contains Kaggle ML & DS Survey data for 2018-2021. Cleaned and improved dataset.

    In the original data (2018, 2019, 2020, 2021) answers to the questions were contained in different columns, the questions and answer options could differ. Single and multi-column columns had the same header type: Q1, Q2 ...

    Improvements

    In this dataset, questions are grouped into SA / GA categories - single answers and group answers. Also cleared columns from spaces and different answer options.

    Modified large categories - grouped by value or categorized as Other. Filling the category only if there is an empty value, not by simple summation, but by replacement.

    Content

    This dataset contains the following: - kaggle_survey_2018-2021_header.csv: the tabular dataset containing the header data - kaggle_survey_2018-2021_data.csv: the tabular dataset containing the aggregated data from 2018 to 2021 - code_samples.pdf: pdf file containing code examples

    Source

    Link : https://www.kaggle.com/c/kaggle-survey-2021 Link : https://www.kaggle.com/c/kaggle-survey-2020 Link : https://www.kaggle.com/c/kaggle-survey-2019 Link : https://www.kaggle.com/kaggle/kaggle-survey-2018

  18. Employee Satisfaction Survey Data

    • kaggle.com
    zip
    Updated Dec 8, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zak (2023). Employee Satisfaction Survey Data [Dataset]. https://www.kaggle.com/datasets/redpen12/employees-satisfaction-analysis
    Explore at:
    zip(142853 bytes)Available download formats
    Dataset updated
    Dec 8, 2023
    Authors
    Zak
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Employee Satisfaction Survey dataset is a comprehensive collection of information regarding employees within a company. It includes essential details such as employee identification numbers, self-reported satisfaction levels, performance evaluations, project involvement, work hours, tenure with the company, work accidents, promotions received in the last 5 years, departmental affiliations, and salary levels. This dataset offers valuable insights into the factors influencing employee satisfaction and can be used to analyze and understand various aspects of the workplace environment.

  19. Cafe Sales - Dirty Data for Cleaning Training

    • kaggle.com
    zip
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ahmed Mohamed (2025). Cafe Sales - Dirty Data for Cleaning Training [Dataset]. https://www.kaggle.com/datasets/ahmedmohamed2003/cafe-sales-dirty-data-for-cleaning-training
    Explore at:
    zip(113510 bytes)Available download formats
    Dataset updated
    Jan 17, 2025
    Authors
    Ahmed Mohamed
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dirty Cafe Sales Dataset

    Overview

    The Dirty Cafe Sales dataset contains 10,000 rows of synthetic data representing sales transactions in a cafe. This dataset is intentionally "dirty," with missing values, inconsistent data, and errors introduced to provide a realistic scenario for data cleaning and exploratory data analysis (EDA). It can be used to practice cleaning techniques, data wrangling, and feature engineering.

    File Information

    • File Name: dirty_cafe_sales.csv
    • Number of Rows: 10,000
    • Number of Columns: 8

    Columns Description

    Column NameDescriptionExample Values
    Transaction IDA unique identifier for each transaction. Always present and unique.TXN_1234567
    ItemThe name of the item purchased. May contain missing or invalid values (e.g., "ERROR").Coffee, Sandwich
    QuantityThe quantity of the item purchased. May contain missing or invalid values.1, 3, UNKNOWN
    Price Per UnitThe price of a single unit of the item. May contain missing or invalid values.2.00, 4.00
    Total SpentThe total amount spent on the transaction. Calculated as Quantity * Price Per Unit.8.00, 12.00
    Payment MethodThe method of payment used. May contain missing or invalid values (e.g., None, "UNKNOWN").Cash, Credit Card
    LocationThe location where the transaction occurred. May contain missing or invalid values.In-store, Takeaway
    Transaction DateThe date of the transaction. May contain missing or incorrect values.2023-01-01

    Data Characteristics

    1. Missing Values:

      • Some columns (e.g., Item, Payment Method, Location) may contain missing values represented as None or empty cells.
    2. Invalid Values:

      • Some rows contain invalid entries like "ERROR" or "UNKNOWN" to simulate real-world data issues.
    3. Price Consistency:

      • Prices for menu items are consistent but may have missing or incorrect values introduced.

    Menu Items

    The dataset includes the following menu items with their respective price ranges:

    ItemPrice($)
    Coffee2
    Tea1.5
    Sandwich4
    Salad5
    Cake3
    Cookie1
    Smoothie4
    Juice3

    Use Cases

    This dataset is suitable for: - Practicing data cleaning techniques such as handling missing values, removing duplicates, and correcting invalid entries. - Exploring EDA techniques like visualizations and summary statistics. - Performing feature engineering for machine learning workflows.

    Cleaning Steps Suggestions

    To clean this dataset, consider the following steps: 1. Handle Missing Values: - Fill missing numeric values with the median or mean. - Replace missing categorical values with the mode or "Unknown."

    1. Handle Invalid Values:

      • Replace invalid entries like "ERROR" and "UNKNOWN" with NaN or appropriate values.
    2. Date Consistency:

      • Ensure all dates are in a consistent format.
      • Fill missing dates with plausible values based on nearby records.
    3. Feature Engineering:

      • Create new columns, such as Day of the Week or Transaction Month, for further analysis.

    License

    This dataset is released under the CC BY-SA 4.0 License. You are free to use, share, and adapt it, provided you give appropriate credit.

    Feedback

    If you have any questions or feedback, feel free to reach out through the dataset's discussion board on Kaggle.

  20. Daily Transactions Dataset

    • kaggle.com
    zip
    Updated May 14, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prasad Patil (2024). Daily Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/prasad22/daily-transactions-dataset
    Explore at:
    zip(34903 bytes)Available download formats
    Dataset updated
    May 14, 2024
    Authors
    Prasad Patil
    Description

    The "Daily Transactions" dataset contains information on dummy transactions made by an individual on a daily basis. The dataset includes data on the products that were purchased, the amount spent on each product, the date and time of each transaction, the payment mode of each transaction, and the source of each record (Expense/Income).

    This dataset can be used to analyze purchasing behavior and money management, forecasting expenses, and optimizing savings and budgeting strategies. The dataset is well-suited for data analysis and machine learning applications,it can be used to train predictive models and make data-driven decisions.

    Column Descriptors

    • Date: The date and time when the transaction was made
    • Mode: The payment mode used for the transaction
    • Category: Each record is divided into a set of categories of transactions
    • Subcategory: Categories are further broken down into Subcategories of transactions
    • Note: A brief description of the transaction made
    • Amount: The transactional amount
    • Income/Expense: The indicator of each transaction representing either expense or income
    • Currency: All transactions are recorded in official currency of India
Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Calvin Oko Mensah (2023). New 1000 Sales Records Data 2 [Dataset]. https://www.kaggle.com/datasets/calvinokomensah/new-1000-sales-records-data-2
Organization logo

New 1000 Sales Records Data 2

Explore at:
zip(49305 bytes)Available download formats
Dataset updated
Jan 12, 2023
Authors
Calvin Oko Mensah
Description

This is a dataset downloaded off excelbianalytics.com created off of random VBA logic. I recently performed an extensive exploratory data analysis on it and I included new columns to it, namely: Unit margin, Order year, Order month, Order weekday and Order_Ship_Days which I think can help with analysis on the data. I shared it because I thought it was a great dataset to practice analytical processes on for newbies like myself.

Search
Clear search
Close search
Google apps
Main menu