100+ datasets found
  1. E-commerce Business Transaction

    • kaggle.com
    Updated May 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gabriel Ramos (2022). E-commerce Business Transaction [Dataset]. https://www.kaggle.com/datasets/gabrielramos87/an-online-shop-business
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 14, 2022
    Dataset provided by
    Kaggle
    Authors
    Gabriel Ramos
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    E-commerce has become a new channel to support businesses development. Through e-commerce, businesses can get access and establish a wider market presence by providing cheaper and more efficient distribution channels for their products or services. E-commerce has also changed the way people shop and consume products and services. Many people are turning to their computers or smart devices to order goods, which can easily be delivered to their homes.

    Content

    This is a sales transaction data set of UK-based e-commerce (online retail) for one year. This London-based shop has been selling gifts and homewares for adults and children through the website since 2007. Their customers come from all over the world and usually make direct purchases for themselves. There are also small businesses that buy in bulk and sell to other customers through retail outlet channels.

    The data set contains 500K rows and 8 columns. The following is the description of each column. 1. TransactionNo (categorical): a six-digit unique number that defines each transaction. The letter “C” in the code indicates a cancellation. 2. Date (numeric): the date when each transaction was generated. 3. ProductNo (categorical): a five or six-digit unique character used to identify a specific product. 4. Product (categorical): product/item name. 5. Price (numeric): the price of each product per unit in pound sterling (£). 6. Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions. 7. CustomerNo (categorical): a five-digit unique number that defines each customer. 8. Country (categorical): name of the country where the customer resides.

    There is a small percentage of order cancellation in the data set. Most of these cancellations were due to out-of-stock conditions on some products. Under this situation, customers tend to cancel an order as they want all products delivered all at once.

    Inspiration

    Information is a main asset of businesses nowadays. The success of a business in a competitive environment depends on its ability to acquire, store, and utilize information. Data is one of the main sources of information. Therefore, data analysis is an important activity for acquiring new and useful information. Analyze this dataset and try to answer the following questions. 1. How was the sales trend over the months? 2. What are the most frequently purchased products? 3. How many products does the customer purchase in each transaction? 4. What are the most profitable segment customers? 5. Based on your findings, what strategy could you recommend to the business to gain more profit?

    Photo by CardMapr on Unsplash

  2. Financial Transactions Dataset for Fraud Detection

    • kaggle.com
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryan Kumar (2025). Financial Transactions Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/aryan208/financial-transactions-dataset-for-fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 2, 2025
    Dataset provided by
    Kaggle
    Authors
    Aryan Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 5 million synthetically generated financial transactions designed to simulate real-world behavior for fraud detection research and machine learning applications. Each transaction record includes fields such as:

    Transaction Details: ID, timestamp, sender/receiver accounts, amount, type (deposit, transfer, etc.)

    Behavioral Features: time since last transaction, spending deviation score, velocity score, geo-anomaly score

    Metadata: location, device used, payment channel, IP address, device hash

    Fraud Indicators: binary fraud label (is_fraud) and type of fraud (e.g., money laundering, account takeover)

    The dataset follows realistic fraud patterns and behavioral anomalies, making it suitable for:

    Binary and multiclass classification models

    Fraud detection systems

    Time-series anomaly detection

    Feature engineering and model explainability

  3. P

    Kaggle-Credit Card Fraud Dataset Dataset

    • paperswithcode.com
    Updated Sep 15, 2013
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    (2013). Kaggle-Credit Card Fraud Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/kaggle-credit-card-fraud-dataset
    Explore at:
    Dataset updated
    Sep 15, 2013
    Description

    The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

    It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependent cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

    Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.

  4. Online Retail Transaction Data

    • kaggle.com
    Updated Dec 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    The Devastator (2023). Online Retail Transaction Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-retail-transaction-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 21, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    The Devastator
    Description

    Online Retail Transaction Data

    UK Online Retail Sales and Customer Transaction Data

    By UCI [source]

    About this dataset

    Comprehensive Dataset on Online Retail Sales and Customer Data

    Welcome to this comprehensive dataset offering a wide array of information related to online retail sales. This data set provides an in-depth look at transactions, product details, and customer information documented by an online retail company based in the UK. The scope of the data spans vastly, from granular details about each product sold to extensive customer data sets from different countries.

    This transnational data set is a treasure trove of vital business insights as it meticulously catalogues all the transactions that happened during its span. It houses rich transactional records curated by a renowned non-store online retail company based in the UK known for selling unique all-occasion gifts. A considerable portion of its clientele includes wholesalers; ergo, this dataset can prove instrumental for companies looking for patterns or studying purchasing trends among such businesses.

    The available attributes within this dataset offer valuable pieces of information:

    • InvoiceNo: This attribute refers to invoice numbers that are six-digit integral numbers uniquely assigned to every transaction logged in this system. Transactions marked with 'c' at the beginning signify cancellations - adding yet another dimension for purchase pattern analysis.

    • StockCode: Stock Code corresponds with specific items as they're represented within the inventory system via 5-digit integral numbers; these allow easy identification and distinction between products.

    • Description: This refers to product names, giving users qualitative knowledge about what kind of items are being bought and sold frequently.

    • Quantity: These figures ascertain the volume of each product per transaction – important figures that can help understand buying trends better.

    • InvoiceDate: Invoice Dates detail when each transaction was generated down to precise timestamps – invaluable when conducting time-based trend analysis or segmentation studies.

    • UnitPrice: Unit prices represent how much each unit retails at — crucial for revenue calculations or cost-related analyses.

    Finally,

    • Country: This locational attribute shows where each customer hails from, adding geographical segmentation to your data investigation toolkit.

    This dataset was originally collated by Dr Daqing Chen, Director of the Public Analytics group based at the School of Engineering, London South Bank University. His research studies and business cases with this dataset have been published in various papers contributing to establishing a solid theoretical basis for direct, data and digital marketing strategies.

    Access to such records can ensure enriching explorations or formulating insightful hypotheses about consumer behavior patterns among wholesalers. Whether it's managing inventory or studying transactional trends over time or spotting cancellation patterns - this dataset is apt for multiple forms of retail analysis

    How to use the dataset

    1. Sales Analysis:

    Sales data forms the backbone of this dataset, and it allows users to delve into various aspects of sales performance. You can use the Quantity and UnitPrice fields to calculate metrics like revenue, and further combine it with InvoiceNo information to understand sales over individual transactions.

    2. Product Analysis:

    Each product in this dataset comes with its unique identifier (StockCode) and its name (Description). You could analyse which products are most popular based on Quantity sold or look at popularity per transaction by considering both Quantity and InvoiceNo.

    3. Customer Segmentation:

    If you associated specific business logic onto the transactions (such as calculating total amounts), then you could use standard machine learning methods or even RFM (Recency, Frequency, Monetary) segmentation techniques combining it with 'CustomerID' for your customer base to understand customer behavior better. Concatenating invoice numbers (which stand for separate transactions) per client will give insights about your clients as well.

    4. Geographical Analysis:

    The Country column enables analysts to study purchase patterns across different geographical locations.

    Practical applications

    Understand what products sell best where - It can help drive tailored marketing strategies. Anomalies detection – Identify unusual behaviors that might lead frau...

  5. A

    ‘Retail Transaction Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Retail Transaction Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-transaction-data-9d6d/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Retail Transaction Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/michalfr/retail-transaction-data on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    This dataset contains transactions and the products they contain, which were obtained by scanning receipts from retail establishments by numerous users. Products were categorized by our proprietary NLP model.

    Content

    Data was collected over a one-year period and contains product information from purchases made within that period, product category inferred from product name, information about organization, transaction to which products belong to and user that uploaded receipt.

    The total user count is 22. The total retail organization count is 179. The total transaction count is 805. The total product count is 7477.

    Acknowledgements

    @kserno

    Inspiration

    Product categorization, User Behaviour Analysis, Product Analysis, Product Price Comparison between Various Retail Stores, Prediction of Next Transaction

    --- Original source retains full ownership of the source dataset ---

  6. t

    Credit Card Fraud Detection

    • test.researchdata.tuwien.at
    • zenodo.org
    • +1more
    csv, json, pdf +2
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
    Explore at:
    csv, pdf, text/markdown, txt, jsonAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

    1. Dataset Description

    Research Domain
    This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

    Purpose
    The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

    Data Sources
    We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

    Method of Dataset Preparation

    1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

    2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

    3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

    4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

    5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

    2. Technical Details

    Dataset Structure

    • The raw data is a single CSV with columns:

      • actionnr (integer transaction ID)

      • merchant_id (string)

      • average_amount_transaction_day (float)

      • transaction_amount (float)

      • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

      • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

    Naming Conventions

    • All columns use lowercase snake_case.

    • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

    • Files in the code repo follow a clear structure:

      ├── data/         # local copies only; raw data lives in DBRepo 
      ├── notebooks/Task.ipynb 
      ├── models/rf_model_v1.joblib 
      ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
      ├── README.md 
      ├── requirements.txt 
      └── codemeta.json 
      

    Required Software

    • Python 3.9+

    • pandas, numpy (data handling)

    • scikit-learn (modeling, metrics)

    • matplotlib (visualizations)

    • dbrepo‐client.py (DBRepo API)

    • requests (TU WRD API)

    Additional Resources

    3. Further Details

    Data Limitations

    • Highly imbalanced: only ~0.17% of transactions are fraudulent.

    • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

    • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

    Licensing and Attribution

    • Raw data: CC-0 (per Kaggle terms)

    • Code & notebooks: MIT License

    • Model artifacts & outputs: CC-BY 4.0

    • DUWRD records include ORCID identifiers for the author.

    Recommended Uses

    • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

    • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

    • Extension: adding time‐series or deep‐learning models.

    Known Issues

    • Possible temporal leakage if date/time features not handled correctly.

    • Model performance may degrade on live data due to concept drift.

    • Binary flags may oversimplify nuanced transaction outcomes.

  7. Synthetic Bank Transactions

    • kaggle.com
    zip
    Updated Mar 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    John Harris (2021). Synthetic Bank Transactions [Dataset]. https://www.kaggle.com/radistaleks/synthetic-bank-transactions
    Explore at:
    zip(13820207 bytes)Available download formats
    Dataset updated
    Mar 20, 2021
    Authors
    John Harris
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Inspiration

    Many projects require datasets about bank transactions to test their systems. Unfortunately, it is hard to find a dataset that would have transaction product categorization which is important for many analytical projects.

    Content

    There you have 4 datasets. Clients - basic information about bank users. Categories - standart transaction categories which are being by many banks worldwide. Transactions - the core of our dataset, basic information about transactions like who is the second account of transaction, category, amount, etc. Subscriptions - information about subscriptions, in other words, transactions which are made automatically.

  8. Fraud Detection - Financial transactions

    • find.data.gov.scot
    csv
    Updated Mar 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deloitte Datathon 2018 (uSmart) (2018). Fraud Detection - Financial transactions [Dataset]. https://find.data.gov.scot/datasets/39167
    Explore at:
    csv(470.6714 MB)Available download formats
    Dataset updated
    Mar 14, 2018
    Dataset provided by
    Deloittehttps://deloitte.com/
    Description

    Synthetic transactional data with labels for fraud detection. For more information, see: https://www.kaggle.com/ntnu-testimon/paysim1/version/2

  9. Fraud Detection Dataset

    • kaggle.com
    Updated Mar 9, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    RANJIT MANDAL (2025). Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/ranjitmandal/fraud-detection-dataset-csv
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 9, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    RANJIT MANDAL
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset Title:

    🔍 Online Payment Fraud Detection Dataset | Real-World Transactions 💳

    Description:

    Fraudulent transactions are a growing challenge for fintech companies. This dataset captures 51,000+ transactions, each labeled as fraudulent or legitimate, based on real-world patterns.

    It includes transaction details, user behavior, payment methods, and device usage, making it ideal for: ✅ Fraud detection modeling (classification) ✅ Feature engineering & anomaly detection ✅ Exploratory data analysis (EDA) & pattern recognition

    Columns Overview:

    Transaction Details: Amount, type, time, and payment method 💰 User Behavior: Past fraud history, account age, recent activity 📊 Device & Location: Device used, transaction location 🌍

    Use Cases:

    🚀 Train machine learning models to detect fraud 📉 Analyze patterns of fraud in financial transactions 🔎 Optimize fraud prevention strategies

    Ready to fight fraud? Let’s dive in! 🔥

  10. A

    ‘Retail Transaction Data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 23, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘Retail Transaction Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-transaction-data-5fa4/latest
    Explore at:
    Dataset updated
    Feb 23, 2019
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Retail Transaction Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/regivm/retailtransactiondata on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    The data provides customer and date level transactions for few years. It can be used for demonstration of any analysis that require transaction information like RFM. The data also provide response information of customers to a promotion campaign.

    Highlight of this dataset is that you can evaluate the effectiveness RFM group by checking the one of the business metric; the response of customers.

    Content

    Transaction data provides customer_id, transaction date and Amount of purchase. Response data provides the response information of each of the customers. It is a binary variable indicating whether the customer responded to a campaign or not.

    Acknowledgements

    Extremely thankful numerous kernel and data publishers of Kaggle and Github. Learnt a lot from these communities.

    Inspiration

    More innovative approaches for handling RFM Analysis.

    --- Original source retains full ownership of the source dataset ---

  11. A

    ‘Store Transaction data’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 14, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Store Transaction data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-store-transaction-data-2e60/3a5df53c/?iid=007-635&v=presentation
    Explore at:
    Dataset updated
    Feb 14, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Store Transaction data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamprateek/store-transaction-data on 14 February 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.

    While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.

    A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.

    Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.

    Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.

    Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.

    Content

    You are provided with the dataset that contains store level data by brands and categories for select stores-

    Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.

    Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.

    Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.

    Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.

    Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.

    Acknowledgements

    Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.

    Know more: https://www.nielsen.com/us/en/

    Inspiration

    Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.

    --- Original source retains full ownership of the source dataset ---

  12. Fraud Detection Transactions Dataset

    • kaggle.com
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samay Ashar (2025). Fraud Detection Transactions Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10816530
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 21, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Samay Ashar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.

    📌 Key Features

    1. 21 features capturing various aspects of a financial transaction
    2. Realistic structure with numerical, categorical, and temporal data
    3. Binary fraud labels (0 = Not Fraud, 1 = Fraud)
    4. Designed for high accuracy with XGBoost and other ML models
    5. Useful for anomaly detection, risk analysis, and security research

    📌 Columns in the Dataset

    Column NameDescription
    Transaction_IDUnique identifier for each transaction
    User_IDUnique identifier for the user
    Transaction_AmountAmount of money involved in the transaction
    Transaction_TypeType of transaction (Online, In-Store, ATM, etc.)
    TimestampDate and time of the transaction
    Account_BalanceUser's current account balance before the transaction
    Device_TypeType of device used (Mobile, Desktop, etc.)
    LocationGeographical location of the transaction
    Merchant_CategoryType of merchant (Retail, Food, Travel, etc.)
    IP_Address_FlagWhether the IP address was flagged as suspicious (0 or 1)
    Previous_Fraudulent_ActivityNumber of past fraudulent activities by the user
    Daily_Transaction_CountNumber of transactions made by the user that day
    Avg_Transaction_Amount_7dUser's average transaction amount in the past 7 days
    Failed_Transaction_Count_7dCount of failed transactions in the past 7 days
    Card_TypeType of payment card used (Credit, Debit, Prepaid, etc.)
    Card_AgeAge of the card in months
    Transaction_DistanceDistance between the user's usual location and transaction location
    Authentication_MethodHow the user authenticated (PIN, Biometric, etc.)
    Risk_ScoreFraud risk score computed for the transaction
    Is_WeekendWhether the transaction occurred on a weekend (0 or 1)
    Fraud_LabelTarget variable (0 = Not Fraud, 1 = Fraud)

    📌 Potential Use Cases

    1. Fraud detection model training
    2. Anomaly detection in financial transactions
    3. Risk scoring systems for banks and fintech companies
    4. Feature engineering and model explainability research
  13. o

    Retail Transaction Dataset

    • opendatabay.com
    .undefined
    Updated Jun 24, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Datasimple (2025). Retail Transaction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/ce827d4f-444a-4ffc-a50e-a769e596a2d3
    Explore at:
    .undefinedAvailable download formats
    Dataset updated
    Jun 24, 2025
    Dataset authored and provided by
    Datasimple
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Area covered
    Food & Beverage Consumption
    Description

    This dataset contains 30,000 unique retail transactions, each representing a customer's shopping basket in a simulated grocery store environment. The data was generated with realistic product combinations and purchase patterns, suitable for association rule mining, recommendation systems and market basket analysis.

    Each row corresponds to a single transaction, listing:

    A unique transaction ID A customer ID The full list of products bought in that transaction The time of the transaction The dataset includes products across various categories such as beverages, snacks, dairy, household items, fruits, vegetables and frozen foods.

    This data is entirely synthetic and does not contain any real user information.

    Original Data Source: Retail Transaction Dataset

  14. UPI Transactions 23-24

    • kaggle.com
    Updated Jun 19, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyanshu Gautam (2024). UPI Transactions 23-24 [Dataset]. https://www.kaggle.com/datasets/priyanshugautam1214/upi-transactions-23-24
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 19, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Priyanshu Gautam
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Priyanshu Gautam

    Released under Apache 2.0

    Contents

  15. Credit Card Fraud Detection

    • zenodo.org
    csv
    Updated Dec 5, 2022
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Luqi Liu; Luqi Liu (2022). Credit Card Fraud Detection [Dataset]. http://doi.org/10.5281/zenodo.7395559
    Explore at:
    csvAvailable download formats
    Dataset updated
    Dec 5, 2022
    Dataset provided by
    Zenodohttp://zenodo.org/
    Authors
    Luqi Liu; Luqi Liu
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    The dataset from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

    The dataset contains transactions made by credit cards in September 2013 by European cardholders.
    This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

  16. A

    ‘Fraud detection bank dataset 20K records binary ’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Jan 28, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Fraud detection bank dataset 20K records binary ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fraud-detection-bank-dataset-20k-records-binary-6287/e0c752fd/?iid=019-348&v=presentation
    Explore at:
    Dataset updated
    Jan 28, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘Fraud detection bank dataset 20K records binary ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/volodymyrgavrysh/fraud-detection-bank-dataset-20k-records-binary on 28 January 2022.

    --- Dataset description provided by original source is as follows ---

    Context

    Banks are often exposed to fraud transactions and constantly improve systems to track them.

    Content

    Bank dataset that contains 20k+ transactions with 112 features (numerical)

    --- Original source retains full ownership of the source dataset ---

  17. A

    ‘OpenSea Daily Ethereum Transactions’ analyzed by Analyst-2

    • analyst-2.ai
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘OpenSea Daily Ethereum Transactions’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-opensea-daily-ethereum-transactions-f433/latest
    Explore at:
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘OpenSea Daily Ethereum Transactions’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankanhore545/opensea-daily-transactions on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    This all-time data represents the raw on-chain activity of the tracked smart contracts.

    I am thankful that we could collect the data from the dapprader platform: https://dappradar.com/ethereum/marketplaces/opensea These are for 5 ETH Smart Contracts as mentioned in the above site.

    --- Original source retains full ownership of the source dataset ---

  18. Sales Dataset

    • kaggle.com
    Updated Mar 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shantanu Garg (2025). Sales Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11128791
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 22, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Shantanu Garg
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains detailed sales transactions, including order details, revenue, profit, and customer information. It can be used for sales analysis, trend forecasting, and business intelligence insights. The data covers multiple product categories and is structured to facilitate easy analysis of sales performance across different locations and time periods.

  19. A

    ‘OpenSea Daily Polygon Transactions’ analyzed by Analyst-2

    • analyst-2.ai
    Updated Feb 13, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘OpenSea Daily Polygon Transactions’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-opensea-daily-polygon-transactions-a8d2/latest
    Explore at:
    Dataset updated
    Feb 13, 2022
    Dataset authored and provided by
    Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Analysis of ‘OpenSea Daily Polygon Transactions’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankanhore545/opensea-daily-polygon-transactions on 13 February 2022.

    --- Dataset description provided by original source is as follows ---

    This all-time data represents the raw on-chain activity of the tracked smart contracts.

    I am thankful that we could collect the data from the dapprader platform: https://dappradar.com/polygon/marketplaces/opensea These are for 1 Polygon Smart Contract as mentioned in the above site.

    --- Original source retains full ownership of the source dataset ---

  20. Transaction Data

    • kaggle.com
    Updated Jan 28, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jashandeep Kaur (2024). Transaction Data [Dataset]. https://www.kaggle.com/datasets/jshndeep/transaction-data/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 28, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Jashandeep Kaur
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Jashandeep Kaur

    Released under Apache 2.0

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Gabriel Ramos (2022). E-commerce Business Transaction [Dataset]. https://www.kaggle.com/datasets/gabrielramos87/an-online-shop-business
Organization logo

E-commerce Business Transaction

Sales transaction of a UK-based e-commerce (online retail) for one year

Explore at:
145 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2022
Dataset provided by
Kaggle
Authors
Gabriel Ramos
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

E-commerce has become a new channel to support businesses development. Through e-commerce, businesses can get access and establish a wider market presence by providing cheaper and more efficient distribution channels for their products or services. E-commerce has also changed the way people shop and consume products and services. Many people are turning to their computers or smart devices to order goods, which can easily be delivered to their homes.

Content

This is a sales transaction data set of UK-based e-commerce (online retail) for one year. This London-based shop has been selling gifts and homewares for adults and children through the website since 2007. Their customers come from all over the world and usually make direct purchases for themselves. There are also small businesses that buy in bulk and sell to other customers through retail outlet channels.

The data set contains 500K rows and 8 columns. The following is the description of each column. 1. TransactionNo (categorical): a six-digit unique number that defines each transaction. The letter “C” in the code indicates a cancellation. 2. Date (numeric): the date when each transaction was generated. 3. ProductNo (categorical): a five or six-digit unique character used to identify a specific product. 4. Product (categorical): product/item name. 5. Price (numeric): the price of each product per unit in pound sterling (£). 6. Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions. 7. CustomerNo (categorical): a five-digit unique number that defines each customer. 8. Country (categorical): name of the country where the customer resides.

There is a small percentage of order cancellation in the data set. Most of these cancellations were due to out-of-stock conditions on some products. Under this situation, customers tend to cancel an order as they want all products delivered all at once.

Inspiration

Information is a main asset of businesses nowadays. The success of a business in a competitive environment depends on its ability to acquire, store, and utilize information. Data is one of the main sources of information. Therefore, data analysis is an important activity for acquiring new and useful information. Analyze this dataset and try to answer the following questions. 1. How was the sales trend over the months? 2. What are the most frequently purchased products? 3. How many products does the customer purchase in each transaction? 4. What are the most profitable segment customers? 5. Based on your findings, what strategy could you recommend to the business to gain more profit?

Photo by CardMapr on Unsplash

Search
Clear search
Close search
Google apps
Main menu