100+ datasets found

E-commerce Business Transaction
kaggle.com
Updated May 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gabriel Ramos (2022). E-commerce Business Transaction [Dataset]. https://www.kaggle.com/datasets/gabrielramos87/an-online-shop-business
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 14, 2022
Dataset provided by
Kaggle
Authors
Gabriel Ramos
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

E-commerce has become a new channel to support businesses development. Through e-commerce, businesses can get access and establish a wider market presence by providing cheaper and more efficient distribution channels for their products or services. E-commerce has also changed the way people shop and consume products and services. Many people are turning to their computers or smart devices to order goods, which can easily be delivered to their homes.

Content

This is a sales transaction data set of UK-based e-commerce (online retail) for one year. This London-based shop has been selling gifts and homewares for adults and children through the website since 2007. Their customers come from all over the world and usually make direct purchases for themselves. There are also small businesses that buy in bulk and sell to other customers through retail outlet channels.

The data set contains 500K rows and 8 columns. The following is the description of each column. 1. TransactionNo (categorical): a six-digit unique number that defines each transaction. The letter “C” in the code indicates a cancellation. 2. Date (numeric): the date when each transaction was generated. 3. ProductNo (categorical): a five or six-digit unique character used to identify a specific product. 4. Product (categorical): product/item name. 5. Price (numeric): the price of each product per unit in pound sterling (£). 6. Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions. 7. CustomerNo (categorical): a five-digit unique number that defines each customer. 8. Country (categorical): name of the country where the customer resides.

There is a small percentage of order cancellation in the data set. Most of these cancellations were due to out-of-stock conditions on some products. Under this situation, customers tend to cancel an order as they want all products delivered all at once.

Inspiration

Information is a main asset of businesses nowadays. The success of a business in a competitive environment depends on its ability to acquire, store, and utilize information. Data is one of the main sources of information. Therefore, data analysis is an important activity for acquiring new and useful information. Analyze this dataset and try to answer the following questions. 1. How was the sales trend over the months? 2. What are the most frequently purchased products? 3. How many products does the customer purchase in each transaction? 4. What are the most profitable segment customers? 5. Based on your findings, what strategy could you recommend to the business to gain more profit?

Photo by CardMapr on Unsplash
Financial Transactions Dataset for Fraud Detection
kaggle.com
Updated May 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Aryan Kumar (2025). Financial Transactions Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/aryan208/financial-transactions-dataset-for-fraud-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 2, 2025
Dataset provided by
Kaggle
Authors
Aryan Kumar
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains 5 million synthetically generated financial transactions designed to simulate real-world behavior for fraud detection research and machine learning applications. Each transaction record includes fields such as:

Transaction Details: ID, timestamp, sender/receiver accounts, amount, type (deposit, transfer, etc.)

Behavioral Features: time since last transaction, spending deviation score, velocity score, geo-anomaly score

Metadata: location, device used, payment channel, IP address, device hash

Fraud Indicators: binary fraud label (is_fraud) and type of fraud (e.g., money laundering, account takeover)

The dataset follows realistic fraud patterns and behavioral anomalies, making it suitable for:

Binary and multiclass classification models

Fraud detection systems

Time-series anomaly detection

Feature engineering and model explainability
P
Kaggle-Credit Card Fraud Dataset Dataset
paperswithcode.com
Updated Sep 15, 2013
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2013). Kaggle-Credit Card Fraud Dataset Dataset [Dataset]. https://paperswithcode.com/dataset/kaggle-credit-card-fraud-dataset
Explore at:
Dataset updated
Sep 15, 2013
Description
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data. Features V1, V2, … V28 are the principal components obtained with PCA, the only features which have not been transformed with PCA are 'Time' and 'Amount'. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount, this feature can be used for example-dependent cost-sensitive learning. Feature 'Class' is the response variable and it takes value 1 in case of fraud and 0 otherwise.

Given the class imbalance ratio, we recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.
Online Retail Transaction Data
kaggle.com
Updated Dec 21, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
The Devastator (2023). Online Retail Transaction Data [Dataset]. https://www.kaggle.com/datasets/thedevastator/online-retail-transaction-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 21, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
The Devastator
Description
Online Retail Transaction Data

UK Online Retail Sales and Customer Transaction Data

By UCI [source]

About this dataset

Comprehensive Dataset on Online Retail Sales and Customer Data

Welcome to this comprehensive dataset offering a wide array of information related to online retail sales. This data set provides an in-depth look at transactions, product details, and customer information documented by an online retail company based in the UK. The scope of the data spans vastly, from granular details about each product sold to extensive customer data sets from different countries.

This transnational data set is a treasure trove of vital business insights as it meticulously catalogues all the transactions that happened during its span. It houses rich transactional records curated by a renowned non-store online retail company based in the UK known for selling unique all-occasion gifts. A considerable portion of its clientele includes wholesalers; ergo, this dataset can prove instrumental for companies looking for patterns or studying purchasing trends among such businesses.

The available attributes within this dataset offer valuable pieces of information:

InvoiceNo: This attribute refers to invoice numbers that are six-digit integral numbers uniquely assigned to every transaction logged in this system. Transactions marked with 'c' at the beginning signify cancellations - adding yet another dimension for purchase pattern analysis.

StockCode: Stock Code corresponds with specific items as they're represented within the inventory system via 5-digit integral numbers; these allow easy identification and distinction between products.

Description: This refers to product names, giving users qualitative knowledge about what kind of items are being bought and sold frequently.

Quantity: These figures ascertain the volume of each product per transaction – important figures that can help understand buying trends better.

InvoiceDate: Invoice Dates detail when each transaction was generated down to precise timestamps – invaluable when conducting time-based trend analysis or segmentation studies.

UnitPrice: Unit prices represent how much each unit retails at — crucial for revenue calculations or cost-related analyses.

Finally,

Country: This locational attribute shows where each customer hails from, adding geographical segmentation to your data investigation toolkit.

This dataset was originally collated by Dr Daqing Chen, Director of the Public Analytics group based at the School of Engineering, London South Bank University. His research studies and business cases with this dataset have been published in various papers contributing to establishing a solid theoretical basis for direct, data and digital marketing strategies.

Access to such records can ensure enriching explorations or formulating insightful hypotheses about consumer behavior patterns among wholesalers. Whether it's managing inventory or studying transactional trends over time or spotting cancellation patterns - this dataset is apt for multiple forms of retail analysis

How to use the dataset

1. Sales Analysis:

Sales data forms the backbone of this dataset, and it allows users to delve into various aspects of sales performance. You can use the Quantity and UnitPrice fields to calculate metrics like revenue, and further combine it with InvoiceNo information to understand sales over individual transactions.

2. Product Analysis:

Each product in this dataset comes with its unique identifier (StockCode) and its name (Description). You could analyse which products are most popular based on Quantity sold or look at popularity per transaction by considering both Quantity and InvoiceNo.

3. Customer Segmentation:

If you associated specific business logic onto the transactions (such as calculating total amounts), then you could use standard machine learning methods or even RFM (Recency, Frequency, Monetary) segmentation techniques combining it with 'CustomerID' for your customer base to understand customer behavior better. Concatenating invoice numbers (which stand for separate transactions) per client will give insights about your clients as well.

4. Geographical Analysis:

The Country column enables analysts to study purchase patterns across different geographical locations.

Practical applications

Understand what products sell best where - It can help drive tailored marketing strategies. Anomalies detection – Identify unusual behaviors that might lead frau...
A
‘Retail Transaction Data’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Retail Transaction Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-transaction-data-9d6d/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Retail Transaction Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/michalfr/retail-transaction-data on 13 February 2022.

--- Dataset description provided by original source is as follows ---

Context

This dataset contains transactions and the products they contain, which were obtained by scanning receipts from retail establishments by numerous users. Products were categorized by our proprietary NLP model.

Content

Data was collected over a one-year period and contains product information from purchases made within that period, product category inferred from product name, information about organization, transaction to which products belong to and user that uploaded receipt.

The total user count is 22. The total retail organization count is 179. The total transaction count is 805. The total product count is 7477.

Acknowledgements

@kserno

Inspiration

Product categorization, User Behaviour Analysis, Product Analysis, Product Price Comparison between Various Retail Stores, Prediction of Next Transaction

--- Original source retains full ownership of the source dataset ---
t
Credit Card Fraud Detection
test.researchdata.tuwien.at
zenodo.org
+1more
csv, json, pdf +2
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
Explore at:
csv, pdf, text/markdown, txt, jsonAvailable download formats
Unique identifier
https://doi.org/10.82556/yvxj-9t22
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

1. Dataset Description

Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

Method of Dataset Preparation

Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

2. Technical Details

Dataset Structure

The raw data is a single CSV with columns:

actionnr (integer transaction ID)

merchant_id (string)

average_amount_transaction_day (float)

transaction_amount (float)

is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

Naming Conventions

All columns use lowercase snake_case.

Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

Files in the code repo follow a clear structure:

├── data/ # local copies only; raw data lives in DBRepo ├── notebooks/Task.ipynb ├── models/rf_model_v1.joblib ├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv ├── README.md ├── requirements.txt └── codemeta.json

Required Software

Python 3.9+

pandas, numpy (data handling)

scikit-learn (modeling, metrics)

matplotlib (visualizations)

dbrepo‐client.py (DBRepo API)

requests (TU WRD API)

Additional Resources

Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud

Scikit-learn docs: https://scikit-learn.org/stable

DBRepo API guide: via the starter notebook’s dbrepo_client.py template

TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs

3. Further Details

Data Limitations

Highly imbalanced: only ~0.17% of transactions are fraudulent.

Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

Licensing and Attribution

Raw data: CC-0 (per Kaggle terms)

Code & notebooks: MIT License

Model artifacts & outputs: CC-BY 4.0

DUWRD records include ORCID identifiers for the author.

Recommended Uses

Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

Extension: adding time‐series or deep‐learning models.

Known Issues

Possible temporal leakage if date/time features not handled correctly.

Model performance may degrade on live data due to concept drift.

Binary flags may oversimplify nuanced transaction outcomes.
Synthetic Bank Transactions
kaggle.com
zip
Updated Mar 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
John Harris (2021). Synthetic Bank Transactions [Dataset]. https://www.kaggle.com/radistaleks/synthetic-bank-transactions
Explore at:
zip(13820207 bytes)Available download formats
Dataset updated
Mar 20, 2021
Authors
John Harris
License
Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
License information was derived automatically
Description
Inspiration

Many projects require datasets about bank transactions to test their systems. Unfortunately, it is hard to find a dataset that would have transaction product categorization which is important for many analytical projects.

Content

There you have 4 datasets. Clients - basic information about bank users. Categories - standart transaction categories which are being by many banks worldwide. Transactions - the core of our dataset, basic information about transactions like who is the second account of transaction, category, amount, etc. Subscriptions - information about subscriptions, in other words, transactions which are made automatically.
Fraud Detection - Financial transactions
find.data.gov.scot
csv
Updated Mar 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deloitte Datathon 2018 (uSmart) (2018). Fraud Detection - Financial transactions [Dataset]. https://find.data.gov.scot/datasets/39167
Explore at:
csv(470.6714 MB)Available download formats
Dataset updated
Mar 14, 2018
Dataset provided by
Deloittehttps://deloitte.com/
Description
Synthetic transactional data with labels for fraud detection. For more information, see: https://www.kaggle.com/ntnu-testimon/paysim1/version/2
Fraud Detection Dataset
kaggle.com
Updated Mar 9, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
RANJIT MANDAL (2025). Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/ranjitmandal/fraud-detection-dataset-csv
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 9, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
RANJIT MANDAL
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset Title:

🔍 Online Payment Fraud Detection Dataset | Real-World Transactions 💳

Description:

Fraudulent transactions are a growing challenge for fintech companies. This dataset captures 51,000+ transactions, each labeled as fraudulent or legitimate, based on real-world patterns.

It includes transaction details, user behavior, payment methods, and device usage, making it ideal for: ✅ Fraud detection modeling (classification) ✅ Feature engineering & anomaly detection ✅ Exploratory data analysis (EDA) & pattern recognition

Columns Overview:

Transaction Details: Amount, type, time, and payment method 💰 User Behavior: Past fraud history, account age, recent activity 📊 Device & Location: Device used, transaction location 🌍

Use Cases:

🚀 Train machine learning models to detect fraud 📉 Analyze patterns of fraud in financial transactions 🔎 Optimize fraud prevention strategies

Ready to fight fraud? Let’s dive in! 🔥
A
‘Retail Transaction Data’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 23, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2019). ‘Retail Transaction Data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-retail-transaction-data-5fa4/latest
Explore at:
Dataset updated
Feb 23, 2019
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Retail Transaction Data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/regivm/retailtransactiondata on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

The data provides customer and date level transactions for few years. It can be used for demonstration of any analysis that require transaction information like RFM. The data also provide response information of customers to a promotion campaign.

Highlight of this dataset is that you can evaluate the effectiveness RFM group by checking the one of the business metric; the response of customers.

Content

Transaction data provides customer_id, transaction date and Amount of purchase. Response data provides the response information of each of the customers. It is a binary variable indicating whether the customer responded to a campaign or not.

Acknowledgements

Extremely thankful numerous kernel and data publishers of Kaggle and Github. Learnt a lot from these communities.

Inspiration

More innovative approaches for handling RFM Analysis.

--- Original source retains full ownership of the source dataset ---
A
‘Store Transaction data’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 14, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Store Transaction data’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-store-transaction-data-2e60/3a5df53c/?iid=007-635&v=presentation
Explore at:
Dataset updated
Feb 14, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Store Transaction data’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/iamprateek/store-transaction-data on 14 February 2022.

--- Dataset description provided by original source is as follows ---

Context

Nielsen receives transaction level scanning data (POS Data) from its partner stores on a regular basis. Stores sharing POS data include bigger format store types such as supermarkets, hypermarkets as well as smaller traditional trade grocery stores (Kirana stores), medical stores etc. using a POS machine.

While in a bigger format store, all items for all transactions are scanned using a POS machine, smaller and more localized shops do not have a 100% compliance rate in terms of scanning and inputting information into the POS machine for all transactions.

A transaction involving a single packet of chips or a single piece of candy may not be scanned and recorded to spare customer the inconvenience or during rush hours when the store is crowded with customers.

Thus, the data received from such stores is often incomplete and lacks complete information of all transactions completed within a day.

Additionally, apart from incomplete transaction data in a day, it is observed that certain stores do not share data for all active days. Stores share data ranging from 2 to 28 days in a month. While it is possible to impute/extrapolate data for 2 days of a month using 28 days of actual historical data, the vice versa is not recommended.

Nielsen encourages you to create a model which can help impute/extrapolate data to fill in the missing data gaps in the store level POS data currently received.

Content

You are provided with the dataset that contains store level data by brands and categories for select stores-

Hackathon_ Ideal_Data - The file contains brand level data for 10 stores for the last 3 months. This can be referred to as the ideal data.

Hackathon_Working_Data - This contains data for selected stores which are missing and/or incomplete.

Hackathon_Mapping_File - This file is provided to help understand the column names in the data set.

Hackathon_Validation_Data - This file contains the data stores and product groups for which you have to predict the Total_VALUE.

Sample Submission - This file represents what needs to be uploaded as output by candidate in the same format. The sample data is provided in the file to help understand the columns and values required.

Acknowledgements

Nielsen Holdings plc (NYSE: NLSN) is a global measurement and data analytics company that provides the most complete and trusted view available of consumers and markets worldwide. Nielsen is divided into two business units. Nielsen Global Media, the arbiter of truth for media markets, provides media and advertising industries with unbiased and reliable metrics that create a shared understanding of the industry required for markets to function. Nielsen Global Connect provides consumer packaged goods manufacturers and retailers with accurate, actionable information and insights and a complete picture of the complex and changing marketplace that companies need to innovate and grow. Our approach marries proprietary Nielsen data with other data sources to help clients around the world understand what’s happening now, what’s happening next, and how to best act on this knowledge. An S&P 500 company, Nielsen has operations in over 100 countries, covering more than 90% of the world’s population.

Know more: https://www.nielsen.com/us/en/

Inspiration

Build an imputation and/or extrapolation model to fill the missing data gaps for select stores by analyzing the data and determine which factors/variables/features can help best predict the store sales.

--- Original source retains full ownership of the source dataset ---

Fraud Detection Transactions Dataset

kaggle.com

Updated Feb 21, 2025

Facebook

Twitter

Click to copy link

Link copied

Cite

Samay Ashar (2025). Fraud Detection Transactions Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/10816530

Explore at:

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Unique identifier

https://doi.org/10.34740/kaggle/dsv/10816530

Dataset updated

Feb 21, 2025

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Samay Ashar

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.

📌 Key Features

21 features capturing various aspects of a financial transaction
Realistic structure with numerical, categorical, and temporal data
Binary fraud labels (0 = Not Fraud, 1 = Fraud)
Designed for high accuracy with XGBoost and other ML models
Useful for anomaly detection, risk analysis, and security research

📌 Columns in the Dataset

Column Name	Description
Transaction_ID	Unique identifier for each transaction
User_ID	Unique identifier for the user
Transaction_Amount	Amount of money involved in the transaction
Transaction_Type	Type of transaction (`Online`, `In-Store`, `ATM`, etc.)
Timestamp	Date and time of the transaction
Account_Balance	User's current account balance before the transaction
Device_Type	Type of device used (`Mobile`, `Desktop`, etc.)
Location	Geographical location of the transaction
Merchant_Category	Type of merchant (`Retail`, `Food`, `Travel`, etc.)
IP_Address_Flag	Whether the IP address was flagged as suspicious (`0` or `1`)
Previous_Fraudulent_Activity	Number of past fraudulent activities by the user
Daily_Transaction_Count	Number of transactions made by the user that day
Avg_Transaction_Amount_7d	User's average transaction amount in the past 7 days
Failed_Transaction_Count_7d	Count of failed transactions in the past 7 days
Card_Type	Type of payment card used (`Credit`, `Debit`, `Prepaid`, etc.)
Card_Age	Age of the card in months
Transaction_Distance	Distance between the user's usual location and transaction location
Authentication_Method	How the user authenticated (`PIN`, `Biometric`, etc.)
Risk_Score	Fraud risk score computed for the transaction
Is_Weekend	Whether the transaction occurred on a weekend (`0` or `1`)
Fraud_Label	Target variable (`0 = Not Fraud`, `1 = Fraud`)

📌 Potential Use Cases

Fraud detection model training
Anomaly detection in financial transactions
Risk scoring systems for banks and fintech companies
Feature engineering and model explainability research

o
Retail Transaction Dataset
opendatabay.com
.undefined
Updated Jun 24, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Datasimple (2025). Retail Transaction Dataset [Dataset]. https://www.opendatabay.com/data/synthetic/ce827d4f-444a-4ffc-a50e-a769e596a2d3
Explore at:
.undefinedAvailable download formats
Dataset updated
Jun 24, 2025
Dataset authored and provided by
Datasimple
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Area covered
Food & Beverage Consumption
Description
This dataset contains 30,000 unique retail transactions, each representing a customer's shopping basket in a simulated grocery store environment. The data was generated with realistic product combinations and purchase patterns, suitable for association rule mining, recommendation systems and market basket analysis.

Each row corresponds to a single transaction, listing:

A unique transaction ID A customer ID The full list of products bought in that transaction The time of the transaction The dataset includes products across various categories such as beverages, snacks, dairy, household items, fruits, vegetables and frozen foods.

This data is entirely synthetic and does not contain any real user information.

Original Data Source: Retail Transaction Dataset
UPI Transactions 23-24
kaggle.com
Updated Jun 19, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Priyanshu Gautam (2024). UPI Transactions 23-24 [Dataset]. https://www.kaggle.com/datasets/priyanshugautam1214/upi-transactions-23-24
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 19, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Priyanshu Gautam
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Priyanshu Gautam

Released under Apache 2.0

Contents
Credit Card Fraud Detection
zenodo.org
csv
Updated Dec 5, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Luqi Liu; Luqi Liu (2022). Credit Card Fraud Detection [Dataset]. http://doi.org/10.5281/zenodo.7395559
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.5281/zenodo.7395559
Dataset updated
Dec 5, 2022
Dataset provided by
Zenodohttp://zenodo.org/
Authors
Luqi Liu; Luqi Liu
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
The dataset from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud

The dataset contains transactions made by credit cards in September 2013 by European cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
A
‘Fraud detection bank dataset 20K records binary ’ analyzed by Analyst-2
analyst-2.ai
Updated Jan 28, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘Fraud detection bank dataset 20K records binary ’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-fraud-detection-bank-dataset-20k-records-binary-6287/e0c752fd/?iid=019-348&v=presentation
Explore at:
Dataset updated
Jan 28, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘Fraud detection bank dataset 20K records binary ’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/volodymyrgavrysh/fraud-detection-bank-dataset-20k-records-binary on 28 January 2022.

--- Dataset description provided by original source is as follows ---

Context

Banks are often exposed to fraud transactions and constantly improve systems to track them.

Content

Bank dataset that contains 20k+ transactions with 112 features (numerical)

--- Original source retains full ownership of the source dataset ---
A
‘OpenSea Daily Ethereum Transactions’ analyzed by Analyst-2
analyst-2.ai
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com), ‘OpenSea Daily Ethereum Transactions’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-opensea-daily-ethereum-transactions-f433/latest
Explore at:
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘OpenSea Daily Ethereum Transactions’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankanhore545/opensea-daily-transactions on 13 February 2022.

--- Dataset description provided by original source is as follows ---

This all-time data represents the raw on-chain activity of the tracked smart contracts.

I am thankful that we could collect the data from the dapprader platform: https://dappradar.com/ethereum/marketplaces/opensea These are for 5 ETH Smart Contracts as mentioned in the above site.

--- Original source retains full ownership of the source dataset ---
Sales Dataset
kaggle.com
Updated Mar 22, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shantanu Garg (2025). Sales Dataset [Dataset]. http://doi.org/10.34740/kaggle/dsv/11128791
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Unique identifier
https://doi.org/10.34740/kaggle/dsv/11128791
Dataset updated
Mar 22, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Shantanu Garg
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
This dataset contains detailed sales transactions, including order details, revenue, profit, and customer information. It can be used for sales analysis, trend forecasting, and business intelligence insights. The data covers multiple product categories and is structured to facilitate easy analysis of sales performance across different locations and time periods.
A
‘OpenSea Daily Polygon Transactions’ analyzed by Analyst-2
analyst-2.ai
Updated Feb 13, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com) (2022). ‘OpenSea Daily Polygon Transactions’ analyzed by Analyst-2 [Dataset]. https://analyst-2.ai/analysis/kaggle-opensea-daily-polygon-transactions-a8d2/latest
Explore at:
Dataset updated
Feb 13, 2022
Dataset authored and provided by
Analyst-2 (analyst-2.ai) / Inspirient GmbH (inspirient.com)
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Analysis of ‘OpenSea Daily Polygon Transactions’ provided by Analyst-2 (analyst-2.ai), based on source dataset retrieved from https://www.kaggle.com/ankanhore545/opensea-daily-polygon-transactions on 13 February 2022.

--- Dataset description provided by original source is as follows ---

This all-time data represents the raw on-chain activity of the tracked smart contracts.

I am thankful that we could collect the data from the dapprader platform: https://dappradar.com/polygon/marketplaces/opensea These are for 1 Polygon Smart Contract as mentioned in the above site.

--- Original source retains full ownership of the source dataset ---
Transaction Data
kaggle.com
Updated Jan 28, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Jashandeep Kaur (2024). Transaction Data [Dataset]. https://www.kaggle.com/datasets/jshndeep/transaction-data/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 28, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Jashandeep Kaur
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Dataset

This dataset was created by Jashandeep Kaur

Released under Apache 2.0

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Gabriel Ramos (2022). E-commerce Business Transaction [Dataset]. https://www.kaggle.com/datasets/gabrielramos87/an-online-shop-business

E-commerce Business Transaction

Sales transaction of a UK-based e-commerce (online retail) for one year

Explore at:

145 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

May 14, 2022

Dataset provided by

Kaggle

Authors

Gabriel Ramos

License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Context

E-commerce has become a new channel to support businesses development. Through e-commerce, businesses can get access and establish a wider market presence by providing cheaper and more efficient distribution channels for their products or services. E-commerce has also changed the way people shop and consume products and services. Many people are turning to their computers or smart devices to order goods, which can easily be delivered to their homes.

Content

This is a sales transaction data set of UK-based e-commerce (online retail) for one year. This London-based shop has been selling gifts and homewares for adults and children through the website since 2007. Their customers come from all over the world and usually make direct purchases for themselves. There are also small businesses that buy in bulk and sell to other customers through retail outlet channels.

The data set contains 500K rows and 8 columns. The following is the description of each column. 1. TransactionNo (categorical): a six-digit unique number that defines each transaction. The letter “C” in the code indicates a cancellation. 2. Date (numeric): the date when each transaction was generated. 3. ProductNo (categorical): a five or six-digit unique character used to identify a specific product. 4. Product (categorical): product/item name. 5. Price (numeric): the price of each product per unit in pound sterling (£). 6. Quantity (numeric): the quantity of each product per transaction. Negative values related to cancelled transactions. 7. CustomerNo (categorical): a five-digit unique number that defines each customer. 8. Country (categorical): name of the country where the customer resides.

There is a small percentage of order cancellation in the data set. Most of these cancellations were due to out-of-stock conditions on some products. Under this situation, customers tend to cancel an order as they want all products delivered all at once.

Inspiration

Information is a main asset of businesses nowadays. The success of a business in a competitive environment depends on its ability to acquire, store, and utilize information. Data is one of the main sources of information. Therefore, data analysis is an important activity for acquiring new and useful information. Analyze this dataset and try to answer the following questions. 1. How was the sales trend over the months? 2. What are the most frequently purchased products? 3. How many products does the customer purchase in each transaction? 4. What are the most profitable segment customers? 5. Based on your findings, what strategy could you recommend to the business to gain more profit?

Photo by CardMapr on Unsplash

Clear search

Close search

Google apps

Main menu

E-commerce Business Transaction

Context

Content

Inspiration

Financial Transactions Dataset for Fraud Detection

Kaggle-Credit Card Fraud Dataset Dataset

Online Retail Transaction Data

Online Retail Transaction Data

UK Online Retail Sales and Customer Transaction Data

About this dataset

Comprehensive Dataset on Online Retail Sales and Customer Data

How to use the dataset

1. Sales Analysis:

2. Product Analysis:

3. Customer Segmentation:

4. Geographical Analysis:

Practical applications

‘Retail Transaction Data’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Credit Card Fraud Detection

1. Dataset Description

2. Technical Details

3. Further Details

Synthetic Bank Transactions

Inspiration

Content

Fraud Detection - Financial transactions

Fraud Detection Dataset

Dataset Title:

Description:

Columns Overview:

Use Cases:

‘Retail Transaction Data’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

‘Store Transaction data’ analyzed by Analyst-2

Context

Content

Acknowledgements

Inspiration

Fraud Detection Transactions Dataset

Description

📌 Key Features

📌 Columns in the Dataset

📌 Potential Use Cases

Retail Transaction Dataset

UPI Transactions 23-24

Dataset

Contents

Credit Card Fraud Detection

‘Fraud detection bank dataset 20K records binary ’ analyzed by Analyst-2

Context

Content

‘OpenSea Daily Ethereum Transactions’ analyzed by Analyst-2

Sales Dataset

‘OpenSea Daily Polygon Transactions’ analyzed by Analyst-2

Transaction Data

Dataset

Contents

E-commerce Business Transaction

Sales transaction of a UK-based e-commerce (online retail) for one year

Context

Content

Inspiration