100+ datasets found
  1. Fraud Detection Transactions Dataset

    • kaggle.com
    zip
    Updated Feb 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Samay Ashar (2025). Fraud Detection Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/samayashar/fraud-detection-transactions-dataset
    Explore at:
    zip(2104444 bytes)Available download formats
    Dataset updated
    Feb 21, 2025
    Authors
    Samay Ashar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Description

    This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.

    📌 Key Features

    1. 21 features capturing various aspects of a financial transaction
    2. Realistic structure with numerical, categorical, and temporal data
    3. Binary fraud labels (0 = Not Fraud, 1 = Fraud)
    4. Designed for high accuracy with XGBoost and other ML models
    5. Useful for anomaly detection, risk analysis, and security research

    📌 Columns in the Dataset

    Column NameDescription
    Transaction_IDUnique identifier for each transaction
    User_IDUnique identifier for the user
    Transaction_AmountAmount of money involved in the transaction
    Transaction_TypeType of transaction (Online, In-Store, ATM, etc.)
    TimestampDate and time of the transaction
    Account_BalanceUser's current account balance before the transaction
    Device_TypeType of device used (Mobile, Desktop, etc.)
    LocationGeographical location of the transaction
    Merchant_CategoryType of merchant (Retail, Food, Travel, etc.)
    IP_Address_FlagWhether the IP address was flagged as suspicious (0 or 1)
    Previous_Fraudulent_ActivityNumber of past fraudulent activities by the user
    Daily_Transaction_CountNumber of transactions made by the user that day
    Avg_Transaction_Amount_7dUser's average transaction amount in the past 7 days
    Failed_Transaction_Count_7dCount of failed transactions in the past 7 days
    Card_TypeType of payment card used (Credit, Debit, Prepaid, etc.)
    Card_AgeAge of the card in months
    Transaction_DistanceDistance between the user's usual location and transaction location
    Authentication_MethodHow the user authenticated (PIN, Biometric, etc.)
    Risk_ScoreFraud risk score computed for the transaction
    Is_WeekendWhether the transaction occurred on a weekend (0 or 1)
    Fraud_LabelTarget variable (0 = Not Fraud, 1 = Fraud)

    📌 Potential Use Cases

    1. Fraud detection model training
    2. Anomaly detection in financial transactions
    3. Risk scoring systems for banks and fintech companies
    4. Feature engineering and model explainability research
  2. Financial Transactions Dataset for Fraud Detection

    • kaggle.com
    zip
    Updated May 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aryan Kumar (2025). Financial Transactions Dataset for Fraud Detection [Dataset]. https://www.kaggle.com/datasets/aryan208/financial-transactions-dataset-for-fraud-detection
    Explore at:
    zip(290256858 bytes)Available download formats
    Dataset updated
    May 2, 2025
    Authors
    Aryan Kumar
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset contains 5 million synthetically generated financial transactions designed to simulate real-world behavior for fraud detection research and machine learning applications. Each transaction record includes fields such as:

    Transaction Details: ID, timestamp, sender/receiver accounts, amount, type (deposit, transfer, etc.)

    Behavioral Features: time since last transaction, spending deviation score, velocity score, geo-anomaly score

    Metadata: location, device used, payment channel, IP address, device hash

    Fraud Indicators: binary fraud label (is_fraud) and type of fraud (e.g., money laundering, account takeover)

    The dataset follows realistic fraud patterns and behavioral anomalies, making it suitable for:

    Binary and multiclass classification models

    Fraud detection systems

    Time-series anomaly detection

    Feature engineering and model explainability

  3. Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Mar 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Aman Ali Siddiqui (2025). Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/amanalisiddiqui/fraud-detection-dataset
    Explore at:
    zip(186385521 bytes)Available download formats
    Dataset updated
    Mar 28, 2025
    Authors
    Aman Ali Siddiqui
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    The dataset contains the records of financial transactions for fraud detection. (6.3 Million Records)

    Some of these records were flagged false by existing algorithms.

    Further approaches could be used to feature engineer properties that could further strengthen the fraud detection algorithms as well as find out where the existing algorithm lacks.

    CASH-IN: is the process of increasing the balance of account by paying in cash to a merchant.

    CASH-OUT: is the opposite process of CASH-IN, it means to withdraw cash from a merchant which decreases the balance of the account.

    DEBIT: is similar process than CASH-OUT and involves sending the money from the mobile money service to a bank account.

    PAYMENT: is the process of paying for goods or services to merchants which decreases the balance of the account and increases the balance of the receiver.

    TRANSFER: is the process of sending money to another user of the service through the mobile money platform

    Citation for original work

  4. HEALTHCARE PROVIDER FRAUD DETECTION ANALYSIS

    • kaggle.com
    zip
    Updated May 9, 2019
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rohit Anand Gupta (2019). HEALTHCARE PROVIDER FRAUD DETECTION ANALYSIS [Dataset]. https://www.kaggle.com/datasets/rohitrox/healthcare-provider-fraud-detection-analysis
    Explore at:
    zip(26631783 bytes)Available download formats
    Dataset updated
    May 9, 2019
    Authors
    Rohit Anand Gupta
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Project Objectives Provider Fraud is one of the biggest problems facing Medicare. According to the government, the total Medicare spending increased exponentially due to frauds in Medicare claims. Healthcare fraud is an organized crime which involves peers of providers, physicians, beneficiaries acting together to make fraud claims.

    Rigorous analysis of Medicare data has yielded many physicians who indulge in fraud. They adopt ways in which an ambiguous diagnosis code is used to adopt costliest procedures and drugs. Insurance companies are the most vulnerable institutions impacted due to these bad practices. Due to this reason, insurance companies increased their insurance premiums and as result healthcare is becoming costly matter day by day.

    Healthcare fraud and abuse take many forms. Some of the most common types of frauds by providers are:

    a) Billing for services that were not provided.

    b) Duplicate submission of a claim for the same service.

    c) Misrepresenting the service provided.

    d) Charging for a more complex or expensive service than was actually provided.

    e) Billing for a covered service when the service actually provided was not covered.

    Problem Statement The goal of this project is to " predict the potentially fraudulent providers " based on the claims filed by them.along with this, we will also discover important variables helpful in detecting the behaviour of potentially fraud providers. further, we will study fraudulent patterns in the provider's claims to understand the future behaviour of providers.

    Introduction to the Dataset For the purpose of this project, we are considering Inpatient claims, Outpatient claims and Beneficiary details of each provider. Lets s see their details :

    A) Inpatient Data

    This data provides insights about the claims filed for those patients who are admitted in the hospitals. It also provides additional details like their admission and discharge dates and admit d diagnosis code.

    B) Outpatient Data

    This data provides details about the claims filed for those patients who visit hospitals and not admitted in it.

    C) Beneficiary Details Data

    This data contains beneficiary KYC details like health conditions,regioregion they belong to etc.

  5. Credit Card Fraud

    • kaggle.com
    zip
    Updated May 7, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dhanush Narayanan R (2022). Credit Card Fraud [Dataset]. https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud
    Explore at:
    zip(30281243 bytes)Available download formats
    Dataset updated
    May 7, 2022
    Authors
    Dhanush Narayanan R
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description
    • Digital payments are evolving, but so are cyber criminals.

    • According to the Data Breach Index, more than 5 million records are being stolen on a daily basis, a concerning statistic that shows - fraud is still very common both for Card-Present and Card-not Present type of payments.

    • In today’s digital world where trillions of Card transaction happens per day, detection of fraud is challenging.

    This Dataset sourced by some unnamed institute.

    Feature Explanation:

    distance_from_home - the distance from home where the transaction happened.

    distance_from_last_transaction - the distance from last transaction happened.

    ratio_to_median_purchase_price - Ratio of purchased price transaction to median purchase price.

    repeat_retailer - Is the transaction happened from same retailer.

    used_chip - Is the transaction through chip (credit card).

    used_pin_number - Is the transaction happened by using PIN number.

    online_order - Is the transaction an online order.

    fraud - Is the transaction fraudulent.

  6. Synthetic Financial Datasets For Fraud Detection

    • kaggle.com
    zip
    Updated Apr 3, 2017
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Edgar Lopez-Rojas (2017). Synthetic Financial Datasets For Fraud Detection [Dataset]. https://www.kaggle.com/datasets/ealaxi/paysim1
    Explore at:
    zip(186385561 bytes)Available download formats
    Dataset updated
    Apr 3, 2017
    Authors
    Edgar Lopez-Rojas
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Context

    There is a lack of public available datasets on financial services and specially in the emerging mobile money transactions domain. Financial datasets are important to many researchers and in particular to us performing research in the domain of fraud detection. Part of the problem is the intrinsically private nature of financial transactions, that leads to no publicly available datasets.

    We present a synthetic dataset generated using the simulator called PaySim as an approach to such a problem. PaySim uses aggregated data from the private dataset to generate a synthetic dataset that resembles the normal operation of transactions and injects malicious behaviour to later evaluate the performance of fraud detection methods.

    Content

    PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.

    This synthetic dataset is scaled down 1/4 of the original dataset and it is created just for Kaggle.

    NOTE: Transactions which are detected as fraud are cancelled, so for fraud detection these columns (oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest ) must not be used.

    Headers

    This is a sample of 1 row with headers explanation:

    1,PAYMENT,1060.31,C429214117,1089.0,28.69,M1591654462,0.0,0.0,0,0

    step - maps a unit of time in the real world. In this case 1 step is 1 hour of time. Total steps 744 (30 days simulation).

    type - CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

    amount - amount of the transaction in local currency.

    nameOrig - customer who started the transaction

    oldbalanceOrg - initial balance before the transaction

    newbalanceOrig - new balance after the transaction.

    nameDest - customer who is the recipient of the transaction

    oldbalanceDest - initial balance recipient before the transaction. Note that there is not information for customers that start with M (Merchants).

    newbalanceDest - new balance recipient after the transaction. Note that there is not information for customers that start with M (Merchants).

    isFraud - This is the transactions made by the fraudulent agents inside the simulation. In this specific dataset the fraudulent behavior of the agents aims to profit by taking control or customers accounts and try to empty the funds by transferring to another account and then cashing out of the system.

    isFlaggedFraud - The business model aims to control massive transfers from one account to another and flags illegal attempts. An illegal attempt in this dataset is an attempt to transfer more than 200.000 in a single transaction.

    Past Research

    There are 5 similar files that contain the run of 5 different scenarios. These files are better explained at my PhD thesis chapter 7 (PhD Thesis Available here http://urn.kb.se/resolve?urn=urn:nbn:se:bth-12932.

    We ran PaySim several times using random seeds for 744 steps, representing each hour of one month of real time, which matches the original logs. Each run took around 45 minutes on an i7 intel processor with 16GB of RAM. The final result of a run contains approximately 24 million of financial records divided into the 5 types of categories: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER.

    Acknowledgements

    This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

    Please refer to this dataset using the following citations:

    PaySim first paper of the simulator:

    E. A. Lopez-Rojas , A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016

  7. Bank Transaction Fraud Detection

    • kaggle.com
    zip
    Updated Feb 1, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sagar Maru (2025). Bank Transaction Fraud Detection [Dataset]. https://www.kaggle.com/datasets/marusagar/bank-transaction-fraud-detection
    Explore at:
    zip(26701215 bytes)Available download formats
    Dataset updated
    Feb 1, 2025
    Authors
    Sagar Maru
    Description

    At LOL Bank Pvt. Ltd., ensuring the safety and integrity of economic transactions is a top priority. With increasingly more on line transactions and digital banking activities, fraudulent transactions have end up a good sized danger to both the financial institution and its customers. Fraudulent activities, along with unauthorized account get right of entry to, identification robbery, and suspicious transaction patterns, bring about economic losses and harm to patron agree with.

    To cope with this developing subject, LOL Bank Pvt. Ltd. Is in search of a strategy to stumble on and save you fraudulent transactions in real time. This includes analyzing ancient transaction records, consisting of account info, transaction quantities, service provider records, and time stamps, to pick out patterns indicative of fraudulent conduct. The intention is to construct a robust fraud detection gadget that may distinguish among legitimate transactions and probably fraudulent ones, with minimal fake positives.

    The answer must incorporate device learning algorithms to study from transaction history, allowing the machine to become aware of rising fraud strategies and adapt to evolving threats. The gadget must be able to flag suspicious transactions in real time, providing bank employees with actionable insights to take activate action. By enhancing fraud detection abilities, LOL Bank Pvt. Ltd. Objectives to shield patron belongings, lessen financial losses, and keep its reputation as a secure and honest economic organization.

    Here are the information of the columns:

    1. Customer_ID: A particular identifier for every customer within the bank's system.
    2. Customer_Name: The name of the consumer making the transaction.
    3. Gender: The gender of the consumer (e.G., Male, Female, Other). Four. Age: The age of the consumer at the time of the transaction.
    4. State: The nation in which the patron resides.
    5. City: The metropolis wherein the client is living.
    6. Bank_Branch: The specific financial institution branch wherein the consumer holds their account. Eight. Account_Type: The kind of account held with the aid of the customer (e.G., Savings, Checking). Nine. Transaction_ID: A particular identifier for each transaction.
    7. Transaction_Date: The date on which the transaction passed off. Eleven. Transaction_Time: The specific time the transaction became initiated.
    8. Transaction_Amount: The financial value of the transaction.
    9. Merchant_ID: A particular identifier for the merchant worried within the transaction.
    10. Transaction_Type: The nature of the transaction (e.G., Withdrawal, Deposit, Transfer).
    11. Merchant_Category: The class of the merchant (e.G., Retail, Online, Travel).
    12. Account_Balance: The balance of the customer's account after the transaction.
    13. Transaction_Device: The tool utilized by the consumer to perform the transaction (e.G., Mobile, Desktop).
    14. Transaction_Location: The geographical vicinity (e.G., latitude, longitude) of the transaction.
    15. Device_Type: The kind of device used for the transaction (e.G., Smartphone, Laptop).
    16. Is_Fraud: A binary indicator (1 or zero) indicating whether or not the transaction is fraudulent or now not.
    17. Transaction_Currency: The currency used for the transaction (e.G., USD, EUR).
    18. Customer_Contact: The contact variety of the client.
    19. Transaction_Description: A brief description of the transaction (e.G., buy, switch).
    20. Customer_Email: The e-mail cope with related to the consumer's account.

    These column descriptions give a clear expertise of the facts as a way to be used for fraud detection analysis.

    Detailed Information

    Problem Statement: Fraud Detection in Bank Transactions for LOL Bank Pvt. Ltd.

    At LOL Bank Pvt. Ltd., making sure the safety of patron financial transactions is paramount. With the rise of digital banking, the growth in transaction extent has unfolded greater opportunities for fraudulent activities, which could significantly affect the bank's recognition and lead to substantial financial losses. The undertaking is to accurately hit upon and prevent fraud while preserving a continuing banking revel in for clients. The key aspects of this trouble are as follows:

    Nature of the Problem:
    - Fraudulent transactions encompass unauthorized account get right of entry to, cash laundering, identity robbery, and uncommon transaction styles. - Traditional strategies of fraud detection are regularly reactive, main to behind schedule identity of fraud. - Fraudsters continuously evolve their tactics, making it harder to discover new forms of fraud the use of conventional strategies.

    Data Available:
    - The dataset includes historic transaction facts, which includes transaction information consisting of: - Transaction ID, ...

  8. Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Feb 18, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ziya (2025). Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/ziya07/fraud-detection-dataset
    Explore at:
    zip(469850 bytes)Available download formats
    Dataset updated
    Feb 18, 2025
    Authors
    Ziya
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset is designed for click fraud detection in Cost-Per-Action (CPA) online advertising. It contains 5,000 click records, with features related to user behavior, device information, and interaction patterns. The dataset includes both legitimate and fraudulent clicks, allowing researchers and data scientists to develop and evaluate AI-based fraud detection models.

    Key Features Click Behavior: Click duration, scroll depth, mouse movements, keystrokes detected User & Device Info: Device type, browser, operating system, IP reputation Network Security: VPN usage, proxy usage, IP address Fraud Labels: is_fraudulent (1 = Fraudulent Click, 0 = Legitimate Click)

  9. Fastag Fraud Detection Datasets

    • kaggle.com
    zip
    Updated Jan 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Prathamesh Pradeep Dessai (2024). Fastag Fraud Detection Datasets [Dataset]. https://www.kaggle.com/datasets/thegoanpanda/fastag-fraud-detection-datesets-fictitious
    Explore at:
    zip(108830 bytes)Available download formats
    Dataset updated
    Jan 16, 2024
    Authors
    Prathamesh Pradeep Dessai
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Nature of Data: This dataset contains fictitious data designed for educational and testing purposes in fraud detection algorithms. It does not represent real-world financial transactions or individuals.

    Purpose of Creation: The dataset was generated to provide a realistic example for developing and evaluating fraud detection models without relying on sensitive real-world data. It's intended for students, researchers, and practitioners to practice data analysis and machine learning techniques in a safe environment.

  10. Financial Fraud Detection Dataset

    • kaggle.com
    Updated Mar 7, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sri Harsha Eedala (2024). Financial Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/sriharshaeedala/financial-fraud-detection-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 7, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sri Harsha Eedala
    License

    Attribution-ShareAlike 4.0 (CC BY-SA 4.0)https://creativecommons.org/licenses/by-sa/4.0/
    License information was derived automatically

    Description

    Dataset Overview

    Introduction

    This dataset presents a synthetic representation of mobile money transactions, meticulously crafted to mirror the complexities of real-world financial activities while integrating fraudulent behaviors for research purposes. Derived from a simulator named PaySim, which utilizes aggregated data from actual financial logs of a mobile money service in an African country, this dataset aims to fill the gap in publicly available financial datasets for fraud detection studies. It encompasses a variety of transaction types including CASH-IN, CASH-OUT, DEBIT, PAYMENT, and TRANSFER over a simulated period of 30 days, providing a comprehensive environment for evaluating fraud detection methodologies. By addressing the intrinsic privacy concerns associated with financial transactions, this dataset offers a unique resource for researchers and analysts in the field of financial security and fraud detection, scaled to 1/4 of the original dataset size for efficient use within the Kaggle platform. Please note that transactions marked as fraudulent have been nullified, emphasizing the importance of non-balance columns for fraud analysis. This dataset is a contribution to the field from the "Scalable resource-efficient systems for big data analytics" project, funded by the Knowledge Foundation in Sweden.

    Dataset Details

    PaySim synthesizes mobile money transactions using data derived from a month's worth of financial logs from a mobile money service operating in an African country. These logs were provided by a multinational company that offers this financial service across more than 14 countries globally.

    This synthetic dataset has been scaled to one-quarter the size of the original dataset and is specifically tailored for Kaggle.

    Important Note: Transactions identified as fraudulent are annulled. Hence, for fraud detection analysis, the following columns should not be utilized: oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest.

    Dataset Structure

    • step: Represents a unit of time in the real world, with 1 step equating to 1 hour. The total simulation spans 744 steps, equivalent to 30 days.
    • type: Transaction types include CASH-IN, CASH-OUT, DEBIT, PAYMENT, and TRANSFER.
    • amount: The transaction amount in the local currency.
    • nameOrig: The customer initiating the transaction.
    • oldbalanceOrg: The initial balance before the transaction.
    • newbalanceOrig: The new balance after the transaction.
    • nameDest: The transaction's recipient customer.
    • oldbalanceDest: The initial recipient's balance before the transaction. Not applicable for customers identified by 'M' (Merchants).
    • newbalanceDest: The new recipient's balance after the transaction. Not applicable for 'M' (Merchants).
    • isFraud: Identifies transactions conducted by fraudulent agents aiming to deplete customer accounts through transfers and cash-outs.
    • isFlaggedFraud: Flags large-scale, unauthorized transfers between accounts, with any single transaction exceeding 200,000 being considered illegal.

    Previous Research and Acknowledgments

    This dataset has been generated through multiple runs of the PaySim simulator, each simulating a month of real-time transactions over 744 steps. Each run produced approximately 24 million financial records across the five transaction categories.

    This project is part of the "Scalable resource-efficient systems for big data analytics" research, supported by the Knowledge Foundation (grant: 20140032) in Sweden.

    For citations and further references, please use:

    E. A. Lopez-Rojas, A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016

  11. Vehicle Insurance Claim Fraud Detection

    • kaggle.com
    zip
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Bansal (2021). Vehicle Insurance Claim Fraud Detection [Dataset]. https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection
    Explore at:
    zip(356667 bytes)Available download formats
    Dataset updated
    Dec 20, 2021
    Authors
    Shivam Bansal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Vehicle Insurance Fraud Detection

    Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident. Some common examples include staged accidents where fraudsters deliberately “arrange” for accidents to occur; the use of phantom passengers where people who were not even at the scene of the accident claim to have suffered grievous injury, and make false personal injury claims where personal injuries are grossly exaggerated.

    About this dataset

    This dataset contains vehicle dataset - attribute, model, accident details, etc along with policy details - policy type, tenure etc. The target is to detect if a claim application is fraudulent or not - FraudFound_P

  12. Fraud Detection in Financial Transactions

    • kaggle.com
    zip
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darshan Dalvi (2025). Fraud Detection in Financial Transactions [Dataset]. https://www.kaggle.com/datasets/darshandalvi12/fraud-detection-in-financial-transactions
    Explore at:
    zip(230131674 bytes)Available download formats
    Dataset updated
    Jan 17, 2025
    Authors
    Darshan Dalvi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Credit Card Fraud Detection Dataset (Updated)

    This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.

    Dataset Details:

    • Number of Transactions: 284,807
    • Fraudulent Transactions: 492 (Highly Imbalanced)
    • Features:
      • 28 anonymized features (V1 to V28)
      • Transaction amount
      • Timestamp
    • Label:
      • 0: Legitimate
      • 1: Fraudulent

    Data Preprocessing:

    • SMOTE (Synthetic Minority Oversampling Technique) has been applied to address the class imbalance in the dataset, generating synthetic examples for the minority class (fraudulent transactions).
    • Additional Operations: Various preprocessing steps were performed, including data cleaning and feature engineering, to ensure the quality of the dataset for model training.

    Processed Files:

    The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set

    This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.

    This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.

  13. Fraud Detection in E-Commerce Dataset

    • kaggle.com
    zip
    Updated Mar 3, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kevin Vagan (2025). Fraud Detection in E-Commerce Dataset [Dataset]. https://www.kaggle.com/datasets/kevinvagan/fraud-detection-dataset
    Explore at:
    zip(176593375 bytes)Available download formats
    Dataset updated
    Mar 3, 2025
    Authors
    Kevin Vagan
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    This is a fabricated dataset which is made by merging two dataset, Dataset1.csv and Dataset2.csv .

    The final dataset which merged_dataset.csv is a synthetic dataset, using probabilistic imputation to handle missing values.

    Balancing the Dataset: The dataset, which was initially imbalanced, was balanced using the ROSE (Random Over-Sampling Examples) package to ensure equal representation of fraudulent and non-fraudulent transactions.

    This dataset was used for my group and school project report. You can check out my code for this project, through this https://github.com/slothislazy/DM_AOL

  14. 🔐 RiskVault: Financial Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Jul 10, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Jainil Patel (2025). 🔐 RiskVault: Financial Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/jainilspatel/fraud-detection-dataset
    Explore at:
    zip(401045 bytes)Available download formats
    Dataset updated
    Jul 10, 2025
    Authors
    Jainil Patel
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    This dataset simulates real-world banking transactions, including both legitimate and fraudulent activity. It includes detailed features such as transaction amount, time, type, location, device type, and historical user behavior. Designed for binary classification, this dataset is ideal for training and evaluating machine learning models for fraud detection. This dataset contains simulated financial transactions labeled as fraudulent or legitimate. It includes the following features:

    transaction_id: Unique identifier for each transaction

    customer_id: Anonymized customer ID

    transaction_amount: Value of the transaction in currency units

    transaction_type: Type of transaction (e.g., payment, transfer)

    transaction_time: Timestamp of when the transaction occurred

    transaction_location: Region where the transaction was initiated

    device_type: Device used (e.g., mobile, POS, desktop)

    previous_transactions_count: Number of recent transactions by the same customer

    is_fraud: Target label indicating fraud (1) or not (0)

    This dataset is ideal for binary classification tasks such as fraud detection using machine learning.

  15. Bank Fraud Dataset

    • kaggle.com
    zip
    Updated Aug 28, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Borna B (2023). Bank Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/mohammadbolandraftar/my-dataset
    Explore at:
    zip(11896802 bytes)Available download formats
    Dataset updated
    Aug 28, 2023
    Authors
    Borna B
    Description

    The dataset has one training dataset, one testing (unseen) dataset, which is unlabeled, and a clickstream dataset, all interconnected through a common identifier known as "SESSION_ID." This identifier allows us to link user actions across the datasets. A session involves client online banking activities like signing in, updating passwords, viewing products, or adding items to the cart.

    Majority of fraud cases add new shipping address, or change password. you can do visualization to get more insights about the nature of frauds.

    I also added 2 datasets named "train/test_dataset_combined" which are the merged version of the train and test datasets based on the "SESSION_ID" column. For more information, please refer to this link: https://www.kaggle.com/code/mohammadbolandraftar/combine-datasets-in-pandas

    In addition, I added the cleaned dataset after doing EDA. For more information about the EDA process, please refer to this link: https://www.kaggle.com/code/mohammadbolandraftar/a-deep-dive-into-fraud-detection-through-eda

  16. Transaction Data for fraud analysis

    • kaggle.com
    zip
    Updated Nov 1, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    isaBBaggin (2023). Transaction Data for fraud analysis [Dataset]. https://www.kaggle.com/datasets/isabbaggin/transaction-fraudulent-financial-syntheticdata
    Explore at:
    zip(214100 bytes)Available download formats
    Dataset updated
    Nov 1, 2023
    Authors
    isaBBaggin
    License

    Attribution-NonCommercial-NoDerivs 4.0 (CC BY-NC-ND 4.0)https://creativecommons.org/licenses/by-nc-nd/4.0/
    License information was derived automatically

    Description

    Creating a comprehensive dataset for fraud prevention and prescription in a bank involves collecting and generating various data points. In practice, data collection and generation can be a complex and time-consuming process. This dataset will be much simpler than a real-world fraud detection dataset but can serve as a starting point for a notebook. You can then expand and refine it as needed. This Kaggle dataset contains synthetic sales data designed for data analytics practice and hackathons. The dataset is entirely computer generated and does not contain any real-world information, ensuring privacy and data protection.

    Key Features:

    Transaction_Id Customer_Id Merchant_Id Amount Transaction time Card_type Location Purchase_category Customer_Age Is_fraudulent

    Use Cases:

    Practice data cleaning and preprocessing techniques. Explore time series analysis and forecasting. Develop customer segmentation models. Investigate product performance and inventory management. Experiment with recommendation systems and personalized marketing.

    Note: This dataset is entirely synthetic and does not represent any real-world sales data. It is intended for educational and practice purposes only.

    Attribution: If you use this dataset in your work, please attribute it to the original creator.

    Creator: Ishita Biswas

    Feel free to customize this description with any additional information or details about the specific characteristics of your required dataset.

  17. Credit Card Transactions Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Oct 21, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Rupeswara Babu Sangoju (2023). Credit Card Transactions Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/rupeswarababusangoju/credit-card-transactions-fraud-detection-dataset
    Explore at:
    zip(63490622 bytes)Available download formats
    Dataset updated
    Oct 21, 2023
    Authors
    Rupeswara Babu Sangoju
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Rupeswara Babu Sangoju

    Released under Apache 2.0

    Contents

  18. Online Payments Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Apr 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Youssef Dessouky (2024). Online Payments Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/youssefdessouky/online-payments-fraud-detection-dataset
    Explore at:
    zip(186385561 bytes)Available download formats
    Dataset updated
    Apr 17, 2024
    Authors
    Youssef Dessouky
    Description

    Dataset

    This dataset was created by Youssef Dessouky

    Contents

  19. Data from: Credit Card Transactions Dataset

    • kaggle.com
    zip
    Updated Jul 23, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Priyam Choksi (2024). Credit Card Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/priyamchoksi/credit-card-transactions-dataset
    Explore at:
    zip(152554916 bytes)Available download formats
    Dataset updated
    Jul 23, 2024
    Authors
    Priyam Choksi
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    The Credit Card Transactions Dataset provides detailed records of credit card transactions, including information about transaction times, amounts, and associated personal and merchant details. This dataset has over 1.85M rows.

    How This Dataset Can Be Used:

    Fraud Detection : Use machine learning models to identify fraudulent transactions by examining patterns in transaction amounts, locations, and user profiles. Enhancing fraud detection systems becomes feasible by analyzing behavioral patterns.

    Customer Segmentation : Segment customers based on spending patterns, location, and demographics. Tailor marketing strategies and personalized offers to these different customer segments for better engagement.

    Transaction Classification : Classify transactions into categories such as grocery or entertainment to understand spending behaviors. This helps in improving recommendation systems by identifying transaction categories and preferences.

    Geospatial Analysis : Analyze transaction data geographically to map spending patterns and detect regional trends or anomalies based on latitude and longitude.

    Predictive Modeling : Build models to forecast future spending behavior using historical transaction data. Predict potential fraudulent activities and financial trends.

    Behavioral Analysis : Examine how factors like transaction amount, merchant type, and time influence spending behavior. Study the relationships between user demographics and transaction patterns.

    Anomaly Detection : Identify unusual transaction patterns that deviate from normal behavior to detect potential fraud early. Employ anomaly detection techniques to spot outliers and suspicious activities.

  20. Healthcare Providers Data For Anomaly Detection

    • kaggle.com
    zip
    Updated Sep 6, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Tamil Selvan (2020). Healthcare Providers Data For Anomaly Detection [Dataset]. https://www.kaggle.com/datasets/tamilsel/healthcare-providers-data
    Explore at:
    zip(9183945 bytes)Available download formats
    Dataset updated
    Sep 6, 2020
    Authors
    Tamil Selvan
    License

    http://opendatacommons.org/licenses/dbcl/1.0/http://opendatacommons.org/licenses/dbcl/1.0/

    Description

    Healthcare fraud is considered a challenge for many societies. Health care funding that could be spent on medicine, care for the elderly, or emergency room visits is instead lost to fraudulent activities by materialistic practitioners or patients. With rising healthcare costs, healthcare fraud is a major contributor to these increasing healthcare costs.

    Try out various unsupervised techniques to find the anomalies in the data.

    Detailed Data File:

    The following variables are included in the detailed Physician and Other Supplier data file (see Appendix A for a condensed version of variables included)).

    npi – National Provider Identifier (NPI) for the performing provider on the claim. The provider NPI is the numeric identifier registered in NPPES.

    nppes_provider_last_org_name – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s last name. When the provider is registered as an organization (entity type code = ‘O’), this is the organization's name.

    nppes_provider_first_name – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s first name. When the provider is registered as an organization (entity type code = ‘O’), this will be blank.

    nppes_provider_mi – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s middle initial. When the provider is registered as an organization (entity type code= ‘O’), this will be blank.

    nppes_credentials – When the provider is registered in NPPES as an individual (entity type code=’I’), these are the provider’s credentials. When the provider is registered as an organization (entity type code = ‘O’), this will be blank.

    nppes_provider_gender – When the provider is registered in NPPES as an individual (entity type code=’I’), this is the provider’s gender. When the provider is registered as an organization (entity type code = ‘O’), this will be blank.

    nppes_entity_code – Type of entity reported in NPPES. An entity code of ‘I’ identifies providers registered as individuals and an entity type code of ‘O’ identifies providers registered as organizations.

    nppes_provider_street1 – The first line of the provider’s street address, as reported in NPPES.

    nppes_provider_street – The second line of the provider’s street address, as reported in NPPES.

    nppes_provider_city – The city where the provider is located, as reported in NPPES.

    nppes_provider_zip – The provider’s zip code, as reported in NPPES.

    nppes_provider_state – The state where the provider is located, as reported in NPPES. The fifty U.S. states and the District of Columbia are reported by the state postal abbreviation. The following values are used for all other areas:

    'XX' = 'Unknown' 'AA' = 'Armed Forces Central/South America' 'AE' = 'Armed Forces Europe' 'AP' = 'Armed Forces Pacific' 'AS' = 'American Samoa' 'GU' = 'Guam' 'MP' = 'North Mariana Islands' 'PR' = 'Puerto Rico' 'VI' = 'Virgin Islands' 'ZZ' = 'Foreign Country'

    nppes_provider_country – The country where the provider is located, as reported in NPPES. The country code will be ‘US’ for any state or U.S. possession. For foreign countries (i.e., state values of ‘ZZ’), the provider country values include the following: AE=United Arab Emirates IT=Italy AG=Antigua JO= Jordan AR=Argentina JP=Japan AU=Australia KR=Korea BO=Bolivia KW=Kuwait BR=Brazil KY=Cayman Islands CA=Canada LB=Lebanon CH=Switzerland MX=Mexico CN=China NL=Netherlands CO=Colombia NO=Norway DE= Germany NZ=New Zealand ES= Spain PA=Panama FR=France PK=Pakistan GB=Great Britain RW=Rwanda GR=Greece SA=Saudi Arabia HU= Hungary SY=Syria IL= Israel TH=Thailand IN=India TR=Turkey IS= Iceland VE=Venezuela

    provider_type – Derived from the provider specialty code reported on the claim.

    medicare_participation_indicator – Identifies whether the provider participates in Medicare and/or accepts the assigned assignment of Medicare allowed amounts.

    place_of_service – Identifies whether the place of service submitted on the claims is a facility (value of ‘F’) or non-facility (value of ‘O’). Non-facility is generally an office setting; however other entities are included in non-facility.

    hcpcs_code – HCPCS code used to identify the specific medical service furnished by the provider.

    hcpcs_description – Description of the HCPCS code for the specific medical service furnished by the provider.

    hcpcs_drug_indicator –Identifies whether the HCPCS code for the specific service furnished by the provider is an HCPCS listed on the Medicare Part B Drug Average Sales Price (ASP) File.

    line_srvc_cnt – Number of services provided; note that the metrics used to count the number provided can vary from service to service.

    bene_unique_cnt – Number of distinct Medicare beneficiaries rec...

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Samay Ashar (2025). Fraud Detection Transactions Dataset [Dataset]. https://www.kaggle.com/datasets/samayashar/fraud-detection-transactions-dataset
Organization logo

Fraud Detection Transactions Dataset

A high-quality synthetic dataset for fraud detection using XGBoost

Explore at:
35 scholarly articles cite this dataset (View in Google Scholar)
zip(2104444 bytes)Available download formats
Dataset updated
Feb 21, 2025
Authors
Samay Ashar
License

https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

Description

Description

This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.

📌 Key Features

  1. 21 features capturing various aspects of a financial transaction
  2. Realistic structure with numerical, categorical, and temporal data
  3. Binary fraud labels (0 = Not Fraud, 1 = Fraud)
  4. Designed for high accuracy with XGBoost and other ML models
  5. Useful for anomaly detection, risk analysis, and security research

📌 Columns in the Dataset

Column NameDescription
Transaction_IDUnique identifier for each transaction
User_IDUnique identifier for the user
Transaction_AmountAmount of money involved in the transaction
Transaction_TypeType of transaction (Online, In-Store, ATM, etc.)
TimestampDate and time of the transaction
Account_BalanceUser's current account balance before the transaction
Device_TypeType of device used (Mobile, Desktop, etc.)
LocationGeographical location of the transaction
Merchant_CategoryType of merchant (Retail, Food, Travel, etc.)
IP_Address_FlagWhether the IP address was flagged as suspicious (0 or 1)
Previous_Fraudulent_ActivityNumber of past fraudulent activities by the user
Daily_Transaction_CountNumber of transactions made by the user that day
Avg_Transaction_Amount_7dUser's average transaction amount in the past 7 days
Failed_Transaction_Count_7dCount of failed transactions in the past 7 days
Card_TypeType of payment card used (Credit, Debit, Prepaid, etc.)
Card_AgeAge of the card in months
Transaction_DistanceDistance between the user's usual location and transaction location
Authentication_MethodHow the user authenticated (PIN, Biometric, etc.)
Risk_ScoreFraud risk score computed for the transaction
Is_WeekendWhether the transaction occurred on a weekend (0 or 1)
Fraud_LabelTarget variable (0 = Not Fraud, 1 = Fraud)

📌 Potential Use Cases

  1. Fraud detection model training
  2. Anomaly detection in financial transactions
  3. Risk scoring systems for banks and fintech companies
  4. Feature engineering and model explainability research
Search
Clear search
Close search
Google apps
Main menu