Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.
| Column Name | Description |
|---|---|
| Transaction_ID | Unique identifier for each transaction |
| User_ID | Unique identifier for the user |
| Transaction_Amount | Amount of money involved in the transaction |
| Transaction_Type | Type of transaction (Online, In-Store, ATM, etc.) |
| Timestamp | Date and time of the transaction |
| Account_Balance | User's current account balance before the transaction |
| Device_Type | Type of device used (Mobile, Desktop, etc.) |
| Location | Geographical location of the transaction |
| Merchant_Category | Type of merchant (Retail, Food, Travel, etc.) |
| IP_Address_Flag | Whether the IP address was flagged as suspicious (0 or 1) |
| Previous_Fraudulent_Activity | Number of past fraudulent activities by the user |
| Daily_Transaction_Count | Number of transactions made by the user that day |
| Avg_Transaction_Amount_7d | User's average transaction amount in the past 7 days |
| Failed_Transaction_Count_7d | Count of failed transactions in the past 7 days |
| Card_Type | Type of payment card used (Credit, Debit, Prepaid, etc.) |
| Card_Age | Age of the card in months |
| Transaction_Distance | Distance between the user's usual location and transaction location |
| Authentication_Method | How the user authenticated (PIN, Biometric, etc.) |
| Risk_Score | Fraud risk score computed for the transaction |
| Is_Weekend | Whether the transaction occurred on a weekend (0 or 1) |
| Fraud_Label | Target variable (0 = Not Fraud, 1 = Fraud) |
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Some of these records were flagged false by existing algorithms.
Further approaches could be used to feature engineer properties that could further strengthen the fraud detection algorithms as well as find out where the existing algorithm lacks.
CASH-IN: is the process of increasing the balance of account by paying in cash to a merchant.
CASH-OUT: is the opposite process of CASH-IN, it means to withdraw cash from a merchant which decreases the balance of the account.
DEBIT: is similar process than CASH-OUT and involves sending the money from the mobile money service to a bank account.
PAYMENT: is the process of paying for goods or services to merchants which decreases the balance of the account and increases the balance of the receiver.
TRANSFER: is the process of sending money to another user of the service through the mobile money platform
Facebook
Twitterhttps://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/
Nigerian Financial Fraud Detection Dataset (Enhanced)
Overview
This is a comprehensive synthetic financial fraud detection dataset specifically engineered for the Nigerian fintech ecosystem. The dataset contains 5,000,000 transactions with 45 advanced features including sophisticated user behaviour analytics, device intelligence, risk scoring, and temporal patterns tailored for Nigerian financial fraud detection.
We have found that a lot of people are unable to use the full⦠See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/Nigerian-Financial-Transactions-and-Fraud-Detection-Dataset.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset contains 5 million synthetically generated financial transactions designed to simulate real-world behavior for fraud detection research and machine learning applications. Each transaction record includes fields such as:
Transaction Details: ID, timestamp, sender/receiver accounts, amount, type (deposit, transfer, etc.)
Behavioral Features: time since last transaction, spending deviation score, velocity score, geo-anomaly score
Metadata: location, device used, payment channel, IP address, device hash
Fraud Indicators: binary fraud label (is_fraud) and type of fraud (e.g., money laundering, account takeover)
The dataset follows realistic fraud patterns and behavioral anomalies, making it suitable for:
Binary and multiclass classification models
Fraud detection systems
Time-series anomaly detection
Feature engineering and model explainability
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fraud detected in Defence SA since 2012-13.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMPāstyle description of your creditācard fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous creditācard transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraudādetection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available creditācard transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepoās requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsetsātraining (70%), validation (15%), test (15%)āusing rangeābased filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from āYā/āNā to 1/0 and dropped nonāfeature identifiers (actionnr, merchant_id).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the heldāout test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr (integer transaction ID)
merchant_id (string)
average_amount_transaction_day (float)
transaction_amount (float)
is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)
total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.
Files in the code repo follow a clear structure:
āāā data/ # local copies only; raw data lives in DBRepo
āāā notebooks/Task.ipynb
āāā models/rf_model_v1.joblib
āāā outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
āāā README.md
āāā requirements.txt
āāā codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepoāclient.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebookās dbrepo_client.py template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1āV28) hidden; we extended with domain features but cannot reverse engineer raw variables.
Timeābounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraudādetection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating modelātraining pipelines, FAIR data practices.
Extension: adding timeāseries or deepālearning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.
Facebook
Twitterpurulalwani/Synthetic-Financial-Datasets-For-Fraud-Detection dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This synthetic insurance claim fraud detection dataset contains detailed records of claims, including incident specifics, claimant demographics, policy details, and fraud indicators. Designed for developing and testing machine learning models, it enables insurers and researchers to identify patterns of fraudulent activity and improve risk assessment strategies.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Nature of Data: This dataset contains fictitious data designed for educational and testing purposes in fraud detection algorithms. It does not represent real-world financial transactions or individuals.
Purpose of Creation: The dataset was generated to provide a realistic example for developing and evaluating fraud detection models without relying on sensitive real-world data. It's intended for students, researchers, and practitioners to practice data analysis and machine learning techniques in a safe environment.
Facebook
TwitterOpen Government Licence 2.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/2/
License information was derived automatically
Data on fraud investigation carried out by City of York Council ā use of powers, number of investigators and investigations, and monetary value of fraud identified. This data is required to be published in order to meet the requirements of the Local Government Transparency Code legislation.
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
## Overview
Fraud Detection is a dataset for object detection tasks - it contains Mobile annotations for 2,590 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [MIT license](https://creativecommons.org/licenses/MIT).
Facebook
TwitterMIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
kgauvin603/creditcard-fraud-detection dataset hosted on Hugging Face and contributed by the HF Datasets community
Facebook
TwitterApache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
š Cifer Fraud Detection Mini Dataset
š§ Overview
(cifer-fraud-detection-mini-dataset) The Cifer-Fraud-Detection-Mini-Dataset is a lightweight sample containing 20 transaction records, extracted from the full 21 million-row Cifer-Fraud-Detection-Dataset-AF. It is designed for quick experimentation of encrypted model training with Fully Homomorphic Encryption (FHE). Though small in size, this mini dataset retains the original schema and data structure inspired by⦠See the full description on the dataset page: https://huggingface.co/datasets/CiferAI/cifer-fraud-detection-mini-dataset.
Facebook
Twitterhttps://www.technavio.com/content/privacy-noticehttps://www.technavio.com/content/privacy-notice
Fraud Detection And Prevention Market Size 2025-2029
The fraud detection and prevention market size is forecast to increase by USD 122.65 billion, at a CAGR of 30.1% between 2024 and 2029.
The market is witnessing significant growth, driven by the increasing adoption of cloud-based services. Businesses are recognizing the benefits of cloud solutions, such as real-time fraud detection, scalability, and cost savings. Additionally, technological advancements in fraud detection and prevention solutions and services are enabling organizations to better protect their assets from sophisticated fraud schemes. However, the complex IT infrastructure of modern businesses poses a challenge in implementing and integrating these solutions effectively. The complexity of the IT infrastructure, which integrates cloud computing, big data, and mobile devices, creates a vast network of devices with insufficient security features.
To capitalize on market opportunities, companies must stay abreast of these trends and invest in advanced fraud detection technologies. Effective implementation and integration of these solutions, coupled with continuous innovation, will be crucial for businesses seeking to mitigate fraud risks and protect their reputation and financial stability. Furthermore, the constant evolution of fraud techniques necessitates continuous innovation and adaptation from solution providers. Encryption techniques and network security protocols form the foundation of robust cybersecurity defenses, while compliance regulations and penetration testing help identify vulnerabilities and strengthen security posture.
What will be the Size of the Fraud Detection And Prevention Market during the forecast period?
Explore in-depth regional segment analysis with market size data - historical 2019-2023 and forecasts 2025-2029 - in the full report.
Request Free Sample
The market continues to evolve, driven by the constant emergence of new threats and the need for advanced technologies to mitigate risks across various sectors. Real-time fraud alerts, anomaly detection systems, forensic accounting tools, and risk mitigation strategies are integrated into comprehensive solutions that adapt to the ever-changing fraud landscape. Entities rely on these tools to maintain regulatory compliance frameworks and incident response planning, ensuring access control management and vulnerability assessments are up-to-date. Machine learning algorithms and transaction monitoring tools enable the detection of suspicious activity, providing valuable insights into potential threats.
Intrusion detection systems and behavioral biometrics offer real-time protection against cyberattacks and payment fraud, while identity verification methods and risk scoring models help prevent account takeover and data loss. Cybersecurity threat intelligence and authentication protocols enhance the overall security strategy, providing a layered approach to fraud prevention. Fraud investigation techniques and loss prevention metrics enable entities to respond effectively to incidents and minimize the impact of data breaches. Social engineering countermeasures and payment fraud detection solutions further fortify the fraud prevention arsenal, ensuring continuous protection against evolving threats.
The ongoing dynamism of the market demands a proactive approach, with entities staying informed and agile to maintain a strong defense against fraudulent activities.
How is this Fraud Detection And Prevention Industry segmented?
The fraud detection and prevention industry research report provides comprehensive data (region-wise segment analysis), with forecasts and estimates in 'USD million' for the period 2025-2029, as well as historical data from 2019-2023 for the following segments.
Component
Solutions
Services
End-user
Large enterprise
SMEs
Application
Transaction monitoring
Compliance and risk management
Identity verification
Behavioral analytics
Others
Geography
North America
US
Canada
Europe
France
Germany
Italy
Russia
UK
APAC
China
India
Japan
Rest of World (ROW)
By Component Insights
The Solutions segment is estimated to witness significant growth during the forecast period. The market is experiencing significant growth due to escalating cyber threats, increasing regulatory compliance requirements, and the need to mitigate financial losses. Biometric authentication, encryption techniques, machine learning algorithms, and intrusion detection systems are among the key solutions driving market expansion. Regulatory frameworks, such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA), are mandating robust incident response planning, access control management, and data breach prevention strategies. Vulnerability assessments and
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset simulates real-world banking transactions, including both legitimate and fraudulent activity. It includes detailed features such as transaction amount, time, type, location, device type, and historical user behavior. Designed for binary classification, this dataset is ideal for training and evaluating machine learning models for fraud detection. This dataset contains simulated financial transactions labeled as fraudulent or legitimate. It includes the following features:
transaction_id: Unique identifier for each transaction
customer_id: Anonymized customer ID
transaction_amount: Value of the transaction in currency units
transaction_type: Type of transaction (e.g., payment, transfer)
transaction_time: Timestamp of when the transaction occurred
transaction_location: Region where the transaction was initiated
device_type: Device used (e.g., mobile, POS, desktop)
previous_transactions_count: Number of recent transactions by the same customer
is_fraud: Target label indicating fraud (1) or not (0)
This dataset is ideal for binary classification tasks such as fraud detection using machine learning.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
The dataset from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
The dataset contains transactions made by credit cards in September 2013 by European cardholders.
This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions.
Facebook
Twitterhttps://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction ⢠The Vehicle Insurance Claim Fraud Detection Dataset is a tabular insurance fraud detection dataset that includes vehicle information, accident and insurance details, and claims details for vehicle insurance claims, and labels each claim as a fraudulent or not.
2) Data Utilization (1) Vehicle Insurance Claim Fraud Detection Dataset has characteristics that: ⢠Each row contains a variety of variables, including vehicle attributes, models, accident details, insurance type and duration, and claim history, as well as the target variable, FraudFound_P. ⢠The data are based on real insurance claim cases and are designed to be suitable for insurance fraud detection and classification model development. (2) Vehicle Insurance Claim Fraud Detection Dataset can be used to: ⢠Development of Insurance Fraud Detection Models: You can build a machine learning-based insurance fraud classification and prediction model by leveraging various vehicle and accident and insurance attributes. ⢠Analyzing fraud patterns and risk factors: You can use billing data and fraud to analyze fraud patterns, risk factors, insurance policy improvements, and more.
Facebook
TwitterAttribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
## Overview
Cctv Fraud Detection is a dataset for object detection tasks - it contains Fraud annotations for 3,000 images.
## Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
## License
This dataset is available under the [CC BY 4.0 license](https://creativecommons.org/licenses/CC BY 4.0).
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset contains detailed synthetic payment transaction records, each labeled with ground-truth indicators of fraud. It includes transaction metadata, customer and merchant identifiers, payment methods, device and location context, and fraud reasons, making it ideal for developing and benchmarking machine learning models for payment fraud detection and risk mitigation.
Facebook
TwitterCC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
This dataset provides detailed, interconnected banking transaction records, capturing sender and receiver relationships, transaction metadata, and anomaly flags. Designed for network analytics, it enables advanced anti-money laundering (AML) detection, fraud analysis, and financial behavior modeling by representing transactions as a directed graph. The flat structure ensures easy integration with machine learning and graph analytics tools.
Facebook
Twitterhttps://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
This dataset is designed to help data scientists and machine learning enthusiasts develop robust fraud detection models. It contains realistic synthetic transaction data, including user information, transaction types, risk scores, and more, making it ideal for binary classification tasks with models like XGBoost and LightGBM.
| Column Name | Description |
|---|---|
| Transaction_ID | Unique identifier for each transaction |
| User_ID | Unique identifier for the user |
| Transaction_Amount | Amount of money involved in the transaction |
| Transaction_Type | Type of transaction (Online, In-Store, ATM, etc.) |
| Timestamp | Date and time of the transaction |
| Account_Balance | User's current account balance before the transaction |
| Device_Type | Type of device used (Mobile, Desktop, etc.) |
| Location | Geographical location of the transaction |
| Merchant_Category | Type of merchant (Retail, Food, Travel, etc.) |
| IP_Address_Flag | Whether the IP address was flagged as suspicious (0 or 1) |
| Previous_Fraudulent_Activity | Number of past fraudulent activities by the user |
| Daily_Transaction_Count | Number of transactions made by the user that day |
| Avg_Transaction_Amount_7d | User's average transaction amount in the past 7 days |
| Failed_Transaction_Count_7d | Count of failed transactions in the past 7 days |
| Card_Type | Type of payment card used (Credit, Debit, Prepaid, etc.) |
| Card_Age | Age of the card in months |
| Transaction_Distance | Distance between the user's usual location and transaction location |
| Authentication_Method | How the user authenticated (PIN, Biometric, etc.) |
| Risk_Score | Fraud risk score computed for the transaction |
| Is_Weekend | Whether the transaction occurred on a weekend (0 or 1) |
| Fraud_Label | Target variable (0 = Not Fraud, 1 = Fraud) |