100+ datasets found
  1. t

    Credit Card Fraud Detection

    • test.researchdata.tuwien.ac.at
    • zenodo.org
    • +1more
    csv, json, pdf +2
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
    Explore at:
    text/markdown, csv, pdf, txt, jsonAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

    1. Dataset Description

    Research Domain
    This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

    Purpose
    The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

    Data Sources
    We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

    Method of Dataset Preparation

    1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

    2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

    3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

    4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

    5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

    2. Technical Details

    Dataset Structure

    • The raw data is a single CSV with columns:

      • actionnr (integer transaction ID)

      • merchant_id (string)

      • average_amount_transaction_day (float)

      • transaction_amount (float)

      • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

      • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

    Naming Conventions

    • All columns use lowercase snake_case.

    • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

    • Files in the code repo follow a clear structure:

      ├── data/         # local copies only; raw data lives in DBRepo 
      ├── notebooks/Task.ipynb 
      ├── models/rf_model_v1.joblib 
      ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
      ├── README.md 
      ├── requirements.txt 
      └── codemeta.json 
      

    Required Software

    • Python 3.9+

    • pandas, numpy (data handling)

    • scikit-learn (modeling, metrics)

    • matplotlib (visualizations)

    • dbrepo‐client.py (DBRepo API)

    • requests (TU WRD API)

    Additional Resources

    3. Further Details

    Data Limitations

    • Highly imbalanced: only ~0.17% of transactions are fraudulent.

    • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

    • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

    Licensing and Attribution

    • Raw data: CC-0 (per Kaggle terms)

    • Code & notebooks: MIT License

    • Model artifacts & outputs: CC-BY 4.0

    • DUWRD records include ORCID identifiers for the author.

    Recommended Uses

    • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

    • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

    • Extension: adding time‐series or deep‐learning models.

    Known Issues

    • Possible temporal leakage if date/time features not handled correctly.

    • Model performance may degrade on live data due to concept drift.

    • Binary flags may oversimplify nuanced transaction outcomes.

  2. CerditCard fraud dataset

    • kaggle.com
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wasiq Ali (2025). CerditCard fraud dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/cerditcard-fraud-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wasiq Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Credit Card Fraud Detection Dataset

    Uncover fraudulent transactions with this anonymized, PCA-transformed dataset. Perfect for building and testing fraud detection algorithms!

    Dataset Overview

    • Objective: Detect fraudulent credit card transactions using anonymized features- - - -

    • Samples: 1,000 transactions

    • Features: 7 columns (5 PCA components + Transaction Amount + Target)

    Class Distribution:

    • Legit (Class 0): 993 transactions (~99.3%)

    • Fraud (Class 1): 7 transactions (~0.7%)

    • Key Challenge: Extreme class imbalance – realistic representation of fraud patterns

    Features Description

    Feature Description Characteristics

    V1-V5 Anonymized principal components PCA-transformed numerical features; preserves >transaction patterns while hiding sensitive details Amount Transaction value Highly variable (min: $0.20, max: $1,916.06); critical for fraud analysis Class Target variable Binary labels: • 0 = Legitimate transaction • 1 = Fraudulent transaction Key Insights & Patterns

    Fraud Indicators:

    • Fraudulent transactions occur across diverse amounts (low: $1.83 → high: $1,916)

    • No obvious amount threshold for fraud – requires nuanced modeling

    Sample fraud cases:

    1. V1:0.579, V2:-0.384, Amount:1916.06

    2. V1:1.023, V2:-0.638, Amount:1094.42

    Data Characteristics:
    1. V1-V5 Distributions:

    2. V1: Concentrated near zero (mean ≈ -0.1)

    3. V2: Wider spread (mean ≈ 0.05)

    4. V3-V5: Asymmetric distributions

    Amount Distribution:

    1. Right-skewed – most transactions < $500

    2.Fraud cases span low and high values

    Class Imbalance:

     - Severe skew: 993:7 legit-to-fraud ratio
    
     - Models must optimize for recall/precision over accuracy
    
    Analysis Challenges

    ⚠️ Class Imbalance: Standard accuracy metrics misleading

    🔍 Feature Interpretation: PCA components lack real-world context

    📊 Non-linear Patterns: Complex interactions between V1-V5

    ⚡ High Stakes: False negatives (missed fraud) costlier than false positives

    Recommended Applications Fraud Detection Models:

    Logistic Regression (with class weighting)

    Random Forests / XGBoost (handle non-linearities)

    Isolation Forests (anomaly detection)

    Evaluation Focus:

    Precision-Recall Curves > ROC-AUC

    F2-Score (prioritize recall)

    Confusion matrix analysis

    Advanced Techniques:

    SMOTE/ADASYN for oversampling

    Autoencoders for anomaly detection

    Feature engineering: Amount-to-Var ratios

    Dataset Source & Ethics Origin: Synthetic dataset mirroring real-world financial patterns

    Anonymization: Original features transformed via PCA for privacy compliance

    Bias Consideration: Geographic/cultural biases possible in source data

    Potential Use Cases

    🏦 Banking: Real-time transaction monitoring systems

    📱 FinTech Apps: Fraud detection APIs for payment gateways

    🎓 Education: Imbalanced classification tutorials

    🏆 Kaggle Competitions: Lightweight fraud detection challenge

    Example Project Idea "Minimalist Fraud Detector":

    # python
    from imblearn.pipeline import make_pipeline
    from sklearn.ensemble import RandomForestClassifier
    
    model = make_pipeline(
      RobustScaler(), 
      SMOTE(sampling_strategy=0.3), 
      RandomForestClassifier(class_weight={0:1, 1:15}) 
    )
    Optimize for: Recall @ Precision > 0.85
    

    Dataset Summary markdown | Feature | Mean | Std | Min | Max | |----------|----------|----------|-----------|-----------| | V1 | -0.11 | 1.02 | -3.24 | 3.85 | | V2 | 0.05 | 1.01 | -2.94 | 2.60 | | V3 | 0.02 | 0.98 | -3.02 | 2.95 |
    | Amount | 250.32 | 190.19 | 0.20 | 1916.06 |

  3. Fraud Detection in Financial Transactions

    • kaggle.com
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darshan Dalvi (2025). Fraud Detection in Financial Transactions [Dataset]. https://www.kaggle.com/datasets/darshandalvi12/fraud-detection-in-financial-transactions/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Darshan Dalvi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Credit Card Fraud Detection Dataset (Updated)

    This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.

    Dataset Details:

    • Number of Transactions: 284,807
    • Fraudulent Transactions: 492 (Highly Imbalanced)
    • Features:
      • 28 anonymized features (V1 to V28)
      • Transaction amount
      • Timestamp
    • Label:
      • 0: Legitimate
      • 1: Fraudulent

    Data Preprocessing:

    • SMOTE (Synthetic Minority Oversampling Technique) has been applied to address the class imbalance in the dataset, generating synthetic examples for the minority class (fraudulent transactions).
    • Additional Operations: Various preprocessing steps were performed, including data cleaning and feature engineering, to ensure the quality of the dataset for model training.

    Processed Files:

    The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set

    This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.

    This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.

  4. c

    Financial Payment Services Fraud Dataset

    • cubig.ai
    Updated Jun 30, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). Financial Payment Services Fraud Dataset [Dataset]. https://cubig.ai/store/products/547/financial-payment-services-fraud-dataset
    Explore at:
    Dataset updated
    Jun 30, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
    Description

    1) Data Introduction • The Financial Payment Services Fraud Data Dataset is based on a real-world financial transaction simulation and was collected to detect fraudulent activities across various types of payments and transfers. It includes key financial data such as transaction time, type, amount, sender and recipient information, and account balances before and after each transaction. Each transaction is labeled as either fraudulent or legitimate.

    2) Data Utilization (1) Characteristics of the Financial Payment Services Fraud Data Dataset: • With its large-scale transaction records, detailed account information, and diverse transaction types, this dataset is well-suited for developing and testing financial fraud detection models.

    (2) Applications of the Financial Payment Services Fraud Data Dataset: • Real-time Fraud Detection: The dataset can be used to train machine learning classification models that quickly detect and prevent fraudulent transactions in real-world financial service environments. • Risky Transaction Pattern Analysis: By analyzing patterns according to transaction type, amount, and account, the dataset can support the advancement of fraud prevention policies and anomaly monitoring systems.

  5. h

    Nigerian-Financial-Transactions-and-Fraud-Detection-Dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Electric Sheep, Nigerian-Financial-Transactions-and-Fraud-Detection-Dataset [Dataset]. https://huggingface.co/datasets/electricsheepafrica/Nigerian-Financial-Transactions-and-Fraud-Detection-Dataset
    Explore at:
    Dataset authored and provided by
    Electric Sheep
    License

    https://choosealicense.com/licenses/gpl/https://choosealicense.com/licenses/gpl/

    Area covered
    Nigeria
    Description

    Nigerian Financial Fraud Detection Dataset (Enhanced)

      Overview
    

    This is a comprehensive synthetic financial fraud detection dataset specifically engineered for the Nigerian fintech ecosystem. The dataset contains 5,000,000 transactions with 45 advanced features including sophisticated user behavior analytics, device intelligence, risk scoring, and temporal patterns tailored for Nigerian financial fraud detection.

      Key Highlights
    

    🇳🇬 Nigerian-Localized: Currency (NGN)… See the full description on the dataset page: https://huggingface.co/datasets/electricsheepafrica/Nigerian-Financial-Transactions-and-Fraud-Detection-Dataset.

  6. G

    Fraudulent Transaction Identification

    • gomask.ai
    csv, json
    Updated Aug 20, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Fraudulent Transaction Identification [Dataset]. https://gomask.ai/marketplace/datasets/fraudulent-transaction-identification
    Explore at:
    csv(10 MB), jsonAvailable download formats
    Dataset updated
    Aug 20, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    notes, currency, is_fraud, device_id, account_id, fraud_type, ip_address, customer_id, device_type, fraud_score, and 13 more
    Description

    This dataset provides detailed records of financial transactions, enriched with fraud detection indicators, device and location metadata, and merchant information. It is designed to help financial institutions identify and analyze fraudulent activities, supporting both real-time monitoring and historical pattern analysis for risk mitigation and compliance.

  7. Fraud Detection - Financial transactions

    • find.data.gov.scot
    csv
    Updated Mar 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deloitte Datathon 2018 (uSmart) (2018). Fraud Detection - Financial transactions [Dataset]. https://find.data.gov.scot/datasets/39167
    Explore at:
    csv(470.6714 MB)Available download formats
    Dataset updated
    Mar 14, 2018
    Dataset provided by
    Deloittehttps://deloitte.com/
    Description

    Synthetic transactional data with labels for fraud detection. For more information, see: https://www.kaggle.com/ntnu-testimon/paysim1/version/2

  8. f

    Performance comparison with other credit card fraud detection dataset.

    • plos.figshare.com
    xls
    Updated Jul 16, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Al Mahmud Siam; Pankaj Bhowmik; Md Palash Uddin (2025). Performance comparison with other credit card fraud detection dataset. [Dataset]. http://doi.org/10.1371/journal.pone.0326975.t015
    Explore at:
    xlsAvailable download formats
    Dataset updated
    Jul 16, 2025
    Dataset provided by
    PLOS ONE
    Authors
    Al Mahmud Siam; Pankaj Bhowmik; Md Palash Uddin
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Performance comparison with other credit card fraud detection dataset.

  9. D

    Online Payment Fraud Detection Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Online Payment Fraud Detection Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/online-payment-fraud-detection-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Online Payment Fraud Detection Market Outlook



    The global online payment fraud detection market size was valued at USD 3.7 billion in 2023 and is projected to reach approximately USD 14.1 billion by 2032, growing at a robust CAGR of 16.2% during the forecast period. The rapid growth of e-commerce, increased digital transactions, and the rising sophistication of cyber-attacks are key factors driving the market's expansion. The market has seen significant growth owing to the necessity for secure online payment solutions to protect against fraud.



    A critical growth factor for the online payment fraud detection market is the burgeoning volume of online transactions. With the proliferation of e-commerce platforms and online financial services, the sheer number of digital payments has skyrocketed. This surge in online transactions has inevitably led to an increase in fraud attempts, necessitating advanced fraud detection systems. Financial institutions and vendors are increasingly investing in robust fraud detection solutions to safeguard their operations and customer data, thereby propelling the market forward.



    Another significant driver is the technological advancement in fraud detection methods. The adoption of artificial intelligence (AI), machine learning (ML), and big data analytics has revolutionized the way online payment fraud is detected and prevented. These technologies offer real-time monitoring and predictive analytics, enabling organizations to identify and mitigate fraudulent activities proactively. The continuous evolution of these technologies promises further advancements, making fraud detection systems more efficient and reliable.



    Regulatory requirements and compliance standards are also contributing to market growth. Governments and regulatory bodies worldwide are implementing stringent guidelines to ensure the security of digital transactions. Compliance with these regulations necessitates the adoption of advanced fraud detection systems. For instance, the European Union's Revised Payment Services Directive (PSD2) mandates strong customer authentication for online payments, thereby fostering the demand for sophisticated fraud detection solutions.



    Account Takeover Fraud Detection Software plays a pivotal role in the evolving landscape of online payment security. As cybercriminals become more adept at exploiting vulnerabilities, businesses are increasingly turning to specialized software to detect and prevent unauthorized access to user accounts. This type of fraud detection software employs advanced algorithms and machine learning techniques to monitor user behavior and identify anomalies that may indicate account takeover attempts. By analyzing login patterns, device information, and transaction history, these solutions can effectively flag suspicious activities and prevent unauthorized access. The integration of such software into existing security frameworks is crucial for businesses aiming to protect their customers' accounts and maintain trust in their digital platforms.



    The regional outlook for the online payment fraud detection market suggests a varied growth pattern. North America currently holds the largest market share due to the high adoption rate of digital payments and stringent regulatory landscape. Europe follows closely, driven by compliance requirements and the proliferation of online transactions. The Asia Pacific region is anticipated to witness the fastest growth, fueled by the rapid expansion of e-commerce and increasing digitalization in emerging economies. In contrast, regions like Latin America and the Middle East & Africa are gradually catching up, with growing awareness and investments in fraud detection technologies.



    Component Analysis



    The online payment fraud detection market is segmented by components into software and services. The software segment dominates the market, accounting for the lion's share of revenue. This segment includes various solutions such as fraud analytics, biometric authentication, and transaction screening. The continuous innovation in software tools to identify and prevent fraudulent activities is a significant driver for this segment. Companies are investing heavily in developing AI and ML-based software tools that offer real-time detection and response to fraud attempts.



    The software segment's growth is further propelled by the increasing demand for integrated fraud detection solutio

  10. G

    Payment Fraud Detection Dataset

    • gomask.ai
    csv, json
    Updated Jul 29, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Payment Fraud Detection Dataset [Dataset]. https://gomask.ai/marketplace/datasets/payment-fraud-detection-dataset
    Explore at:
    json, csv(10 MB)Available download formats
    Dataset updated
    Jul 29, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    amount, currency, device_id, entry_mode, ip_address, customer_id, fraud_label, merchant_id, customer_age, fraud_reason, and 9 more
    Description

    This dataset contains detailed synthetic payment transaction records, each labeled with ground-truth indicators of fraud. It includes transaction metadata, customer and merchant identifiers, payment methods, device and location context, and fraud reasons, making it ideal for developing and benchmarking machine learning models for payment fraud detection and risk mitigation.

  11. Credit Card Fraud Dataset

    • kaggle.com
    Updated Jun 22, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dylan Moraes (2024). Credit Card Fraud Dataset [Dataset]. https://www.kaggle.com/datasets/dylanmoraes/credit-card-fraud-dataset/discussion
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jun 22, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Dylan Moraes
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Overview

    This dataset contains synthetic credit card transaction data designed for fraud detection and machine learning research. With over 6.3 million transactions, it provides a realistic simulation of financial transaction patterns including both legitimate and fraudulent activities.

    Source

    This is a synthetic dataset generated to simulate credit card transaction behavior. The data represents financial transactions over a 30-day period (743 hours) with various transaction types including payments, transfers, cash-outs, debits, and cash-ins.

    Purpose

    The dataset is specifically designed for: - Training and testing fraud detection models - Anomaly detection research - Binary classification tasks - Imbalanced learning scenarios - Financial machine learning applications

    Column Descriptions

    • step: Maps a unit of time in the real world. 1 step represents 1 hour of time. Range: 1 to 743
    • type: Type of transaction (PAYMENT, TRANSFER, CASH_OUT, DEBIT, CASH_IN)
    • amount: Amount of the transaction in local currency
    • nameOrig: Customer ID who initiated the transaction
    • oldbalanceOrg: Initial balance before the transaction (origin account)
    • newbalanceOrig: New balance after the transaction (origin account)
    • nameDest: Recipient ID of the transaction
    • oldbalanceDest: Initial recipient balance before the transaction
    • newbalanceDest: New recipient balance after the transaction
    • isFraud: Binary flag indicating fraud (1 = fraud, 0 = legitimate)
    • isFlaggedFraud: Flag for illegal attempts to transfer more than 200,000 in a single transaction

    Dataset Statistics

    • Total Transactions: 6,362,620
    • Fraudulent Transactions: 8,213 (~0.13%)
    • Legitimate Transactions: 6,354,407 (~99.87%)
    • Time Period: 30 days (743 hours)
    • File Size: 493.53 MB

    Class Imbalance Note

    This dataset exhibits significant class imbalance with only 0.13% fraudulent transactions. This mirrors real-world fraud detection scenarios where fraudulent transactions are rare. Consider using techniques such as: - SMOTE (Synthetic Minority Over-sampling Technique) - Undersampling of majority class - Cost-sensitive learning - Ensemble methods - Anomaly detection algorithms

    Model Suitability

    This dataset is well-suited for: - Logistic Regression - Random Forest - Gradient Boosting (XGBoost, LightGBM, CatBoost) - Neural Networks - Isolation Forest - Autoencoders - Support Vector Machines

    Quick Start Example

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    # Load the dataset
    df = pd.read_csv('/kaggle/input/credit-card-fraud-dataset/Fraud.csv')
    
    # Display basic information
    print(df.info())
    print(df.head())
    
    # Check fraud distribution
    print(df['isFraud'].value_counts())
    
    # Visualize fraud distribution
    plt.figure(figsize=(8, 5))
    sns.countplot(data=df, x='isFraud')
    plt.title('Distribution of Fraud vs Legitimate Transactions')
    plt.xlabel('Is Fraud (0=No, 1=Yes)')
    plt.ylabel('Count')
    plt.show()
    
    # Transaction type distribution
    plt.figure(figsize=(10, 6))
    sns.countplot(data=df, x='type', hue='isFraud')
    plt.title('Transaction Types by Fraud Status')
    plt.xticks(rotation=45)
    plt.show()
    

    Usage Tips

    1. Handle Class Imbalance: Use appropriate sampling techniques or algorithms designed for imbalanced data
    2. Feature Engineering: Consider creating features like transaction velocity, time-based patterns, and balance differences
    3. Evaluation Metrics: Use precision, recall, F1-score, and AUC-ROC rather than accuracy due to class imbalance
    4. Cross-validation: Use stratified k-fold to maintain class distribution across folds
    5. Transaction Patterns: Analyze transaction types - TRANSFER and CASH_OUT are more associated with fraud

    Update Frequency

    This is a static dataset with no planned future updates. It serves as a benchmark for fraud detection research and model development.

    Acknowledgments

    This dataset is made available under the MIT License for educational and research purposes in the field of fraud detection and financial machine learning.

  12. c

    Data from: CreditCardTransactions Dataset

    • cubig.ai
    Updated Jul 8, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    CUBIG (2025). CreditCardTransactions Dataset [Dataset]. https://cubig.ai/store/products/554/creditcardtransactions-dataset
    Explore at:
    Dataset updated
    Jul 8, 2025
    Dataset authored and provided by
    CUBIG
    License

    https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service

    Measurement technique
    Privacy-preserving data transformation via differential privacy, Synthetic data generation using AI techniques for model training
    Description

    1) Data Introduction • The Credit_Card_Transactions Dataset is a representative sample data for building fraud detection models, including anonymized real-world transaction data such as financial transaction type, amount, sender/receiver account balance, and fraud indicators.

    2) Data Utilization (1) Credit_Card_Transactions Dataset has characteristics that: • This dataset provides individual transaction records on a row-by-row basis, reflecting the real-world class imbalance problem with the extremely low percentage of fraudulent transactions (isFraud=1). • It is an unprocessed raw data structure that allows you to directly utilize key variables such as transaction time, amount, and account change. (2) Credit_Card_Transactions Dataset can be used to: • Binary classification modeling: Fraud transaction detection models can be developed by applying imbalance processing techniques such as SMOTE and undersampling, and appropriate evaluation indicators such as F1-score and ROC-AUC. • Real-time anomaly detection: It can be used to build a real-time anomaly signal detection system through analysis of transaction patterns (amount, frequency, account change).

  13. G

    Financial Transaction Fraud Features

    • gomask.ai
    csv, json
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Financial Transaction Fraud Features [Dataset]. https://gomask.ai/marketplace/datasets/financial-transaction-fraud-features
    Explore at:
    csv(10 MB), jsonAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    channel, currency, is_fraud, device_id, account_id, fraud_type, merchant_id, location_city, location_state, transaction_id, and 9 more
    Description

    This dataset provides a detailed, feature-rich record of synthetic banking transactions, including transaction metadata, account and merchant information, contextual behavioral features, and fraud labels. It is ideal for developing, training, and benchmarking machine learning models for fraud detection and anomaly analysis in financial services.

  14. C

    Credit Card Fraud Detection Platform Report

    • datainsightsmarket.com
    doc, pdf, ppt
    Updated Jul 7, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Data Insights Market (2025). Credit Card Fraud Detection Platform Report [Dataset]. https://www.datainsightsmarket.com/reports/credit-card-fraud-detection-platform-1982870
    Explore at:
    pdf, doc, pptAvailable download formats
    Dataset updated
    Jul 7, 2025
    Dataset authored and provided by
    Data Insights Market
    License

    https://www.datainsightsmarket.com/privacy-policyhttps://www.datainsightsmarket.com/privacy-policy

    Time period covered
    2025 - 2033
    Area covered
    Global
    Variables measured
    Market Size
    Description

    The global credit card fraud detection platform market is experiencing robust growth, driven by the increasing prevalence of digital transactions and the sophistication of fraudulent activities. The market, estimated at $15 billion in 2025, is projected to maintain a healthy Compound Annual Growth Rate (CAGR) of 15% from 2025 to 2033, reaching approximately $45 billion by 2033. This expansion is fueled by several key factors: the rising adoption of e-commerce and mobile payments, the increasing volume of online transactions, the growing need for robust security measures among businesses to protect customer data and prevent financial losses, and the continuous evolution of fraud techniques necessitating advanced detection capabilities. Furthermore, the increasing regulatory scrutiny and compliance requirements are pushing organizations to invest heavily in sophisticated fraud detection systems. The market is segmented by deployment (cloud-based and on-premise), by organization size (small, medium, and large enterprises), and by industry vertical (banking, financial services, and insurance, retail, healthcare, and others). Key players in this dynamic market include established companies like Kount, ClearSale, Stripe Radar, Riskified, and FICO, alongside emerging technology providers like Akkio and Dataiku. These companies are constantly innovating to improve detection accuracy, reduce false positives, and offer seamless integration with existing payment processing systems. While challenges remain, such as the rising complexity of fraud schemes and the need to balance security with user experience, the market is poised for continued strong growth, driven by technological advancements in machine learning, artificial intelligence, and big data analytics. The increasing adoption of real-time fraud detection and advanced analytics capabilities will further shape the market landscape in the coming years, creating opportunities for both established and emerging players.

  15. D

    Financial Anti Fraud Software Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Jan 7, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2025). Financial Anti Fraud Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/financial-anti-fraud-software-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Jan 7, 2025
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Financial Anti-Fraud Software Market Outlook



    The global financial anti-fraud software market size was valued at USD 6.5 billion in 2023 and is projected to reach USD 15.8 billion by 2032, growing at a CAGR of 10.4% during the forecast period. The market is expected to witness significant growth driven by the increasing sophistication of cyber-attacks and the rising need for robust fraud detection mechanisms. Factors such as the rising digitization of financial transactions and stringent regulatory requirements are also contributing to the market's expansion.



    One of the primary growth factors for the financial anti-fraud software market is the increasing sophistication of cyber-attacks. As cybercriminals employ more advanced techniques, organizations are compelled to adopt equally advanced systems to detect and prevent fraudulent activities. The use of artificial intelligence (AI) and machine learning (ML) in these software solutions has enabled real-time analysis and detection of anomalies, making it more difficult for fraudsters to succeed. Moreover, as financial institutions increasingly rely on digital channels, the exposure to potential security breaches has surged, necessitating advanced anti-fraud measures.



    Another significant growth factor is the regulatory environment. Governments and regulatory bodies worldwide are implementing stringent policies to ensure the security of financial transactions. Compliance with these regulations requires financial institutions to adopt robust anti-fraud solutions. For instance, regulations like the General Data Protection Regulation (GDPR) in Europe and the Payment Card Industry Data Security Standard (PCI DSS) mandate rigorous data protection measures, which, in turn, drives the demand for advanced fraud detection software. The need for compliance not only mitigates risks but also builds customer trust.



    Additionally, the rising digitization of financial services is a substantial growth driver. The shift from traditional banking methods to digital platforms has led to an increase in online transactions. While this transition offers convenience and efficiency, it also opens up new avenues for fraud. Financial institutions are investing heavily in anti-fraud software to safeguard their digital platforms. This includes mobile banking, online transactions, and even cryptocurrency exchanges. As digital financial activities continue to grow, the market for anti-fraud solutions is expected to expand correspondingly.



    Fraud Risk Management Services play a crucial role in the financial sector by providing a comprehensive approach to identifying, assessing, and mitigating fraud risks. These services encompass a range of activities, including fraud risk assessments, the development of anti-fraud strategies, and the implementation of robust controls to prevent fraudulent activities. By leveraging data analytics and advanced technologies, fraud risk management services enable financial institutions to proactively detect and respond to potential threats. This proactive approach not only helps in minimizing financial losses but also enhances the overall security posture of organizations. As the financial landscape continues to evolve, the demand for specialized fraud risk management services is expected to rise, driven by the increasing complexity of fraud schemes and the need for compliance with regulatory requirements.



    On the regional front, North America currently holds the largest market share, driven by the high adoption rate of advanced technologies and stringent regulatory requirements. However, the Asia Pacific region is anticipated to witness the highest growth rate during the forecast period. Factors such as the rapid digitization of financial services, increasing internet penetration, and growing awareness about financial fraud are contributing to this growth. Countries like China and India are expected to be major contributors due to their large population base and increasing adoption of digital financial services.



    Component Analysis



    The financial anti-fraud software market can be segmented by component into software and services. The software segment holds the largest market share due to the increasing adoption of advanced fraud detection technologies by financial institutions. These software solutions incorporate advanced analytics, machine learning algorithms, and artificial intelligence to provide real-time fraud detection and prevention. Companies are continually investing in R&D to e

  16. D

    Ai Based Fraud Detection Tools Market Report | Global Forecast From 2025 To...

    • dataintelo.com
    csv, pdf, pptx
    Updated Oct 16, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Ai Based Fraud Detection Tools Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/ai-based-fraud-detection-tools-market
    Explore at:
    pptx, pdf, csvAvailable download formats
    Dataset updated
    Oct 16, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    AI-Based Fraud Detection Tools Market Outlook



    The global AI-based fraud detection tools market size was valued at approximately USD 6.5 billion in 2023 and is projected to reach USD 22.8 billion by 2032, growing at a robust CAGR of 15.1% during the forecast period. The significant growth factors driving this market include the increasing sophistication of fraudulent activities, the growing adoption of AI and machine learning technologies in various sectors, and the heightened demand for real-time fraud detection solutions.



    One of the primary growth factors for the AI-based fraud detection tools market is the rising complexity of fraudulent activities. In today's digital age, fraudsters are employing increasingly sophisticated techniques to breach security systems, making traditional detection methods inadequate. AI-based solutions, which leverage advanced algorithms and machine learning, are capable of analyzing large volumes of data to identify patterns and anomalies indicative of fraud. This capability is crucial for organizations seeking to protect their assets and maintain customer trust in an environment where cyber threats are continually evolving.



    Another significant growth driver is the widespread adoption of AI and machine learning technologies across various industries. Businesses are recognizing the potential of these technologies to enhance their fraud detection capabilities, leading to increased investments in AI-driven solutions. The banking and financial services sector, in particular, has been at the forefront of adopting AI-based fraud detection tools to combat financial crimes such as identity theft, credit card fraud, and money laundering. Furthermore, the retail and e-commerce sectors are increasingly implementing these tools to safeguard against fraudulent transactions and account takeovers.



    The growing demand for real-time fraud detection solutions is also propelling the market forward. Traditional fraud detection systems often rely on rule-based approaches that can be slow and reactive, allowing fraudulent activities to go undetected until significant damage has been done. In contrast, AI-based solutions can process and analyze data in real-time, enabling organizations to identify and respond to threats rapidly. This real-time capability is essential for minimizing losses and mitigating risks, particularly in sectors where the speed of transactions is critical, such as online retail and financial services.



    Regionally, North America currently dominates the AI-based fraud detection tools market, owing to the high adoption rate of advanced technologies and the presence of major industry players. However, other regions like Asia Pacific and Europe are also experiencing significant growth. Asia Pacific, in particular, is expected to exhibit the highest CAGR during the forecast period, driven by the increasing digitization of economies, rising internet penetration, and the growing awareness of cybersecurity threats. Europe is also witnessing substantial growth due to stringent regulatory requirements and the increasing focus on data privacy and security.



    Component Analysis



    The AI-based fraud detection tools market can be segmented by component into software, hardware, and services. The software segment is expected to hold the largest market share during the forecast period. This dominance can be attributed to the continuous advancements in AI algorithms and machine learning models, which enhance the accuracy and efficiency of fraud detection systems. Furthermore, the software solutions are designed to be scalable and easily integrated into existing systems, making them an attractive option for organizations of all sizes.



    Hardware components, though not as dominant as software, play a crucial role in the deployment of AI-based fraud detection systems. High-performance computing hardware, including GPUs and specialized AI processors, are essential for handling the large datasets and complex computations required for real-time fraud detection. As the demand for more powerful and efficient hardware grows, this segment is expected to see steady growth, particularly in large enterprises that require robust infrastructure to support their AI initiatives.



    The services segment, encompassing consulting, integration, and maintenance services, is also poised for significant growth. Organizations often lack the in-house expertise required to develop and implement AI-based fraud detection systems, leading to an increased reliance on external service providers. These services help organizations to customize and opti

  17. G

    Real-Time Fraudulent Transaction Alerts

    • gomask.ai
    csv, json
    Updated Jul 21, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Real-Time Fraudulent Transaction Alerts [Dataset]. https://gomask.ai/marketplace/datasets/real-time-fraudulent-transaction-alerts
    Explore at:
    json, csv(10 MB)Available download formats
    Dataset updated
    Jul 21, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    channel, alert_id, device_id, account_id, customer_id, fraud_score, alert_status, fraud_pattern, is_fraudulent, merchant_name, and 12 more
    Description

    This dataset provides comprehensive, real-time records of suspicious financial transactions flagged for potential fraud in digital banking platforms. It includes detailed transaction data, risk scores, fraud patterns, investigation outcomes, and contextual information such as device, channel, and location. Ideal for developing and benchmarking fraud detection models, auditing risk management processes, and supporting regulatory compliance.

  18. G

    Credit Card Fraud Detection

    • gomask.ai
    csv, json
    Updated Jul 12, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Credit Card Fraud Detection [Dataset]. https://gomask.ai/marketplace/datasets/credit-card-fraud-detection
    Explore at:
    csv(10 MB), jsonAvailable download formats
    Dataset updated
    Jul 12, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    is_fraud, entry_mode, card_number, merchant_id, cardholder_id, currency_code, cardholder_age, transaction_id, is_international, transaction_city, and 7 more
    Description

    This dataset provides detailed, labeled records of simulated credit card transactions, including transaction amounts, merchant and cardholder information, and fraud indicators. It is ideal for developing and benchmarking machine learning models aimed at detecting fraudulent activity and reducing financial risk in payment systems. The inclusion of transaction context and cardholder demographics supports advanced analytics and feature engineering.

  19. G

    Banking Transaction Graphs Dataset

    • gomask.ai
    csv, json
    Updated Jul 22, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    GoMask.ai (2025). Banking Transaction Graphs Dataset [Dataset]. https://gomask.ai/marketplace/datasets/banking-transaction-graphs-dataset
    Explore at:
    csv(10 MB), jsonAvailable download formats
    Dataset updated
    Jul 22, 2025
    Dataset provided by
    GoMask.ai
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Time period covered
    2024 - 2025
    Area covered
    Global
    Variables measured
    amount, channel, currency, timestamp, origin_country, reference_note, transaction_id, is_international, transaction_type, sender_account_id, and 6 more
    Description

    This dataset provides detailed, interconnected banking transaction records, capturing sender and receiver relationships, transaction metadata, and anomaly flags. Designed for network analytics, it enables advanced anti-money laundering (AML) detection, fraud analysis, and financial behavior modeling by representing transactions as a directed graph. The flat structure ensures easy integration with machine learning and graph analytics tools.

  20. D

    Fraud Detection and Prevention (FDP) Software Market Report | Global...

    • dataintelo.com
    csv, pdf, pptx
    Updated Dec 3, 2024
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Dataintelo (2024). Fraud Detection and Prevention (FDP) Software Market Report | Global Forecast From 2025 To 2033 [Dataset]. https://dataintelo.com/report/fraud-detection-and-prevention-fdp-software-market
    Explore at:
    pdf, pptx, csvAvailable download formats
    Dataset updated
    Dec 3, 2024
    Dataset authored and provided by
    Dataintelo
    License

    https://dataintelo.com/privacy-and-policyhttps://dataintelo.com/privacy-and-policy

    Time period covered
    2024 - 2032
    Area covered
    Global
    Description

    Fraud Detection and Prevention (FDP) Software Market Outlook



    In 2023, the global market size for Fraud Detection and Prevention (FDP) software is projected to be valued at approximately USD 25 billion. This burgeoning market is anticipated to escalate with a compound annual growth rate (CAGR) of 11% from 2024 to 2032, reaching an estimated USD 58 billion by the end of the forecast period. The proliferation of digital transactions, coupled with the increasing sophistication of cyber threats, is propelling the adoption of FDP solutions across various industry sectors. The market's growth is further fueled by an escalating demand for advanced analytics and machine learning technologies, which are integral to modern fraud detection mechanisms.



    The burgeoning volume of online transactions, driven by the rapid uptake of e-commerce and digital payment solutions, is one of the primary growth factors of the FDP software market. As businesses transition to digital platforms, they face heightened exposure to fraud risks, necessitating robust fraud detection solutions. The expansion of the e-commerce sector has particularly intensified the need for comprehensive digital security strategies, as fraudulent activities such as identity theft, payment fraud, and account takeovers become increasingly prevalent. FDP software, leveraging advanced algorithms and real-time analytics, plays a pivotal role in mitigating such risks, thereby safeguarding businesses and consumers alike.



    Moreover, the increasing regulatory pressures worldwide are another significant driver for the FDP software market. Governments and regulatory bodies are intensifying their focus on data protection and financial integrity, mandating businesses to implement stringent fraud prevention measures. Compliance with regulations such as the GDPR in Europe and CCPA in California demands sophisticated fraud detection systems to ensure data privacy and security. Consequently, businesses are increasingly investing in FDP solutions to not only protect themselves from fraud but also to remain compliant with evolving legal requirements.



    Furthermore, technological advancements in artificial intelligence and machine learning are revolutionizing the fraud detection landscape, contributing to market growth. These technologies enable the development of intelligent systems capable of identifying suspicious activities with greater accuracy and speed. Machine learning models can learn from historical data to predict potential fraudulent activities, thus allowing businesses to proactively address security threats. The integration of AI in FDP solutions enhances their ability to adapt to new and ever-evolving fraud tactics, ensuring continuous protection for enterprises across various sectors.



    Regionally, North America holds a significant share of the FDP software market, primarily due to the high adoption of advanced technologies and the presence of key market players. The region's strong financial infrastructure and the prevalence of online transactions further boost the demand for FDP solutions. The Asia Pacific region is poised for the highest growth rate during the forecast period, driven by digital transformation initiatives across emerging economies and increasing internet penetration. In Europe, stringent data protection regulations and a high concentration of e-commerce activities are driving the adoption of FDP software. Latin America and the Middle East & Africa are also witnessing growing interest in fraud prevention solutions, although these regions are still developing in terms of technological infrastructure.



    Component Analysis



    In the Fraud Detection and Prevention software market, the component segment is bifurcated into software and services. The software component is further sub-divided into various types of applications and platforms that cater to different aspects of fraud detection, such as identity verification, transaction monitoring, and behavioral analysis. The software division constitutes the lion's share of the market, as businesses prioritize robust technological solutions to combat sophisticated fraud techniques. These software solutions leverage machine learning, data analytics, and artificial intelligence to deliver real-time insights and predictive analytics, which are essential for identifying and mitigating fraudulent activities swiftly.



    On the other hand, the services component encompasses support and maintenance services, consulting, and training. These services are critical for the effective deployment and functioning of FDP software solutions. Service providers offer expertise

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22

Credit Card Fraud Detection

Explore at:
text/markdown, csv, pdf, txt, jsonAvailable download formats
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
License

Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically

Time period covered
Apr 28, 2025
Description

Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

1. Dataset Description

Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

Method of Dataset Preparation

  1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

  2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

  3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

  4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

  5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

2. Technical Details

Dataset Structure

  • The raw data is a single CSV with columns:

    • actionnr (integer transaction ID)

    • merchant_id (string)

    • average_amount_transaction_day (float)

    • transaction_amount (float)

    • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

    • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

Naming Conventions

  • All columns use lowercase snake_case.

  • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

  • Files in the code repo follow a clear structure:

    ├── data/         # local copies only; raw data lives in DBRepo 
    ├── notebooks/Task.ipynb 
    ├── models/rf_model_v1.joblib 
    ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
    ├── README.md 
    ├── requirements.txt 
    └── codemeta.json 
    

Required Software

  • Python 3.9+

  • pandas, numpy (data handling)

  • scikit-learn (modeling, metrics)

  • matplotlib (visualizations)

  • dbrepo‐client.py (DBRepo API)

  • requests (TU WRD API)

Additional Resources

3. Further Details

Data Limitations

  • Highly imbalanced: only ~0.17% of transactions are fraudulent.

  • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

  • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

Licensing and Attribution

  • Raw data: CC-0 (per Kaggle terms)

  • Code & notebooks: MIT License

  • Model artifacts & outputs: CC-BY 4.0

  • DUWRD records include ORCID identifiers for the author.

Recommended Uses

  • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

  • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

  • Extension: adding time‐series or deep‐learning models.

Known Issues

  • Possible temporal leakage if date/time features not handled correctly.

  • Model performance may degrade on live data due to concept drift.

  • Binary flags may oversimplify nuanced transaction outcomes.

Search
Clear search
Close search
Google apps
Main menu