64 datasets found
  1. Bank Account Fraud Dataset Suite (NeurIPS 2022)

    • kaggle.com
    Updated Nov 29, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 29, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Sérgio Jesus
    License

    Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
    License information was derived automatically

    Description

    The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

    This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
    - Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

    https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

    Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

    Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

    Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

    Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

    Learn more about Feedzai Research here: https://research.feedzai.com/

    Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }

  2. Vehicle Insurance Claim Fraud Detection

    • kaggle.com
    Updated Dec 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Shivam Bansal (2021). Vehicle Insurance Claim Fraud Detection [Dataset]. https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Dec 20, 2021
    Dataset provided by
    Kaggle
    Authors
    Shivam Bansal
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Vehicle Insurance Fraud Detection

    Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident. Some common examples include staged accidents where fraudsters deliberately “arrange” for accidents to occur; the use of phantom passengers where people who were not even at the scene of the accident claim to have suffered grievous injury, and make false personal injury claims where personal injuries are grossly exaggerated.

    About this dataset

    This dataset contains vehicle dataset - attribute, model, accident details, etc along with policy details - policy type, tenure etc. The target is to detect if a claim application is fraudulent or not - FraudFound_P

  3. t

    Credit Card Fraud Detection

    • test.researchdata.tuwien.ac.at
    • zenodo.org
    • +1more
    csv, json, pdf +2
    Updated Apr 28, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
    Explore at:
    text/markdown, csv, pdf, txt, jsonAvailable download formats
    Dataset updated
    Apr 28, 2025
    Dataset provided by
    TU Wien
    Authors
    Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Time period covered
    Apr 28, 2025
    Description

    Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

    1. Dataset Description

    Research Domain
    This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

    Purpose
    The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

    Data Sources
    We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

    Method of Dataset Preparation

    1. Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

    2. Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

    3. Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

    4. Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

    5. Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

    2. Technical Details

    Dataset Structure

    • The raw data is a single CSV with columns:

      • actionnr (integer transaction ID)

      • merchant_id (string)

      • average_amount_transaction_day (float)

      • transaction_amount (float)

      • is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

      • total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

    Naming Conventions

    • All columns use lowercase snake_case.

    • Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

    • Files in the code repo follow a clear structure:

      ├── data/         # local copies only; raw data lives in DBRepo 
      ├── notebooks/Task.ipynb 
      ├── models/rf_model_v1.joblib 
      ├── outputs/        # confusion_matrix.png, roc_curve.png, predictions.csv 
      ├── README.md 
      ├── requirements.txt 
      └── codemeta.json 
      

    Required Software

    • Python 3.9+

    • pandas, numpy (data handling)

    • scikit-learn (modeling, metrics)

    • matplotlib (visualizations)

    • dbrepo‐client.py (DBRepo API)

    • requests (TU WRD API)

    Additional Resources

    3. Further Details

    Data Limitations

    • Highly imbalanced: only ~0.17% of transactions are fraudulent.

    • Anonymized PCA features (V1V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

    • Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

    Licensing and Attribution

    • Raw data: CC-0 (per Kaggle terms)

    • Code & notebooks: MIT License

    • Model artifacts & outputs: CC-BY 4.0

    • DUWRD records include ORCID identifiers for the author.

    Recommended Uses

    • Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

    • Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

    • Extension: adding time‐series or deep‐learning models.

    Known Issues

    • Possible temporal leakage if date/time features not handled correctly.

    • Model performance may degrade on live data due to concept drift.

    • Binary flags may oversimplify nuanced transaction outcomes.

  4. credit card fraud detection dataset

    • kaggle.com
    Updated Jul 15, 2023
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Muste A(M.A) (2023). credit card fraud detection dataset [Dataset]. https://www.kaggle.com/datasets/mustefaage22m014/credit-card-fraud-detection-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 15, 2023
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Muste A(M.A)
    Description

    Dataset

    This dataset was created by Muste A(M.A)

    Contents

  5. Fraud Detection - Financial transactions

    • find.data.gov.scot
    csv
    Updated Mar 14, 2018
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Deloitte Datathon 2018 (uSmart) (2018). Fraud Detection - Financial transactions [Dataset]. https://find.data.gov.scot/datasets/39167
    Explore at:
    csv(470.6714 MB)Available download formats
    Dataset updated
    Mar 14, 2018
    Dataset provided by
    Deloittehttps://deloitte.com/
    Description

    Synthetic transactional data with labels for fraud detection. For more information, see: https://www.kaggle.com/ntnu-testimon/paysim1/version/2

  6. b

    Credit Card Fraud Detection

    • berd-platform.de
    csv
    Updated Jul 31, 2025
    + more versions
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Kaggle (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82939/qcqqe-g6q16
    Explore at:
    csvAvailable download formats
    Dataset updated
    Jul 31, 2025
    Dataset provided by
    Kaggle
    Description

    The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset is 0.15 GB large.

    The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.

  7. fraud detection

    • kaggle.com
    Updated Jul 21, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    IAbhishekBhardwaj (2021). fraud detection [Dataset]. https://www.kaggle.com/datasets/iabhishekbhardwaj/fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jul 21, 2021
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    IAbhishekBhardwaj
    Description

    Dataset

    This dataset was created by IAbhishekBhardwaj

    Contents

  8. CCFD_dataset

    • figshare.com
    xlsx
    Updated May 30, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Nur Amirah Ishak; Keng-Hoong Ng; Gee-Kok Tong; Suraya Nurain Kalid; Kok-Chin Khor (2023). CCFD_dataset [Dataset]. http://doi.org/10.6084/m9.figshare.16695616.v3
    Explore at:
    xlsxAvailable download formats
    Dataset updated
    May 30, 2023
    Dataset provided by
    Figsharehttp://figshare.com/
    figshare
    Authors
    Nur Amirah Ishak; Keng-Hoong Ng; Gee-Kok Tong; Suraya Nurain Kalid; Kok-Chin Khor
    License

    CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
    License information was derived automatically

    Description

    The dataset has been released by [1], which had been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of Université Libre de Bruxelles (ULB) on big data mining and fraud detection. [1] Pozzolo, A. D., Caelan, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. 2015 IEEE Symposium Series on Computational, pp. 159-166, doi: 10.1109/SSCI.2015.33 open source kaggle : https://www.kaggle.com/mlg-ulb/creditcardfraud

  9. Fraud Detection in Financial Transactions

    • kaggle.com
    Updated Jan 17, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Darshan Dalvi (2025). Fraud Detection in Financial Transactions [Dataset]. https://www.kaggle.com/datasets/darshandalvi12/fraud-detection-in-financial-transactions/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Jan 17, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Darshan Dalvi
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Credit Card Fraud Detection Dataset (Updated)

    This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.

    Dataset Details:

    • Number of Transactions: 284,807
    • Fraudulent Transactions: 492 (Highly Imbalanced)
    • Features:
      • 28 anonymized features (V1 to V28)
      • Transaction amount
      • Timestamp
    • Label:
      • 0: Legitimate
      • 1: Fraudulent

    Data Preprocessing:

    • SMOTE (Synthetic Minority Oversampling Technique) has been applied to address the class imbalance in the dataset, generating synthetic examples for the minority class (fraudulent transactions).
    • Additional Operations: Various preprocessing steps were performed, including data cleaning and feature engineering, to ensure the quality of the dataset for model training.

    Processed Files:

    The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set

    This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.

    This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.

  10. Credit Card Fraud Detection Dataset

    • kaggle.com
    zip
    Updated Dec 30, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    AYUSH VARSHNEY (2020). Credit Card Fraud Detection Dataset [Dataset]. https://www.kaggle.com/ayushvarshnay/credit-card-fraud-detection-dataset
    Explore at:
    zip(12076 bytes)Available download formats
    Dataset updated
    Dec 30, 2020
    Authors
    AYUSH VARSHNEY
    Description

    Dataset

    This dataset was created by AYUSH VARSHNEY

    Contents

  11. Credit Card Fraud Detection Dataset

    • kaggle.com
    Updated Feb 17, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Arshiya Kishore (2024). Credit Card Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/arshiyakishore/credit-card-fraud-detection-dataset/discussion?sort=undefined
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Feb 17, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Arshiya Kishore
    License

    MIT Licensehttps://opensource.org/licenses/MIT
    License information was derived automatically

    Description

    Dataset

    This dataset was created by Arshiya Kishore

    Released under MIT

    Contents

  12. CerditCard fraud dataset

    • kaggle.com
    Updated Aug 2, 2025
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Wasiq Ali (2025). CerditCard fraud dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/cerditcard-fraud-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Aug 2, 2025
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Wasiq Ali
    License

    Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
    License information was derived automatically

    Description

    Credit Card Fraud Detection Dataset

    Uncover fraudulent transactions with this anonymized, PCA-transformed dataset. Perfect for building and testing fraud detection algorithms!

    Dataset Overview

    • Objective: Detect fraudulent credit card transactions using anonymized features- - - -

    • Samples: 1,000 transactions

    • Features: 7 columns (5 PCA components + Transaction Amount + Target)

    Class Distribution:

    • Legit (Class 0): 993 transactions (~99.3%)

    • Fraud (Class 1): 7 transactions (~0.7%)

    • Key Challenge: Extreme class imbalance – realistic representation of fraud patterns

    Features Description

    Feature Description Characteristics

    V1-V5 Anonymized principal components PCA-transformed numerical features; preserves >transaction patterns while hiding sensitive details Amount Transaction value Highly variable (min: $0.20, max: $1,916.06); critical for fraud analysis Class Target variable Binary labels: • 0 = Legitimate transaction • 1 = Fraudulent transaction Key Insights & Patterns

    Fraud Indicators:

    • Fraudulent transactions occur across diverse amounts (low: $1.83 → high: $1,916)

    • No obvious amount threshold for fraud – requires nuanced modeling

    Sample fraud cases:

    1. V1:0.579, V2:-0.384, Amount:1916.06

    2. V1:1.023, V2:-0.638, Amount:1094.42

    Data Characteristics:
    1. V1-V5 Distributions:

    2. V1: Concentrated near zero (mean ≈ -0.1)

    3. V2: Wider spread (mean ≈ 0.05)

    4. V3-V5: Asymmetric distributions

    Amount Distribution:

    1. Right-skewed – most transactions < $500

    2.Fraud cases span low and high values

    Class Imbalance:

     - Severe skew: 993:7 legit-to-fraud ratio
    
     - Models must optimize for recall/precision over accuracy
    
    Analysis Challenges

    ⚠️ Class Imbalance: Standard accuracy metrics misleading

    🔍 Feature Interpretation: PCA components lack real-world context

    📊 Non-linear Patterns: Complex interactions between V1-V5

    ⚡ High Stakes: False negatives (missed fraud) costlier than false positives

    Recommended Applications Fraud Detection Models:

    Logistic Regression (with class weighting)

    Random Forests / XGBoost (handle non-linearities)

    Isolation Forests (anomaly detection)

    Evaluation Focus:

    Precision-Recall Curves > ROC-AUC

    F2-Score (prioritize recall)

    Confusion matrix analysis

    Advanced Techniques:

    SMOTE/ADASYN for oversampling

    Autoencoders for anomaly detection

    Feature engineering: Amount-to-Var ratios

    Dataset Source & Ethics Origin: Synthetic dataset mirroring real-world financial patterns

    Anonymization: Original features transformed via PCA for privacy compliance

    Bias Consideration: Geographic/cultural biases possible in source data

    Potential Use Cases

    🏦 Banking: Real-time transaction monitoring systems

    📱 FinTech Apps: Fraud detection APIs for payment gateways

    🎓 Education: Imbalanced classification tutorials

    🏆 Kaggle Competitions: Lightweight fraud detection challenge

    Example Project Idea "Minimalist Fraud Detector":

    # python
    from imblearn.pipeline import make_pipeline
    from sklearn.ensemble import RandomForestClassifier
    
    model = make_pipeline(
      RobustScaler(), 
      SMOTE(sampling_strategy=0.3), 
      RandomForestClassifier(class_weight={0:1, 1:15}) 
    )
    Optimize for: Recall @ Precision > 0.85
    

    Dataset Summary markdown | Feature | Mean | Std | Min | Max | |----------|----------|----------|-----------|-----------| | V1 | -0.11 | 1.02 | -3.24 | 3.85 | | V2 | 0.05 | 1.01 | -2.94 | 2.60 | | V3 | 0.02 | 0.98 | -3.02 | 2.95 |
    | Amount | 250.32 | 190.19 | 0.20 | 1916.06 |

  13. h

    phishing-email-dataset

    • huggingface.co
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Zefang Liu, phishing-email-dataset [Dataset]. https://huggingface.co/datasets/zefang-liu/phishing-email-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Authors
    Zefang Liu
    License

    https://choosealicense.com/licenses/lgpl-3.0/https://choosealicense.com/licenses/lgpl-3.0/

    Description

    Phishing Email Dataset

    This dataset on Hugging Face is a direct copy of the 'Phishing Email Detection' dataset from Kaggle, shared under the GNU Lesser General Public License 3.0. The dataset was originally created by the user 'Cyber Cop' on Kaggle. For complete details, including licensing and usage information, please visit the original Kaggle page.

  14. Data from: Data Fraud Detection

    • kaggle.com
    zip
    Updated Dec 5, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    TienNguyen143 (2024). Data Fraud Detection [Dataset]. https://www.kaggle.com/datasets/tiennguyen143/data-fraud-detection
    Explore at:
    zip(0 bytes)Available download formats
    Dataset updated
    Dec 5, 2024
    Authors
    TienNguyen143
    Description

    Dataset

    This dataset was created by TienNguyen143

    Contents

  15. Credit Card Fraud Detection

    • kaggle.com
    Updated Mar 25, 2020
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    kroder (2020). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/datasets/ankit256/credit-card-fraud-detection/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Mar 25, 2020
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    kroder
    Description

    Dataset

    This dataset was created by kroder

    Contents

  16. Credit Card Fraud Detection

    • kaggle.com
    Updated Nov 20, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Sriseshagiri (2021). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/datasets/sriseshagiri/credit-card-fraud-detection
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 20, 2021
    Dataset provided by
    Kaggle
    Authors
    Sriseshagiri
    Description

    Dataset

    This dataset was created by Sriseshagiri

    Contents

  17. Fraud detection payment system's dataset

    • kaggle.com
    Updated May 5, 2023
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Andrew K (2023). Fraud detection payment system's dataset [Dataset]. https://www.kaggle.com/datasets/kornilovag94/payment-systems-transactions-synthetic-dataset
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 5, 2023
    Dataset provided by
    Kaggle
    Authors
    Andrew K
    License

    Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
    License information was derived automatically

    Description

    Inspiration

    If you find this dataset useful, pls drop a like.

    Description

    Banks and Payment systems are often exposed to fraudulent transactions and constantly improve systems to track them.

    The synthetic dataset below contains 6.3mln transactions with 10 features.

  18. Fraud Challenge Data

    • kaggle.com
    Updated Nov 11, 2021
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    George Hayduke (2021). Fraud Challenge Data [Dataset]. https://www.kaggle.com/ban7002/fraud-challenge-data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Nov 11, 2021
    Dataset provided by
    Kaggle
    Authors
    George Hayduke
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Context

    Simple fraud detection dataset. The target is EVENT_LABEL 1 = fraud 0 = not fraud

    Content

    This is a great dataset to practice classification tasks with and challenge students with.

    Inspiration

    1. relatively rare event detection
    2. what variables are important
    3. dealing with high frequency categorical data like user agent, card bin, and postal codes.
  19. credit card fraud detection

    • kaggle.com
    Updated Oct 1, 2024
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Gungun Shukla15 (2024). credit card fraud detection [Dataset]. https://www.kaggle.com/datasets/gungunshukla15/credit-card-fraud-detection/code
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    Oct 1, 2024
    Dataset provided by
    Kagglehttp://kaggle.com/
    Authors
    Gungun Shukla15
    License

    https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/

    Description

    Dataset

    This dataset was created by Gungun Shukla15

    Released under CC0: Public Domain

    Contents

  20. Credit Card Fraud Detection

    • kaggle.com
    Updated May 10, 2022
    Share
    FacebookFacebook
    TwitterTwitter
    Email
    Click to copy link
    Link copied
    Close
    Cite
    Emily Smith (2022). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/datasets/emilysmithh/credit-card-fraud-detection/data
    Explore at:
    CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
    Dataset updated
    May 10, 2022
    Dataset provided by
    Kaggle
    Authors
    Emily Smith
    Description

    Dataset

    This dataset was created by Emily Smith

    Released under Data files © Original Authors

    Contents

Share
FacebookFacebook
TwitterTwitter
Email
Click to copy link
Link copied
Close
Cite
Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
Organization logo

Bank Account Fraud Dataset Suite (NeurIPS 2022)

Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation.

Explore at:
16 scholarly articles cite this dataset (View in Google Scholar)
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sérgio Jesus
License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

Learn more about Feedzai Research here: https://research.feedzai.com/

Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }

Search
Clear search
Close search
Google apps
Main menu