64 datasets found

Bank Account Fraud Dataset Suite (NeurIPS 2022)
kaggle.com
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sérgio Jesus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

Learn more about Feedzai Research here: https://research.feedzai.com/

Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }
Vehicle Insurance Claim Fraud Detection
kaggle.com
Updated Dec 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Shivam Bansal (2021). Vehicle Insurance Claim Fraud Detection [Dataset]. https://www.kaggle.com/datasets/shivamb/vehicle-claim-fraud-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Dec 20, 2021
Dataset provided by
Kaggle
Authors
Shivam Bansal
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Vehicle Insurance Fraud Detection

Vehicle insurance fraud involves conspiring to make false or exaggerated claims involving property damage or personal injuries following an accident. Some common examples include staged accidents where fraudsters deliberately “arrange” for accidents to occur; the use of phantom passengers where people who were not even at the scene of the accident claim to have suffered grievous injury, and make false personal injury claims where personal injuries are grossly exaggerated.

About this dataset

This dataset contains vehicle dataset - attribute, model, accident details, etc along with policy details - policy type, tenure etc. The target is to detect if a claim application is fraudulent or not - FraudFound_P
t
Credit Card Fraud Detection
test.researchdata.tuwien.ac.at
zenodo.org
+1more
csv, json, pdf +2
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
Explore at:
text/markdown, csv, pdf, txt, jsonAvailable download formats
Unique identifier
https://doi.org/10.82556/yvxj-9t22
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

1. Dataset Description

Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

Method of Dataset Preparation

Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

2. Technical Details

Dataset Structure

The raw data is a single CSV with columns:

actionnr (integer transaction ID)

merchant_id (string)

average_amount_transaction_day (float)

transaction_amount (float)

is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

Naming Conventions

All columns use lowercase snake_case.

Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

Files in the code repo follow a clear structure:

├── data/ # local copies only; raw data lives in DBRepo ├── notebooks/Task.ipynb ├── models/rf_model_v1.joblib ├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv ├── README.md ├── requirements.txt └── codemeta.json

Required Software

Python 3.9+

pandas, numpy (data handling)

scikit-learn (modeling, metrics)

matplotlib (visualizations)

dbrepo‐client.py (DBRepo API)

requests (TU WRD API)

Additional Resources

Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud

Scikit-learn docs: https://scikit-learn.org/stable

DBRepo API guide: via the starter notebook’s dbrepo_client.py template

TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs

3. Further Details

Data Limitations

Highly imbalanced: only ~0.17% of transactions are fraudulent.

Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

Licensing and Attribution

Raw data: CC-0 (per Kaggle terms)

Code & notebooks: MIT License

Model artifacts & outputs: CC-BY 4.0

DUWRD records include ORCID identifiers for the author.

Recommended Uses

Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

Extension: adding time‐series or deep‐learning models.

Known Issues

Possible temporal leakage if date/time features not handled correctly.

Model performance may degrade on live data due to concept drift.

Binary flags may oversimplify nuanced transaction outcomes.
credit card fraud detection dataset
kaggle.com
Updated Jul 15, 2023
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Muste A(M.A) (2023). credit card fraud detection dataset [Dataset]. https://www.kaggle.com/datasets/mustefaage22m014/credit-card-fraud-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 15, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Muste A(M.A)
Description
Dataset

This dataset was created by Muste A(M.A)

Contents
Fraud Detection - Financial transactions
find.data.gov.scot
csv
Updated Mar 14, 2018
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Deloitte Datathon 2018 (uSmart) (2018). Fraud Detection - Financial transactions [Dataset]. https://find.data.gov.scot/datasets/39167
Explore at:
csv(470.6714 MB)Available download formats
Dataset updated
Mar 14, 2018
Dataset provided by
Deloittehttps://deloitte.com/
Description
Synthetic transactional data with labels for fraud detection. For more information, see: https://www.kaggle.com/ntnu-testimon/paysim1/version/2
b
Credit Card Fraud Detection
berd-platform.de
csv
Updated Jul 31, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Kaggle (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82939/qcqqe-g6q16
Explore at:
csvAvailable download formats
Unique identifier
https://doi.org/10.82939/qcqqe-g6q16
Dataset updated
Jul 31, 2025
Dataset provided by
Kaggle
Description
The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset is 0.15 GB large.
The dataset has been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of ULB (Université Libre de Bruxelles) on big data mining and fraud detection.
fraud detection
kaggle.com
Updated Jul 21, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
IAbhishekBhardwaj (2021). fraud detection [Dataset]. https://www.kaggle.com/datasets/iabhishekbhardwaj/fraud-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jul 21, 2021
Dataset provided by
Kagglehttp://kaggle.com/
Authors
IAbhishekBhardwaj
Description
Dataset

This dataset was created by IAbhishekBhardwaj

Contents
CCFD_dataset
figshare.com
xlsx
Updated May 30, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Nur Amirah Ishak; Keng-Hoong Ng; Gee-Kok Tong; Suraya Nurain Kalid; Kok-Chin Khor (2023). CCFD_dataset [Dataset]. http://doi.org/10.6084/m9.figshare.16695616.v3
Explore at:
xlsxAvailable download formats
Unique identifier
https://doi.org/10.6084/m9.figshare.16695616.v3
Dataset updated
May 30, 2023
Dataset provided by
Figsharehttp://figshare.com/
figshare
Authors
Nur Amirah Ishak; Keng-Hoong Ng; Gee-Kok Tong; Suraya Nurain Kalid; Kok-Chin Khor
License
CC0 1.0 Universal Public Domain Dedicationhttps://creativecommons.org/publicdomain/zero/1.0/
License information was derived automatically
Description
The dataset has been released by [1], which had been collected and analysed during a research collaboration of Worldline and the Machine Learning Group (http://mlg.ulb.ac.be) of Université Libre de Bruxelles (ULB) on big data mining and fraud detection. [1] Pozzolo, A. D., Caelan, O., Johnson, R. A., and Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. 2015 IEEE Symposium Series on Computational, pp. 159-166, doi: 10.1109/SSCI.2015.33 open source kaggle : https://www.kaggle.com/mlg-ulb/creditcardfraud
Fraud Detection in Financial Transactions
kaggle.com
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darshan Dalvi (2025). Fraud Detection in Financial Transactions [Dataset]. https://www.kaggle.com/datasets/darshandalvi12/fraud-detection-in-financial-transactions/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Darshan Dalvi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Credit Card Fraud Detection Dataset (Updated)

This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.

Dataset Details:

Number of Transactions: 284,807

Fraudulent Transactions: 492 (Highly Imbalanced)

Features:

28 anonymized features (V1 to V28)

Transaction amount

Timestamp

Label:

0: Legitimate

1: Fraudulent

Data Preprocessing:

SMOTE (Synthetic Minority Oversampling Technique) has been applied to address the class imbalance in the dataset, generating synthetic examples for the minority class (fraudulent transactions).

Additional Operations: Various preprocessing steps were performed, including data cleaning and feature engineering, to ensure the quality of the dataset for model training.

Processed Files:

The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set

This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.

This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.
Credit Card Fraud Detection Dataset
kaggle.com
zip
Updated Dec 30, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
AYUSH VARSHNEY (2020). Credit Card Fraud Detection Dataset [Dataset]. https://www.kaggle.com/ayushvarshnay/credit-card-fraud-detection-dataset
Explore at:
zip(12076 bytes)Available download formats
Dataset updated
Dec 30, 2020
Authors
AYUSH VARSHNEY
Description
Dataset

This dataset was created by AYUSH VARSHNEY

Contents
Credit Card Fraud Detection Dataset
kaggle.com
Updated Feb 17, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Arshiya Kishore (2024). Credit Card Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/arshiyakishore/credit-card-fraud-detection-dataset/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Feb 17, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Arshiya Kishore
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Dataset

This dataset was created by Arshiya Kishore

Released under MIT

Contents
CerditCard fraud dataset
kaggle.com
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wasiq Ali (2025). CerditCard fraud dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/cerditcard-fraud-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wasiq Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Credit Card Fraud Detection Dataset

Uncover fraudulent transactions with this anonymized, PCA-transformed dataset. Perfect for building and testing fraud detection algorithms!

Dataset Overview

Objective: Detect fraudulent credit card transactions using anonymized features- - - -

Samples: 1,000 transactions

Features: 7 columns (5 PCA components + Transaction Amount + Target)

Class Distribution:

Legit (Class 0): 993 transactions (~99.3%)

Fraud (Class 1): 7 transactions (~0.7%)

Key Challenge: Extreme class imbalance – realistic representation of fraud patterns

Features Description

Feature Description Characteristics

V1-V5 Anonymized principal components PCA-transformed numerical features; preserves >transaction patterns while hiding sensitive details Amount Transaction value Highly variable (min: $0.20, max: $1,916.06); critical for fraud analysis Class Target variable Binary labels: • 0 = Legitimate transaction • 1 = Fraudulent transaction Key Insights & Patterns

Fraud Indicators:

Fraudulent transactions occur across diverse amounts (low: $1.83 → high: $1,916)

No obvious amount threshold for fraud – requires nuanced modeling

Sample fraud cases:

V1:0.579, V2:-0.384, Amount:1916.06

V1:1.023, V2:-0.638, Amount:1094.42

Data Characteristics:

V1-V5 Distributions:

V1: Concentrated near zero (mean ≈ -0.1)

V2: Wider spread (mean ≈ 0.05)

V3-V5: Asymmetric distributions

Amount Distribution:

Right-skewed – most transactions < $500

2.Fraud cases span low and high values

Class Imbalance:

- Severe skew: 993:7 legit-to-fraud ratio - Models must optimize for recall/precision over accuracy

Analysis Challenges

⚠️ Class Imbalance: Standard accuracy metrics misleading

🔍 Feature Interpretation: PCA components lack real-world context

📊 Non-linear Patterns: Complex interactions between V1-V5

⚡ High Stakes: False negatives (missed fraud) costlier than false positives

Recommended Applications Fraud Detection Models:

Logistic Regression (with class weighting)

Random Forests / XGBoost (handle non-linearities)

Isolation Forests (anomaly detection)

Evaluation Focus:

Precision-Recall Curves > ROC-AUC

F2-Score (prioritize recall)

Confusion matrix analysis

Advanced Techniques:

SMOTE/ADASYN for oversampling

Autoencoders for anomaly detection

Feature engineering: Amount-to-Var ratios

Dataset Source & Ethics Origin: Synthetic dataset mirroring real-world financial patterns

Anonymization: Original features transformed via PCA for privacy compliance

Bias Consideration: Geographic/cultural biases possible in source data

Potential Use Cases

🏦 Banking: Real-time transaction monitoring systems

📱 FinTech Apps: Fraud detection APIs for payment gateways

🎓 Education: Imbalanced classification tutorials

🏆 Kaggle Competitions: Lightweight fraud detection challenge

Example Project Idea "Minimalist Fraud Detector":

# python from imblearn.pipeline import make_pipeline from sklearn.ensemble import RandomForestClassifier model = make_pipeline( RobustScaler(), SMOTE(sampling_strategy=0.3), RandomForestClassifier(class_weight={0:1, 1:15}) ) Optimize for: Recall @ Precision > 0.85

Dataset Summary markdown | Feature | Mean | Std | Min | Max | |----------|----------|----------|-----------|-----------| | V1 | -0.11 | 1.02 | -3.24 | 3.85 | | V2 | 0.05 | 1.01 | -2.94 | 2.60 | | V3 | 0.02 | 0.98 | -3.02 | 2.95 |
| Amount | 250.32 | 190.19 | 0.20 | 1916.06 |
h
phishing-email-dataset
huggingface.co
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Zefang Liu, phishing-email-dataset [Dataset]. https://huggingface.co/datasets/zefang-liu/phishing-email-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Authors
Zefang Liu
License
https://choosealicense.com/licenses/lgpl-3.0/https://choosealicense.com/licenses/lgpl-3.0/
Description
Phishing Email Dataset

This dataset on Hugging Face is a direct copy of the 'Phishing Email Detection' dataset from Kaggle, shared under the GNU Lesser General Public License 3.0. The dataset was originally created by the user 'Cyber Cop' on Kaggle. For complete details, including licensing and usage information, please visit the original Kaggle page.
Data from: Data Fraud Detection
kaggle.com
zip
Updated Dec 5, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
TienNguyen143 (2024). Data Fraud Detection [Dataset]. https://www.kaggle.com/datasets/tiennguyen143/data-fraud-detection
Explore at:
zip(0 bytes)Available download formats
Dataset updated
Dec 5, 2024
Authors
TienNguyen143
Description
Dataset

This dataset was created by TienNguyen143

Contents
Credit Card Fraud Detection
kaggle.com
Updated Mar 25, 2020
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
kroder (2020). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/datasets/ankit256/credit-card-fraud-detection/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Mar 25, 2020
Dataset provided by
Kagglehttp://kaggle.com/
Authors
kroder
Description
Dataset

This dataset was created by kroder

Contents
Credit Card Fraud Detection
kaggle.com
Updated Nov 20, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sriseshagiri (2021). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/datasets/sriseshagiri/credit-card-fraud-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 20, 2021
Dataset provided by
Kaggle
Authors
Sriseshagiri
Description
Dataset

This dataset was created by Sriseshagiri

Contents
Fraud detection payment system's dataset
kaggle.com
Updated May 5, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Andrew K (2023). Fraud detection payment system's dataset [Dataset]. https://www.kaggle.com/datasets/kornilovag94/payment-systems-transactions-synthetic-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 5, 2023
Dataset provided by
Kaggle
Authors
Andrew K
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Description
Inspiration

If you find this dataset useful, pls drop a like.

Description

Banks and Payment systems are often exposed to fraudulent transactions and constantly improve systems to track them.

The synthetic dataset below contains 6.3mln transactions with 10 features.
Fraud Challenge Data
kaggle.com
Updated Nov 11, 2021
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
George Hayduke (2021). Fraud Challenge Data [Dataset]. https://www.kaggle.com/ban7002/fraud-challenge-data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 11, 2021
Dataset provided by
Kaggle
Authors
George Hayduke
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Context

Simple fraud detection dataset. The target is EVENT_LABEL 1 = fraud 0 = not fraud

Content

This is a great dataset to practice classification tasks with and challenge students with.

Inspiration

relatively rare event detection

what variables are important

dealing with high frequency categorical data like user agent, card bin, and postal codes.
credit card fraud detection
kaggle.com
Updated Oct 1, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Gungun Shukla15 (2024). credit card fraud detection [Dataset]. https://www.kaggle.com/datasets/gungunshukla15/credit-card-fraud-detection/code
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Oct 1, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Gungun Shukla15
License
https://creativecommons.org/publicdomain/zero/1.0/https://creativecommons.org/publicdomain/zero/1.0/
Description
Dataset

This dataset was created by Gungun Shukla15

Released under CC0: Public Domain

Contents
Credit Card Fraud Detection
kaggle.com
Updated May 10, 2022
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Emily Smith (2022). Credit Card Fraud Detection [Dataset]. https://www.kaggle.com/datasets/emilysmithh/credit-card-fraud-detection/data
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
May 10, 2022
Dataset provided by
Kaggle
Authors
Emily Smith
Description
Dataset

This dataset was created by Emily Smith

Released under Data files © Original Authors

Contents

Facebook

Twitter

Click to copy link

Link copied

Cite

Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022

Bank Account Fraud Dataset Suite (NeurIPS 2022)

Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation.

Explore at:

16 scholarly articles cite this dataset (View in Google Scholar)

CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.

Dataset updated

Nov 29, 2023

Dataset provided by

Kagglehttp://kaggle.com/

Authors

Sérgio Jesus

License

Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically

Description

The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

Learn more about Feedzai Research here: https://research.feedzai.com/

Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }

Clear search

Close search

Google apps

Main menu

Bank Account Fraud Dataset Suite (NeurIPS 2022)

Vehicle Insurance Claim Fraud Detection

Vehicle Insurance Fraud Detection

About this dataset

Credit Card Fraud Detection

1. Dataset Description

2. Technical Details

3. Further Details

credit card fraud detection dataset

Dataset

Contents

Fraud Detection - Financial transactions

Credit Card Fraud Detection

fraud detection

Dataset

Contents

CCFD_dataset

Fraud Detection in Financial Transactions

Credit Card Fraud Detection Dataset (Updated)

Dataset Details:

Data Preprocessing:

Processed Files:

Credit Card Fraud Detection Dataset

Dataset

Contents

Credit Card Fraud Detection Dataset

Dataset

Contents

CerditCard fraud dataset

Dataset Overview

Features Description

Fraud Indicators:

Sample fraud cases:

Data Characteristics:

Analysis Challenges

phishing-email-dataset

Data from: Data Fraud Detection

Dataset

Contents

Credit Card Fraud Detection

Dataset

Contents

Credit Card Fraud Detection

Dataset

Contents

Fraud detection payment system's dataset

Inspiration

Description

Fraud Challenge Data

Context

Content

Inspiration

credit card fraud detection

Dataset

Contents

Credit Card Fraud Detection

Dataset

Contents

Bank Account Fraud Dataset Suite (NeurIPS 2022)

Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation.