Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual data on the nature of fraud and computer misuse offences. Data for the year ending March 2021 and March 2022 are from the Telephone-operated Crime Survey for England and Wales (TCSEW).
Payment card fraud - including both credit cards and debit cards - is forecast to grow by over ** billion U.S. dollars between 2022 and 2028. Especially outside the United States, the amount of fraudulent payments almost doubled from 2014 to 2021. In total, fraudulent card payments reached ** billion U.S. dollars in 2021. Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Estimates from Crime Survey for England and Wales (CSEW) on fraud and computer misuse. Also data from Home Office police recorded crime on the number of online offences recorded by the police and Action Fraud figures broken down by police force area.
These tables were formerly known as Experimental tables.
Please note: This set of tables are no longer produced. All content previously released within these tables has, or will be, redistributed among other sets of tables.
The dataset is generated using the Faker library to simulate transaction data. It contains several columns that represent both user and transaction information, including features for detecting fraudulent activities. The data includes a mix of categorical, numerical, and datetime values, which need to be processed for machine learning.
In 2022, the District of Columbia was the state with the highest rate of consumer fraud and other related problems, with a rate of ***** reports per 100,000 of the population. North Dakota had the lowest rate of consumer fraud reports in that year, at *** reports per 100,000 of the population.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Data showing fraud statistics in Plymouth.
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!
This suite of datasets is:
- Realistic, based on a present-day real-world dataset for fraud detection;
- Biased, each dataset has distinct controlled types of bias;
- Imbalanced, this setting presents a extremely low prevalence of positive class;
- Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">
https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">
Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).
Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf
Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud
Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358
Learn more about Feedzai Research here: https://research.feedzai.com/
Please, use the following citation of BAF dataset suite
@article{jesusTurningTablesBiased2022,
title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}},
author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro},
journal={Advances in Neural Information Processing Systems},
year={2022}
}
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Fraud detected in Defence SA for 2022-23 Financial Year.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Data showing fraud statistics in Plymouth.
https://data.gov.tw/licensehttps://data.gov.tw/license
Provide telecommunications fraud case data (This data is preliminary statistics at the beginning of each quarter, for reference only, the accurate statistics are based on the annual crime statistics data of this department).
Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018. It was estimated that merchants and card acquirers lost well over ** billion U.S. dollars, with - so the source adds - roughly ** billion U.S. dollar coming from the United States alone. Note that the figures provided here included both credit card fraud and debit card fraud. The source does not separate between the two, and also did not provide figures on the United States - a country known for its reliance on credit cards.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Data from the Crime Survey for England and Wales (CSEW) and the National Fraud Intelligence Bureau (NFIB), including numbers of incidents and characteristics of victims.
https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
E-Commerce Fraud Statistics: When you shop online, you probably think about getting the best deal, fast delivery, or whether the product will match the description. But there’s a whole other side to e-commerce that most shopping people never see, the world of fraud. And trust me, these numbers will shock you, because you did to me. These e-commerce fraud statistics aren’t just random figures in a report; they show the real damage that scammers are causing to businesses and even regular customers like us.
Over the years, fraud in online shopping has gone from the stolen credit card to a multi-billion-dollar global problem. We’re talking billions lost every single year, and it’s only getting worse. These statistics tell a story about how criminals work, where they strike the most, and which types of fraud cost businesses the most money. If you’ve ever wondered just how big the problem is, or what kinds of tricks fraudsters are using, let’s get started.
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:
Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.
Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.
Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.
Method of Dataset Preparation
Schema validation: Renamed columns to snake_case (e.g. transaction_amount
, is_declined
) so they conform to DBRepo’s requirements.
Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).
Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr
. Each subset was materialized in DBRepo and assigned its own PID for precise citation.
Cleaning: Converted the categorical flags (is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr
, merchant_id
).
Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.
Dataset Structure
The raw data is a single CSV with columns:
actionnr
(integer transaction ID)
merchant_id
(string)
average_amount_transaction_day
(float)
transaction_amount
(float)
is_declined
, isforeigntransaction
, ishighriskcountry
, isfradulent
(binary flags)
total_number_of_declines_day
, daily_chargeback_avg_amt
, sixmonth_avg_chbk_amt
, sixmonth_chbk_freq
(numeric features)
Naming Conventions
All columns use lowercase snake_case.
Subsets are named creditcard_training
, creditcard_validation
, creditcard_test
in DBRepo.
Files in the code repo follow a clear structure:
├── data/ # local copies only; raw data lives in DBRepo
├── notebooks/Task.ipynb
├── models/rf_model_v1.joblib
├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv
├── README.md
├── requirements.txt
└── codemeta.json
Required Software
Python 3.9+
pandas, numpy (data handling)
scikit-learn (modeling, metrics)
matplotlib (visualizations)
dbrepo‐client.py (DBRepo API)
requests (TU WRD API)
Additional Resources
Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud
Scikit-learn docs: https://scikit-learn.org/stable
DBRepo API guide: via the starter notebook’s dbrepo_client.py
template
TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs
Data Limitations
Highly imbalanced: only ~0.17% of transactions are fraudulent.
Anonymized PCA features (V1
–V28
) hidden; we extended with domain features but cannot reverse engineer raw variables.
Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.
Licensing and Attribution
Raw data: CC-0 (per Kaggle terms)
Code & notebooks: MIT License
Model artifacts & outputs: CC-BY 4.0
DUWRD records include ORCID identifiers for the author.
Recommended Uses
Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.
Educational purposes: demonstrating model‐training pipelines, FAIR data practices.
Extension: adding time‐series or deep‐learning models.
Known Issues
Possible temporal leakage if date/time features not handled correctly.
Model performance may degrade on live data due to concept drift.
Binary flags may oversimplify nuanced transaction outcomes.
Medicaid Fraud Control Units (MFCU or Unit) investigate and prosecute Medicaid fraud as well as patient abuse and neglect in health care facilities. OIG certifies, and annually recertifies, each MFCU. OIG collects information about MFCU operations and assesses whether they comply with statutes, regulations, and OIG policy. OIG also analyzes MFCU performance based on 12 published performance standards and recommends program improvements where appropriate.
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.
The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set
This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.
This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
The global credit card fraud detection platform market is experiencing robust growth, driven by the escalating volume of digital transactions and the increasing sophistication of fraud techniques. The market, valued at approximately $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033. This substantial growth is fueled by several key factors. The rising adoption of e-commerce and mobile payments creates a larger attack surface for fraudsters, necessitating advanced detection solutions. Furthermore, the increasing prevalence of sophisticated fraud schemes, such as synthetic identity theft and account takeover, demands more intelligent and adaptive fraud detection systems. The market is segmented by screening type (manual and automatic) and application (personal and enterprise), with automatic screening and enterprise applications driving the majority of growth due to their scalability and efficiency. The competitive landscape is dynamic, with established players like FICO, Mastercard, and Visa competing alongside innovative startups such as Forter and Feedzai. These companies continuously develop AI-powered solutions leveraging machine learning and big data analytics to identify and prevent fraudulent transactions effectively. Regional growth varies, with North America and Europe currently holding significant market share, but Asia-Pacific is expected to experience rapid expansion in the coming years due to rising digital adoption and economic growth in countries like India and China. The continued growth of the credit card fraud detection platform market hinges on several factors. The increasing demand for real-time fraud detection capabilities is driving the adoption of cloud-based solutions and the integration of advanced analytics. Regulatory compliance requirements, particularly around data privacy and security, also contribute to market growth. However, challenges remain. The cost of implementing and maintaining these sophisticated systems can be prohibitive for smaller businesses. Moreover, the constant evolution of fraud techniques necessitates ongoing investment in research and development to stay ahead of emerging threats. The market’s future trajectory will depend on the continued innovation in fraud detection technologies, the ability to adapt to evolving fraud tactics, and the successful integration of these solutions across various industries and geographies.
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Credit Card Fraud Detection Dataset
Uncover fraudulent transactions with this anonymized, PCA-transformed dataset. Perfect for building and testing fraud detection algorithms!
Objective: Detect fraudulent credit card transactions using anonymized features- - - -
Samples: 1,000 transactions
Features: 7 columns (5 PCA components + Transaction Amount + Target)
Class Distribution:
Legit (Class 0): 993 transactions (~99.3%)
Fraud (Class 1): 7 transactions (~0.7%)
Key Challenge: Extreme class imbalance – realistic representation of fraud patterns
Feature Description Characteristics
V1-V5 Anonymized principal components PCA-transformed numerical features; preserves >transaction patterns while hiding sensitive details Amount Transaction value Highly variable (min: $0.20, max: $1,916.06); critical for fraud analysis Class Target variable Binary labels: • 0 = Legitimate transaction • 1 = Fraudulent transaction Key Insights & Patterns
Fraud Indicators:
Fraudulent transactions occur across diverse amounts (low: $1.83 → high: $1,916)
No obvious amount threshold for fraud – requires nuanced modeling
V1:0.579, V2:-0.384, Amount:1916.06
V1:1.023, V2:-0.638, Amount:1094.42
V1-V5 Distributions:
V1: Concentrated near zero (mean ≈ -0.1)
V2: Wider spread (mean ≈ 0.05)
V3-V5: Asymmetric distributions
Amount Distribution:
2.Fraud cases span low and high values
Class Imbalance:
- Severe skew: 993:7 legit-to-fraud ratio
- Models must optimize for recall/precision over accuracy
⚠️ Class Imbalance: Standard accuracy metrics misleading
🔍 Feature Interpretation: PCA components lack real-world context
📊 Non-linear Patterns: Complex interactions between V1-V5
⚡ High Stakes: False negatives (missed fraud) costlier than false positives
Recommended Applications Fraud Detection Models:
Logistic Regression (with class weighting)
Random Forests / XGBoost (handle non-linearities)
Isolation Forests (anomaly detection)
Evaluation Focus:
Precision-Recall Curves > ROC-AUC
F2-Score (prioritize recall)
Confusion matrix analysis
Advanced Techniques:
SMOTE/ADASYN for oversampling
Autoencoders for anomaly detection
Feature engineering: Amount-to-Var ratios
Dataset Source & Ethics Origin: Synthetic dataset mirroring real-world financial patterns
Anonymization: Original features transformed via PCA for privacy compliance
Bias Consideration: Geographic/cultural biases possible in source data
Potential Use Cases
🏦 Banking: Real-time transaction monitoring systems
📱 FinTech Apps: Fraud detection APIs for payment gateways
🎓 Education: Imbalanced classification tutorials
🏆 Kaggle Competitions: Lightweight fraud detection challenge
Example Project Idea "Minimalist Fraud Detector":
# python
from imblearn.pipeline import make_pipeline
from sklearn.ensemble import RandomForestClassifier
model = make_pipeline(
RobustScaler(),
SMOTE(sampling_strategy=0.3),
RandomForestClassifier(class_weight={0:1, 1:15})
)
Optimize for: Recall @ Precision > 0.85
Dataset Summary
markdown
| Feature | Mean | Std | Min | Max |
|----------|----------|----------|-----------|-----------|
| V1 | -0.11 | 1.02 | -3.24 | 3.85 |
| V2 | 0.05 | 1.01 | -2.94 | 2.60 |
| V3 | 0.02 | 0.98 | -3.02 | 2.95 |
| Amount | 250.32 | 190.19 | 0.20 | 1916.06 |
This dataset was created by Dileep
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
1) Data Introduction • The Financial Payment Services Fraud Data Dataset is based on a real-world financial transaction simulation and was collected to detect fraudulent activities across various types of payments and transfers. It includes key financial data such as transaction time, type, amount, sender and recipient information, and account balances before and after each transaction. Each transaction is labeled as either fraudulent or legitimate.
2) Data Utilization (1) Characteristics of the Financial Payment Services Fraud Data Dataset: • With its large-scale transaction records, detailed account information, and diverse transaction types, this dataset is well-suited for developing and testing financial fraud detection models.
(2) Applications of the Financial Payment Services Fraud Data Dataset: • Real-time Fraud Detection: The dataset can be used to train machine learning classification models that quickly detect and prevent fraudulent transactions in real-world financial service environments. • Risky Transaction Pattern Analysis: By analyzing patterns according to transaction type, amount, and account, the dataset can support the advancement of fraud prevention policies and anomaly monitoring systems.
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Annual data on the nature of fraud and computer misuse offences. Data for the year ending March 2021 and March 2022 are from the Telephone-operated Crime Survey for England and Wales (TCSEW).