100+ datasets found

Nature of crime: fraud and computer misuse
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Apr 8, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2025). Nature of crime: fraud and computer misuse [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/natureofcrimefraudandcomputermisuse
Explore at:
xlsxAvailable download formats
Dataset updated
Apr 8, 2025
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Annual data on the nature of fraud and computer misuse offences. Data for the year ending March 2021 and March 2022 are from the Telephone-operated Crime Survey for England and Wales (TCSEW).
Card fraud in the U.S. versus rest of the world 2014-2023, with global...
statista.com
Updated Jun 25, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Card fraud in the U.S. versus rest of the world 2014-2023, with global forecasts 2028 [Dataset]. https://www.statista.com/statistics/1264329/value-fraudulent-card-transactions-worldwide/
Explore at:
Dataset updated
Jun 25, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2024
Area covered
United States
Description
Payment card fraud - including both credit cards and debit cards - is forecast to grow by over ** billion U.S. dollars between 2022 and 2028. Especially outside the United States, the amount of fraudulent payments almost doubled from 2014 to 2021. In total, fraudulent card payments reached ** billion U.S. dollars in 2021. Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018.
Crime in England and Wales: Additional tables on fraud and cybercrime
ons.gov.uk
xlsx
Updated Apr 25, 2019
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2019). Crime in England and Wales: Additional tables on fraud and cybercrime [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/crimeinenglandandwalesexperimentaltables
Explore at:
xlsxAvailable download formats
Dataset updated
Apr 25, 2019
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Estimates from Crime Survey for England and Wales (CSEW) on fraud and computer misuse. Also data from Home Office police recorded crime on the number of online offences recorded by the police and Action Fraud figures broken down by police force area.

These tables were formerly known as Experimental tables.

Please note: This set of tables are no longer produced. All content previously released within these tables has, or will be, redistributed among other sets of tables.
Fraud Detection Dataset
kaggle.com
Updated Nov 9, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sameerk (2024). Fraud Detection Dataset [Dataset]. https://www.kaggle.com/datasets/sameerk2004/fraud-detection-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 9, 2024
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sameerk
Description
The dataset is generated using the Faker library to simulate transaction data. It contains several columns that represent both user and transaction information, including features for detecting fraudulent activities. The data includes a mix of categorical, numerical, and datetime values, which need to be processed for machine learning.
Consumer fraud report rate, by state U.S. 2022
statista.com
Updated Jul 11, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Consumer fraud report rate, by state U.S. 2022 [Dataset]. https://www.statista.com/statistics/302313/consumer-fraud-report-rate-in-the-us/
Explore at:
Dataset updated
Jul 11, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
2022
Area covered
United States
Description
In 2022, the District of Columbia was the state with the highest rate of consumer fraud and other related problems, with a rate of ***** reports per 100,000 of the population. North Dakota had the lowest rate of consumer fraud reports in that year, at *** reports per 100,000 of the population.
Fraud Statistics - Dataset - data.gov.uk
ckan.publishing.service.gov.uk
Updated Dec 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
ckan.publishing.service.gov.uk (2016). Fraud Statistics - Dataset - data.gov.uk [Dataset]. https://ckan.publishing.service.gov.uk/dataset/plymouth-city-council-fraud-statistics-2015
Explore at:
Dataset updated
Dec 19, 2016
Dataset provided by
CKANhttps://ckan.org/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Data showing fraud statistics in Plymouth.
Bank Account Fraud Dataset Suite (NeurIPS 2022)
kaggle.com
Updated Nov 29, 2023
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sérgio Jesus (2023). Bank Account Fraud Dataset Suite (NeurIPS 2022) [Dataset]. https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Nov 29, 2023
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Sérgio Jesus
License
Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)https://creativecommons.org/licenses/by-nc-sa/4.0/
License information was derived automatically
Description
The Bank Account Fraud (BAF) suite of datasets has been published at NeurIPS 2022 and it comprises a total of 6 different synthetic bank account fraud tabular datasets. BAF is a realistic, complete, and robust test bed to evaluate novel and existing methods in ML and fair ML, and the first of its kind!

This suite of datasets is: - Realistic, based on a present-day real-world dataset for fraud detection; - Biased, each dataset has distinct controlled types of bias; - Imbalanced, this setting presents a extremely low prevalence of positive class; - Dynamic, with temporal data and observed distribution shifts;
- Privacy preserving, to protect the identity of potential applicants we have applied differential privacy techniques (noise addition), feature encoding and trained a generative model (CTGAN).

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2F4271ec763b04362801df2660c6e2ec30%2FScreenshot%20from%202022-11-29%2017-42-41.png?generation=1669743799938811&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Faf502caf5b9e370b869b85c9d4642c5c%2FScreenshot%20from%202022-12-15%2015-17-59.png?generation=1671117525527314&alt=media" alt="">

https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F3349776%2Ff3789bd484ee392d648b7809429134df%2FScreenshot%20from%202022-11-29%2017-40-58.png?generation=1669743681526133&alt=media" alt="">

Each dataset is composed of: - 1 million instances; - 30 realistic features used in the fraud detection use-case; - A column of “month”, providing temporal information about the dataset; - Protected attributes, (age group, employment status and % income).

Detailed information (datasheet) on the suite: https://github.com/feedzai/bank-account-fraud/blob/main/documents/datasheet.pdf

Check out the github repository for more resources and some example notebooks: https://github.com/feedzai/bank-account-fraud

Read the NeurIPS 2022 paper here: https://arxiv.org/abs/2211.13358

Learn more about Feedzai Research here: https://research.feedzai.com/

Please, use the following citation of BAF dataset suite @article{jesusTurningTablesBiased2022, title={Turning the {{Tables}}: {{Biased}}, {{Imbalanced}}, {{Dynamic Tabular Datasets}} for {{ML Evaluation}}}, author={Jesus, S{\'e}rgio and Pombal, Jos{\'e} and Alves, Duarte and Cruz, Andr{\'e} and Saleiro, Pedro and Ribeiro, Rita P. and Gama, Jo{\~a}o and Bizarro, Pedro}, journal={Advances in Neural Information Processing Systems}, year={2022} }
d
Fraud Detection 2022-23 - Dataset - data.sa.gov.au
data.sa.gov.au
Updated Jul 1, 2022
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
(2022). Fraud Detection 2022-23 - Dataset - data.sa.gov.au [Dataset]. https://data.sa.gov.au/data/dataset/fraud-detection-2022-23-defencesa
Explore at:
Dataset updated
Jul 1, 2022
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Area covered
South Australia
Description
Fraud detected in Defence SA for 2022-23 Financial Year.
w
Fraud Statistics
data.wu.ac.at
data.gov.uk
csv
Updated Dec 19, 2016
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Plymouth City Council (2016). Fraud Statistics [Dataset]. https://data.wu.ac.at/schema/data_gov_uk/ZGQwMTRiNDctYWFmNC00Mjk2LThkMWMtZTY4MzBjMDAzZWI0
Explore at:
csvAvailable download formats
Dataset updated
Dec 19, 2016
Dataset provided by
Plymouth City Council
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Data showing fraud statistics in Plymouth.
d
Telecommunication scam criminal data
data.gov.tw
api, csv
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
National Police Administration, Telecommunication scam criminal data [Dataset]. https://data.gov.tw/en/datasets/98176
Explore at:
api, csvAvailable download formats
Dataset authored and provided by
National Police Administration
License
https://data.gov.tw/licensehttps://data.gov.tw/license
Description
Provide telecommunications fraud case data (This data is preliminary statistics at the beginning of each quarter, for reference only, the accurate statistics are based on the annual crime statistics data of this department).
Annual card fraud - credit cards and debit cards combined - worldwide...
statista.com
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Statista (2025). Annual card fraud - credit cards and debit cards combined - worldwide 2014-2023 [Dataset]. https://www.statista.com/statistics/1394119/global-card-fraud-losses/
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
Statistahttp://statista.com/
Time period covered
Dec 2024
Area covered
Worldwide
Description
Card fraud losses across the world increased by more than ** percent between 2020 and 2021, the largest increase since 2018. It was estimated that merchants and card acquirers lost well over ** billion U.S. dollars, with - so the source adds - roughly ** billion U.S. dollar coming from the United States alone. Note that the figures provided here included both credit card fraud and debit card fraud. The source does not separate between the two, and also did not provide figures on the United States - a country known for its reliance on credit cards.
Nature of fraud and computer misuse in England and Wales: appendix tables
ons.gov.uk
cy.ons.gov.uk
xlsx
Updated Nov 6, 2024
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Office for National Statistics (2024). Nature of fraud and computer misuse in England and Wales: appendix tables [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/natureoffraudandcomputermisuseinenglandandwalesappendixtables
Explore at:
xlsxAvailable download formats
Dataset updated
Nov 6, 2024
Dataset provided by
Office for National Statisticshttp://www.ons.gov.uk/
License
Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically
Description
Data from the Crime Survey for England and Wales (CSEW) and the National Fraud Intelligence Bureau (NFIB), including numbers of incidents and characteristics of victims.
S
E-Commerce Fraud Statistics And Facts (2025)
sci-tech-today.com
Updated Aug 19, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Sci-Tech Today (2025). E-Commerce Fraud Statistics And Facts (2025) [Dataset]. https://www.sci-tech-today.com/stats/e-commerce-fraud-statistics/
Explore at:
Dataset updated
Aug 19, 2025
Dataset authored and provided by
Sci-Tech Today
License
https://www.sci-tech-today.com/privacy-policyhttps://www.sci-tech-today.com/privacy-policy
Time period covered
2022 - 2032
Area covered
Global
Description
Introduction

E-Commerce Fraud Statistics: When you shop online, you probably think about getting the best deal, fast delivery, or whether the product will match the description. But thereâ€™s a whole other side to e-commerce that most shopping people never see,Â the world of fraud. And trust me, these numbers will shock you, because you did to me. These e-commerce fraud statistics arenâ€™t just random figures in a report; they show the real damage that scammers are causing to businesses and even regular customers like us.

Over the years, fraud in online shopping has gone from the stolen credit card to a multi-billion-dollar global problem. Weâ€™re talking billions lost every single year, and itâ€™s only getting worse. These statistics tell a story about how criminals work, where they strike the most, and which types of fraud cost businesses the most money. If youâ€™ve ever wondered just how big the problem is, or what kinds of tricks fraudsters are using, letâ€™s get started.
t
Credit Card Fraud Detection
test.researchdata.tuwien.ac.at
zenodo.org
+1more
csv, json, pdf +2
Updated Apr 28, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja (2025). Credit Card Fraud Detection [Dataset]. http://doi.org/10.82556/yvxj-9t22
Explore at:
text/markdown, csv, pdf, txt, jsonAvailable download formats
Unique identifier
https://doi.org/10.82556/yvxj-9t22
Dataset updated
Apr 28, 2025
Dataset provided by
TU Wien
Authors
Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja; Ajdina Grizhja
License
Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
Time period covered
Apr 28, 2025
Description
Below is a draft DMP–style description of your credit‐card fraud detection experiment, modeled on the antiquities example:

1. Dataset Description

Research Domain
This work resides in the domain of financial fraud detection and applied machine learning. We focus on detecting anomalous credit‐card transactions in real time to reduce financial losses and improve trust in digital payment systems.

Purpose
The goal is to train and evaluate a binary classification model that flags potentially fraudulent transactions. By publishing both the code and data splits via FAIR repositories, we enable reproducible benchmarking of fraud‐detection algorithms and support future research on anomaly detection in transaction data.

Data Sources
We used the publicly available credit‐card transaction dataset from Kaggle (original source: https://www.kaggle.com/mlg-ulb/creditcardfraud), which contains anonymized transactions made by European cardholders over two days in September 2013. The dataset includes 284 807 transactions, of which 492 are fraudulent.

Method of Dataset Preparation

Schema validation: Renamed columns to snake_case (e.g. transaction_amount, is_declined) so they conform to DBRepo’s requirements.

Data import: Uploaded the full CSV into DBRepo, assigned persistent identifiers (PIDs).

Splitting: Programmatically derived three subsets—training (70%), validation (15%), test (15%)—using range‐based filters on the primary key actionnr. Each subset was materialized in DBRepo and assigned its own PID for precise citation.

Cleaning: Converted the categorical flags (is_declined, isforeigntransaction, ishighriskcountry, isfradulent) from “Y”/“N” to 1/0 and dropped non‐feature identifiers (actionnr, merchant_id).

Modeling: Trained a RandomForest classifier on the training split, tuned on validation, and evaluated on the held‐out test set.

2. Technical Details

Dataset Structure

The raw data is a single CSV with columns:

actionnr (integer transaction ID)

merchant_id (string)

average_amount_transaction_day (float)

transaction_amount (float)

is_declined, isforeigntransaction, ishighriskcountry, isfradulent (binary flags)

total_number_of_declines_day, daily_chargeback_avg_amt, sixmonth_avg_chbk_amt, sixmonth_chbk_freq (numeric features)

Naming Conventions

All columns use lowercase snake_case.

Subsets are named creditcard_training, creditcard_validation, creditcard_test in DBRepo.

Files in the code repo follow a clear structure:

├── data/ # local copies only; raw data lives in DBRepo ├── notebooks/Task.ipynb ├── models/rf_model_v1.joblib ├── outputs/ # confusion_matrix.png, roc_curve.png, predictions.csv ├── README.md ├── requirements.txt └── codemeta.json

Required Software

Python 3.9+

pandas, numpy (data handling)

scikit-learn (modeling, metrics)

matplotlib (visualizations)

dbrepo‐client.py (DBRepo API)

requests (TU WRD API)

Additional Resources

Original dataset: https://www.kaggle.com/mlg-ulb/creditcardfraud

Scikit-learn docs: https://scikit-learn.org/stable

DBRepo API guide: via the starter notebook’s dbrepo_client.py template

TU WRD REST API spec: https://test.researchdata.tuwien.ac.at/api/docs

3. Further Details

Data Limitations

Highly imbalanced: only ~0.17% of transactions are fraudulent.

Anonymized PCA features (V1–V28) hidden; we extended with domain features but cannot reverse engineer raw variables.

Time‐bounded: only covers two days of transactions, may not capture seasonal patterns.

Licensing and Attribution

Raw data: CC-0 (per Kaggle terms)

Code & notebooks: MIT License

Model artifacts & outputs: CC-BY 4.0

DUWRD records include ORCID identifiers for the author.

Recommended Uses

Benchmarking new fraud‐detection algorithms on a standard imbalanced dataset.

Educational purposes: demonstrating model‐training pipelines, FAIR data practices.

Extension: adding time‐series or deep‐learning models.

Known Issues

Possible temporal leakage if date/time features not handled correctly.

Model performance may degrade on live data due to concept drift.

Binary flags may oversimplify nuanced transaction outcomes.
Medicaid Fraud Control Units (MFCUs)
catalog.data.gov
healthdata.gov
+2more
Updated Aug 11, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Department of Health & Human Services (2025). Medicaid Fraud Control Units (MFCUs) [Dataset]. https://catalog.data.gov/dataset/medicaid-fraud-control-units-mfcu-annual-spending-and-performance-statistics-ddfe3
Explore at:
Dataset updated
Aug 11, 2025
Dataset provided by
United States Department of Health and Human Serviceshttp://www.hhs.gov/
Description
Medicaid Fraud Control Units (MFCU or Unit) investigate and prosecute Medicaid fraud as well as patient abuse and neglect in health care facilities. OIG certifies, and annually recertifies, each MFCU. OIG collects information about MFCU operations and assesses whether they comply with statutes, regulations, and OIG policy. OIG also analyzes MFCU performance based on 12 published performance standards and recommends program improvements where appropriate.
Fraud Detection in Financial Transactions
kaggle.com
Updated Jan 17, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Darshan Dalvi (2025). Fraud Detection in Financial Transactions [Dataset]. https://www.kaggle.com/datasets/darshandalvi12/fraud-detection-in-financial-transactions/discussion?sort=undefined
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jan 17, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Darshan Dalvi
License
MIT Licensehttps://opensource.org/licenses/MIT
License information was derived automatically
Description
Credit Card Fraud Detection Dataset (Updated)

This dataset contains 284,807 transactions from a credit card company, where 492 transactions are fraudulent. The data is highly imbalanced, with only a small fraction of transactions being fraudulent. The dataset is commonly used to build and evaluate fraud detection models.

Dataset Details:

Number of Transactions: 284,807

Fraudulent Transactions: 492 (Highly Imbalanced)

Features:

28 anonymized features (V1 to V28)

Transaction amount

Timestamp

Label:

0: Legitimate

1: Fraudulent

Data Preprocessing:

SMOTE (Synthetic Minority Oversampling Technique) has been applied to address the class imbalance in the dataset, generating synthetic examples for the minority class (fraudulent transactions).

Additional Operations: Various preprocessing steps were performed, including data cleaning and feature engineering, to ensure the quality of the dataset for model training.

Processed Files:

The dataset has been split into training and testing sets and saved in the following files: - X_train.csv: Feature data for the training set - X_test.csv: Feature data for the testing set - y_train.csv: Labels for the training set (fraudulent or legitimate) - y_test.csv: Labels for the testing set

This updated dataset is ready to be used for training and evaluating machine learning models, specifically designed for credit card fraud detection tasks.

This description highlights the key aspects of the dataset, including its preprocessing steps and the availability of the processed files for ease of use.
C
Credit Card Fraud Detection Platform Report
archivemarketresearch.com
doc, pdf, ppt
Updated Mar 14, 2025
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Archive Market Research (2025). Credit Card Fraud Detection Platform Report [Dataset]. https://www.archivemarketresearch.com/reports/credit-card-fraud-detection-platform-57120
Explore at:
ppt, doc, pdfAvailable download formats
Dataset updated
Mar 14, 2025
Dataset authored and provided by
Archive Market Research
License
https://www.archivemarketresearch.com/privacy-policyhttps://www.archivemarketresearch.com/privacy-policy
Time period covered
2025 - 2033
Area covered
Global
Variables measured
Market Size
Description
The global credit card fraud detection platform market is experiencing robust growth, driven by the escalating volume of digital transactions and the increasing sophistication of fraud techniques. The market, valued at approximately $15 billion in 2025, is projected to exhibit a Compound Annual Growth Rate (CAGR) of 18% from 2025 to 2033. This substantial growth is fueled by several key factors. The rising adoption of e-commerce and mobile payments creates a larger attack surface for fraudsters, necessitating advanced detection solutions. Furthermore, the increasing prevalence of sophisticated fraud schemes, such as synthetic identity theft and account takeover, demands more intelligent and adaptive fraud detection systems. The market is segmented by screening type (manual and automatic) and application (personal and enterprise), with automatic screening and enterprise applications driving the majority of growth due to their scalability and efficiency. The competitive landscape is dynamic, with established players like FICO, Mastercard, and Visa competing alongside innovative startups such as Forter and Feedzai. These companies continuously develop AI-powered solutions leveraging machine learning and big data analytics to identify and prevent fraudulent transactions effectively. Regional growth varies, with North America and Europe currently holding significant market share, but Asia-Pacific is expected to experience rapid expansion in the coming years due to rising digital adoption and economic growth in countries like India and China. The continued growth of the credit card fraud detection platform market hinges on several factors. The increasing demand for real-time fraud detection capabilities is driving the adoption of cloud-based solutions and the integration of advanced analytics. Regulatory compliance requirements, particularly around data privacy and security, also contribute to market growth. However, challenges remain. The cost of implementing and maintaining these sophisticated systems can be prohibitive for smaller businesses. Moreover, the constant evolution of fraud techniques necessitates ongoing investment in research and development to stay ahead of emerging threats. The market’s future trajectory will depend on the continued innovation in fraud detection technologies, the ability to adapt to evolving fraud tactics, and the successful integration of these solutions across various industries and geographies.
CerditCard fraud dataset
kaggle.com
Updated Aug 2, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Wasiq Ali (2025). CerditCard fraud dataset [Dataset]. https://www.kaggle.com/datasets/wasiqaliyasir/cerditcard-fraud-dataset
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Aug 2, 2025
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Wasiq Ali
License
Apache License, v2.0https://www.apache.org/licenses/LICENSE-2.0
License information was derived automatically
Description
Credit Card Fraud Detection Dataset

Uncover fraudulent transactions with this anonymized, PCA-transformed dataset. Perfect for building and testing fraud detection algorithms!

Dataset Overview

Objective: Detect fraudulent credit card transactions using anonymized features- - - -

Samples: 1,000 transactions

Features: 7 columns (5 PCA components + Transaction Amount + Target)

Class Distribution:

Legit (Class 0): 993 transactions (~99.3%)

Fraud (Class 1): 7 transactions (~0.7%)

Key Challenge: Extreme class imbalance – realistic representation of fraud patterns

Features Description

Feature Description Characteristics

V1-V5 Anonymized principal components PCA-transformed numerical features; preserves >transaction patterns while hiding sensitive details Amount Transaction value Highly variable (min: $0.20, max: $1,916.06); critical for fraud analysis Class Target variable Binary labels: • 0 = Legitimate transaction • 1 = Fraudulent transaction Key Insights & Patterns

Fraud Indicators:

Fraudulent transactions occur across diverse amounts (low: $1.83 → high: $1,916)

No obvious amount threshold for fraud – requires nuanced modeling

Sample fraud cases:

V1:0.579, V2:-0.384, Amount:1916.06

V1:1.023, V2:-0.638, Amount:1094.42

Data Characteristics:

V1-V5 Distributions:

V1: Concentrated near zero (mean ≈ -0.1)

V2: Wider spread (mean ≈ 0.05)

V3-V5: Asymmetric distributions

Amount Distribution:

Right-skewed – most transactions < $500

2.Fraud cases span low and high values

Class Imbalance:

- Severe skew: 993:7 legit-to-fraud ratio - Models must optimize for recall/precision over accuracy

Analysis Challenges

⚠️ Class Imbalance: Standard accuracy metrics misleading

🔍 Feature Interpretation: PCA components lack real-world context

📊 Non-linear Patterns: Complex interactions between V1-V5

⚡ High Stakes: False negatives (missed fraud) costlier than false positives

Recommended Applications Fraud Detection Models:

Logistic Regression (with class weighting)

Random Forests / XGBoost (handle non-linearities)

Isolation Forests (anomaly detection)

Evaluation Focus:

Precision-Recall Curves > ROC-AUC

F2-Score (prioritize recall)

Confusion matrix analysis

Advanced Techniques:

SMOTE/ADASYN for oversampling

Autoencoders for anomaly detection

Feature engineering: Amount-to-Var ratios

Dataset Source & Ethics Origin: Synthetic dataset mirroring real-world financial patterns

Anonymization: Original features transformed via PCA for privacy compliance

Bias Consideration: Geographic/cultural biases possible in source data

Potential Use Cases

🏦 Banking: Real-time transaction monitoring systems

📱 FinTech Apps: Fraud detection APIs for payment gateways

🎓 Education: Imbalanced classification tutorials

🏆 Kaggle Competitions: Lightweight fraud detection challenge

Example Project Idea "Minimalist Fraud Detector":

# python from imblearn.pipeline import make_pipeline from sklearn.ensemble import RandomForestClassifier model = make_pipeline( RobustScaler(), SMOTE(sampling_strategy=0.3), RandomForestClassifier(class_weight={0:1, 1:15}) ) Optimize for: Recall @ Precision > 0.85

Dataset Summary markdown | Feature | Mean | Std | Min | Max | |----------|----------|----------|-----------|-----------| | V1 | -0.11 | 1.02 | -3.24 | 3.85 | | V2 | 0.05 | 1.01 | -2.94 | 2.60 | | V3 | 0.02 | 0.98 | -3.02 | 2.95 |
| Amount | 250.32 | 190.19 | 0.20 | 1916.06 |
Credit card fraud detection
kaggle.com
Updated Jun 19, 2019
+ more versions
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
Dileep (2019). Credit card fraud detection [Dataset]. https://www.kaggle.com/datasets/dileep070/anomaly-detection
Explore at:
CroissantCroissant is a format for machine-learning datasets. Learn more about this at mlcommons.org/croissant.
Dataset updated
Jun 19, 2019
Dataset provided by
Kagglehttp://kaggle.com/
Authors
Dileep
Description
Dataset

This dataset was created by Dileep

Contents
c
Financial Payment Services Fraud Dataset
cubig.ai
Updated Jun 30, 2025
Share
Facebook
Twitter
Email
Click to copy link
Link copied
Cite
CUBIG (2025). Financial Payment Services Fraud Dataset [Dataset]. https://cubig.ai/store/products/547/financial-payment-services-fraud-dataset
Explore at:
Dataset updated
Jun 30, 2025
Dataset authored and provided by
CUBIG
License
https://cubig.ai/store/terms-of-servicehttps://cubig.ai/store/terms-of-service
Measurement technique
Synthetic data generation using AI techniques for model training, Privacy-preserving data transformation via differential privacy
Description
1) Data Introduction • The Financial Payment Services Fraud Data Dataset is based on a real-world financial transaction simulation and was collected to detect fraudulent activities across various types of payments and transfers. It includes key financial data such as transaction time, type, amount, sender and recipient information, and account balances before and after each transaction. Each transaction is labeled as either fraudulent or legitimate.

2) Data Utilization (1) Characteristics of the Financial Payment Services Fraud Data Dataset: • With its large-scale transaction records, detailed account information, and diverse transaction types, this dataset is well-suited for developing and testing financial fraud detection models.

(2) Applications of the Financial Payment Services Fraud Data Dataset: • Real-time Fraud Detection: The dataset can be used to train machine learning classification models that quickly detect and prevent fraudulent transactions in real-world financial service environments. • Risky Transaction Pattern Analysis: By analyzing patterns according to transaction type, amount, and account, the dataset can support the advancement of fraud prevention policies and anomaly monitoring systems.

Facebook

Twitter

Click to copy link

Link copied

Cite

Office for National Statistics (2025). Nature of crime: fraud and computer misuse [Dataset]. https://www.ons.gov.uk/peoplepopulationandcommunity/crimeandjustice/datasets/natureofcrimefraudandcomputermisuse

Nature of crime: fraud and computer misuse

Explore at:

11 scholarly articles cite this dataset (View in Google Scholar)

xlsxAvailable download formats

Dataset updated

Apr 8, 2025

Dataset provided by

Office for National Statisticshttp://www.ons.gov.uk/

License

Open Government Licence 3.0http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/
License information was derived automatically

Description

Annual data on the nature of fraud and computer misuse offences. Data for the year ending March 2021 and March 2022 are from the Telephone-operated Crime Survey for England and Wales (TCSEW).

Clear search

Close search

Google apps

Main menu

Nature of crime: fraud and computer misuse

Card fraud in the U.S. versus rest of the world 2014-2023, with global...

Crime in England and Wales: Additional tables on fraud and cybercrime

Fraud Detection Dataset

Consumer fraud report rate, by state U.S. 2022

Fraud Statistics - Dataset - data.gov.uk

Bank Account Fraud Dataset Suite (NeurIPS 2022)

Fraud Detection 2022-23 - Dataset - data.sa.gov.au

Fraud Statistics

Telecommunication scam criminal data

Annual card fraud - credit cards and debit cards combined - worldwide...

Nature of fraud and computer misuse in England and Wales: appendix tables

E-Commerce Fraud Statistics And Facts (2025)

Introduction

Credit Card Fraud Detection

1. Dataset Description

2. Technical Details

3. Further Details

Medicaid Fraud Control Units (MFCUs)

Fraud Detection in Financial Transactions

Credit Card Fraud Detection Dataset (Updated)

Dataset Details:

Data Preprocessing:

Processed Files:

Credit Card Fraud Detection Platform Report

CerditCard fraud dataset

Dataset Overview

Features Description

Fraud Indicators:

Sample fraud cases:

Data Characteristics:

Analysis Challenges

Credit card fraud detection

Dataset

Contents

Financial Payment Services Fraud Dataset

Nature of crime: fraud and computer misuse